Abstract

Systems design involves decomposing a system into interconnected subsystems and allocating resources to teams responsible for designing each subsystem. The outcomes of the process depend on how well limited resources are allocated to different teams, and the strategy each team uses to design the subsystems. This article presents an approach based on hierarchical reinforcement learning (RL) to generate heuristics for solving complex design problems under resource constraints. The approach consists of formulating systems design problems as hierarchical multiarmed bandit (MAB) problems, where decisions are made at both the system level (allocating budget across subsystems) and the subsystem level (selecting heuristics for sequential information acquisition). The approach is demonstrated using an illustrative example of a race car optimization in The Open Racing Car Simulator (TORCS) environment. The results indicate that the RL agent can learn to allocate resources strategically, prioritize the subsystems with the greatest influence on overall performance, and identify effective information acquisition heuristics for each subsystem. For example, the RL agent learned to allocate a larger portion of the budget to the gearbox subsystem, which has a higher-dimensional design space compared to other subsystems. The results also indicate that the extracted heuristics lead to convergence to high-performing car configurations with greater efficiency when compared to using Bayesian optimization for design.

References

1.
Bhise
,
V. D.
,
2017
,
Automotive Product Development: A Systems Engineering Implementation
,
CRC Press
,
Boca Raton, FL
.
2.
Freriks
,
H.
,
Heemels
,
W.
,
Muller
,
G.
, and
Sandee
,
J.
,
2006
,
5.3. 2 On the Systematic Use of Budget-Based Design: Sixteenth Annual International Symposium of the International Council on Systems Engineering (INCOSE)
, INCOSE International Symposium, Vol.
16
, July 8–14,
Wiley Online Library
, pp.
788
803
. Paper No. 1.
3.
Martins
,
J. R.
, and
Lambe
,
A. B.
,
2013
, “
Multidisciplinary Design Optimization: A Survey of Architectures
,”
AIAA J.
,
51
(
9
), pp.
2049
2075
.
4.
Kim
,
H. M.
,
Rideout
,
D. G.
,
Papalambros
,
P. Y.
, and
Stein
,
J. L.
,
2003
, “
Analytical Target Cascading in Automotive Vehicle Design
,”
ASME J. Mech. Des.
,
125
(
3
), pp.
481
489
.
5.
Kang
,
N.
,
Kokkolaras
,
M.
,
Papalambros
,
P. Y.
,
Yoo
,
S.
,
Na
,
W.
,
Park
,
J.
, and
Featherman
,
D.
,
2014
, “
Optimal Design of Commercial Vehicle Systems Using Analytical Target Cascading
,”
Struct. Multidiscipl. Optim.
,
50
(
6
), pp.
1103
1114
.
6.
Fu
,
K. K.
,
Yang
,
M. C.
, and
Wood
,
K. L.
,
2016
, “
Design Principles: Literature Review, Analysis, and Future Directions
,”
ASME J. Mech. Des.
,
138
(
10
), p.
101103
.
7.
Yilmaz
,
S.
,
Daly
,
S. R.
,
Seifert
,
C. M.
, and
Gonzalez
,
R.
,
2015
, “
How Do Designers Generate New Ideas? Design Heuristics Across Two Disciplines
,”
Des. Sci.
,
1
, p.
e4
.
8.
Yilmaz
,
S.
, and
Seifert
,
C. M.
,
2011
, “
Creativity Through Design Heuristics: A Case Study of Expert Product Design
,”
Des. Stud.
,
32
(
4
), pp.
384
415
.
9.
Fillingim
,
K. B.
,
Nwaeri
,
R. O.
,
Borja
,
F.
,
Fu
,
K.
, and
Paredis
,
C. J. J.
,
2020
, “
Design Heuristics: Extraction and Classification Methods With Jet Propulsion Laboratory’s Architecture Team
,”
ASME J. Mech. Des.
,
142
(
8
), p.
081101
.
10.
Deshmukh
,
A. P.
,
Thurston
,
D. L.
, and
Allison
,
J. T.
,
2016
, “
Heuristics for Formulating Design Optimization Models: Their Uses and Pitfalls
,”
CESUN 2016 – 5th International Engineering Systems Symposium
,
Washington DC
,
June 27–29
.
11.
Gadi
,
V. S.
,
Topcu
,
T. G.
,
Szajnfarber
,
Z.
, and
Panchal
,
J. H.
,
2023
, “
Heuristics for Solver-Aware Systems Architecting (SASA): A Reinforcement Learning Approach
,”
ASME 2023 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference
,
Boston, MA
,
Aug. 20–23
.
12.
Sutton
,
R. S.
, and
Barto
,
A. G.
,
2018
,
Reinforcement Learning: An Introduction
,
MIT Press
,
Cambridge, MA
.
13.
Tao
,
S.
,
Van Beek
,
A.
,
Apley
,
D. W.
, and
Chen
,
W.
,
2020
, “
Bayesian Optimization for Simulation-Based Design of Multi-Model Systems
,”
ASME 2020 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference
,
Virtual Conference
,
Aug. 17–19
.
14.
Tao
,
S.
,
Sharma
,
C.
, and
Devanathan
,
S.
,
2024
, “
Resource-Aware Multi-Fidelity Multi-Objective Multidisciplinary Design Optimization
,”
ASME 2024 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference
,
Washington, DC
,
Aug. 25–28
.
15.
Tran
,
A.
,
Wildey
,
T.
, and
McCann
,
S.
,
2020
, “
sMF-BO-2CoGP: A Sequential Multi-Fidelity Constrained Bayesian Optimization Framework for Design Applications
,”
ASME J. Comput. Inf. Sci. Eng.
,
20
(
3
), p.
031007
.
16.
Xiao
,
H.
, and
Wei
,
Z.
,
2023
, “Efficient Dynamic Allocation Policy for Robust Ranking and Selection Under Stochastic Control Framework,” arXiv preprint arXiv:2305.07603.
17.
Hsieh
,
B.-J.
,
Hsieh
,
P.-C.
, and
Liu
,
X.
,
2021
, “
Reinforced Few-Shot Acquisition Function Learning for Bayesian Optimization
,”
NIPS '21: Proceedings of the 35th International Conference on Neural Information Processing Systems
,
Virtual-only Conference
,
Dec. 6–14
.
18.
Liu
,
Z.
,
Qu
,
X.
,
Liu
,
X.
, and
Lyu
,
H.
,
2022
, “Robust Bayesian Optimization With Reinforcement Learned Acquisition Functions,” arXiv preprint arXiv:2210.00476.
19.
Ma
,
H.
,
Vo
,
T. V.
, and
Leong
,
T.-Y.
,
2024
, “
Mixed-Initiative Bayesian Sub-Goal Optimization in Hierarchical Reinforcement Learning
,”
23rd International Conference on Autonomous Agents and Multiagent Systems
,
Auckland, New Zealand
,
May 6–10
, pp.
1328
1336
.
20.
Chen
,
H.-C.
,
Dai
,
L.
,
Chen
,
C.-H.
, and
Yücesan
,
E.
,
1997
, “
New Development of Optimal Computing Budget Allocation for Discrete Event Simulation
,”
29th Conference on Winter Simulation
,
Atlanta, GA
,
Dec. 7–10
, pp.
334
341
.
21.
Xiao
,
H.
,
Lee
,
L. H.
, and
Chen
,
C.-H.
,
2015
, “
Optimal Budget Allocation Rule for Simulation Optimization Using Quadratic Regression in Partitioned Domains
,”
IEEE Trans. Syst. Man. Cybernet.: Syst.
,
45
(
7
), pp.
1047
1062
.
22.
Fan
,
Q.
, and
Hu
,
J.
,
2013
, “
Adaptive Simulation Budget Allocation for Determining the Best Design
,”
2013 Winter Simulations Conference (WSC)
,
Washington, DC
,
Dec. 8–11
, IEEE, pp.
888
897
.
23.
Michelena
,
N.
,
Park
,
H.
, and
Papalambros
,
P. Y.
,
2003
, “
Convergence Properties of Analytical Target Cascading
,”
AIAA J.
,
41
(
5
), pp.
897
905
.
24.
Allison
,
J. T.
, and
Papalambros
,
P. Y.
,
2007
, “
Optimal Partitioning and Coordination Decisions in System Design Using an Evolutionary Algorithm
,” Proceedings of the Seventh World Conference on Structural and Multidisciplinary Optimization, Seoul, South Korea, May, pp.
21
25
.
25.
Allison
,
J. T.
, and
Papalambros
,
P. Y.
,
2010
, “
Consistency Constraint Allocation in Augmented Lagrangian Coordination
,”
ASME. J. Mech. Des.
,
132
(
7
), p.
071007
.
26.
Miguel
,
F.
,
Gómez
,
T.
,
Luque
,
M.
,
Ruiz
,
F.
, and
Caballero
,
R.
,
2009
, “
A Decomposition-Coordination Method for Complex Multi-Objective Systems
,”
Asia-Pac. J. Oper. Res.
,
26
(
06
), pp.
735
757
.
27.
Ashenafi
,
Y.
,
Pandita
,
P.
, and
Ghosh
,
S.
,
2022
, “
Reinforcement Learning-Based Sequential Batch-Sampling for Bayesian Optimal Experimental Design
,”
ASME J. Mech. Des.
,
144
(
9
), p.
091705
.
28.
Chen
,
Q.
, and
Heydari
,
B.
,
2022
, “
Dynamic Resource Allocation in Systems-of-Systems Using a Heuristic-Based Interpretable Deep Reinforcement Learning
,”
ASME J. Mech. Des.
,
144
(
9
), p.
091711
.
29.
Wymann
,
B.
,
Espié
,
E.
,
Guionneau
,
C.
,
Dimitrakakis
,
C.
,
Coulom
,
R.
, and
Sumner
,
A.
,
2000
, “
Torcs, the Open Racing Car Simulator
,”
Software
,
4
(
6
), p.
2
.
30.
Chaudhari
,
A. M.
,
Bilionis
,
I.
, and
Panchal
,
J. H.
,
2020
, “
Descriptive Models of Sequential Decisions in Engineering Design: An Experimental Study
,”
ASME J. Mech. Des.
,
142
(
8
), p.
081704
.
31.
Shergadwala
,
M.
,
Bilionis
,
I.
,
Kannan
,
K. N.
, and
Panchal
,
J. H.
,
2018
, “
Quantifying the Impact of Domain Knowledge and Problem Framing on Sequential Decisions in Engineering Design
,”
ASME J. Mech. Des.
,
140
(
10
), p.
101402
.
32.
Haskins
,
C.
,
Forsberg
,
K.
, and
Krueger
,
M.
,
2011
,
Systems Engineering Handbook: A Guide for System Life Cycle Processes and Activities
,
Incose
,
San Diego, CA
.
33.
Auer
,
P.
,
2002
, “
Using Confidence Bounds for Exploitation-Exploration Trade-Offs
,”
J. Mach. Learn. Res.
,
3
(
Nov
), pp.
397
422
.
34.
Jones
,
D. R.
,
Schonlau
,
M.
, and
Welch
,
W. J.
,
1998
, “
Efficient Global Optimization of Expensive Black-Box Functions
,”
J. Global Optim.
,
13
(
4
), pp.
455
492
.
35.
Mockus
,
J.
,
1994
, “
Application of Bayesian Approach to Numerical Methods of Global and Stochastic Optimization
,”
J. Global Optim.
,
4
(
4
), pp.
347
365
.
36.
Bishop
,
C. M.
,
2006
,
Pattern Recognition and Machine Learning
,
Springer
,
New York
.
37.
Rasmussen
,
C. E.
, and
Williams
,
C. K.
,
2006
,
Gaussian Processes for Machine Learning
,
The MIT Press
,
Cambridge, MA
.
38.
Kemmerling
,
M.
, and
Preuss
,
M.
,
2010
, “
Automatic Adaptation to Generated Content via Car Setup Optimization in Torcs
,”
2010 IEEE Conference on Computational Intelligence and Games
,
Copenhagen, Denmark
,
Aug. 18–21
, IEEE, pp.
131
138
.
39.
Shan
,
S.
, and
Wang
,
G. G.
,
2011
, “
Turning Black-Box Functions Into White Functions
,”
ASME J. Mech. Des.
,
133
(
3
), p.
031003
.
40.
Omidvar
,
M. N.
,
Li
,
X.
,
Mei
,
Y.
, and
Yao
,
X.
,
2014
, “
Cooperative Co-Evolution With Differential Grouping for Large Scale Optimization
,”
IEEE Trans. Evol. Comput.
,
18
(
3
), pp.
378
393
.
You do not currently have access to this content.