Abstract
Systems design involves decomposing a system into interconnected subsystems and allocating resources to teams responsible for designing each subsystem. The outcomes of the process depend on how well limited resources are allocated to different teams, and the strategy each team uses to design the subsystems. This article presents an approach based on hierarchical reinforcement learning (RL) to generate heuristics for solving complex design problems under resource constraints. The approach consists of formulating systems design problems as hierarchical multiarmed bandit (MAB) problems, where decisions are made at both the system level (allocating budget across subsystems) and the subsystem level (selecting heuristics for sequential information acquisition). The approach is demonstrated using an illustrative example of a race car optimization in The Open Racing Car Simulator (TORCS) environment. The results indicate that the RL agent can learn to allocate resources strategically, prioritize the subsystems with the greatest influence on overall performance, and identify effective information acquisition heuristics for each subsystem. For example, the RL agent learned to allocate a larger portion of the budget to the gearbox subsystem, which has a higher-dimensional design space compared to other subsystems. The results also indicate that the extracted heuristics lead to convergence to high-performing car configurations with greater efficiency when compared to using Bayesian optimization for design.