Abstract

We propose a novel deep learning-based approach for the problem of continuous-time leader synchronization in graphical games on large networks. The problem setup is to deploy a distributed and coordinated swarm to track the trajectory of a leader while minimizing local neighborhood tracking error and control costs for each agent. The goal of our work is to develop optimal control policies for continuous-time leader synchronization in graphical games using deep neural networks. We discretize the agents model using sampling to facilitate the modification of gradient descent methods for learning optimal control policies. The distributed swarm is deployed for a certain amount of time while keeping the control input of each agent constant during each sampling period. After collecting state and input data at each sampling time during one iteration, we update the weights of a deep neural network for each agent using collected data to minimize a loss function that characterizes the agents' local neighborhood tracking error and the control cost. A modified gradient descent method is presented to overcome existing limitations. The performance of the proposed method is compared with two reinforcement learning-based methods in terms of robustness to initial neural network weights and initial local neighborhood tracking errors and the scalability to networks with a large number of agents. Our approach has been shown to achieve superior performance compared with the other two methods.

References

1.
Qu
,
Z.
,
2009
,
Cooperative Control of Dynamical Systems: Applications to Autonomous Vehicles
, Vol.
3
,
Springer
,
New York
.
2.
Ren
,
W.
, and
Beard
,
R. W.
,
2008
,
Distributed Consensus in Multi-Vehicle Cooperative Control
, Vol.
27
,
Springer
,
New York
.
3.
Hong
,
Y.
,
Hu
,
J.
, and
Gao
,
L.
,
2006
, “
Tracking Control for Multi-Agent Consensus With an Active Leader and Variable Topology
,”
Automatica
,
42
(
7
), pp.
1177
1182
.
4.
Ren
,
W.
,
Moore
,
K. L.
, and
Chen
,
Y.
,
2006
, “
High-Order and Model Reference Consensus Algorithms in Cooperative Control of MultiVehicle Systems
,”
J. Dyn. Syst. Meas. Control
,
129
(
5
), pp.
678
688
.
5.
Nash
,
J. F.
, Jr.
,
1950
, “
Equilibrium Points in n-Person Games
,”
Proc. Natl. Acad. Sci.
,
36
(
1
), pp.
48
49
.
6.
Tijs
,
S.
,
2003
,
Introduction to Game Theory
,
Springer
,
New York
.
7.
Başar
,
T.
, and
Olsder
,
G. J.
,
1998
,
Dynamic Noncooperative Game Theory
,
SIAM
,
Philadelphia, PA
.
8.
Vickrey
,
D.
, and
Koller
,
D.
,
2002
, “
Multi-Agent Algorithms for Solving Graphical Games
,”
Proceedings of the Eighteenth National Conference on Artificial Intelligence
,
Edmonton, Alberta, Canada
,
July 28–Aug. 1
, pp.
345
351
.
9.
Vamvoudakis
,
K. G.
,
Lewis
,
F. L.
, and
Hudas
,
G. R.
,
2012
, “
Multi-Agent Differential Graphical Games: Online Adaptive Learning Solution for Synchronization With Optimality
,”
Automatica
,
48
(
8
), pp.
1598
1611
.
10.
Vamvoudakis
,
K. G.
,
2017
, “
Q-Learning for Continuous-Time Graphical Games on Large Networks With Completely Unknown Linear System Dynamics
,”
J. Robust Nonlinear Control
,
27
(
16
), pp.
2900
2920
.
11.
Hecht-Nielsen
,
R.
,
1992
,
Neural Networks for Perception
,
Elsevier
,
Cambridge, MA
, pp.
65
93
.
12.
Brunton
,
S. L.
,
Budišić
,
M.
,
Kaiser
,
E.
, and
Kutz
,
J. N.
,
2022
, “
Modern Koopman Theory for Dynamical Systems
,”
SIAM Review
,
64
(
2
), pp.
229
340
.
13.
Li
,
Y.
,
Tang
,
Y.
,
Zhang
,
R.
, and
Li
,
N.
,
2021
, “
Distributed Reinforcement Learning for Decentralized Linear Quadratic Control: A Derivative-Free Policy Optimization Approach
,”
IEEE Trans. Automat. Contr.
,
67
(
12
), pp.
6429
6444
.
14.
Brunton
,
S. L.
, and
Kutz
,
J. N.
,
2022
,
Data-driven Science and Engineering: Machine Learning, Dynamical Systems, and Control
,
Cambridge University Press
,
Cambridge, UK
.
15.
Wang
,
Y.
,
2017
, “
A New Concept Using Lstm Neural Networks for Dynamic System Identification
,”
Proceedings of the 2017 American Control Conference (ACC)
,
Seattle, WA
,
May 24–26
, IEEE, pp.
5324
5329
.
16.
Bemporad
,
A.
,
Morari
,
M.
,
Dua
,
V.
, and
Pistikopoulos
,
E. N.
,
2002
, “
The Explicit Linear Quadratic Regulator for Constrained Systems
,”
Automatica
,
38
(
1
), pp.
3
20
.
17.
Alessio
,
A.
, and
Bemporad
,
A.
,
2009
, “A Survey on Explicit Model Predictive Control,”
Nonlinear Model Predictive Control: Towards New Challenging Applications
,
L.
Magni
,
D. M.
Raimondo
, and
F.
Allgöwer
., eds., Nonlinear Model Predictive Control. Lecture Notes in Control and Information Sciences, Vol.
384
,
Springer, Berlin, Heidelberg.
,
Berlin/Heidelberg
, pp.
345
369
.
18.
Drgoňa
,
J.
,
Kiš
,
K.
,
Tuor
,
A.
,
Vrabie
,
D.
, and
Klaučo
,
M.
,
2022
, “
Differentiable Predictive Control: Deep Learning Alternative to Explicit Model Predictive Control for Unknown Nonlinear Systems
,”
J. Process Control
,
116
, pp.
80
92
.
19.
Drgoňa
,
J.
,
Tuor
,
A.
, and
Vrabie
,
D.
,
2020
, “
Learning Constrained Adaptive Differentiable Predictive Control Policies with Guarantees
,” arXiv preprint arXiv:2004.11184.
20.
Mukherjee
,
S.
,
Drgoňa
,
J.
,
Tuor
,
A.
,
Halappanavar
,
M.
, and
Vrabie
,
D.
,
2022
, “
Neural Lyapunov Differentiable Predictive Control
,”
2022 IEEE 61st Conference on Decision and Control (CDC)
,
Cancún, Mexico
,
Dec. 6–9
, IEEE, pp.
2097
2104
.
21.
Cortez
,
W. S.
,
Drgona
,
J.
,
Tuor
,
A.
,
Halappanavar
,
M.
, and
Vrabie
,
D.
,
2022
, “
Differentiable Predictive Control with Safety Guarantees: A Control Barrier Function Approach
,”
2022 IEEE 61st Conference on Decision and Control (CDC)
,
Cancún, Mexico
,
Dec. 6–9
, IEEE, pp.
932
938
.
22.
Khoo
,
S.
,
Xie
,
L.
, and
Man
,
Z.
,
2009
, “
Robust Finite-Time Consensus Tracking Algorithm for Multirobot Systems
,”
IEEE/ASME Trans. Mechatron.
,
14
(
2
), pp.
219
228
.
23.
Hashemi
,
N.
,
Ruths
,
J.
, and
Fazlyab
,
M.
,
2021
, “
Certifying Incremental Quadratic Constraints for Neural Networks via Convex Optimization
,”
Proceedings of the 3rd Conference on Learning for Dynamics and Control
,
Virtual Event, Switzerland
,
June 7–8
, PMLR, pp.
842
853
.
24.
Fazlyab
,
M.
,
Morari
,
M.
, and
Pappas
,
G. J.
,
2020
, “
Safety Verification and Robustness Analysis of Neural Networks via Quadratic Constraints and Semidefinite Programming
,”
IEEE Trans. Autom. Control
,
67
(
1
), pp.
1
15
.
You do not currently have access to this content.