Abstract

To identify the underlying mechanisms of human motor control, parametric models are utilized. One approach of employing these models is the inferring the control intent (estimating motor control strategy). A well-accepted assumption is that human motor control is optimal; thus, the intent is inferred by solving an inverse optimal control (IOC) problem. Linear quadratic regulator (LQR) is a well-established optimal controller, and its inverse LQR (ILQR) problem has been used in the literature to infer the control intent of one subject. This implementation used a cost function with gain penalty, minimizing the error between LQR gain and a preliminary estimated gain. We hypothesize that relying on an estimated gain may limit ILQR optimization capability. In this study, we derive an ILQR optimization with output penalty, minimizing the error between the model output and the measured output. We conducted the test on 30 healthy subjects who sat on a robotic seat capable of rotation. The task utilized a physical human–robot interaction with a perturbation torque as input and lower and upper body angles as output. Our method significantly improved the goodness of fit compared to the gain-penalty ILQR. Moreover, the dominant inferred intent was not statistically different between the two methods. To our knowledge, this work is the first that infers motor control intent for a sample of healthy subjects. This is a step closer to investigating control intent differences between healthy subjects and subjects with altered motor control, e.g., low back pain.

1 Introduction

Applying system identification to the study of human motor control has been investigated over the last few decades. System identification methods enabled researchers to determine physiological parameters, control gains, and control bandwidths for various motor control tasks [14]. One subfield of system identification is inverse optimal control (IOC) theory. In IOC, a stabilization feedback control law is first constructed for a given plant. Then, meaningful cost functions are retrieved based on state variables and control inputs [58]. These cost functions determine how much weight the controller assigns to various states compared to the control effort. Several previous studies have estimated optimal control cost functions from human motion data in an effort to explain human control intent [811]. Unlike the general potential cost functions used in these studies, we propose the use of a control theoretical method that uses a linear quadratic regulator (LQR) framework [12].

The inverse LQR (ILQR) problem received some attention with some results for potentially unstable controllers [13]. Since controller stabilization is important when examining engineering or biological systems, we focus on the stable LQR problem. In addition, when the cross term S of the LQR cost function is included, any controller K is optimal for some cost function [14]. However, for various reasons, we chose to exclude S from our LQR cost function. It is rarely used in the design of LQR controllers for real system. The inverse results, including cross terms, provide less meaningful information about the control intent than the results from the direct separation of control and state costs (e.g., the principle of parsimony). As a result, in the remainder of this paper, we will focus on the stable LQR problem with S =0.

There have been some studies on ILQR [1517]. These studies (based on Ref. [15], gain-penalty ILQR) use two optimization steps where the estimated control gain K from the first step is used in constructing cost function for the second step. This is a relatively complex process, and errors between the two steps can accumulate. Moreover, the optimization capability may be limited because the experimental data is not used in the second step. Therefore, this study aims at overcoming these limitations. Other studies presented alternative approaches as well [18,19]. They used particle swarm optimization and an off-the-shelf genetic algorithm, respectively, which do not provide an exact solution.

This study proposes an output-penalty ILQR method that uses the experimental data in the ILQR cost function. We applied normalization and auto-annealing techniques [20,21] to improve performance and speed of operation. We hypothesize that the output-penalty ILQR will yield a better fit since it uses the experimental data. A better fit is highly desirable to increase trust in the model and hence the inferred intent. We are the first who build a framework intended to investigate the feedback control aspects of low back pain. Our long-term clinical question is what differences (in terms of feedback control) exist between healthy subjects and subjects with low back pain. We are approaching this question from an optimality perspective, i.e., each subject tries to optimize a quantity while doing the task. This is a step closer to investigating control intent differences between healthy subjects and subjects with altered motor control, e.g., low back pain patients.

2 Materials and Methods

2.1 Subjects.

Thirty healthy subjects participated in this study, with an average age of 33.2±12.0 yr (age range = 18–59 yr), height of 168.2±9.2 cm, and weight of 76.3±14.2 kg. The group consisted of 11 males and 19 females. All participants had no neurological or musculoskeletal disorders affecting motor control. Ethical approval for the study was obtained from the MSU internal review board, and subjects provided informed consent prior to enrollment.

2.2 Data Collection.

The seated balance test was performed by each subject during three laboratory visits, and each test was identical. For this test, the subject sat on a back-drivable robotic seat capable of rotation about an axis perpendicular to the coronal plane (Figs. 1 and 2). The robotic seat was driven by a motor (C062C-13-3305, Kollmorgen, Radford, VA). Subjects were securely strapped to the seat with their knees and hips positioned at 90 deg, and their arms crossed on their chests. They were instructed to balance on the seat, while the robot provided both spring–damper action and torque perturbations. Additional details regarding the test setup can be found in Ref. [22].

Fig. 1
The seated balance task using a back-drivable robot
Fig. 1
The seated balance task using a back-drivable robot
Close modal
Fig. 2
Simplified rigid body model of the seated balance experiment
Fig. 2
Simplified rigid body model of the seated balance experiment
Close modal

Data acquisition during the test was conducted using a data acquisition board (PCI-DAS6036, Measurement Computing, Norton, MA) and a quadrature encoder board (PCI-QUAD04, Measurement Computing, Norton, MA) at a rate of 1000 samples/s. The recorded data included seat angle (α1), upper body angle (α2), robot stiffness (kr), robot damping (cr), perturbation torque (u), total robot torque (ur), and the subject's weight and height. Robot stiffness (kr), damping (cr), and the amplitude of the perturbation torque were tuned for each subject to normalize test difficulty between subjects before testing [22,23]. For instance, each subject was asked to balance on the seat using a high stiffness value, and then the operator would decrease it gradually until the subject would not be able to balance anymore. After fine-tuning the estimate of this critical stiffness, the experiment stiffness was set to twice this value. More details are in Ref. [22]. In less demanding settings, achieving balance provides flexibility in control, a factor that might not be contingent on health status but rather could denote individual inclinations. To reduce ambiguity in evaluation, it is advisable to subject trunk neuromuscular control to maximal challenges [24]. Perturbation torque (u) was designed as a pseudorandom ternary sequence with a power spectral density up to approximately 1.6 Hz. The measurement of α1 was obtained from an encoder in the motor. Previous studies have claimed that encoders accurately reflect lower body angles [22]. α2 was measured using two string potentiometers (SP2-50, Celesco, Chatsworth, CA) attached to the fourth thoracic (T4) and L4 level. Prior studies have demonstrated the reliability of the measurements obtained from the seated balance test (intraclass correlation coefficients: 0.98 for α1 and 0.89 for α2 in time domain) [22].

2.3 Model and Preliminary Estimation.

For the analysis of physical human–robot interaction, we referred to the model from Ref. [25] (Fig. 2). The model incorporates the subject's torque uh applied about the L4 vertebrae by the subject, as well as intrinsic lumbar stiffness (kh) and damping coefficients (ch). The subject exerted a torque uh about the L4 vertebrae and had intrinsic lumbar stiffness and damping coefficients kh and ch, respectively. The dynamic model of physical human–robot interaction system has the following equations of motion. The details of the dynamic model can be found in Ref. [25]. The model parameters are summarized in Table 1 
(1)
(2)

where g is gravity. The lower body and robot seat below the fourth lumbar (L4) vertebra are lumped into a single rigid element with mass M1 and moments of inertia J1 with respect to the center of mass (COM). COM is at distance l1 relative to the pivot O of the seat. Similarly, the upper body above the L4 vertebra is lumped into a rigid element with a mass M2 and a moment of inertia J2 relative to COM. The distance between the COM and L4 vertebrae of the upper body is l2. The L4 vertebra itself is at a distance l12 relative to the seat pivot O. The subjects can apply a control torque uh for the L4 vertebrae and possess an intrinsic lumbar stiffness kh and damping coefficients ch. In addition to the torque disturbance u, a virtual stiffness kr and a virtual damping cr for the pivot point O are applied via feedback. The sum of these torques produces the total robot torque ur for the pivot point O, where ur=ukrα1crα˙1.

Table 1

Description of model parameters

NotationDescription
α1Angle of the lower body from vertical
α2Angle of the upper body from vertical
M1Mass of the subject's lower body below the fourth lumbar vertebrae (L4) and the seat
J1Moment of inertia of the subject's lower body and the seat about its center of mass
l1Distance between the pivot point and M1center of mass
M2Mass of the subject's upper body above L4
J2Moment of inertia of the subject's upper body about its center of mass
l2Distance between L4 and M2 center of mass
l12Distance between the pivot and L4
uhHuman control torque about L4
khIntrinsic rotational stiffness about L4
chIntrinsic rotational damping about L4
uPerturbation torque about the pivot
krRobot stiffness about the pivot
crRobot damping about the pivot
urTotal robot torque about the pivot point, i.e., ur=ukrα1crα˙1
NotationDescription
α1Angle of the lower body from vertical
α2Angle of the upper body from vertical
M1Mass of the subject's lower body below the fourth lumbar vertebrae (L4) and the seat
J1Moment of inertia of the subject's lower body and the seat about its center of mass
l1Distance between the pivot point and M1center of mass
M2Mass of the subject's upper body above L4
J2Moment of inertia of the subject's upper body about its center of mass
l2Distance between L4 and M2 center of mass
l12Distance between the pivot and L4
uhHuman control torque about L4
khIntrinsic rotational stiffness about L4
chIntrinsic rotational damping about L4
uPerturbation torque about the pivot
krRobot stiffness about the pivot
crRobot damping about the pivot
urTotal robot torque about the pivot point, i.e., ur=ukrα1crα˙1

The dynamical model of physical human–robot interaction system was formulated as a closed-loop feedback control system. The block diagram of the closed-loop system is shown in Fig. 3. The plant P represents the linearized mechanical dynamics with respect to the upright equilibrium point of the system in Fig. 2 [25]. The kinematic vector [α1α˙1α2α˙2]T represents the states x of the plant P. Active human motor control was modeled as a gain feedback control law, uh=K1α1K2α˙1K3α2K4α˙2=Kx (Fig. 3), where the kinematic vector [α1α2]T represents the output measurement in the seated balance test.

Fig. 3
Block diagram of the seated balance experiment. The measured output is y=[α1 α2]T. Active human motor control is a gain feedback control law, uh=−K1α1−K2α˙1−K3α2−K4α˙2=−Kx.
Fig. 3
Block diagram of the seated balance experiment. The measured output is y=[α1 α2]T. Active human motor control is a gain feedback control law, uh=−K1α1−K2α˙1−K3α2−K4α˙2=−Kx.
Close modal
Certain model parameters were fixed, while others were estimated from the measured input–output data. The values for M1 and M2 for each subject were determined from the anthropometric tables (e.g., Ref. [26]). l1, l2, and l12 are estimated from the anthropometric data, after which stringent bounds were applied around these estimates to refine them further. cr and kr were determined before the test to standardize the difficulty level of the test across subjects [22]. ch and kh were set as average parameters fitted in a similar study [3]. Therefore, our preliminary estimation parameters for each data are
(3)
The preliminary estimation used nonlinear weighted least-squares optimization to solve
(4)

where ŷ is the output of the model using ρ, ye is the experimental measurement (α1,α2), and Ymax is a diagonal matrix with the maximum absolute value of ye (used for normalization). Twenty initial points selected by a Latin hypercube method were used for estimating ρ.

2.4 Inverse Linear Quadratic Regulator.

The LQR problem has both discrete-time and continuous-time formulations. For convenience, we present here the discrete-time LQR for linear time invariant (LTI) systems. For the discrete-time LTI system
(5)
The LQR problem minimizes the cost function
(6)
where Q0 (positive-semidefinite) and R0 (positive-definite) are symmetric weighting matrices for the state x and the control u, respectively. Assuming that S=0, the optimal feedback control is
(7)
where
(8)
and P is the unique positive-semidefinite solution to the discrete-time algebraic Riccati equation (ARE)
(9)

Our ILQR problem is to estimate the weighting matrices of LQR using input–output measurements of the seated balance test. Here, the estimated weight matrices are optimal where the information of the experimental data is used as much as possible. We will solve the general case of this ILQR problem when both the weighting matricesQ and R are unknown.

The objective can be stated as follows.

Problem 1. Given the experimental data, fit a model to get model parameters including an initial value of the control gain K. Then, given the experimental posture dataset, {ye}RN×yn, apply the gradient-based, least-squares minimization (ILQR) with the cost function y(θ)yeF2 to estimate the weighting matrices Q̂ and R̂, where F2 is the Frobenius norm, and θ is the upper triangular elements of Q and R.

The previous formulation of the ILQR [15] used a gain penalty where the objective was minimizing K(θ)KeF2. In this study, we opt to use an output penalty by minimizing y(θ)yeF2 in order to achieve better fitting between the model with LQR optimal feedback and the experimental measurements.

Our ILQR problem with output penalty in the discrete-time case is
(10)
where ye=[α1α2]T with N observations (y,yeR2×N). We can rewrite the minimization as
(11)

2.5 Gradient-Descent-Based Solution.

The gradient-descent method searches for a local minimum for the cost in Eq. (9). To derive a gradient-descent law in concise form, we first vectorize y(θ)and ye using the row-wise vectorization operator vecr(·).
where vecr(·) converts an arbitrary matrix ACm×n into a column vector by stacking the rows of the matrix A such that
Now, we define eyv(θ)yev. The cost becomes eTe=yv(θ)yev2=y(θ)yeF2 where is the L2-norm. Our gradient-descent law for the element θi of θ is
(12)
where λ>0 is a learning rate that controls the convergence rate of the algorithm. For concise notation, we use
Next, we need to analytically derive the derivative of the error
(13)
The closed-loop system (Fig. 3) can be formulated into a discrete-time LTI state-space model since all the subsystems are linear and rational-ordered
(14)

with xkR4,ukR,ykR2. wkN(0,Σ)R2 is white and noise uncorrelated in time. ρ0 is the true model parameter vector in Eq. (4).

The system is observed during an experiment over the time indices kK1,,N with a sampling time T such that tk=kT. The input sequence is u[u1,,uN1]T. Note that the system output yk can be determined at an arbitrary time index k1 when the input sequence and the initial state x1 are known. The complete solution for the discrete-time state-space system given in Eq. (14) is
We can write this solution in a matrix form
(15)

where yn=[α1α2]T. We now have a nonrecursive solution yv for all time steps k1 given U. Note that the first element in U is x1.

The derivative of the error in Eq. (13) becomes
(16)
The derivative of G with respect to θi can be calculated from Eq. (15). Note that B and C are constants in θi
(17)
Let the plant system matrices be Ap,Bp,Cp, and Bp=[b1,b2] (b1 is related to ur and b2 is related to uh). Ap and Bp can be found in Ref. [3] and Cp=I4. Then, from Fig. 3, uh=Kx, and this makes the closed-loop matrices A=Apb2K, and B=b1. Therefore
and
(18)
To compute P, we note that the stabilizing solution P for the discrete-time ARE is analytic in A,B,M (where Q=MTM0). Thus, it can be differentiated implicitly with respect to θ. The derivative of the discrete-time ARE from Eq. (9) with respect to θi is
(19)
The derivative equation obtained above can be formulated in the Sylvester equation
(20)

By solving this Sylvester equation, we get P. Then, we determine e for all elements i of θi(t). This is a directional derivative that minimizes the cost in Eq. (11).

We formulate the gradient-descent law in discrete-time form as
(21)

which is iterated a preset number of times. We introduced a projection rule for θ to maintain Q̂0 and R̂0. If the update of θ compromise these conditions, the rule adjusts θ to the nearest matrix complying with them.

The starting point of θ holds significant importance as it greatly influences the solutions obtained by the gradient-descent algorithm. Similar to our previous research [15], we leverage the fact that there always exists an exact solution to the ILQR problem when S0 [14] to obtain a good initial θ. In our prior studies [15], we confirmed that solving the following set of linear matrix inequalities for Qs, Rs, and Ss results in an almost optimal approximation for the starting point when S=0:
(22)
Complete details of the algorithm are in the Supplemental Material on the ASME Digital Collection. For a thorough comparison of the two ILQR methods, we also present the algorithm of gain-penalty ILQR.

2.6 Primary Outcomes and Comparisons.

The first primary outcome of interest is the human control intent. However, clear information about the intent may not be readily apparent from the estimated weighting matrix Q̂ itself. To address this, we applied a similarity transformation to Q̂ and the state x to obtain a diagonal matrix Q˜ which provides a clear representation of the intent. The transformed state x˜ is a linear combination of the elements of x. Let V be an orthogonal matrix whose columns are the eigenvectors of Q, and let Λ be a square diagonal matrix whose diagonal elements are the eigenvalues corresponding to each eigenvector. Then, to satisfy xTQ̂x=x˜TQ˜x˜, the transformation is given byQ˜=Λ and x˜=VTx. The eigenvector corresponding to the largest eigenvalue represents the most dominant linear combination of body angles and velocities during the task. In other words, it represents the most dominant human motor control intent.

The second primary outcome is the goodness of fit of the simulated model data to the experimental data, which we quantified using the variance accounted for (VAF) [27]
(23)

A higher VAF value indicates a better fit, with 100% indicating a perfect match between the estimated model output α̂(k,θ) and the measured output αe(k).

We set a VAF threshold of 50% to remove any case or subject from further analysis. This threshold was selected to balance the tradeoff between how well the model describes the data and how many cases/subjects would be excluded from the analysis. Although this is not the topic of this article, we like to elaborate for future endeavors. Most balance biomechanics studies model the body as a single degree-of-freedom (DoF) with VAF value close to our VAFα1 [27,28]. In our group, we started with a two-degree-of-freedom model without reporting the VAF of a fit measure [3]. In general, it is expected that VAFα1>VAFα2 since we demonstrated that the reliability measures of α1 were better than α2 [22]. One reason for the low VAFα2 is that this DoF was not perturbed; therefore, its response was more confounded by other physiological or nonlinear control factors, such as muscle co-activation or gain scheduling.

To investigate the differences between the two ILQR methods (output penalty versus gain penalty), we employed repeated-measures multivariate analysis of variance (MANOVA) [29]. Details about MANOVA's procedures and statistical calculations can be found in Ref. [30].This allowed us to compare the multivariable weights in dominant V(wα1,wα˙1,wα2,wα˙2) as well as the goodness of fit (VAFα1,VAFα2) between the two methods treating them as the repeated measure. In cases where MANOVA showed a significant difference, a univariate repeated-measures ANOVA was performed. In our study, we opted for a multivariate repeated-measures ANOVA instead of a paired t-test due to the nonscalar nature of the variables under consideration. Repeated-measures ANOVA is typically used when multiple measurements are taken on the same subjects across various conditions or time points. While most literature studies apply repeated-measures ANOVA when running the same method at different time points on the same subject, here, the repeated measures are the different methods applied to the same subject/trial. Statistical significance was set at p ≤ 0.05. We conducted this statistical analysis using spss version 26 (IBM, Armonk, NY).

For ILQR, computations of G and G in Eqs. (15) and (17) were to be done several times per iteration, which causes high computational loads. Therefore, we decided to downsample the data from 30,000 to 300 samples per trial. The VAF difference before and after downsampling was up to 5%, indicating no significant difference in the fitting.

3 Comparative Results and Discussion

We analyzed a total of 90 cases (30 subjects × 3 visits/subject) for this study. Of these, two subjects (six cases) were excluded due to invalid height or weight records. After conducting preliminary estimation of the model parameters ρ and applying ILQR algorithm, 33 cases (19 subjects) remained for further analysis. We opted to draw meaningful insights from subjects that align well with the model's descriptions. Essentially, including subjects with poor fit could potentially obscure valuable observations obtained from subjects with good fit. We recognize that the inability to achieve reasonable goodness of fit in certain subjects represents a limitation of the current model. We can speculate on reasons why the model fails to explain data from all subjects. One plausible explanation is being trapped in a local minimum, as depicted in Fig. 4 where the weights in the initial point show minimal change from the final solution. Thus, we acknowledge that the current inverse LQR model exhibits limited generalizability across subjects. The analysis results for all subjects excluding those with invalid records can be found in the Supplemental Material on the ASME Digital Collection.

Fig. 4
Weights of the kinematic variables in the dominant V (eigenvector). No effect for the methods on the multivariate weight suggests that the intent is similar in the two methods (output-penalty ILQR and gain-penalty ILQR) (repeated-measures MANOVA p = 0.38 for the multivariate weights). Additionally, no significant effect for methods (output-penalty ILQR, gain-penalty ILQR, and starting Qs) (repeated-measures MANOVA p = 0.26 for the multivariate weights). The methods had no effect on each weight (repeated-measures univariate ANOVA p > 0.06).
Fig. 4
Weights of the kinematic variables in the dominant V (eigenvector). No effect for the methods on the multivariate weight suggests that the intent is similar in the two methods (output-penalty ILQR and gain-penalty ILQR) (repeated-measures MANOVA p = 0.38 for the multivariate weights). Additionally, no significant effect for methods (output-penalty ILQR, gain-penalty ILQR, and starting Qs) (repeated-measures MANOVA p = 0.26 for the multivariate weights). The methods had no effect on each weight (repeated-measures univariate ANOVA p > 0.06).
Close modal

The output-penalty ILQR yielded an average VAF of 88.31% for α1 and 66.92% for α2, while the gain-penalty ILQR resulted in lower values of 87.40% for α1 and 61.54% for α2. (For reference, the VAF for Qs was 85.89 and 56.47, respectively.) Our model does not account for physiological nonlinearities and other unmodeled dynamics. Given that the VAF of α1 is approximately 88%, the model may have reached its maximum capacity to describe the data, indicating limited potential for further improvement. Van Drunen et al. [27] implemented various biomechanical models for a 1DoF trunk task, reporting a maximum VAF of 89.5% in their relax task. Our VAF for α2 showed an improvement four times greater than that of α1, suggesting that there is still considerable room for enhancing the fit of α2.

Regarding the first primary outcome, which involved comparing the weights in dominant V, the MANOVA analysis showed no effect for the method (p = 0.38) (Fig. 4). This suggests that there is no significant difference in the inferred (dominant) intent between the two ILQR methods. To investigate this further, we included the dominant V from the starting point of the ILQR algorithm (Qs). By comparing among three methods (two ILQR methods and Qs), repeated-measures MANOVA showed no significant effect for the method (p = 0.26) (Fig. 4). This indicates that the gradient-descent formulation of ILQR renders the inferred intent dependent on the starting point of the algorithm regardless of the cost function. A detailed example of the first primary outcome can be found in the Supplemental Material on the ASME Digital Collection. We observed in Fig. 4 that the weights (wα1, wα˙1, wα˙2) exhibited much less variation compared to wα2 which can also take on negative values. Consequently, we hypothesized that wα2 is associated with different balance strategies within healthy subjects, and further analysis of this will be addressed in future studies.

In contrast, the second primary outcome, which involved the VAF, showed a significant effect for the method (p = 0.004) in the MANOVA analysis. A posthoc univariate ANOVA on VAFα1 andVAFα2 was also significant (p = 0.007 and 0.001, respectively) (Fig. 5). This confirms that the goodness of fit is better in the output-penalty ILQR, supporting our initial hypothesis. This outcome is expected since the cost function in the output-penalty ILQR directly minimizes the output fit error.

Fig. 5
VAF of α1 and α2. Significant difference between the methods (repeated-measures MANOVA p = 0.004) as well as for each univariate VAF (*p = 0.007 and **p = 0.001).
Fig. 5
VAF of α1 and α2. Significant difference between the methods (repeated-measures MANOVA p = 0.004) as well as for each univariate VAF (*p = 0.007 and **p = 0.001).
Close modal

Notably, the fitting of α1 was significantly better than that of α2 within each method (Fig. 5). This result is consistent with a reliability study of this experiment [22], which showed higher reliability measures for α1 than for α2. A potential explanation for this difference lies in the fact that α1 is an actuated DoF with a known perturbation torque. In contrast, α2 is not actuated, making its response more susceptible to unknown neural system noise. Moreover, α2 exhibits smaller response amplitudes than α1, resulting in a lower signal-to-noise ratio.

We acknowledge the capability of subjects to learn environmental dynamics. However, we decided on modeling through pure feedback pathways, cautious of the potential for overfitting with the inclusion of feedforward paths [31]. Furthermore, the learning of environmental dynamics by subjects does not represent clinically significant differences [32].

The gain-penalty ILQR uses the output y to estimate ρ (including K) initially, then optimizes Q and R through the estimated K, i.e., K̂ [15]. In the processes of estimating K first and obtaining Q and R later from K̂, the parameter estimation error in K̂ can propagate. On the other hand, the output-penalty ILQR, we mainly use y to obtain Q and R minimizing the error propagation, which results in a better goodness of fit in terms of VAF.

Despite these significant findings, there are some limitations to this study. First, unfortunately, we did not collect electromyography data during the test, which could have provided valuable insights into muscle activations. Novice strategies might have relied more on muscle co-contractions, as previous research has shown that performance declines with increasing muscle co-contraction [33]. Future studies should investigate if our observed kinematic patterns correspond to distinct electromyography patterns to test this hypothesis. Additionally, some cases were excluded due to poor fitting, particularly with regards to the upper body angle. Although we improved the fitting by introducing the weight Ymax into the preliminary estimation and adopting the output-penalty ILQR, there is still room for improvement in model fitting. Future research should explore methods to enhance the accuracy of the model fitting process. Furthermore, our current model includes the physical human–robot interaction (2DoF pendulum) and linear feedback of positions and velocities, reflecting the proprioceptive feedback pathway. Muscle dynamics and sensory delay were not explicitly modeled. However, in our previous research [34], where we incorporated muscle dynamics and sensory delay, we found that the inferred intent was predominantly influenced by kinematic states rather than delay or muscle dynamics. Therefore, our current model is considered a good approximation.

4 Conclusion

An output-penalty ILQR was derived in this paper. We used the experimental data directly in the cost function, unlike a previous study with gain-penalty ILQR [15]. Our method can be applied not only to the estimation of human control intent but also to the reverse engineering of black box controllers. By employing the output-penalty ILQR method on the seated balance experimental data, we observed a meaningful enhancement in goodness of fit. Specifically, the average VAF improvement for α1 and α2 was 1.04% and 8.74%, respectively, representing an improvement ratio over the gain-penalty ILQR approach. There was no significant difference in the inferred (dominant) intent between the two methods. This was due to the starting point Qsof the algorithm. This is a step closer to investigating control intent differences between healthy subjects and subjects with altered motor control, e.g., low back pain patients.

Acknowledgment

The contents are solely the responsibility of the authors and do not necessarily represent the official views of NCCIH.

Funding Data

  • National Center for Complementary and Integrative Health (NCCIH) at the National Institutes of Health (Grant No. U19AT006057; Funder ID: 10.13039/100008460).

  • National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. RS-2023-00221762; Funder ID: 10.13039/501100003725).

Data Availability Statement

The datasets generated and supporting the findings of this article are obtainable from the corresponding author upon reasonable request.

References

1.
Peng
,
G. C.
,
Hain
,
T. C.
, and
Peterson
,
B. W.
,
1996
, “
A Dynamical Model for Reflex Activated Head Movements in the Horizontal Plane
,”
Biol. Cybern.
,
75
(
4
), pp.
309
319
.10.1007/s004220050297
2.
Goodworth
,
A. D.
, and
Peterka
,
R. J.
,
2009
, “
Contribution of Sensorimotor Integration to Spinal Stabilization in Humans
,”
J. Neurophysiol.
,
102
(
1
), pp.
496
512
.10.1152/jn.00118.2009
3.
Reeves
,
N. P.
,
Cholewicki
,
J.
, and
Narendra
,
K. S.
,
2009
, “
Effects of Reflex Delays on Postural Control During Unstable Seated Balance
,”
J. Biomech.
,
42
(
2
), pp.
164
170
.10.1016/j.jbiomech.2008.10.016
4.
Ghasemi
,
Z.
,
Kim
,
C.-S.
,
Ginsberg
,
E.
,
Gupta
,
A.
, and
Hahn
,
J.-O.
,
2017
, “
Model-Based Blind System Identification Approach to Estimation of Central Aortic Blood Pressure Waveform From Noninvasive Diametric Circulatory Signals
,”
ASME J. Dyn. Syst., Meas., Control
,
139
(
6
), p.
061003
.10.1115/1.4035451
5.
Zhang
,
H.
,
Feng
,
T.
,
Yang
,
G.-H.
, and
Liang
,
H.
,
2014
, “
Distributed Cooperative Optimal Control for Multiagent Systems on Directed Graphs: An Inverse Optimal Approach
,”
IEEE Trans. Cybern.
,
45
(
7
), pp.
1315
1326
.10.1109/TCYB.2014.2350511
6.
Mainprice
,
J.
,
Hayne
,
R.
, and
Berenson
,
D.
,
2016
, “
Goal Set Inverse Optimal Control and Iterative Replanning for Predicting Human Reaching Motions in Shared Workspaces
,”
IEEE Trans. Rob.
,
32
(
4
), pp.
897
908
.10.1109/TRO.2016.2581216
7.
Haddad
,
W. M.
, and
Jin
,
X.
,
2020
, “
Universal Feedback Controllers and Inverse Optimality for Nonlinear Stochastic Systems
,”
ASME J. Dyn. Syst., Meas., Control
,
142
(
2
), p.
021003
.10.1115/1.4045153
8.
Mombaur
,
K.
,
Truong
,
A.
, and
Laumond
,
J.-P.
,
2010
, “
From Human to Humanoid Locomotion—An Inverse Optimal Control Approach
,”
Auton. Robots
,
28
(
3
), pp.
369
383
.10.1007/s10514-009-9170-7
9.
Arechavaleta
,
G.
,
Laumond
,
J.-P.
,
Hicheur
,
H.
, and
Berthoz
,
A.
,
2008
, “
An Optimality Principle Governing Human Walking
,”
IEEE Trans. Rob.
,
24
(
1
), pp.
5
14
.10.1109/TRO.2008.915449
10.
Mainprice
,
J.
,
Hayne
,
R.
, and
Berenson
,
D.
,
2015
, “
Predicting Human Reaching Motion in Collaborative Tasks Using Inverse Optimal Control and Iterative Re-Planning
,” 2015 IEEE International Conference on Robotics and Automation (
ICRA
), Seattle, WA, May 26–30, pp.
885
892
.10.1109/ICRA.2015.7139282
11.
Nguyen
,
V. Q.
,
Johnson
,
R. T.
,
Sup
,
F. C.
, and
Umberger
,
B. R.
,
2019
, “
Bilevel Optimization for Cost Function Determination in Dynamic Simulation of Human Gait
,”
IEEE Trans. Neural Syst. Rehabil. Eng.
,
27
(
7
), pp.
1426
1435
.10.1109/TNSRE.2019.2922942
12.
Antsaklis
,
P. J.
, and
Michel
,
A. N.
,
2006
,
Linear Systems
,
Springer Science & Business Media
,
Berlin, Germany
.
13.
Fujii
,
T.
, and
Narazaki
,
M.
,
1984
, “
A Complete Optimality Condition in the Inverse Problem of Optimal Control
,”
SIAM J. Control Optim.
,
22
(
2
), pp.
327
341
.10.1137/0322022
14.
Kreindler
,
E.
, and
Jameson
,
A.
,
1972
, “
Optimality of Linear Control Systems
,”
IEEE Trans. Autom. Control
,
17
(
3
), pp.
349
351
.10.1109/TAC.1972.1099985
15.
Priess
,
M. C.
,
Conway
,
R.
,
Choi
,
J.
,
Popovich
,
J. M.
, and
Radcliffe
,
C.
,
2015
, “
Solutions to the Inverse LQR Problem With Application to Biological Systems Analysis
,”
IEEE Trans. Control Syst. Technol.
,
23
(
2
), pp.
770
777
.10.1109/TCST.2014.2343935
16.
Monfort
,
M.
,
Liu
,
A.
, and
Ziebart
,
B.
,
2015
, “
Intent Prediction and Trajectory Forecasting Via Predictive Inverse Linear-Quadratic Regulation
,”
29th AAAI Conference on Artificial Intelligence
,” Austin, TX, Jan. 25–30, pp.
3672
3678
.10.1609/aaai.v29i1.9674
17.
Priess
,
M. C.
,
Choi
,
J.
, and
Radcliffe
,
C.
,
2013
, “
Determining Human Control Intent Using Inverse LQR Solutions
,”
ASME
Paper No. DSCC2013-3874.10.1115/DSCC2013-3874
18.
El-Hussieny
,
H.
,
Asker
,
A.
, and
Salah
,
O.
,
2017
, “
Learning the Sit-to-Stand Human Behavior: An Inverse Optimal Control Approach
,” 2017 13th International Computer Engineering Conference (
ICENCO
), Cairo, Egypt, Dec. 27–28, pp.
112
117
.10.1109/ICENCO.2017.8289773
19.
El-Hussieny
,
H.
,
Abouelsoud
,
A.
,
Assal
,
S. F.
, and
Megahed
,
S. M.
,
2016
, “
Adaptive Learning of Human Motor Behaviors: An Evolving Inverse Optimal Control Approach
,”
Eng. Appl. Artif. Intell.
,
50
, pp.
115
124
.10.1016/j.engappai.2016.01.024
20.
Ljung
,
L.
,
1977
, “
Analysis of Recursive Stochastic Algorithms
,”
IEEE Trans. Autom. Control
,
22
(
4
), pp.
551
575
.10.1109/TAC.1977.1101561
21.
Pan
,
H.
,
Niu
,
X.
,
Li
,
R.
,
Dou
,
Y.
, and
Jiang
,
H.
,
2020
, “
Annealed Gradient Descent for Deep Learning
,”
Neurocomputing
,
380
, pp.
201
211
.10.1016/j.neucom.2019.11.021
22.
Ramadan
,
A.
,
Cholewicki
,
J.
,
Radcliffe
,
C. J.
,
Popovich
,
J. M.
, Jr.
,
Reeves
,
N. P.
, and
Choi
,
J.
,
2017
, “
Reliability of Assessing Postural Control During Seated Balancing Using a Physical Human-Robot Interaction
,”
J. Biomech.
,
64
, pp.
198
205
.10.1016/j.jbiomech.2017.09.036
23.
Ramadan
,
A.
,
Choi
,
J.
,
Cholewicki
,
J.
,
Reeves
,
N. P.
,
Popovich
,
J. M.
, and
Radcliffe
,
C. J.
,
2019
, “
Feasibility of Incorporating Test-Retest Reliability and Model Diversity in Identification of Key Neuromuscular Pathways During Head Position Tracking
,”
IEEE Trans. Neural Syst. Rehabil. Eng.
,
27
(
2
), pp.
275
282
.10.1109/TNSRE.2019.2891525
24.
Reeves
,
N. P.
,
Sal y Rosas Celi
,
V. G.
,
Ramadan
,
A.
,
Popovich
,
J. M.
, Jr.
,
Radcliffe
,
C. J.
,
Choi
,
J.
, and
Cholewicki
,
J.
,
2020
, “
Quantifying Trunk Neuromuscular Control Using Seated Balancing and Stability Threshold
,”
J. Biomech.
,
112
, p.
110038
.10.1016/j.jbiomech.2020.110038
25.
Cody Priess
,
M.
,
Choi
,
J.
,
Radcliffe
,
C.
,
Popovich
,
J. M.
, Jr.
,
Cholewicki
,
J.
, and
Peter Reeves
,
N.
,
2015
, “
Time-Domain Optimal Experimental Design in Human Seated Postural Control Testing
,”
ASME J. Dyn. Syst., Meas., Control
,
137
(
5
), p.
054501
.10.1115/1.4028850
26.
Zatsiorsky
,
V. M.
,
2002
,
Kinetics of Human Motion
,
Human Kinetics
,
Champaign, IL
.
27.
Van Drunen
,
P.
,
Maaswinkel
,
E.
,
Van der Helm
,
F.
,
Van Dieën
,
J.
, and
Happee
,
R.
,
2013
, “
Identifying Intrinsic and Reflexive Contributions to Low-Back Stabilization
,”
J. Biomech.
,
46
(
8
), pp.
1440
1446
.10.1016/j.jbiomech.2013.03.007
28.
Peterka
,
R. J.
,
Murchison
,
C. F.
,
Parrington
,
L.
,
Fino
,
P. C.
, and
King
,
L. A.
,
2018
, “
Implementation of a Central Sensorimotor Integration Test for Characterization of Human Balance Control During Stance
,”
Front. Neurol.
,
9
, p.
1045
.10.3389/fneur.2018.01045
29.
Girden
,
E. R.
,
1992
,
ANOVA: Repeated Measures
,
Sage
,
Thousand Oaks, CA
.
30.
Huang
,
F. L.
,
2020
, “
MANOVA: A Procedure Whose Time Has Passed?
,”
Gifted Child Q.
,
64
(
1
), pp.
56
60
.10.1177/0016986219887200
31.
Ramadan
,
A.
,
Boss
,
C.
,
Choi
,
J.
,
Peter Reeves
,
N.
,
Cholewicki
,
J.
,
Popovich
,
J. M.
, Jr.
, and
Radcliffe
,
C. J.
,
2018
, “
Selecting Sensitive Parameter Subsets in Dynamical Models With Application to Biomechanical System Identification
,”
ASME J. Biomech. Eng.
,
140
(
7
), p.
074503
.10.1115/1.4039677
32.
Reeves
,
N. P.
,
Narendra
,
K. S.
, and
Cholewicki
,
J.
,
2007
, “
Spine Stability: The Six Blind Men and the Elephant
,”
Clin. Biomech.
,
22
(
3
), pp.
266
274
.10.1016/j.clinbiomech.2006.11.011
33.
Reeves
,
N. P.
,
Everding
,
V. Q.
,
Cholewicki
,
J.
, and
Morrisette
,
D. C.
,
2006
, “
The Effects of Trunk Stiffness on Postural Control During Unstable Seated Balance
,”
Exp. Brain Res.
,
174
(
4
), pp.
694
700
.10.1007/s00221-006-0516-5
34.
Ramadan
,
A.
,
Choi
,
J.
,
Radcliffe
,
C. J.
,
Popovich
,
J. M.
, and
Reeves
,
N. P.
,
2019
, “
Inferring Control Intent During Seated Balance Using Inverse Model Predictive Control
,”
IEEE Rob. Autom. Lett.
,
4
(
2
), pp.
224
230
.10.1109/LRA.2018.2886407