Model calibration and validation are two activities in system model development, and both of them make use of test data. Limited testing budget creates the challenge of test resource allocation, i.e., how to optimize the number of calibration and validation tests to be conducted. Test resource allocation is conducted before any actual test is performed, and therefore needs to use synthetic data. This paper develops a test resource allocation methodology to make the system response prediction “robust” to test outcome, i.e., insensitive to the variability in test outcome; therefore, consistent system response predictions can be achieved under different test outcomes. This paper analyzes the uncertainty sources in the generation of synthetic data regarding different test conditions, and concludes that the robustness objective can be achieved if the contribution of model parameter uncertainty in the synthetic data can be maximized. Global sensitivity analysis (Sobol’ index) is used to assess this contribution, and to formulate an optimization problem to achieve the desired consistent system response prediction. A simulated annealing algorithm is applied to solve this optimization problem. The proposed method is suitable either when only model calibration tests are considered or when both calibration and validation tests are considered. Two numerical examples are provided to demonstrate the proposed approach.

## Introduction

In engineering applications, it is often required to estimate the system response under untested conditions using available computational models and test data at different conditions. The computational model aims to describe the physics of the system and can be denoted as $Y=F(X;\theta )$, where $Y$ is the system response, $X$ is the set of model inputs, and $\theta $ is the set of model parameters. The uncertainty in an input $X$ can be described by a probability distribution $\pi X(x)$. In the actual test, in some cases, we can control an input at a nominal value but the control is not perfect; thus, $\pi X(x)$ characterizes this imperfect control. In some other cases, an input such as outdoor temperature cannot be controlled but measured; thus, $\pi X(x)$ characterizes the natural variability in $X$. The model parameters $\theta $ have fixed but unknown values in all tests on the same specimen. The uncertainty regarding the values of $\theta $ is epistemic uncertainty due to lack of information, which can be reduced using test data. (In some problems, the model parameters $\theta $ may not be physical quantities but simply artifacts of modeling, in which case the concept of true value may not be applicable; such cases are not considered in this paper. Also, in some problems, the model parameters could be input-dependent; this paper does not consider such cases.)

Two important questions in the system response prediction are: (1) how to quantify and reduce the uncertainty in $\theta $ and (2) how to validate the agreement of the computational model to the true physics or quantify their difference. These two questions are resolved by model calibration and model validation, respectively. Usually, model calibration is conducted first to quantify the values of $\theta $ or reduce the uncertainty about their values, and then model validation follows. Various approaches to model calibration and validation have been studied in the literature. Consider an example of model calibration using Bayesian inference. While some researchers directly use the computational model $Y=F(X;\theta )$ and calibrate $\theta $, others [1] use a model discrepancy term $\delta (X)$ to correct the computational model and calibrate both $\theta $ and $\delta (X)$. Consider another example regarding the use of test data. Some researches treat all the data as calibration data and use the calibrated model parameters in predicting the system response [2,3]; others integrate the results of model calibration and model validation (each done with different sets of data) in predicting the system response [4–6].

No matter what approaches are pursued, model calibration and validation always require test data. Due to the variability in test outcomes, two sets of test data of the same size may lead to two distinct system response predictions (after calibration and/or validation) even if the same computational model and the same framework of model calibration/validation are used. Here, “test outcome” is defined as the value of test data, i.e., the measurements of test inputs and outputs. The variability in the test outcome is due to the following reasons: (1) the input is controlled at a nominal value but the control is imperfect; (2) the input has natural variability, which means that the input cannot be controlled; and (3) there is measurement error in the input and output data.

If a single data point is used in model calibration/validation, the calibration/validation result will be affected by the value of this data point significantly. However, as more data points are applied, the calibration/validation result will converge; thus, the consequent system response prediction will also converge. Thus, as the number of tests increases, the model prediction uncertainty becomes less and less sensitive to variability in the test outcomes. This raises the following questions, when test budget is limited: (1) is it possible to organize the test campaign to make the system response prediction robust to variability in test outcomes and (2) how many tests of each type are necessary to achieve the robustness objective. Note that in this paper, the term “test type” refers to two attributes: (1) whether the test data are for calibration or validation and (2) the physical quantity measured in the test. For example, if three quantities are measured in tests and all data are used for calibration, we have $3\xd71=3$ types of test; but if part of the data is used for calibration and the remaining data are used for validation, then we have 3 $\xd7$ 2 = 6 types of tests. The focus of this paper is to develop an optimization approach to answer this question, assuming the computational model and the framework of model calibration/validation are given. The design variables of this optimization are the numbers of each type of test, denoted as $N\u2208\mathbb{N}q$ if $q$ types of tests are available; the objective function and constraints will be discussed later. Note that (1) this optimization needs to be solved before any actual test is conducted [4] and (2) this optimization needs to consider test outcome uncertainty due to which the subsequent system response prediction is also uncertain.

Several approaches for test resource allocation have been studied in the literature [4,7–11], and the main difference among these approaches is the choice of the objective function. Note that model calibration aims to reduce the uncertainty in model parameters, and thus reduce the uncertainty in the subsequent system response prediction. Thus, in the case that only model calibration is considered in system response prediction, generally the objective of test resource allocation optimization is to minimize the system response prediction uncertainty subject to limited budget. Several quantities have been used to represent system response prediction uncertainty, and the first one is variance. Sankararaman et al. [4] minimized $E(V(Y))$ where $V(Y)$ is the variance of the system response prediction $Y$ at given numbers of each type of test, and $E(\u22c5)$ denotes the average of $V(Y)$ over different synthetic data sets. Similarly, Vanlier et al. [8] defined the variance reduction of $Y$ via model calibration as $1\u2212E(\sigma new2/\sigma old2)$ and maximized it, where $\sigma new2$ is the variance of the system response prediction using the posterior distribution and $\sigma old2$ is the variance of the system response prediction using the prior distribution. Entropy measures have also been used to represent system response prediction uncertainty. In Ref. [9], the authors maximized the relative entropy (Kullback–Leibler divergence) from the system response prediction $\pi \u2032(y)$ using the prior distribution and the system response prediction $\pi \u2033(y)$ using the posterior distribution; while in Refs. [10] and [11], the authors maximized the mutual information, i.e., the change of entropy from $\pi \u2032(y)$ to $\pi \u2033(y)$.

The previously mentioned approaches that select only calibration tests to minimize the uncertainty in the system response prediction are not applicable when model validation is also incorporated in the system response prediction. The reason is that model validation may indicate that the calibrated model is not exactly valid; accounting for this result increases the uncertainty in the system response prediction. Thus, the earlier optimization formulations would lead to the conclusion that model validation is not necessary. Mullins et al. [12] proposed a method considering both model calibration and model validation, in which model calibration is via Bayesian inference, and model validation is via a stochastic model reliability metric, i.e., describing model validity through a probability distribution. In this method, the objective regarding model validation tests was to minimize the spread in the family of system response predictions that results from the uncertainty in model validity, denoted as $E{V[E(Y)]}$ where the inner $E(Y)$ is the system response prediction mean at given synthetic data set and given value of model validity, and $V[\u22c5]$ is the average over the distribution of model validity, and the outer $E{\u22c5}$ is the average over the different data sets. The objective regarding model calibration tests is still to minimize the variance of the system response prediction, denoted as $E{E[V(Y)]}$ where $V(Y)$ is the system response prediction variance based on a given synthetic data set and given value of model validity; the inner $E[\u22c5]$ is the average over the distribution of model validity, and the outer $E{\u22c5}$ is the average over different synthetic data sets.

In this paper, the proposed concept of “test resource allocation for system response prediction robustness” means that the system response prediction becomes insensitive to the variability in test outcomes; thus, at the optimal value of the design variables (i.e., number of tests $N\u2208\mathbb{N}q)$, different test outcomes result in consistent system response predictions. This concept and the required objective function will be explained in Sec. 2. The approach is suitable in different situations when only model calibration tests are considered or when both calibration and validation tests are considered. Note that the proposed methodology only selects the number of each type of test; it does not design the actual tests, i.e., select the input for the test. Experimental design is a subsequent step to test selection; we only focus on test selection.

The constraint in the optimization of test resource allocation is generally the budget. Note that the constraint and objective are interchangeable, i.e., the optimization may have two alternative formats: (1) subject to the budget constraint, optimize the design variable $N\u2208\mathbb{N}q$ (the number of each type of test) to reach the most robust system response prediction; or (2) subject to the robustness requirement in the system response prediction, find $N$ to minimize the budget. The proposed concept can be realized with either formulation.

In addition, it is important to note that the data considered in test resource allocation analysis has to be synthetic since it is done before any actual test. The actual physical test data from a test are obtained by: (1) selecting the values of inputs $X$; (2) applying $X$ to the physical test configuration where the model parameters $\theta $ are at their true but unknown values; and (3) recording the input–output data, where both the input and output measurements may be subject to measurement errors. In actual tests where the values of $X$ have been decided, the test outcome uncertainty arises only from experimental variability, including measurement errors. The generation of synthetic data is a simulation of the three steps mentioned earlier, with the physical test configuration replaced by a computational model and the model parameters being unknown. Thus, two additional uncertainty sources are introduced in the synthetic data: (1) uncertainty regarding the value of $\theta $ and (2) model discrepancy, i.e., the difference between the computational model and the actual physics. In a Bayesian framework, the first one can be represented by the prior distribution of $\theta $ based on available knowledge. But no information on the model discrepancy is available before any testing.

In addition, compared to the actual test, the physical meaning of the input distribution $\pi X(x)$ may be changed in generating the synthetic data. As explained at the beginning of Sec. 1, for an actual test, the uncertainty characterized by $\pi X(x)$ is due to the following sources: (1) imperfect control over the true value, (2) natural variability of the input, and (3) measurement errors. In generating the synthetic data, $\pi X(x)$ accounts for the same uncertainty sources in the case that test conditions are known (for example, the nominal values of the inputs are known). But in the case of unknown test condition, $\pi X(x)$ mainly accounts the uncertainty about which experimental conditions will be subsequently selected. For example, if the tester only mentions that the possible nominal value of an input is between 5 and 10, then we may have a uniform distribution $\pi X(x)\u223cU(5,10)$ to represent this uncertainty in the nominal value. In this case, the uncertainty in $\pi X(x)$ is epistemic. In this paper, the proposed method is versatile and able to handle both cases. It is possible for the decision-maker to apply the proposed method before and after knowing the test conditions, and different answers can be obtained due to changed availability of knowledge.

In summary, the objectives of this paper are to: (1) find the optimal number of each type of test such that different data sets result in consistent system response predictions; (2) develop solutions for both formats of the optimization problem; and (3) adapt to different cases when only model calibration tests are considered or when both calibration and validation tests are considered. The rest of this paper is organized as follows: Section 2 proposes the objective in the optimization of robust test allocation. Section 3 analyzes the uncertainty sources in the synthetic data and the use of Sobol’ indices to assess their contributions toward the uncertainty in the system response prediction. Section 4 develops a flexible approach for test resource allocation optimization. Section 5 uses two numerical examples to illustrate the proposed approach.

## Global Sensitivity Analysis of Uncertainty in Synthetic Test Data

### Objective of Robust Test Resource Allocation.

The objective of the proposed test resource allocation optimization can be visually represented as in Fig. 1, which shows the families of the system response prediction probability density functions (PDFs) at different values of the design variables $N$. Within a sub-figure, the variation between the PDFs is caused by the test outcome variability among different data sets. From Figs. 1(a)–1(c), this variation becomes smaller and the system response predictions reveal stronger consistency due to: (1) the decreased variability of mean values $E(Y)$ across the PDFs, meaning that the centroids of the family members are closer; and (2) the decreased variability of the variance $V(Y)$ across the PDFs, meaning that the ranges of values covered by the PDF are similar. In other words, at the value of optimal $N$ in Fig. 1(c), the effects of test outcome uncertainty on $E(Y)$ and $V(Y)$ are small so that consistent system response predictions can be obtained with different sets of test data. Note that this paper is only concerned about the mean value $E(Y)$ and variance $V(Y)$ in the system response prediction, not the exact shape of the PDF in Fig. 1. Here, the “variability” of $E(Y)$ and $V(Y)$ is captured by their variance, i.e., $V(E(Y))$ and $V(V(Y))$ across different data sets.

Therefore, this paper defines the objective for robust test resource allocation as: minimize the contribution of test outcome uncertainty toward the variability (i.e., variance) in the system response prediction mean value $E(Y)$ and the system response prediction variance $V(Y)$.

Global sensitivity analysis (GSA) using Sobol’ indices is a prominent approach [13–15] to quantify the contributions of input uncertainty toward the variance in the output. A brief introduction to Sobol’ indices is given in Sec. 2.2. One challenge is to establish a deterministic function required by the Sobol’ indices computation, in mapping the test outcome uncertainty to the system response prediction uncertainty. This challenge will be analyzed and overcome in Sec. 3.

### Sobol’ Indices.

$SXP$ is a combined measure of the individual contributions of the components of $Xp$ and of the interactions among them.

where $X\u2212p$ is the complementary subset of $Xp$. $SXPT$ is a combined measure of the individual contributions of the components of $Xp$, the interactions among them, and the interactions between $Xp$ and $X\u2212p$.

The direct computation of Sobol’ indices requires double-loop Monte Carlo simulation and thus is expensive. Taking $Sl$ in Eq. (1) as an example, we need: (1) an inner loop $E(Y|Xl)$ to compute the mean value of $Y$ using $n$ random samples of $X\u2212l$ and (2) an outer loop to compute $V(E(Y|Xl))$ by iterating the inner loop $n2$ times at different values of $Xl$. In addition, other $n3$ Monte Carlo simulation iterations are required to compute $V(Y)$. Various algorithms have been developed in the literature to reduce the computational cost [21–25]. Any one of them can be used to compute the Sobol’ indices in this paper. Several illustrative examples on computing Sobol’ indices can be found in Ref. [19].

The Sobol’ index computation requires (1) a deterministic input–output function and (2) the representation of all the inputs by uncorrelated continuous probabilistic distributions. These two requirements need to be achieved before applying Sobol’ indices in the proposed approach for test resource allocation. Section 3 analyzes the uncertainty sources in test outcomes and develops an approach to achieve both requirements.

## Uncertainty Sources in Test Outcomes

Recall that all the data considered in test resource allocation analysis have to be synthetic since the analysis is done before any actual test. The uncertainty in the synthetic data depends on specific test conditions, including: (1) the possible values of inputs $X$; (2) the number of test types; and (3) whether a single test specimen or multiple specimens are used for each type of test.

Regarding the first condition, this paper assumes that a distribution of $X$ is provided by the testing personnel or assumed based on some information. For example, for a single model input $X\u2208X$, we may have $X\u223cU(LX,UX)$ where $LX$ is the lower bound and $UX$ is the upper bound. We can also use other types of distribution such as Gaussian distribution to capture the uncertainty in $X$ if additional information is available.

This section will analyze the uncertainty sources in the synthetic data regarding the second and third conditions; the corresponding deterministic function required by the Sobol’ indices also varies correspondingly. The rest of this section starts from the simplest case of one type of test and single specimen, and subsequently extends it to multiple types of tests and multiple test specimens.

### Single Type of Test and Single Test Specimen.

If only one type of test is available and all tests are conducted on a single specimen, the actual test data is a set of $N$ data points obtained from the same specimen. Figure 2 shows the generation and usage of the synthetic data in this case. As shown in the left part of Fig. 2, to generate a data set of $N$ synthetic data points, four steps should be followed: (1) select and fix the values of $\theta \u2208\mathbb{R}d\theta $, where $d\theta $ is the dimension of model parameters; (2) generate $N$ samples of model inputs $xj\u2208\mathbb{R}dX(j=1toN)$, where $dX$ is the dimension of model inputs; and (3) propagate $xj(j=1\u2009to\u2009N)$ and $\theta $ through the computational model $F(\u22c5)$; and (4) record the model input and output with measurement errors added. The resultant data set contains pairwise data points ${\omega j,zj}(j=1\u2009to\u2009N)$ as

where $ej\u2208\mathbb{R}dX$ is the model input measurement error and $\u03f5j\u2208\mathbb{R}$ is the model output measurement error. If the model input measurement error is ignored, then $\omega j=xj$.

A crucial point in the generation of synthetic data is regarding the model parameters $\theta $. For a single specimen, $\theta $ have true but unknown values, meaning that the uncertainty in $\theta $ is epistemic. Thus, the uncertainty caused by $\theta $ is the uncertainty in selecting the values of $\theta $*before* generating a synthetic data set; once selected, the values of $\theta $ are *fixed* within the synthetic data set. This uncertainty in $\theta $ only exists in the synthetic data; actual tests will fix the value of $\theta $ at their true values.

The four steps mentioned earlier indicate three uncertainty sources in generating a pairwise synthetic data point ${\omega j,zj}$, including:

- (1)
Uncertainty regarding the values of model parameters $\theta $ can be represented by their prior distribution $\pi \u2032(\theta )$ based on available knowledge before conducting any physical test. This uncertainty is epistemic since $\theta $ have unknown but fixed true values.

- (2)
Uncertainty regarding the possible values of inputs $xj$ to be used in the tests. As mentioned earlier, a distribution of $X$ has been provided or assumed. This uncertainty is also epistemic if the values of $X$ are unknown during test selection analysis, but will be decided by the test personnel in actual tests.

- (3)
Uncertainty regarding input measurement errors $ej$ and output measurement errors $\u03f5j$. Usually, measurement error is assumed to have a zero mean Gaussian distribution; thus, $ej\u223cN(0,\Sigma X)$ and $\u03f5j\u223cN(0,\sigma 2)$. The uncertainty in $ej$ and $\u03f5j$ is aleatory if the values of $\Sigma X$ and $\sigma $ are known; but additional epistemic uncertainty regarding $\Sigma X$ and $\sigma $ will be introduced if their values are unknown.

where $\alpha j={xj,ej,\u03f5j}\u2208\mathbb{R}2dX+1$ for $j=1\u2009to\u2009N$ representing the uncertainty sources in generating a single pairwise data point ${\omega j,zj}$, and $N$ is the number of pairwise data points; $G(\u22c5)$ represents the entire process shown in Fig. 2, including both synthetic data generation and model calibration/validation analyses before predicting the system response.

In Eq. (6), the uncertainty in ${\alpha 1,\u2026,\alpha N}$ represents the variability in the actual test outcomes; while the epistemic uncertainty in $\theta $ only exists in the synthetic data, not in actual test data. To minimize the sensitivity of the system response prediction to the variability in the test outcomes, we need to minimize the sensitivity index of ${\alpha 1,\u2026,\alpha N}$ in Eq. (6) so that $E(Y)$ and $V(Y)$ are insensitive to the variability in test outcomes and consistent system response prediction distributions can be achieved under different actual test outcomes. However, this minimization requires the sensitivity index closer to zero while numerical accuracy is always a challenge for small sensitivity indices.

Instead, this paper chooses to maximize the sensitivity index of $\theta $. If that is achieved, the epistemic uncertainty in $\theta $ will be dominant toward the uncertainty in the system response prediction mean $E(Y)$ and the system response prediction variance $V(Y)$ (based on synthetic data). In the system response prediction using actual test data where $\theta $ are fixed at their true values, the most dominant uncertainty contribution to $E(Y)$ and $V(Y)$ will be removed. Therefore, the uncertainty in $E(Y)$ and $V(Y)$ caused by test outcome uncertainty will reduce significantly and consistent system response prediction distributions can be achieved under different actual test outcomes. In sum, the basic idea of the proposed approach is to *maximize the contribution of epistemic uncertainty regarding model parameters in the synthetic data*.

Note that the proposed approach guarantees consistent system response predictions regardless of what the true values of $\theta $ are, since the Sobol’ index is a global sensitivity analysis method and considers the entire distribution of $\theta $.

### Single Type of Test and Multiple Test Specimens.

For a single type of test, multiple test specimens are required if the test is destructive so that each specimen can be used only once. Two examples of destructive tests are fatigue test and tensile strength test. The true value of a model parameter $\theta l\u2208\theta $ for $l=1\u2009to\u2009d\theta $ is fixed for a single specimen, but varies across different specimens. This variability of $\theta $ may be represented by a probability distribution $\pi (\theta l|P\theta l)$ where $P\theta l$ are the distribution parameters of $\theta l$. For example, $P\theta l={\mu ,\sigma}$ if $\theta l$ has a Gaussian distribution $N(\mu ,\sigma 2)$ where $\mu $ is the mean value and $\sigma $ is the standard deviation. In addition, the entire set of distribution parameters for all components of $\theta $ are denoted as $P\theta $ where $P\theta l\u2208P\theta $ for $l=1\u2009to\u2009d\theta $. In this case, $P\theta $ have unknown true values; thus, the uncertainty in $P\theta $ is epistemic; and this uncertainty can be represented by a prior distribution $\pi (P\theta )$ based on available knowledge. Thus, model calibration aims to quantify the uncertainty in $P\theta $, instead of $\theta $. (Note that $\theta $ have both aleatory and epistemic uncertainty, whereas the uncertainty in $P\theta $ is epistemic.)

In the case of single type of test and multiple test specimens, the steps in generation and usage of the synthetic data set of $N$ data points are similar to those in Fig. 2, but the box “model parameters $\theta $ ” should be replaced by “ $P\theta \u2192\theta j$,” where $\theta j$ is the value of $\theta $ generated for the $j$ th specimen (i.e., the $j$ th test). Compared to Fig. 2, the values of $P\theta $ are now selected *before* generating a synthetic data set; once selected, the values of $P\theta $ are *fixed* within the synthetic data set. The values of model parameters $\theta j(j=1\u2009to\u2009N)$ for each of the $N$ specimens are generated from the conditional distribution $\pi (\theta l|P\theta l)$ for $l=1\u2009to\u2009d\theta $.

It seems natural to replace $\theta $ in Eq. (6) with $P\theta $ and build new functions for the Sobol’ indices computation. However, the new functions will not be deterministic functions as required by the Sobol’ indices. A specific realization of $P\theta $ does not determine the values of $\theta $ but only the distribution $\pi (\theta l|P\theta l)$ for $l=1\u2009to\u2009d\theta $; thus, $\theta $ are still stochastic at given $P\theta $. Only deterministic values of $\theta $ and $\alpha i={xj,ej,\u03f5j}$ ($j=1\u2009to\u2009N$) can decide the subsequent system response prediction distribution $\pi Y(y)$ and its mean value $E(Y)$ and variance $V(Y)$. In sum, an approach to establish a deterministic relationship from $P\theta $ to $\theta $ is needed.

where $F\theta l|P\theta l\u22121(\u22c5)$ is the inverse CDF of $\theta l$ at given $P\theta l$. Note that $U\theta l$ has the standard uniform distribution $U(0,1)$. Equation (7) indicates three steps: (1) generate the values of $P\theta l$ from their prior distribution to produce the conditional distribution $\pi (\theta l|P\theta l)$; (2) generate the value of $U\theta l$ from $U(0,1)$; and (3) substitute $U\theta l$ into the inverse CDF $F\theta l|P\theta l\u22121(\u22c5)$ to obtain a unique value of $\theta l$.

The uncertainty in model parameter $\theta l$ consists of two components: (1) the epistemic uncertainty in distribution parameters $P\theta l$, represented by the prior distribution $\pi (P\theta l)$; and (2) the aleatory uncertainty in $\theta l$ at given $P\theta l$, represented by the conditional distribution $\pi (\theta l|P\theta l)$. These two parts are coupled since $\pi (\theta l|P\theta l)$ depends on the value of $P\theta l$. The introduced auxiliary variable $U\theta l$ captures the aleatory uncertainty, and also helps to decouple the aleatory and epistemic uncertainties [26] since the distribution of $U\theta l\u223cU(0,1)$ does not depend on $P\theta l$.

As explained earlier, the basic idea of the proposed approach is to maximize the contribution of epistemic uncertainty of $\theta $ in the synthetic data, in the case of a single specimen. In the case of multiple specimens, we need the contribution of $P\theta $ to be dominant in the context of Eq. (8). If that is achieved, in the system response prediction using actual test data where $P\theta $ are fixed at their true values, the most dominant uncertainty contribution to $E(Y)$ and $V(Y)$ will be removed. Therefore the uncertainty in $E(Y)$ and $V(Y)$ caused by test outcome uncertainty will be reduced significantly, and different actual test outcomes will lead to consistent system response predictions.

### Multiple Types of Tests and Single Test Specimen.

In the case that $q$ different types of tests are to be considered and each type utilizes only one specimen (nondestructive test), Fig. 2 expands to Fig. 3, and Eq. (6) expands to

Equation (9) gives the required deterministic functions for Sobol’ indices computation. In Eq. (9), $Ai={\alpha 1i,\u2026,\alpha Nii}$ for $i=1\u2009to\u2009q$ represents the uncertainty regarding inputs and measurement errors in generating the synthetic data for the $i$ th type of test, where $\alpha ji={xji,eji,\u03f5ji}$ for $i=1\u2009to\u2009q$ and $j=1\u2009to\u2009Ni$; $j$ represents the test number and $Ni$ is the total number of the $i$ th type of test. Note that here $\theta $ is the vector of the model parameters in all types of tests, and test type refers to calibration test versus validation test, and the output quantities measured, as explained in Sec. 1.

Similar to the earlier discussion, in the test resource allocation optimization regarding Eq. (9), we need the contribution of the epistemic uncertainty in $\theta $ toward the uncertainty in $E(Y)$ and $V(Y)$ to be dominant. For the case of multiple types of tests and single test specimen, an example with a framework considering only model calibration is considered in Sec. 5.1; another example of a framework incorporating both model calibration and model validation is considered in Sec. 5.2.

### Multiple Types of Tests and Multiple Test Specimens.

Similarly, in the test resource allocation optimization regarding Eq. (10), we need the contribution of the epistemic uncertainty in $P\theta $ toward the uncertainty in $E(Y)$ and $V(Y)$ to be dominant.

### Selection of Sobol’ Indices.

Thus far, deterministic functions for Sobol’ indices computation in different test conditions have been established. Robust design of resource allocation can be achieved by maximizing the contribution of the epistemic uncertainty regarding either $\theta $ (single specimen) or $P\theta $ (multiple specimen). This epistemic uncertainty is represented by a set of random variables ($\theta $ in Eqs. (6) and (9); $P\theta $ in Eqs. (8) and (10)). The total effect sensitivity index considers the interactions between the subset of random variables and its complement; thus, to be more comprehensive, the optimization in this paper uses Eq. (4) to compute the total effect index for the subset of epistemic uncertainty (either $\theta $ or $P\theta $). In the rest of the paper, Sobol’ index indicates the total effect index in Eq. (4). The computed Sobol’ indices are denoted as $SmE(Y)$ for $E(Y)$ and $SmV(Y)$ for $V(Y)$. In the case of single specimen, $m=\theta $ so that $SmE(Y)$ and $SmV(Y)$ are the Sobol’ indices of $\theta $; in the case of multiple specimen, $m=P\theta $ so that $SmE(Y)$ and $SmV(Y)$ are the Sobol’ indices of $P\theta $.

## Optimum Test Resource Allocation

### Formulation.

where $Ci>0$ is the unit cost of the $i$ th $(i=1\u2009to\u2009q)$ type of test and $Ni$ is the number of tests of the $i$ th type; and $C0$ is the budget constraint; and $p1$ and $p2$ are user-defined positive constant weight coefficients.

where $\lambda E(Y)$ and $\lambda V(Y)$ are the desired lower bounds of the Sobol’ index for $E(Y)$ and $V(Y)$, respectively.

Equations (11) and (12) are both integer optimization problems since the decision variables $Ni(i=1toq)$ are integers. Sometimes, integer optimization is solved using a relaxation approach [28], where the integer constraint is first relaxed, and the integers nearest to the resultant optimal solution are used as the solution of the original (unrelaxed) problem. Unfortunately, this approach is not applicable here because the synthetic data to be used in model calibration/validation can be generated only if $Ni(i=1\u2009to\u2009q)$ are integers. *It is not possible to generate test data for a noninteger number of tests.*

### Solution Algorithm.

A simulated annealing algorithm [29] is used for the solution of Eqs. (11) and (12) because it can handle stochastic discrete optimization problems without requiring relaxation. For discrete optimization problems such as in Eqs. (11) and (12), this algorithm aims to minimize an objective function $f(s)$ where $s={s1,\u2026,sL}$ is a vector of integers and its feasible region is $\Omega $. If the objective is to maximize $f(s)$ as shown in Eq. (11), $\u2212f(s)$ ought to be minimized.

As shown in Fig. 4, the simulated annealing algorithm starts from an initial value $s0\u2208\Omega $. If $s$ is the optimal solution in an iteration, a new value $s\u2032$ will be randomly selected within the neighborhood of $s$. This neighborhood, denoted as $\u2135(s)$, can be defined by different proposal density functions; and this paper defines $\u2135(s)=[s1\xb1d1,\u2026,sl\xb1dL]\u2229\Omega $ where $dl$ is a user-defined positive integer for $l=1\u2009to\u2009L$. In one iteration, if $f(s\u2032)<f(s)$, the new value $s\u2032$ is accepted as the new optimal solution; otherwise, the probability to accept $s\u2032$ is

where $T0$ is the user-defined starting value of $T$, $k$ is the current iteration number, $K$ is the total number of iterations allowed, and $\alpha $ is a user-defined exponent that determines the rate of decrease of $T$. This iteration proceeds until the total allowed number of iterations $K$ is expended.

### Summary.

This section proposed formulations for test resource allocation optimization, considering two formats: (1) maximizing the Sobol’ index of the epistemic uncertainty in $\theta $ or $P\theta $ subject to budget constraint and (2) minimizing the cost subject to the Sobol’ index threshold. Both formats are applicable to the cases of single or multiple specimens and single or multiple types of tests. As a result, the system response predictions become insensitive to the variability in test outcomes. The decision variables (numbers of tests) are discrete variables, and a simulated annealing algorithm is used to solve this discrete optimization. In this optimization, the Sobol’ index of the epistemic uncertainty in $\theta $ or $P\theta $ is computed by the method discussed in Sec. 2.

## Numerical Examples

This section uses two examples to illustrate the proposed method. The first example is a mathematical problem and the second example is a structural dynamics problem. Regarding the types of tests, specimen, and calibration/validation, the first example considers: (1) multiple types of tests, (2) model calibration only, and (3) both the cases of single and multiple specimens. The second example considers: (1) multiple types of tests, (2) both model calibration and validation, and (3) single specimen only.

### Mathematical Example.

The inputs $X1$ and $X2$ are assumed to be independent random variables; the uncertainty regarding their values in tests is represented by uniform distributions $X1\u223cU(90,110),X2\u223cU(40,60)$, based on ranges obtained from the test personnel.

Two types of tests are available. Test type I measures $W1$ with measurement error $\u03f51\u223cN(0,502)$; and test type II measures $W2$ with measurement error $\u03f52\u223cN(0,402)$. The resultant synthetic data are pairwise data ${X1,W1}$ and ${X2,W2}$, respectively. Assume that the unit cost of type I test is 4 and the unit cost of type II test is 1.

Two cases are considered in this example: single test specimen versus multiple test specimens. In case 1 of single specimen, model parameter $\theta ={\theta 1,\theta 2}$ has true but unknown values to be calibrated. In case 2 of multiple specimens, ${\theta 1,\theta 2}$ follow normal distributions $N(\mu \theta 1,\sigma \theta 12)$ and $N(\mu \theta 1,\sigma \theta 12)$ across specimens, and the parameters to be calibrated are $P\theta ={\mu \theta 1,\sigma \theta 1,\mu 2,\sigma \theta 2}$.

The process to realize the system response prediction $Y$, i.e., the framework of model calibration/validation with the synthetic data is shown in Fig. 5, where the posterior distributions of calibration parameters together with the known distributions of $X1$ and $X2$ are propagated through the computational model in Eq. (15) to obtain the distribution of $Y$. Note that model validation is not considered in this example; only calibration is considered. The proposed test resource allocation approach can also handle model validation, as shown in the next numerical example.

#### Case 1: Single Test Specimen.

##### Optimization formulation 1.

where $N1$ is the number of type I tests and $N2$ is the number of type II tests. $N1$ and $N2$ are the decision variables, i.e., we need to decide the number of replications of each type of test.

The simulated annealing algorithm is used to solve Eq. (16), and Fig. 6 records the process of optimization. Figure 6(a) shows that the optimization starts at an initial design point $(N1,N2)=(1,1)$ and terminates at the optimal solution $(N1,N2)=(2,8)$. Figure 6(b) shows that only some of the random walks are accepted and the maximized Sobol’ index sum $S\theta E(Y)+S\theta V(Y)$ is 1.89. The feasible region in Fig. 6(a) covers the combinations of $N1$ and $N2$ such that $4N1+N2\u226416$. Note that (1) this feasible region is obtained by extra computation and (2) this feasible region is shown only to help in visualizing the result but is not needed in the optimization.

As discussed in Sec. 3.1, since the robustness objective $S\theta E(Y)+S\theta V(Y)$ is maximized, the optimal solution $(N1,N2)=(2,8)$ for Eq. (16) should lead to consistent system response prediction regardless of the true values of $\theta $. Three steps are pursued to verify it: (1) assume “true” values of $\theta $; (2) generate multiple sets of data with the size of $(N1,N2)=(2,8)$ based on the assumed value of $\theta $ from step 1; and (3) plot the family of system response prediction PDFs using the data sets in step 2 and observe whether they are consistent. Although the data are still synthetic, this is a simulation of the system response prediction using the actual test data since the model parameters $\theta $ are fixed at the same value across different data sets; while in the synthetic data generation for test resource allocation shown in Fig. 2, the model parameters are fixed within a single data set but vary across different data sets. The results of this verification are shown in Fig. 7. Figure 7(a) indicates that $(N1,N2)=(2,8)$ leads to consistent system response predictions if the true values of model parameters are ${\theta 1,\theta 2}={4.9,9.5}$; similarly, Figs. 7(b) and 7(c) show that consistent system response predictions are also obtained if ${\theta 1,\theta 2}={5.4,9.8}$ or ${\theta 1,\theta 2}={5.0,10.5}$.

As a comparison, Fig. 8 shows the same results as Fig. 7 but at a suboptimal solution of $(N1,N2)=(1,12)$. This suboptimal solution spends the same cost as the optimal solution, but the enlarged variation across different PDFs in Fig. 8 indicates that this suboptimal solution cannot guarantee consistent predictions as the optimal solution. To quantify this conclusion, Table 1 compares the “variance of the variance of the prediction” $V(V(Y))$ at the optimal and suboptimal solution. This table clearly shows that the optimal solution always has smaller values of $V(V(Y))$ at different values of $\theta $, which proves that the optimal solution gives more consistent predictions.

$\theta $ | $\theta 1=4.9,\theta 2=9.5$ | $\theta 1=5.4,\theta 2=9.8$ | $\theta 1=5.0,\theta 2=10.5$ |
---|---|---|---|

Optimal solution of $(N1,N2)=(2,8)$ | 5.7 × 10^{3} | 6.2 × 10^{3} | 6.2 × 10^{3} |

Suboptimal solution of $(N1,N2)=(1,12)$ | 1.4 × 10^{4} | 1.3 × 10^{4} | 1.3 × 10^{4} |

$\theta $ | $\theta 1=4.9,\theta 2=9.5$ | $\theta 1=5.4,\theta 2=9.8$ | $\theta 1=5.0,\theta 2=10.5$ |
---|---|---|---|

Optimal solution of $(N1,N2)=(2,8)$ | 5.7 × 10^{3} | 6.2 × 10^{3} | 6.2 × 10^{3} |

Suboptimal solution of $(N1,N2)=(1,12)$ | 1.4 × 10^{4} | 1.3 × 10^{4} | 1.3 × 10^{4} |

##### Optimization formulation 2.

The simulated annealing algorithm is used to solve Eq. (17), and Fig. 9 records the process of optimization. Figure 9(a) shows that the optimization starts at an initial design point $(N1,N2)=(8,8)$ and terminates at the optimal solution $(N1,N2)=(3,7)$. Figure 9(b) shows that only some of the random walks are accepted and the minimized cost is 19. The feasible region in Fig. 9(a) covers the combinations of $N1$ and $N2$ such that $S\theta E(Y)\u22650.95$ and $S\theta V(Y)\u22650.95$. Similar to Fig. 6, note that (1) this feasible region is obtained by extra computation and (2) this feasible region is shown only to help in visualizing the result but is NOT needed in the optimization.

As discussed in Sec. 3.1, since the robustness constraints $S\theta E(Y)\u22650.95,S\theta V(Y)\u22650.95$ are satisfied, the optimal solution $(N1,N2)=(3,7)$ for Eq. (17) should lead to consistent system response prediction regardless of the true values of $\theta $. The same three steps for Fig. 7 are pursued to verify it. The results of this verification are shown in Fig. 10. Figure 10(a) indicates that $(N1,N2)=(3,7)$ leads to consistent system response predictions if the true values of model parameters are ${\theta 1,\theta 2}={5.7,10.5}$; similarly, Figs. 10(b) and 10(c) show that consistent system response predictions are also obtained if ${\theta 1,\theta 2}={5.2,9.1}$ or ${\theta 1,\theta 2}={4.6,10.8}$.

#### Case 2: Multiple Test Specimens.

In this case, model parameters $P\theta ={\mu \theta 1,\sigma \theta 1,\mu 2,\sigma \theta 2}$ have unknown deterministic values and uniform prior distributions $\mu \theta 1\u223cU(4,6)$, $\sigma \theta 1\u223cU(0.2,1)$, $\mu \theta 2\u223cU(8,10)$, $\sigma \theta 2\u223cU(0.8,1.5)$ are assumed for them. This case is also applied to the two optimizations in Eqs. (11) and (12). The unit cost of type I test is 4 and the unit cost of type II test is 1.

##### Optimization formulation 1.

The simulated annealing algorithm is used to solve Eq. (18), and Fig. 11 records the process of optimization. Figure 11(a) shows that the optimization starts at an initial design point $(N1,N2)=(5,5)$ and terminates at the optimal solution $(N1,N2)=(5,13)$. Figure 11(b) shows that only some of the random walks are accepted and the maximized Sobol’ index sum $SP\theta E(Y)+SP\theta V(Y)$ is 1.92.

As discussed in Sec. 3.1, since the robustness objective $SP\theta E(Y)+SP\theta V(Y)$ is maximized, the optimal solution $(N1,N2)=(5,13)$ for Eq. (18) should lead to consistent system response prediction regardless of the true values of $P\theta $. The results of this verification are shown in Fig. 12.

As a comparison, Fig. 13 shows the same results as in Fig. 12 but at a suboptimal solution of $(N1,N2)=(4,17)$. This suboptimal solution spends the same cost as the optimal solution, but the enlarged variation across different PDFs in Fig. 13 indicates that this suboptimal solution cannot guarantee consistent predictions as the optimal solution. To quantify this conclusion, Table 2 compares $V(V(Y))$ at the optimal and suboptimal solution. This table clearly shows that the optimal solution always has smaller values of $V(V(Y))$ at different values of $\theta $, which proves that the optimal solution gives more consistent predictions.

$P\theta $ | ${4.2,0.9,8.3,1.1}$ | ${5.8,0.4,9.1,0.9}$ | ${4.7,0.6,9.6,1.2}$ |
---|---|---|---|

Optimal solution of $(N1,N2)=(5,13)$ | 4.1 × 10^{2} | 3.7 × 10^{2} | 8.1 × 10^{2} |

Suboptimal solution of $(N1,N2)=(4,17)$ | 3.4 × 10^{3} | 2.7 × 10^{3} | 3.1 × 10^{3} |

$P\theta $ | ${4.2,0.9,8.3,1.1}$ | ${5.8,0.4,9.1,0.9}$ | ${4.7,0.6,9.6,1.2}$ |
---|---|---|---|

Optimal solution of $(N1,N2)=(5,13)$ | 4.1 × 10^{2} | 3.7 × 10^{2} | 8.1 × 10^{2} |

Suboptimal solution of $(N1,N2)=(4,17)$ | 3.4 × 10^{3} | 2.7 × 10^{3} | 3.1 × 10^{3} |

##### Optimization formulation 2.

The simulated annealing algorithm is used to solve Eq. (19), and Fig. 14 records the process of optimization. Figure 14(a) shows that the optimization starts at an initial design point $(N1,N2)=(12,12)$ and terminates at the optimal solution $(N1,N2)=(5,10)$. Figure 14(b) shows that only some of the random walks are accepted and the minimized cost is 30.

As discussed in Sec. 3.1, since the robustness constraints $SP\theta E(Y)\u22650.95,SP\theta V(Y)\u22650.95$ are satisfied, the optimal solution $(N1,N2)=(5,10)$ for Eq. (19) should lead to consistent system response prediction regardless of the true values of $P\theta $. The results of this verification are shown in Fig. 15.

### Multilevel Problem.

The second numerical example is a multilevel structural dynamics challenge problem provided by Sandia National Laboratories [30]. In this example, we have four types of tests and a single specimen, as explained in Sec. 3.3. As shown in Fig. 16, this multilevel problem consists of three levels. Tests are available at level 1 and level 2, and it is required to predict the system response in level 3.

*Level 1*: The three mass-spring-damper components are connected in series (Fig. 16(a)), and a sinusoidal force input $P=300\u2009sin(500t)$ is applied to $m1$. The observable quantity is the maximum acceleration $A3L1$ at the top mass and the measurement error is $\u03f51\u223cN(0,1002)$. The computational model for $A3L1$ can be found in structural dynamics text books [31]; thus, synthetic data of $A3L1$ can be generated.

*Level 2*: The mass-spring-damper system is mounted on a beam supported by a hinge at one end and a spring at the other end (Fig. 16(b)), and a sinusoidal force input $P=3000\u2009sin(350t)$ is applied on the beam. The observable quantity is the maximum acceleration $A3L2$ at the top mass and the measurement error is $\u03f52\u223cN(0,4002)$. The computational model for $A3L2$ based on finite element analysis is provided by Sandia National Laboratories [30]; thus, synthetic data of $A3L2$ can be generated. Level 1 and level 2 are defined as lower levels, and test data are assumed to be available only at lower levels.

*Level 3*: This has the same configuration as level 2, but the input is a random process loading (indicating a difference in usage condition), as shown in Fig. 16(a). Level 3 is the prediction configuration of interest, and the response to be predicted is the maximum acceleration $A3L3$ at the top mass at level 3. No test data are available at level 3. The computational models for $A3L3$ are also provided by Sandia National Laboratories [30].

All three levels have the same model parameters, i.e., the three spring stiffnesses $k={k1,k2,k3}$. This example assumes the case of single test specimen; thus, $k$ are the parameters to be calibrated. They are assumed to be deterministic but unknown, with independent prior distributions $k1\u223cN(5000,5002)$, $k2\u223cN(10000,10002)$, and $k3\u223cN(9000,9002)$.

Four types of tests are available in this example:

- (1)
Type I test measures $A3L1$ and the resultant data set $D1C$ is used in model calibration;

- (2)
Type II test measures $A3L1$ but the resultant data set $D1V$ is used in model validation;

- (3)
Type III test measures $A3L2$ and the resultant data set $D2C$ is used in model calibration;

- (4)
Type IV test measures $A3L2$ but the resultant data set $D2V$ is used in model calibration.

The unit costs of these four types of tests are denoted as $Ci(i=1to4)$, respectively, and the number of each type of test is denoted as $Ni(i=1to4)$, respectively.

The key step to predict $A3L3$ is to estimate the values of the model parameters $k={k1,k2,k3}$. A reasonable route is to quantify the model parameters $k={k1,k2,k3}$ using lower level calibration data of $A3L1$ and $A3L2$, and propagate the results through the computational model at the system level. However, either $A3L1$ or $A3L2$ can be used to calibrate the same model parameters; thus, three calibration options are possible: (1) calibration using the data on $A3L1$ alone; (2) calibration using the data on $A3L2$ alone; and (3) calibration using the data on both $A3L1$ and $A3L2$. The challenge in such a multilevel problem is how to select from or combine these alternative calibration results. This paper uses the roll-up method developed in Refs. [5] and [32] to solve this challenge. This roll-up method uses Bayesian model averaging of various calibration results and the weights for the averaging are obtained from model validation in each lower level. Thus, the framework of model calibration/validation for system response prediction considers both model calibration and validation. A brief introduction of this framework is given here:

- (1)
Model calibration by Bayesian inference to obtain the posterior distributions $\pi (k|D1C)$, $\pi (k|D2C)$, and $\pi (k|D1C,D2C)$, respectively.

- (2)
Model validation at lower levels using the model reliability metric [5,33]. The resultant model validity at level 1 and level 2 is denoted as $P(G1)$ and $P(G2)$, respectively.

- (3)Obtain the integrated distribution $\pi (k|D1C,V,D2C,V)$ by the roll-up formula [5,32,34] in the below equationwhere $P(G1\u2032)=1\u2212P(G1)$ and $P(G2\u2032)=1\u2212P(G2)$ and $\pi (k)$ denotes the prior distribution of $k$. In Eq. (20), the integrated distribution $\pi (k|D1C,V,D2C,V)$ is a weighted average of four terms: in the first term, the posterior distribution $\pi (k|D1C,D2C)$ uses the calibration data of both level 1 and level 2 and its weight $P(G1)P(G2)$ is the probability that both models are valid; in the second and third terms, the posterior distribution $\pi (k|DiC)$ uses the calibration data at level $i$ alone and its weight is the probability that the model at level $i$ is valid but the model at another level is invalid; in the last term, the weight $P(G1\u2032)P(G2\u2032)$ of the prior distribution $\pi (k)$ is the probability that both of the models are invalid. Recently, a more comprehensive approach incorporating the relevance between lower levels and level 3 has been developed in Ref. [6]; and the proposed method in this paper is also applicable for this new approach.$\pi (k|D1C,V,D2C,V)=P(G1)P(G2)\pi (k|D1C,D2C)+P(G1\u2032)P(G2)\pi (k|D2C)+P(G1)P(G2\u2032)\pi (k|D1C)+P(G1\u2032)P(G2\u2032)\pi (k)$(20)
- (4)
Propagate $\pi (k|D1C,V,D2C,V)$ through the computational model of $A3L3$ to predict the distribution of $A3L3$.

Since the computational models and measurement errors are known so that synthetic data of four types of test can be generated, and the framework of model calibration/validation is known, the proposed approach of test resource allocation is used to optimize the number of each type of test.

#### Optimization Formulation 1.

The simulated annealing algorithm is used to solve Eq. (21). The initial value is $N1=N2=N3=N4=3$. Among 500 iterations, the random walks of 226 iterations are accepted. Figure 17 shows the change of index sum over the iterations and the maximized index sum at the optimal solution is 1.88. The final optimal solution is $N1=11,N2=9,N3=6,N4=2$.

As discussed in Sec. 3.1, since the robustness objective $S\theta (E(Y))+S\theta (V(Y))$ is maximized, the optimal solution $(N1,N2,N3,N4)$ would result in consistent system response predictions regardless of the true value of model parameters ** k**. Similar to the mathematical example in Sec. 5.1, verification of this multilevel test allocation result is shown in Fig. 18. Figure 18 indicates that consistent system response predictions with three different assumed true values of model parameters.

#### Optimization Formulation 2.

The simulated annealing algorithm is used to solve Eq. (22). The initial value is $N1=N2=N3=N4=15$. Among 500 iterations, the random walks of 164 iterations are accepted. Figure 19 shows the change of cost over the iterations and the minimized cost at the optimal solution is 66. The final optimal solution is $N1=11,\u2009N2=10,\u2009N3=6,\u2009N4=3$.

As discussed in Sec. 3.1, since the robustness constraints $S\theta E(Y)\u22650.95,S\theta V(Y)\u22650.95$ are satisfied, the optimal solution $(N1,N2,N3,N4)=(11,10,6,3)$ should lead to consistent system response prediction regardless of the true value of model parameters $k$. Similar to the mathematical example in Sec. 5.1, verification of this multilevel test allocation result is shown in Fig. 20. Figure 20 indicates that consistent system response predictions with three different assumed true values of model parameters.

## Summary

Test resource allocation aims to optimize the number of each type of test before any actual test is conducted. This paper focuses on the proposed robust test resource allocation, which means that the system response prediction is insensitive to the variability in the test outcomes so that consistent system response predictions can be achieved under different test outcomes.

The main challenge for the proposed approach is to quantify the contribution of test outcome uncertainty toward the uncertainty in the system response prediction. Since test resource allocation is needed before any actual test, this test outcome uncertainty is simulated by the uncertainty in the synthetic data. This paper analyzes the uncertainty sources in the synthetic data regarding different test conditions and concludes that consistent system response predictions will be achieved if the contribution of epistemic uncertainty regarding model parameters in the synthetic data can be maximized. This paper uses the global sensitivity analysis method Sobol’ indices to assess this contribution, so the desired consistent system response predictions can be guaranteed regardless of the true values of the parameters in the actual tests ($\theta $ for single specimen and $P\theta $ for multiple specimen).

Two cases of optimization are considered in this paper: (1) subject to the budget constraint, optimize the number of each type of test to reach the most robust design or (2) subject to the robustness requirement, find the number of each type of test to minimize the budget. In addition, the proposed approach can be applied in multiple situations: (1) only model calibration tests are performed or (2) both model calibration and model validation tests are performed. The method can also be applied to tests involving single or multiple specimens. The proposed method results in a discrete stochastic optimization problem, and a simulated annealing algorithm is used to solve this problem.

This paper assumes that the test inputs are from a range of values and represents the uncertainty regarding the test inputs through uniform distributions. Note that this paper is only focused on choosing the number of experiments after the available physical tests are identified. To answer the question that how to choose the physical tests, several factors should be considered, in particular the relevance and sensitivity of the experiments to the calibration quantity of interest. The assessment of relevance and sensitivity addressed in Ref. [1] may be useful in identifying the useful physical test configurations. This paper only addresses the variability of the test data, and on optimizing the number of each type of test so that we can get consistent predictions under different test data outcomes.

This paper assumes that the quantity of interest to predict is a scalar, so we can easily use variance as its uncertainty indicator; thus, the variance-based Sobol’ index can be easily used. If the quantity of interest is a vector, another indicator instead of variance may be needed, and the corresponding sensitivity index is also required. Thus, further work is needed to extend the proposed method to vector and field outputs.

Another direction for further work is regarding test design. The context of the proposed method is during the stage of budget planning, and usually at this stage, details of the test design are not known or considered. Thus, this paper only focuses on optimizing the number of each type of the test. The extension of the proposed approach to include test design, i.e., deciding the specific test conditions, can be studied in future work such that the resultant system response prediction uncertainty can be further reduced. This can be addressed in two ways: (1) by simultaneously optimizing the number of tests and the test inputs or (2) by adaptively deciding the number of tests and their input conditions based on the observation data as the test campaign progresses.

## Acknowledgment

The authors appreciate valuable discussions with Joshua Mullins from Sandia National Laboratories.

## Funding Data

Sandia National Laboratories (Contract No. BG-7732).