Choice of Explanatory Variables: A Multiobjective Optimization Approach

Choice of Explanatory Variables: A Multiobjective Optimization Approach

CHOICE OF EXPLANATORY VARIABLES: A MULTIOBJECTIVE ... Copyright ([:1 14th World Congress of IFAC N-7a-ll-1 1999 IFA,C 14th Triennial World Congr...

3MB Sizes 1 Downloads 34 Views




14th World Congress of IFAC


1999 IFA,C

14th Triennial World Congress, Bcijing, P.H.. China

CHOICE OF EXPLANATORY VARIABLES: A MULTIOBJECTIVE OPTIMIZATION APPROACH Danyang Liu, Sirish L. Shah and D. Grant Fisher Department of Chemical and Materials EngineerinY1 University of Alberta, Edm.) Canudu T6G 2G6

Abstract~ In process control applications it is often necessary to derive a suitable process model from input/output data nsing rnultivariable regression analysis. I-IoweverJ problems frequently arise in selecting an appropriate set of independent (explanatory) variables and/or in handling prOblems caused by correlation between the rneasurernel1ts. The rnultiobjective function based approach presented in this paper addresses both problems and is shown to produce excellent results in practical applications. The proposed method miniITlizes a user-specified combination of the following three criteria: 1) the data Illatrix related residue J 2) the observation or measurement related residue, and 3) the condition number of the new data. matrix of the selected explanatory variables. Four different algorithlTls for solving the resulting multiobjective optimization are compared. The ITIodel structure determined by applying the proposed lTIethod to a simulated process identification problem was identical to the actual model structure. A second successful application involved ITlodel structure determination for a 2 x 2 pilot plant process based on experimental input/output data. Copyright t£) 1999lFAC

Ke),vords~ :Wultivariate analysis; multiple linear regression; modelling; system identification

Each column of the data matrix X corresponds to a series of measurements of a.n independent variable in h(k). In model (1) there are n such independent variables, also called explanatory variables. If X is of full column rank, the least squares estimate of e is

1. Introduction

One of the Inost difficult problems in regression analysis is the selection of a set of indepeZldent or explanatory variables to be included in the model (Neter and Wasserrnan, 1974). This is a problem of general interest in multiple linear and principle component regression. Suppose the original regression model is

(3) However) in many applications the set of n explanatory variables may contain variables that have very little effect on the prediction of Y ~ In addition, approximate linear dependence may also


exist amongst the n explanatory vd.riables, i.e.

where y(k) E RP is the p-dimensional dependent variable, h(k) E Jr' is an n-vector containing n independent variables, e E Rnxp is the parameter matrix, and e(k) is the random error. If y(k) and h(k) are measured m times, then from (1) it follows that



some of the explanatory variables are redundant. In such cases, the data matrix X may be seriously ill-conditioned or even close to rank deficient, and hence the estimated parameter matrix is very sensitive to measurement errors. This will usually lead to a poor Illodel and hence the prediction error given by the model may be unacceptable~



One natural way to solve this problem is to select a subset of explanatory variables from the original variable set. This is equivalent to finding a new


data matrix X r whose columns are selected from the columns of the original data nlatrix X ~ This paper mainly focuses on this explanatory V"dXiable selection problem....t\.nother way to sol,,"'e this problem is to select the best set of linear combinations of the original explanatory variables. This prob-

e is a matrix to be estimated, and E is a random error matrix with zero mean.


Copyright 1999 IFAC

ISBN: 0 08 043248 4


14th World Congress of IFAC

lem has also been considered by Liu et aL (1998).

2. Probletn forInulation

The explanatory variable selection problem described above arises in many practical applications. For example, in system identification, the output y(k) of a dynamic process at sampling instant k can be expressed approximately as a linear function of the previous process inputs u(k - d), u(k - d - 1), .n, u(k - m) and outputs y(k - 1), y(k - 2), ... , y(k - n), where d ~ 1 is called the delay, and the pair (m, n) represents the order of the process. Finding suitable values for d, rn, and n is an explanatory variable selection problem.

Let X r E Rm,xr be an m by r matrix whose columns are selected from X and let R..(Xr ) denote the range of X r , Le. the space spanned by all columns of X r . Then the orthogonal projection of X on "R,(X r ) is

where Xi is the Moore-Penrose inverse of X r (Ben-Israel and Greville, 1974)~ This projection is the part of the original data matrix X that is explained by X r • The unexplained part is the residue

Another relevant problem is the sensor selection


problem. In a practical chemical process) for example, there are usually a large number of sensors and hence a large number of measurements are available. Typically these measurements have


The residue can not be explained by X


because it

is orthogonal to n(Xr)~ The original data matrix X is thus divided into two parts:

strong collineaxity among them. The problem of selecting those sensors whose measurements are most useful for a specific purpose can be formu-


lated as an explanatory variable selection problem.

Similarly, the observation matrix Y can also be divided into two parts~

The literature on the explanatory variable selection problem is vast and the reader is referred to Rotelling (1957) for a review of early works,


Van H uffel and Vandewalle (1987) for a totalleaBt squares approach based algorithm, and Golub and

Van Loan (1996) for an algorithm that is based on the singular value decomposition and QR factorization methods. Some other approaches are described in Neter and Wassennan (1974). A different "ray to pose the explanatory variable selection problem is via principle component analysis of X and Y blocks separately. Alternatively one can carry out PLS (projection onto latent structure) analysis on the data. This was first proposed by Wold ("Told, 1966). This paper presents a multiobjective optimization approach to solve the explanatory variable selection problem. The basic idea behind this approach is given in Section 2, where three criteria are chosen to formulate the explanatory variable selection problem. Algorithms to solve the resultant Inultiobjective optimization problem are presented in Section 3.. Section 4 provides a brief discussion on how to deal with the multiple output observation case, Le. the case where y(k) in model (1) is a p-dimensional vector and p > 1. In Section 5, two examples are given to show how the proposed method can be used to determine an appropriate structure for the dynamic model identification of a process. Finally, concluding remarks are given in Section 6.

The first part, XrX:Y, is the orthogonal projection of Y on R.(Xr ) and is explained by X r • The second part, Y - xrxty, is the residue that is orthogonal to R(Xr ) and hence can not be explained by X r · Therefore, for a given number r, the choice of X should be such that the two criteria

11 12



IIX - XrX; Xl1 2 IIY - X r X;-YH 2


(6) (7)

are as small as possible, where I1 .. It can be any user-specified matrix norm. In this paper it is restricted to represent the Frobenius norm. In other words, for a specific value of r, the choice of X r should minimize the residues in the X- and

Y-spaces. From (4) and (5) and the orthogonality between the projection and the residue it follows that (8)

(9) Therefore, minimizing maximizing

/1 and /2

is equivalent to

and hence the problem of residue minimization and the problem of projection maximization have the same solutions.


Copyright 1999 IFAC

ISBN: 0 08 043248 4


14th World Congress of IFAC

Equations (8) and (9) also suggest that the two

Unlike other existing methods for solving the explanatory v-driable selection problem, the main feature of the approach developed in this paper is the use of multiple model selection criteria. The solution to the explanatory variable selection problem is obtained by solving the resultant multiobjeetive optimization problem. It is worth noting that the use of the third criterion cond(Xr ) is important, as it guarantees that explanatory variables that can be approximately expressed as linear combinations of the other explanatory variables will not be selel."ted.

numbers (10) (11) can be used to mea:3ure or quantify the percentages of X and Y that are explained by Xf'. Given the number r, the solution to this problem should be the choice of X r that will maximize these two percentages.

Using a similar approach, the problem of choosing the best set of latent explanatory variables that are linear combinations of the original explanatory variables is considered in Liu et al. (1998)~ The resultant multiobjective optimization problem is of the same form as (13) or (14) except that the constraint X r C X is replaced by X r = XL, where L, whose columns are coefficients of the linear combinations, is the transformation matrix to be optimized.

As has been pointed out in the introduction section, one of the original concerns in the regression problem is the ill-conditioning of the data matrix X. Therefore, another criterion that should be taken into account when selecting X r is the condition number

fa ::::::: cond(Xr )


Minimization of the condition number (12) guarantees good numerical behavior of the selected data matrix X.,.. The following measure

.._ cond(X) - cond(Xr ) 100 93 .cond(X) x

Various techniques (see, e.g.. Grace (1994), Cohon (1978), Zionts (1985)) can be used to solve this multiobjecti,,'"e optimization problem. Some of these techniques are discussed in the next section with the objective of solving problem (13).

reflects the improvement of the numerical condition of the sele(."ted new data matrix X r . 3. Multiobjective Optimization Algoritlnns

Thus, the explanatory variable or column selection problem can be formulated as a multiobjective optimization problem of the form:


3.1. Algorithm 1 One of the frequently llSed methods of solving a rnultiobjective optimization problem such as (13) is to specify a weight for each of the three criteria and lorm a single objective function that is the vireighted sum of these criteria, i.e.,

min f === Wl/1 s.t. X,.. C X

or equivalently,


where the calligraphy letter X r represents the set of all columns of the matrix X r and X represents the set of all columns of the matrix X. The basic idea behind this problem formulation can be summarized as: The choice of explanatory variables should be such that the resultant new data matrix X r (smaller in size than X) should

1. be a good representation of the original data matrix X, i.e., i1 should be small; 2. provide a good prediction of the dependent Variable y(k), Le.) 12 should be small; and 3. be well conditioned numerically, Le., fa should be smalL






In order to solve (15), the number r should be specified. One way to find an appropriate value for r is to solve (15) and find the minimum value of fer) for each r = TO, TO + 1, ~ ... , n, where TO is slightly less than the initial guess of the appropriate value.. ..-\8 r increases, /1 and 12 decrease and /3 increases.. The appropriate choice of r should be such that fer) is minimum or that the rate of decrease in f (r) as r increa:::es is insignificant ~ In the following discussion it is assumed that the appropria.te value of r has already been obtained.

Another way to solve the multiobjective optimization problem is to use lexicographic preference


Copyright 1999 IFAC

ISBN: 0 08 043248 4


14th World Congress of IFAC

(Fishburn, 1974). Suppose (i.,j., k) is a permutation of (1,2,3). If we assume that it is overwhelmingly more important to make fi small than to make Ij· and Ik small, we can first solve the problem

'This algorithm has three possible variations because the subscript i has three possible choices, Le. 1, 2, and 3.


min li s.t. X r C X




In the above discussion Y was considered to be a matrix with p < n columns and no other restrictions were imposed on p. This is equivalent to the assumption that the dependent variables contained in tile (see Equation (1») have the same explanatory variables, However, it is easy to show that this assumption is not always valid. For example, suppose

Now suppose that making fj small is overwhelmingly more important than making fk smalL We solve the second optimization problem min fj s.t. X r is a solution to (16)

A relnark on the case p


Finally, the solution to the problem min fk B. t.

X r is a solution to (17) and it is required that two columns of X be selected to explain Y. Clearly, in order to explain the first column of Y, the best choice is the first and second column of X, and in order to explain the second column of Y,. the best choice is the second and third column of X. In each case the corresponding column of Y can be explained completely. However, it is impossible to find two columns of X that can simultaneously and completely explain bqth columns of Y.

is the optimal solution we want. For simplicity, we denote the above procedure by

min (fi,




s.t. X r eX

Because the pennutation (i, j) k) of (1,2,3) has six results:

(1,2,3), (2,3, 1),

(1,3,2), (3,1,2),

(2, 1,3) (3,2,1)

(19) Therefore, when using the above algorithms, unless it is required that the same selection of explanatory variables be used to explain all dependent variables, it is necessary to consider each of the columns of Y separately.

Algorithm 2 has six combinatorial options.

3.3, Algorithm 3

Suppose (i, j, k) are as in Algorithm 2, and asstune that making fi small is overwhelmingly more important than making fj small, and that fk should not be larger than Ck. Then, using the notation convention in (18), the optimal X r is a solution to the problem min (fi, Jj) s.t. fk ~ Ck,

Xr C X

Like Algorithm 2, this algorithm also has six variations due to the six possible permutations (i, j, k) as shown in (19)

If two of the criteria, say fj and fk, are subject to the hard constraints fj :$ Cj and Ik ~ Ck, then the optimal X ... can be found by solving the problem


5. EXanlples 5.1. A SISO process identification problem Suppose the process to be identified is a SISO plant with a delay of 3 and described by

3.4. Algorithm 4

min fi s.t. fj ~

The above remark can be applied to the problem of multivariable process identification. If the process to be identified is a multi-input multi-output (MII\10) process and the nmnber of outputs is p, then the same number of p multi-input singleoutput (MISO) models should be used. In other ","ords, a MIMO process with p outputs should be treated as a set of p MISO subprocesses. This remark is supported by the example given in the next section.




Xr C X


= O.6y(k -



+ a.BuCk :=



+ w(k) (20)

where w(k) is Gaussian white noise with zero mean and unit variance, the input u(k) is generated by a measllreable white noise sequence v(k)


Copyright 1999 IFAC

ISBN: 0 08 043248 4


Table 1

r-----,-' - '-----,-- - . .,' - ,---.---'-----,-~-I

Selected explanatory variables.




1 2 3

4034 3669 2422






4 5 6 7



1307 957 563 219 0

359 358

1.8 2.3 2.3 2.3



357 357


8 9

14th World Congress of IFAC

Indices of selected explanatory variables



1 1,7


1.4,7 1,2,4,7 1,2,4,7,9 1,2,4,6, 7,9 1, 2, 3, 4, 6, 7, 9 I, 2, 3, 4, 5, 6, 7, 9 1, 2, 3, 4, 5, 6, 7, 8,9











with unit variance. Process (20) represents a system with relatively low signal-to-noise ratio. It is assumed that the structure (order, delay, etc~) of the process is unknown, but the upper bound for the order is known to be 4~ Therefore the original model is chosen to be a fourth order model:


Fig. 1. fer) versus r Wl:IteIlo

y(k) ~ h(k)1'O

+ e(k)


where the h(·) vector contains all the nine possi bie explanatory variables numbered from 1 to 9: y(k ~ 1),

h 2 (k) == y(k - 2),

y(k - 3), = u(k - 1), h7(k) == u(k - 3), h 9 (k) == 1

h 4 (k) :::: y(k - 4) htJ (k) == u(k - 2), h 8 (k) =:: u(k - 4),


h 3 (k) h 5 (k)



R 9 is the parameter ,,·ector, e(k) the model error. A total number of 400 input-output data points of process (20) were recorded. Based on these data points and for each r ~ 1,2,3, ...,9, (J E

Algorithm 1 was applied to model (21) to find the r important explanatory variables_ The three ",-eights used in (15) are U.'1 == 1, W2 := 1000, and W3 = 1. The results are shown in Table 1 and Figure 1. From Figure 1 it can be seen that the best choice of r is 2~ Table 1 show~ that the best choice of the explanatory variables corresponding to r == 2 is {hI (k), h 7 (k)}. The least squares regression model for this choice is



O.619y(k -1)

+ O.778u(k -




The structure (Le . , order and delay) of the above identified model is identical to the actual process (20). 5~2.

A MIMO process identification problem

Consider the 2 x 2 computer-interfaced liquid-level control system shown in Figure 2. The closed-loop input-output data are sampled every 20 seconds, with the results shown in Figure 3. These results were obtained using a multiple model based adaptive predictive control scheme developed by Liu et al~ (1997)" Suppose the original regression model is a fourth order model of the form:

y Drain

Fig. 2. A pilot scale 2 x 2 level control system.

where k is the discrete time, h(k)T

== [

y(k - l)T . . . y(k - 4)T u(k - l)T . .. u(k - 4)T 1 ](23)

u(k) E R 2 and y(k) E R2 are the input and output at time k respectively, e E R 1 7x2 is the parameter matrix, and e(k) is the model error.

Note that the regression vector h(k) has 17 components which represent all the explanatory variables numbered from 1 to 17, e.g. h1(k) = Ut (k - 1), ~(k) == Y2(k - 1), h 3 (k) =:;: Ut (k - 2), etc~ The model (22) is ill-conditioned as evidenced by its large condition number cond(X) = 2239~ Therefore, it is necessary to find a better model structure. This is an explanatory variable selection problem. From these 17 vdIiabies the following nine variables are selected by Algorithm 1 for the first liquid level Yl (k):

h l (k), h 7 (k), ha (k), h 9 (k), h10(k)

h12 (k),



h1S (k), hl6 (k)


Copyright 1999 IFAC

ISBN: 0 08 043248 4


14th World Congress of IFAC

teria. Another equivalent approach is also mentioned which maximizes 1) the percentage 91 of X explained by X r , 2) the percentage 02 of Y explained by X r , and 3) the percentage 93 of condition number decrease. The proposed optimization approach has been applied to the problem of model structure determination for a simulated process and a pilot-scale, real-time, multivariable process. The results obtained v.,rere excellent. Fig. 3. Plot of the closed-loop input-output data.




~----- .._.~--_./;

Fig. 4. f(r) versus r for the first liquid level Yl (k)

According to (23) and, this selection corresponds to a model of the form: Yl (k) ~ [YJ (k-l) Yl (k-4) Y2(k-4) u, (k-l) u2(k-l) U2

(k - 2)


(k - 3)


(k - 4) u2(k - 4))81

+ el (k)


81 ==

[1.216 - 0.143 - 0.071 0.492 0.041 0.083 - 0.333 - 0.199 - O.028]T

This model is numerically well conditioned because its data matrix has a much smaller condition number 69, and according to Equations (10) and (11),99.96% of X and 99.998% ofY are explained by the new modeL

6. Conclusions

The Inain contribution of this pa.per is the presentation of a multiobjective optimi~ation based approach that can be used to select an appropriate subset of explanatory variables for a linear model. Three criteria have been proposed: 1) the data. matrix X related residue /1 defined by Equation (6), 2) the observation Y related residue /2 defined by Equation (7), and 3) the condition number fa of the selected new data matrix _Y,. (Equation (12»~

The choice of the explanatory variables should minimize a user-specified combination of these cri-

Ben-Israel, A. and T~N.E~ Greville (1974). Generalized Inverses: Theory and Applications. John WHey & Sons. New York. Cohon., J.L~ (1978). Multiobiective Programming and Planning. Academic Press. lXew York. Fishburn, P.C. (1974). Lexicographic orders, utilities and decision rules; a survey~ Management Science 20, 1442-1472. Golub, G. H . and C.F. Van Loan (1996). Matrix Computations, 9rd edition. The Johns Hopkins University Press. Baltimore, Maryland. Grace, A. (1994). Optimization Toolbox for Use with Matlab. The Mathworks Inc.. Natick, ~:[as­ sachusetts. Hotelling, H. (1957). The relations of the newer multivariate statistical methods to factor analysis. Brit. J~ Stat. Psych. 10, 69-79. Liu, D~Y., S~L. Shah and D . G. Fisher (1998). Choice of latent explanatory variables: a IDultiobjective optimization method. submitted.. Liu, D.Y., S.L" Shah, D.G~ Fisher and X.H. Liu (1997). Multimodel-based minimum-bias control of a benchmark paper Inachine process. The Canadian Journal of Engineering 75, 152-160_ Van Huffel, S. and J. Vandewalle (1987). Subset selection using the total least squares approach in collinearity problems with errors in thr variables. Lin. Alg.. and Its Applic. 88/89, 695714_ Neter, J~ and W. Wassennan (1974). Applied Linear Statz'stical Models~ Richard D~ Irwin, Inc. Homewood, Illinois. Wold, H. (1966). Estimation of principal components and related models by relative least squares. In: Multivarite Analysis (P.R~ Krishnaiah, Ed.). PP6 391-420. Academic Press. New York. Zionts, S. (1985). Multiple criteria mathematical programming: an overview and several approaches. In: Mathematics of Multiobjectitle Optimization (P. Serafini, F...d.),. Springe rVerlag. New York.


Copyright 1999 IFAC

ISBN: 0 08 043248 4