# Detection techniques in least squares identification

## Detection techniques in least squares identification

,4utomatica, Vol. 17, No. 6, pp. 805-819, 1981 Printed in Great Britain 0005-1098/81/060805-15 \$02.00/0 Pergamon Press Ltd c 1981 international Feder...

,4utomatica, Vol. 17, No. 6, pp. 805-819, 1981 Printed in Great Britain

0005-1098/81/060805-15 \$02.00/0 Pergamon Press Ltd c 1981 international Federation of Automatic Control

Detection Techniques in Least Squares Identification* RAJENDRA KUMAR? and J O H N B. MOORE:~ Parameter estimation schemes based on least squares identification and detection ideas are easily implemented, give reduced numerical difficulties and bias reduction when the noise is correlated with the states of the signal generating system. Key Words--Stochastic systems; identification; parameter estimation; detection; bias reduction.

Optimal least squares (LS) schemes may require too much computational effort, particularly when square root versions and high order precision arithmetic are required to avoid illconditioning problems. The computational effort increases as the square of the dimension of the state vectors, and for high order plants, may be prohibitive. Simple stochastic approximation (SA) schemes, [as in the references of Kumar and Moore (1979) and Maschner (1970)] including the state inverse (SI) scheme of Kumar and Moore (1979) and the clipped state (CS) scheme of Maschner (1970) lead to computational simplifications at the cost of a speed of convergence. Their complexity increases only linearly with state dimension but, in rough terms, their convergence rate is up to 10 times slower for SI schemes and up to 100 times slower for SA and CS schemes. Fast LS schemes of Ljung and Falconer (1978) (nonsquare root versions) achieve the speed of the LS algorithm with a complexity that increases only linearly with the state dimension but at the cost of a relatively lengthy computer program. These schemes are also not very robust in the sense that they require a high order of precision in the arithmetic for their implementation. There is still the need then for a scheme which has the numerical advantages of the square root LS scheme, has a fast and robust version option, is less complex than the (LS) scheme, but faster than (SI) (SA) and (CS) schemes. A second major drawback of least squares schemes is that in environments where the noise is colored and correlated with the states, they lead to bias in the estimates. In some applications, as in certain prediction and control problems, this bias may be of very little consequence, but in other applications a precise system identification is important. There are two approaches to bias reduction. In the first, the color in the noise is estimated which adds considerably to the computational burden. In

Almtraet--Parameter estimation schemes based on least squares identification and detection ideas are proposed for ease of computation, reduced numerical difficulties, and bias reduction in the presence of colored noise correlated with the states of the signal generating system. The algorithms are simpler because in the calculations, the state vector is at one point replaced by a quantized version. This technique avoids to some extent numerical diffculties associated with illconditioning in least squares schemes and thus obviates the need for square root algorithms and the need for high order precision calculations. In recursive form, the schemes are designed to yield parameter estimates with negligible bias without the additional computational effort or instability risks associated with generalized and extended least squares, recursive maximum likelihood schemes, or the method of instrumental variables. Nonrecursive schemes are designed to minimize computational effort in a batch processing situation while at the same time giving some reduction of bias in the state dependent colored noise situation. The novel algorithms have the limitation that they are suboptimal and there is thus a consequent reduction in the speed of convergence for some applications. The merits of the proposed schemes are assessed via simulation studies in this paper and an adaptive equalization application in a companion paper.

1. INTRODUCTION

LEAST squares identification schemes have immediate appeal because of simplicity both in their theory and application and we may refer to various references (Moore, 1978; Solo, 1978; Goodwin and Payne, 1977) for a discussion on some of these. However they suffer two major drawbacks.

*Received 27 May 1980; revised 27 January 1981. The original version of this paper was presented at the 5th IFAC Symposium on Identification and System Parameter Estimation which was held in Darmstadt, F.R. Germany during September 1979. The published proceedings of this IFAC Meeting may be ordered from Pergamon Press Ltd, Headington Hill Hall, Oxford OX3 0BW, U.K. This paper was recommended for publication in revised form by Associate Editor H. Sorenson. tDivision of Applied Mathematics, Brown University, Providence, RI 02912, U.S.A. Previous address: Department of Electrical Engineering, University of Newcastle, Australia. :l:Department of Electrical Engineering University of Newcastle, New South Wales 2308, Australia. 805

806

RAJENDRA KUMAR and JOHN B. MOORE

generalized least squares (Hastings James and Sage, 1969) and extended least squares (Moore and Ledwich, 1979) estimates may diverge in certain colored noise environments. The recursive maximum likelihood method (S6derstr6m, Ljung and Gustavsson, 1978) is even more complex but appears to avoid this latter problem. In the second approach to bias reduction state estimates are employed which are uncorrelated with the noise, as in output-error schemes (Moore and Ledwich, 1979) and the method of instrumental variables (Wong and Polak, 1967; Young, 1976). These schemes are not suitable for closed-loop control applications. In this paper, parameter estimation schemes based on least squares identification and detection ideas are proposed for ease of computation, robustness and reduced numerical difficulties, and bias reduction in the presence of colored noise correlated with the signal generating system states. The algorithms are termed quantized state (QS) schemes. The algorithms are simpler because in the calculations, the state vector at one or two points is replaced by a quantized (clipped) version. Thus some multiplications are replaced by additions giving robustness. Moreover, numerical difficulties associated with ill-conditioning in least squares schemes are avoided to some extent. In some examples of the novel schemes, the condition number of the relevant 'covariance' matrix is reduced by a square root factor which obviates the need for square root versions. Fast versions of the algorithms are also presented which are more robust than fast least squares algorithms which diverge unless high precision arithmetic is employed. In recursive form, the schemes are enhanced in their bias reduction capabilities by using inlier rejection techniques which reject measurements when the prediction error magnitudes are small. In batch processing form, the schemes are designed to minimize computational effort while at the same time achieving bias reduction. Simplified versions are also possible using Toeplitz matrix approximations as explored more fully in a companion applications paper (Kumar and Moore, 1980). The schemes we propose have evolved using ideas from detection theory and least squares theory and from simulation studies. From a limited perspective, the algorithms are merely variations on the method of instrumental variables and outer-error methods using measurement weighting and clipped state techniques. However, we claim that the techniques are not completely ad hoc, nor are they merely variations on the folklore in identification. We would claim that

the attractive performance properties achieved by our simple algorithms, and demonstrated in this paper and in a companion applications paper, arise from systematic application of a novel geometric interpretation of parameter estimation schemes and standard detection ideas. Strong consistency results are available for the algorithms when the noise is uncorrelated with the states. Ergodicity assumptions simplify the convergence analysis. However, at this stage in our research, we are able to explain only heuristically why the schemes work as well as they do in the correlated noise case and support the explanations with sample calculations, analysis and simulations results. A full analysis and theoretical treatment for this case appears tedious, formidable, and perhaps unrewarding from our present perspectives. As in the case of the schemes of earlier literature, there may be some delay between proposal of an algorithm and a more complete theory. In Section 2, a variation on the method of instrumental variables is proposed, but where filtering of the state variable to achieve the instrumental variable is replaced by the computationally simpler process of detection. In Section 3, nonrecursive batch processing schemes are proposed and a full rationale for the QS follows in Section 4 with analysis results. In Section 5, an inlier rejection scheme, viewed as a weighted least square scheme is proposed with weightings zero or one depending on the magnitude of the residuals. In Section 6, comparative simulation results with discussion are given, conclusions and areas for future research are noted in Section 7. For the remainder of this section some preliminaries are briefly noted. For clarity of presentation purposes, we restrict attention to the simple class of signal models

zk=x~O+v~

(1)

where zk are the scalar measurements, xk are known states, 0 is the unknown parameter and vk is a zero mean noise process with unknown noise model or spectrum. For the least squares index ~'~(Zk--X'kO) 2, the minimization is achieved by the 'least squares' parameter estimate

/m

~-lm

o. iXo ,) x° XkZk ~ m

XkXk

+

)'

XkXl

(~m - I

Xk(Zk--X~,Om-1 )

(2)

Detection techniques in least squares identification

known (Finnegan and Rowe, 1974) that with wh'ite noise inputs to the system generating x~ and ~/k that the nonsingularity condition is satisfied.

with estimation error Or,,= 0-/~m m

Or.= -

x,x~

~,v,= o

Under reasonable persistently exciting conditions the error Or,, convergence almost surely to some value Or~. With ok zero mean and white, then 0~ is 0 itself and the parameter estimate is consistent (Moore, 1978). With vk zero mean, as in our assumption above, but colored, 0® 6 0 and Or~= 0 - 0 ® represents the bias. Under appropriate ergodicity assumptions, the bias has the convenient expression Oroo= - E [ xkx'd - *E[ x~vd.

(4)

2. D E T E C T I O N A P P L I E D T O T H E STATE V E C T O R

The algorithms proposed in this section are termed quantized state (QS) schemes. For pedagogical reasons we present the schemes initially as variations of the method of instrumental variables. 2.1. The method of instrumental variables Consider the situation where the state x k is replaced in (2) by some modified Xk as follows:

/~,,=

-

,

XkX ~

X~Z~ = ~ m - 1

~x'~

~.(z.-x'.O._,)

2.2. A clipped-state pseudo-instrumental variable approach In this section the key idea studied is to generate a pseudo-instrumental variable by passing Xk through a detector (quantizer) to yield a 'clipped state' vector as

g'~=[sgnx~**sgnx~Z~...sgnx~ "~]

(8)

where x~° is the ith element of the n-vector xk. Correlation between £~ and vk may still exist. However, as a later analysis example illustrates, this correlation is less than that between xk and vk, and so a reduction in bias relative to that of a least squares scheme is expected. Put another way, the quantizer will obscure certain correlations between Xk and Vk thereby reducing the overall correlation. This is in contrast to the instrumental variables approach where the bias is completely eliminated under some conditions, but at a cost in complexity. The recirsive form of the resulting algorithm, termed a quantized-state algorithm is a follows:

The quantized-state (QS1) algorithm. With xk as in (8)

~. =a._, +K.~.,~.=z.-~'.#._,

0

+

807

(5)

K s = Q . _ , # . [ 1 + x'~Q._ 1~.,]-* = Q . # . Q. = Q . - , - Q . - ,J/.[ 1 +x'~Q,._ l#,.] - *x~Q m_ ,.(9)

where the relevant inverse is assumed to exist. Then the estimation error is 0r, = -

~kXl

~kVk

(6)

o

The quantized-state (QS2) algorithm. With ~/k as in (8)

and the bias, under ergodicity assumptions is Or. = - EE ~x'd - 'EE ~vd.

A variation of the method of instrumental variables is to replace Q~ in the recursive form by a symmetric (~k=~#k#~,. The corresponding quantized-state recursions are:

(7)

~. =O._, + ~ . ~ . , ~ . = = . - ~ ; # . _ , In the method of instrumental variables (IV), ~ is some state vector, the instrumental variable, correlated with Xk but uncorrelated with the noise vk. The state ~k is frequently a delayed estimate of Xk or is generated independently of z~ in a parallel state model. With E[~kx'~] nonsingular and E[~kvk]=0, then Or~=0 and there is zero bias even for the case when vk is colored. A limitation of the approach as shown by S6derstr6m (1974) is that the nonsingularity assumption can fail even when the inputs to the system with state x k are persistently exciting. It is

~ . = Q._ ,~.[I + ~'.Q._ ,~.] -' = Q.~. (io)

This algorithm is not studied in detail in this paper for space limitations. Comparative performance simulation results for various algorithms when used for adaptive equalization in digital communication are studied in the companion paper.

808

RAJENDRA KUMAR and JOHN B. MOORE

2.3. Fast version of QS algorithms The derivation of fast version of the QS algorithms follows precisely that of Ljung and Falconer (1978), with substitution of the appropriate terms with quantized versions. A more sttractive approach for the QS recursions is via Toeplitz approximations which is presented in a companion paper (Kumar and Moore, 1980).

Remarks 1. The algorithm has the general structure of a least squares algorithm but is in fact simpler to implement. For an n-vector xk, and setting n additions as 1 multiplication, the effective number of multiplications for LS, QS1, QS2 are 3n 2 + ~ n + 2½, 2n 2 + 4n + 3, ½n2 + 3½+ 2 respectively. For the fast least squares we have 10n + 16 and for the Toeplitz versions of QS2 (6n + 2) of these operations. 2. Numerical difficulties for the recursive algorithms of interest here are influenced by the condition number of the matrices Qm, Qm, Rm as m becomes large, for QS1, QS2, and LS, respectively. General results appear difficult to obtain, however it is reasonable to take as a guideline a consideration of the Gaussian ergodic case with zero mean, and study specific examples. Appendix A studies two such examples, from which we can conclude that the condition number of E[~x'] is, in rough terms, the square root of the condition number of E[xx']. The numerical difficulties associated with the QS1 algorithm are therefore expected to be considerably less than for the least squares scheme, and in rough terms are expected to be comparable to the more sophisticated square root least squares schemes. The scheme QS2 will have enhanced numerical robustness. It is also clear that in passing from a LS scheme to a QS scheme, multiplications are replaced by additions and thus less numerical problems are expected. 3. In the update equation for Q,,, the inverse I-I +x~,Qm_x~j -1 exists with probability 1, and thus there is no concern about the existence of , -I exists for m >1 for some 1 is Qm. That (~o XkXk) assumed in the analysis to follow. 4. One expects that, in general, sgnx[ ~ is less correlated with vk than Xk is, and thus that the bias term (6) is less than for least squares (2). Assuming for the moment that Vk is uncorrelated with "~k' then I~=E[~kx'k]-lE[~kvk]=O as long as E[~kX'R] is nonsingular under ergodicity. Using the results of Appendix A we see that in the case when xk is zero mean and Gaussian then m

E[:~kx~,]=

-

diag{E[xk ] .... E[xk ] }-l/z

x E[xkx'd.

Thus with E[XkX'k]>O, then E[~kx'~]~O, in contrast to other realizations of the method of instrumental variables. We conclude that when Vk is correlated [uncorrelated] with ~k, then the bias term ~]oo=E[XkX'k]-lE(.~kVk] is small [zero] if E[~kv~] is small [zero] and E[XkX'k] > 6I for 6 >>0. Other analysis results are given in Section 4 which study bias reduction and also convergence in the absence of ergodicity assumptions. The case when ~k is chosen as a clipped instrumental variable rather than a clipped-state variable is also studied. 5. Simulation experience shows that there is typically less than a factor of two difference in terms of convergence rate between the QS1 s c h e m e and the optimal one, and perhaps more for QS2. Simulations also show that in the QS algorithms bias is invariably reduced in the colored noise case over a wide range of color variations. A bias reduction of 50% is a typical value for autoregressive moving average (ARMA) models over a wide range of noise situations. The question as to why the QS algorithms work as well as they do is left to Section 4 in which a rationale for the algorithm is given within the theoretical framework built up in the next Section 3 for nonrecursive schemes. 3. DETECTION AND NONRECURSIVE ESTIMATION In this section we propose novel parameter estimation schemes which apply detection ideas in a nonrecursive approach where considerable computational savings can be achieved. The bias reduction capability of the new schemes turn out to be significant only in the low noise situation. A case when the state xk is discrete is considered first as a lead in to the more general case. (A) Discrete-state case Consider the case when the n-vector x k can take only the values ~102..-O. where the Oj are linearly independent. Consider now the set of m e a s u r e m e n t s up to time n and the collection of n subsets {k/x~=Oj} denoted S~, containing m~ elements. Then zk = Oj0 + vk for k E Sj for all j, and moreover, - -1

1 ~ _ vk, j = l , 2 .... n. (11) ~ zk=dpj'0 +-~-

mj ~ s ~

'"j k~s

Now observe that when v~ is uncorrelated with xj then

lim 1-- ~ vh=O for j = l , 2 .... n. mj~ ~ mj k~S~ T h i s property suggests s c h e m e e v e n w h e n v h is w i t h xj.

(12)

the following estimation correlated to some e x t e n t

Detection techniques in least squares identification

809

Estimation scheme

Thus,

The nonrecursive parameter estimation scheme we propose for the process model (1) in this discrete-state case is to solve the set of simultaneous equations (11) neglecting the noise term. Thus

E[xkx'k]- IE[xkv~,]= 0 if and only if E[vj, lx k = ckj]

1

1

x zh =- x

Lml L e s

,,.2 keS 2

t

n

'

x zh]

keS

I

(13) where ~ = [ ~ 1 ~ 2 . . . ~ , ] assumption. Likewise

has

an

inverse

by

X Vk ±~12 k~ 2 k''" Lint keSt

the

least

squares

bias

term

= 0 for all j. (B) Case of states in a continuous range The concepts outlined above can be applied using transformations to the case when the state vector belongs to a continuous range to achieve bias reduction as now outlined. When the state xk belongs to a continuous range, the E" space can be partitioned into n biconical sections Sr, so that for each k, x~ can be classified as belonging to one of these sections. Consider now the set of equations corresponding to (11) as , ( 11 X zh~=( 11 X x.~O+( 11 ~. v,~, \mj k,sl ,] kmj ~,s~ / \mr h~sj /

j = 1,2,...n. (15)

Computational advantages The scheme proposed requires m(n+l) additions together with one n x n matrix inversion at the time instants m of interest. We see immediately that this represents a significant reduction from the nonrecursive least squares approach of (2) which requires in addition to the above computational effort m[n(n-1)/2+n] multiplications and rain(n- I ) / 2 - 1 ] additions. Moreover, the condition number of the matrix to be inverted in the proposed scheme is approximately the square root of the corresponding one for the least squares case which reduces the chances of numerical difficulties.

Consistent estimation

E[ch Ix~ = 4'r]-

Then the noise condition for unbiased p~ameter estimation for the scheme (13) is that E[v~lx k = 0~] = 0 for all j. This consistency condition is in essence the same as for least squares algorithm. To see this note that 0r are linearly independent, and

E[vhxd = ~ {P(¢r)E[v~l x~ = ~bj]}~b r. r=l AUTO 17:6 - C

A transformation One possibility, assuming for the moment that the state covariance matrix E[xkx'k] is available and time invariant is to transform the state as

Ok =P- t/2xh, zh =0~,0+vh, 0= (Pt/2)'O (16)

From (14), we have immediately that if (12) is satisfied, then lin%,_.®~.,=0. There is a persistently excitation assumption implicit in (12) since we assume mi--. oo for each i and thus that xh---~bi for each i, infinitely often. Further under the ergodicity assumption made to keep the theory simple lim 1 ~ ok ~ mj-*~ mr h,sj

This equation can now be solved to yield parameter estimates. It is desirable that the xh, at least to a first approximation, are distributed uniformly in their n-dimensional space so that for each m, mt~mz,~m3...rn,, Transformations on the state vector can be employed to achieve this property.

where P=E[xhx~. Now 0h has the desired property E[0h01] = I , so that its components are uncorrelated with one another and of equal variance. In the usual Gaussian state case, this means that the 0k directions are uniformly distributed over the 0 space. It now makes sense to organize groupings with all 0h in the same group having directions that belong to a specified biconical sector. For example, define the groupings S, as 0. s, iffl0'/'l_-__lOV'l for j = 1 , 2 .... n.

Detection with threshold A refinement is to reject measurements zh where the state 0h is near a boundary of a decision region. A justification for this is given later in the section. Thus for some di~> 0, set

0hea, iffl0[',l~(l+a,)t0£, I

for all j.

810

RAJENDRA KUMAR and JOHN B. MOORE

The relevant-summations of (15) transformed via (16) are now •

1

.~

,

1

1 ~°k'z~=--(E~°q~O+--E~v m--~~ m~ ksj / m~ sj

.~

~ (17)

where Sj denotes the set { k I ~/k ~ S j} and ~k°) = 1 if ¢,k°) > 0, ak°) = - 1 if ¢~) < 0 and zero otherwise. The effect of the ~k°) is to ensure that in calculating 8 and thus 0 from the solution of the simultaneous equations, the matrix to be inverted has nonnegative diagonal elements increasing with m.

Simplified transformation A simplification of the above scheme which is useful when P is not available or readily calculated is to replace p1/~ with a constant matrix L which is designed to distribute g~ into the groups Si on an even basis. (Actually, the persistently exciting condition simply requires that x~ ~ S~ infinitely often.) A suitable selection of L for some problems we have found is a lower triangular matrix with unity diagonal elements and negative unity elements in one place below the diagonal elements and zero otherwise. [This corresponds to a differencing transformation as discussed in Kumar and Moore (1979).] A much better selection is P-~/2L where P is a diagonal matrix with ith element the variance of the ith element of Lxk or a sample estimate of this.

iteration. For moving average model identification it is not necessary to apply any transformations.

Convergence conditions A mild extension of the theory for the discretestate case yields that the noise condition for consistent estimation, under ergodicity, is that E[~)vk I ~Ok~ Sj] = 0 for all j. Sufficient conditions are that E[v~ I d/~~ S j, ~ ) = 1] = E[vk [~kk~ S j, ~ ' = - 1 ] = 0 for all j. These conditions are weaker than the corresponding ones for the least squares case which are that E[@kvk]=O, or equivalently, that E[vklg¢ ~ = 0 for the Gaussian zero mean case. To see that E[~kvk] = 0 implies E[vk I ~bk]= O, we note that vk and #k are also independent since they have jointly Gaussian distribution from which it follows that E[vkl~Ok]=E[vk]=O. The required condition can be seen to lie in between the relatively weak condition i.e. E[vk] = 0 which may be rewritten as

[

1

U

=o

and the stronger least square conditions for unbiased estimation, namely E[O~vk]=0. It is difficult to set up a measure of just how much weaker the conditions are.

Bias reduction capability

In summary, the nonrecursive parameter estimation scheme we propose for the process model (1) when xk belongs to a continuous range is to first transform Xk as #k=P-1/2X k and /7 as (PU2)'O (s~ above discussions) to distribute the ¢k direction uniformly (to a first approximation) in the ~bk space. Then to solve the set of simultaneous equations (17), neglecting the noise term, to achieve an estimate of 0 and thereby 0. If the state covariance P is not available, then the transformation may be replaced by P-U2L as noted above.

Simulations indicate that in the colored noise situations, a bias reduction of 50% is a typical figure in the low noise situation. The above convergence analysis which indicates that the noise conditions for consistent estimation of the transformation-detection-estimation scheme are weaker than the standard least squares one offers at least some explanation of the bias reduction. A deeper quantitative analysis is not available at this stage, but qualitatively it can be seen that the detection aspect is decorrelating the noise and state vectors. The greater the threshold ~j, the more restrictive are the groups S j, the greater the decorrelation, the greater is the bias reduction but of course the slower is the convergence because more measurements, percentagewise, are rejected.

Application

As for the discrete-state scheme above, there is a significant computational effort advantage over the least squares scheme. The only additional effort here over the discrete-state case is the implementation of the transformations on the state. The P-~/~L transformation which works well in all examples we have simulat¢~l, requires but n multiplications and n/2 additions per

A typical situation where the above estimation scheme is attractive is in channel equalization over a finite test sequence (Kumar and Moore, 1980). For such applications it is important to have computational robustness, minimal computational effort, minimal test sequence length, and minimal bias on the estimates. It should be noted that the nonrecursive scheme

The detection-estimation scheme

Detection techniques in least squares identification here requires considerably less computational effort than even simple stochastic approximation schemes. 4. RATIONALE FOR QS A L G O R I T H M A N D ANALYSIS

In Section 2, the QSI scheme is presented for pedagogical reasons as an ad hoc variation on the method of instrumental variables so as to reduce bias, but such a justification for the scheme falls short of explaining why the scheme converges as well as it does. Also, the existing theory of the method of instrumental variables provides no clear motivation to generate a variation such as the QSI algorithm, since in fact the QS1 scheme does not achieve fully the objectives of this method. The computational aspects of the QS1 algorithm differ markedly from those of the method of instrumental variables and indeed the QS1 algorithm can be applied to advantage to the method of instrumental variables itself, as well as to other schemes such as recursive maximum likelihood algorithms. The advantage is in numerical simplicity and robustness, and in the latter case to reduce any bias. In this section, we first generalize the nonrecursive versions of the parameter estimation scheme of the previous section and obtain a convenient recursive implementation which turns out to be the QS1 scheme of Section 1. The IV method is also derived in the same context. The new interpretation of the method of instrumental variables is of independent interest, but is included here to clarify the relationship to the QSI approach. With the nonrecursive scheme (15) of the previous section in mind, rather than classify x~ as belonging to only one of the biconical sections S~, there is a potential advantage to assigning x~ to each of n groups S~ with weightings given by an n element vector c~. Thus

,

I~

,

(18) The nonrecursive scheme of the previous section can now be interpreted as a special case of this scheme where ck consists of all zeros save the jth element which is unity if xk ~ St. The objective of this more general procedure with its increased computational cost is to speed convergence. This is achieved if the parameter estimation error term (14) is reduced by the more general weighting coefficient selection. To keep

811

our thinking simple, consider the scalar case when ct is correlated with xt and uncorrelated with vk the error term ~, of (18) will be reduced. The same is true for the vector case. Thus we see that ck achieves its convergence speed objectives if it is an instrumental variable. With ck an instrumental variable, the estimate 0,, solved in recursive form is in fact the method of instrumental variables, and is thus computationally far less attractive in either recursive or nonrecursive form than the scheme of the previous subsection. In order to gain advantages in terms of computational speed over the scheme of subsection (B), but without the computational cost of the method of instrumental variables, it makes sense to restrict the elements of c~ to the set { - 1 , 0, 1}. In which case c~ should be selected as a clipped version of an instrumental variable, or more simply, we claim that ck can be selected as a clipped version of x~ itself. The corresponding recursive equations constitute the QS1 algorithm of Section 2. The QS1 and IV algorithms are now seen as more sophisticated versions of the detection approach of the previous section with the QSI scheme being the simpler approach with attractive numerical properties. Some analysis results are now studied.

Bias reduction analysis example The bias reduction capabilities of the QSI algorithm when the states xk of the system are correlated with the additive noise v~ are of interest. For the purpose of illustration, consider the case where the signal model (1) state vector x~ can be decomposed as the sum of a term ~ uncorrelated with vk and a term ~vk for some vector ~ orthogonal to ~ . Also take vk Gaussian (zero mean) with ECv~]= o 2 and restrict attention to the case when the components of ~ are bounded below in magnitude. More precisely we require [~[°[_~6 for some ~ and those i which correspond to nonzero ~(0 (in obvious notation). For the situation when ECxtv~]=O, then of course ~ = 0 and there is a zero bias ~s. This situation arises, for example, when zt is a moving average of white noise v~ and x~=Cvk-l, vt-2 ..... vh-,] with ECv~v~]=O for j~:k. If in addition, vt is zero mean and Gaussian, then also for the clipped-state vector xk, then EC~v~] =0 and the bias ~ s l is zero. More generally, of course, vt is colored and xk is correlated with v~ and the decomposition x~=~vk+ ~ is relevant. Thus for the LS scheme

812

RAJENDRA KUMAR a n d JOHN B. MOORE

Hence

I111___ llcll II txv, 3- 11.

(19)

The bias of the QS1 algorithm denoted /~os~ is now derived as follows. Under ergodicity

~Sl =E[~kX,L]- tE[vk~ ] =

l i m E['~kX~] } - 1

Vk.~~

Now as shown in the Appendix B, this bias is bounded in norm as

I1 '11 <

II

(20)

parameter convergence then of course the plant output will converge to a pseudo-random binary sequence with added noise thereby achieving I¢~k01 ----6 where 62 is the output signal power. When there is a small bias in the parameter convergence then it is plausible that ]~i' I ---6, and the above theory suggests that the bias will then become even smaller.

Strong consistency of lVand QS1 algorithms When ~k is chosen as a clipped instrumental variable then the noise Vk is uncorrelated with ~/k and the QS1 algorithm is a special novel class of instrumental variable methods and can be analysed without the ergodicity assumptions as in Section 2. The following convergence analysis approach is to our knowledge also novel. First consider the fictitious least squares problem of identifying 0 from the measurements Zk=Ot~k"~-Dk. The estimation error according to the standard theory is given from

where

k

I¢'"1 = max I¢'"1

j=l

where

Remarks 1. The bounds (19) and (20) are not necessarily tight but they do point to the power of the QS1 algorithms in that II llt is d ~ e a s e d exponentially as the signal to noise ratio is increased. In contrast, II~Sll decreases as the inverse of the signal to noise ratio. However, for very poor signal to noise ratio defined in the above sense, there may not be any bias reduction over the LS algorithm. 2. The restrictions imposed on ~k are not satisfied for all models. Relaxing this restriction on ~h makes any analysis formidable and it is expected that the exponential term (20) would be weakened. 3. Although in this example the noise distribution is taken to be Gaussian, similar bounds can be derived for other noise distributions. It is to be noted that the precise distribution of x~ is not used in the derivation of the bounds, though if such a distribution is known, a more accurate estimate of the bias can be obtained. 4. It appears that the lower bound restriction I~[l~l_~6 can be achieved for a wide class of signal models using a specific known input test sequence. Theory for this is limited at the moment, but some simulations reported here suggest that the specific signal can be constructed by prefiltering a pseudo-random input sequence with a filter which is an estimate of the inverse of the plant, updated at each iteration as the plant parameter estimates are updated. Under

k

j--I

With V~=Wj+ClW~_I+...CNWi_ N and w~ zero mean, bounded variance, and white* then it suffices to study k

j=l

for each i. Applying the results of Moore (1978) we have with minor adaptations. Lemma 4.1. For the fictitious least squares scbeme above and vh uncorrelated with ~k, then lim~_.® ~ = 0 almost surely under the persistently exciting conditions, for some ~t> 0

1=0

k~oo

Moreover for the case when ~ is a clipped (or quantized state) then in lieu of the former condition oo

0

suffices. *More pr~i~ly E[w~IFt_~]=O and E[w~IFk_~]<_~2 for all k and some ~ where Fk is the a-algebra generated by W i W 2 . . . Wk .

Detection techniques in least squares identification The next observation is that, depending on the definition of ~ ~v

or

~ l = Q k ( ~ f lOk.

This leads to the following extension of the above lemma.

Lemma 4.2. Consider the model (1) and IV and QS1 algorithms as in Section 2. Then with the noise vk uncorrelated with ~/k, there is strong consistency as lim ~v = lim ~/7~sl = 0

813

Observe that with these relationships, the persistently exciting conditions are satisfied with ~k a quantized state, but not necessary for ~ an instrumental variable or a quantized instrumental variable. For the ease of nonstationary processes, refer to Appendix C. 4. Convergence rates can also be extracted from the above theoretical approach. Suffice it to say that at least at the rate k~Q~O.flQi --,0, which in the ergodic case with ~k a quantized state vector is arbitrarily close to 1/k.

I1 11 -.0

a.s. 5. D E T E C T I O N A P P L I E D T O T H E RESIDUALS

under the following persistency of excitation conditions, for some a > 0 ~. i-'~iQ,:~ < oo, tim k-- ®/d(QkQf *Qi) = O. l=O

Moreover, for the case when ~k is a quantized state, in lieu of the former condition the following condition suffices

/=0

Remarks 1. The persistently exciting conditions for QS1 in which ~k is quantized appear less restrictive then in other IV methods where ~k is not quantized. 2. The above results clearly have relevance for the case when vk is correlated with ~k as when ~k is a clipped state vector, but the analysis of this latter case appears tedious and unrewarding. The results of Section 2 assuming ergodicity are all that is currently available for this case. 3. Insight into the above persistently exciting conditions is perhaps best gleaned from a study of them assuming ergodicity. In particular, if the probabilities P(x~° > O) =P(x~° <0) = 1/2 hold, then under ergodicity ,

1

1

--,

.

1

-l

El':~kx~'l = lim -- Q/" , E[xx~ = k--.~tim~ ~k k~

k

and the persistently exciting conditions simplified via lim/~'QkQ/

1

,

k"* 0o 00

oOi--~

E° i-'ll ,ll = Z-T° lie-' ooi-a 0

0

are

In this section, we introduce what might at first appear to be a counter-intuitive technique in order to reduce bias in least squares (LS) or quantized state (QS) identification due to noise correlated with states. In essence, measurements are ignored ff the measurement prediction error (residual) is low. This contrasts the techniques employed in the presence of impulsive noise superimposed on Gaussian noise where measurements with high prediction error are ignored to reduce the disturbing effect of the impulsive noise. We refer to the algorithm as the Inlier Rejection Algorithm.

Inlier rejection (IR ) algorithm Measurements are processed in an IR algorithm as for LS or QS algorithms save that measurements are ignored if the residuals vk are less than a threshold for some 6 > 0 . Experience suggests a threshold value of 6 as = 1. The term xi0 k is one possible normalization in the absence of a more precise measurable one.

61xi0 l

Performance Simulation studies, examples of which are given in Section 6, indicate that bias reduction of an IR algorithm when applied to an LS algorithm can be typically 50 % and considerably greater for a QS1 algorithm. When the IR technique is employed in a QS1 algorithm there is frequently a negligible bias for a wide range of noise environments. Moreover, the algorithm inherits the numerical stability, and computational simplicity advantages of the QSI algorithm. However, there is some reduction in convergence rate due to measurement rejection during the period when the noise dominates the convergence. In the presence of noise spikes, the various algorithms can of course be suitably modified to reject measurements when the residuals exceed a certain high threshold, thus incorporating an outlier rejection facility.

814

RAJENDRA K U M A R a n d J O H N B. M O O R E

Justification

ARMA model in the presence of colored noise given by

First observe that the least squares index m

Zk = - - f l l Z k _ 1 ~ a 2 Z k -

lim 1 Z (Zk- X;,O~S.)2, m~oom

0

+'"

in the colored noise case is by virtue of its optimality, less than or equal to m

lim m ~° (Zk--X'kO)2.

m~oo

+ bmuk-m + Vk"

There is no limitation on the model of Vk; however, the algorithm will yield better estimates of the parameters for some models of v~ than for others. The parameter vector of interest is given by

More precisely

O°=[-al,..., -a,,bl ..... b~]'.

Lemma 5.1. Under ergodicity assumptions " 1" l i m 1 Z (zh -xiO) z = lim - Z (zk- x;,O~s.)~ m--.~o m 0 m~m o

+llEE Oall

(21)

where

O~=P-I/2xk with P=E[XkX'k].

(A) White noise case. In the first example, the noise Vk is considered to be white so as to evaluate a reference convergence rate of the algorithm relative to an optimal least squares estimation. The input signal uk consists of a pseudo random binary sequence of length 127. Both the versions described in Sections 2 and 3 have been simulated for this case. Figure 1 plots the ensemble mean square error 1

Proof. See Appendix D. Consider now the case when measurements are rejected if the residuals are low, as in the IR algorithm. Then (21) may hold only approximately because of lack of ergodicity. But even with (21) holding only approximately, we can see that this rejection process will reduce the difference between 1 ~.(zh_xiO) 2

2. . . - - a a 2 k _ n + b l N k _ 1

and

I1° -°°112 versus the number of iterations k, where N is the number of sequences in the ensemble. It is clear that the performance of all the versions is within a factor of two of the optimal least square algorithm. Optimal grouping of measurements induced by the p - 1/2

m °~ (zk-xi0~s~) z

i0 •

mo

and thus reduce the term ItEEo~b,]ll and consequently the bias O~s~=-(P-l/2)'EE~tvL]. More precise results for bias reduction seem formidable to obtain. At a more intuitive level, consider the case when the QS1 algorithm is nearly converged with some bias b. Then the prediction error is z,-x~0~ ~-b'xt + Vk, Now if this error is amall, then vk -b'xk and we see that vt is highly correlated with XA. Rejecting measuremnts where the prediction error is small thus has the effect of reducing the correlation as far as the algorithm is concerned, thereby reducing the bias. (Recall from Section 2 that bias is proportional to EEx~vd or EE~kvd.) 6. SIMULATIONS AND RESULTS

E n s a n ~ e ~ i u i b ; , ~ 2 5 scnulotim run~ N o r a ~ ' m c e OS

io-I

i

B w

p"t~ ~am

-

2OO

4OO

Algorithm performance The performance of the various algorithms is evaluated for the estimation of the parameters of

N

600

8OO

lOGO ~ 0

~,00 EO0

~00

NO, of ~ r a t t ~ s

FIG. 1. ~oond-order

ARMA model identification (white noisel.

Detection techniques in least squares identification transformation on the state vectors as in Section 3 gives the best convergence rate for the suboptimal schemes in the white noise case. (B) Moving average of white noise case. The QS1 algorithm with and without a prediction error threshold has been used for the estimation of at and b~ parameters in varied colored noise conditions. The results for some of these are reported below. The variance of the noise E[v~] and variance of the input signal are both set equal to unity to represent a typical medium noise level situation. Consider the cases v~= eL- eL-

1 +

0.2eL_ 2

(noise model 1)

VL= eL-- ek- 1 + 0.2eL-2- O.leL- 3 -0.3eL_4 + 0.1eL_ s +0.8e~_ 6 model 2)

(noise

where eL denotes a white noise sequence. Figure 2 plots the ensemble mean square error for QS1 and RLS algorithms initialized by P ffil. In the first 400 iterations, no threshold on prediction error is applied and after that a threshold of 1 is applied on the normalized residuals. However since the states [u,_tuL_2] associated with input are known to be uncorrelated with vL, a matrix weighting rather than scalar weighting is applied, in which xL is replaced by W~L where Wfdiag[wlw2w3w~] where w 3 = w 4 = l and w x and w2 are 0111 if the normalized prediction error is below [above1 the threshold in magnitude.

815

As is inferred from Fig. 2 in about 3000 iterations a reduction in mean square error of a factor of 10 is achieved over the LS algorithm. The improvement continues from there onwards but the marginal improvement versus the number of iterations is quite small. Exponential data weighting yields some improvement in the rate of convergence. A time varying data weighting (Kumar and Moore, 1979) could be used though in the simulation a fixed factor of 0.998 is used for the first 2000 iterations. For varying values of threshold, it has been observed from the simulations that the performance improves as the threshold is reduced from a high value to about 1. We do not present the additional simulations here in this regard due to space limitations. In Fig. 3 another measure of convergence is plotted. For the convergence to true parameter, it is expected that with t3k zk- 0~_ xxk, E [ ~ ] must approach E[v~] as k--. oo. Hence in the figure, the loss function =

1

N

1

N

are plotted versus the number of iterations k (the superscript denotes the sample sequence number). As may be inferred from the figure, the two curves remain very close to each other, thus again indicating convergence.

Ensentle s ~ i s ~ s ~ ~ runs. Noise ~ ' ~ n c e = I IO0 - 12

sk~stics 2 5 runs. Noise vorlance • IO

I.I

/1'

(39 i0-1

os

~

i

I

i t

LO

v,

~jvj

IR ( r n o ~ I)

8 04

QI

to"

I t

L

2

I

3

I

4

I

5

I

6

I

7

I

8

9

N~ of hmfionsAO00

FIG. 2. Convergence of IR algorithm in colored noise.

I

I

I

I

I

i

2

3

4

6

I 7

8

N~ of ilamfiom/lO00

FIG. 3. Convergence of the loss function V.

2

816

RAJENDRA

KUMAR

and

JOHN

B.

MOORE

TABLE I. PERFORMANCE OF Q S AND LS ALGORITHMS FOR NOISE MODEL 1 a I = - 1.5

a t =0.7

b I = 1.0

b 2 ---0.5

- 1.336

+0.5501

1.001

0.6617

RLS

mean

QS1

mean variance

- 1.385 0.315 x 10 - 3

0.5955 0.290 x 10 - 3

0.9927 0.112 x 10 - s

0.6239 0.493 x 10 - 3

IRQS1 mean variance

- 1.512 0.279 x 10 -2

0.67 0.1398 × 10 -2

1.004 0.164 x 10 - 3

0.4935 0.300 x 10 - 2

IRRLS mean variance

- 1.381 0.387 x 10 - 3

0.5815 0.269 × 10 - 3

1.045 0.522 x 10 - 3

0.640 0.772 x 10 - 3

In Table 1 the expected mean and expected variance for the QS1 and RLS algorithms are shown (after 8000 iterations). For the RLS algorithm, only the mean is of interest since the variance becomes very small. For comparison purposes, the RLS algorithm employing a prediction threshold is also included. Thus separate improvement due to quantization of state elements and employing a threshold on prediction error can be clearly discerned. More important is the fact that the two thresholds together yield almost zero bias in this class of identification problems. Also there is some degradation of performance for the noise model 2 with its longer 'tail'. Simulation results for the algorithms of Section 4 are not included for the colored noise case since their relative merit is apparent only in low noise.

adaptive threshold--initially one starts with a high threshold and, as convergence proceeds, the threshold is reduced. The performance of the QS algorithm is somewhat slower than RML1 algorithm as shown in Fig. 2, where the mean square error obtained in applying RML1 algorithm to noise model 1 is also plotted. This discrepancy no doubt results due to partial rejection of a large number of measurements. Note that the mean square error for RML1 also includes the noise parameter error. Processing the measurements in the high noise case When the additive noise is high, it is possible to increase the signal to noise ratio and possibly the noise whiteness by averaging the measuremnts over a finite interval. Thus the modified measurement equation for any of the algorithms proposed can be taken as

(C) More general colored noise. Consider the case when the noise vk generated as in (B) is passed through a nonlinear device. In the example the nonlinearity is given by Iv~[~/2sgnvk. Table 2 gives the estimates achieved after about 10 000 iterations. It is clear that the performance of the algorithm under varied noise conditions but (low to medium signal to noise ratio in colored noise) is quite satisfactory. The convergence rate seems to be slow but that is because the threshold on residuals allows only about 10% measurements to be fully utilized. However, most of the improvement is achieved in the first few thousand iterations as shown in Fig. 2. It is possible to improve the convergence rate by using an

TABLE 2.

z, if I

= /

o+ \iffi 1

v,. i= 1

There is no point in increasing l beyond the time interval over which Xk is significantly correlated with xk- i.

Identification in colored noise with special inputs As discussed in an earlier section, when in the identification specific input signals are permited, then the QS1 algorithm can lead to almost zero bias. In the simulation example of this section for such a situation, we use the same plant model as

PERFORMANCE OF Q S I ALGORrI'HM UNDER COLORED NOISE WITH LONG MEMORY AND ALSO NONLINEAR NOISE MODEL.

Case mean ~C) v a r i a n c e

aI

a,

bI

b2

- 1.443 0.1303 ~< 1 0 - 2

0.6472 0.7 `99 X 10 - 3

0.9979 0.1072 X 10 - 3

0.5581 0.137 X 10- ~

Detection techniques in least squares identification

i0o

~

~

n~o2

817

Summary of performance of QS schemes

FIG. 4. Identification with QS algorithm in colored noise with special inputs.

(1) Convergence rate. The QS1 and related algorithms are within a factor of two or so, typically, of the convergence rate of the optimal least squares scheme. (2) Bias reduction. The QS1 algorithm with IR gives a bias reduction of a factor of 2-10 typically on that for optimal least squares. (3) Computational effort. The QS2 schemes using Toeplitz approximations studied in a companion paper require less computational effort than Fast Least Squares (FLS). (4) Simplicity of Implementation. The QS2 schemes are simple to program on a microprocessor (say) than LS or FLS. (5) Robustness to low precision arithmetic. The QS schemes are considerably more robust than LS, FLS, but full details have not been assessed as yet.

earlier and with the noise having the moving average model vk = ek - ek- 1 + 0-2ok- 2. The input signal to the plant is generated by passing a pseudo-random binary signal with period 127 and having 4 samples per bit period through an inverse model of the plant. This model is updated recursively as the identification proceeds. The output signal to noise ratio in steady state is 18 which corresponds roughly to input SNRE[u2]/E[e2] of 2 being the same as used in the earlier simulation example. When the QS1 algorithm is used with these specific inputs, the parameter estimation error square is plotted in Fig. 4. For the same signal to noise ratio but with pseudo-random binary inputs, the minimum error achieved by the LS algorithm is represented by a dashed line in the figure. It is thus apparent that the bias does become negligible when the QS1 algorithm is used in this procedure. Table 3 shows the parameter estimates achieved after 1000 and 4 0 0 0 iterations for this example.

7. CONCLUSIONS AND AREAS FOR FURTHER WORK We have applied detection techniques to improve least squares algorithms for parameter identification in terms of the trade-offs between numerical difficulties, complexity, speed, stability and bias. The techniques taken separately have been shown to achieve significant (factor of 2) bias reduction in typical examples, and together can achieve dramatic bias reduction. Their advantages over more sophisticated schemes are their inherent simplicity, stability, numerical stability and their wide applicability. This gain is at the expense of a small slow down in convergence rate--in loose terms, a factor of two. The question is now raised as to the extent of improvement for the more sophisticated techniques of generalized least squares, extended least squares, recursive maximum likelihood or prediction error method and the method of instrumental variables when modified using the techniques of this paper. Our simulation studies to date are encouraging. In particular, for the

-

i lE*

J

! la.i

~r

I

IO00

I

2000

I

3000

I~ ofilemli0m

TABLE 3. PERFORMANCE OF

QS

ALGORITHM UNDER COLORED NOISE WITH SPECIAL INPUTS

a I = - 1.5

a a ==0.7

bl = 1.0

b 2 ==0.5

- 1.428

0.636

1.026

0.585

2000 iterations

- 1.459

0.662

0.9902

0.531

After 4000 iterations

- 1.479

0.681

0.989

0.500

E s t i m a t e s after

1000iterations After

818

RAJENDRA K U M A R a n d J O H N B. M O O R E

recursive prediction error case, our simulations show that a quantized state version with a residual quantization facility, and including a bounding of the rate of divergence of Q-~, obviates the need for a stability test and a projection into any stability domain. Acknowledgement--This work was supported Australian Research Grants Committee.

by

the

E[~x']. The (i,j)th element of this matrix can be computed from the above lemma and is given by

To calculate the improvement in the condition number (CN) of the P matrix over the least squares matrix P or that of E[xx'] one has to explicitly compute these for specific examples. We illustrate this for the following two examples EI/2[x(ll] 2 = 1, EII2[x~2)]2 = 10, P,2 =0.5

(1)

where REFERENCES Cramer, H. and M. R. Leadhetter (1967). Stationary and Related Stochastic Processes. John Wiley. Goodwin, G. C. and R. L. Payne (1977). Dynamic System ldent~/ication. Academic Press, New York. Hastings James, R. and M. W. Sage (1969). Recursive generalized least squares procedure for on-line identification of process parameters. IEEE Proc. 116, 2057. Finnegan, B. M. and I. H. Rowe (1974). Strongly consistent parameter estimates by the introduction of strong instrumental variables, IEEE Trans. Aut. Control AC-19, 825. Kumar, R. and J. B. Moore (1979). State inverse and decorrelated state stochastic approximation, Automatica 16, 295. Kumar, R. and J. B. Moore (1980). Adaptive equalization via fast quantized-state methods. IEEE Trans. Comm. (to he published). Ljung, L. and D. Falconer 0978). Fast calculation of gain matrices for recursive estimation schemes, int. J. Control 27, 1. Maschner, J. L. (1970). Adaptive filter with clipped input data. Tech. Rept. No. 6796-1, Centre for Systems Research, Stanford University, California. Moore, J. B. and G. F. Ledwich (1979). Multivariable adaptive parameter and state estimators with convergence analysis. J. Australian Math. Soc. 21, 176. Moore, J. B. (1978). On strong consistency of least squares identification algorithms. Automatica, 14, 505. S6derstr6m, T. (1974). Convergence of identification methods on the instrumental variable approach, Automatica, 10, 685. S6derstr6m, T., L. Ljung and I. Gustavsson (1978). A theoretical analysis of recursive identification methods. Automatica, 14, 231. Solo, V. (1978). Time series recursions and stochastic approximation. Ph.D. Thesis, The Australian National University, Canberra. Wong, K. Y. and E. Polak (1967). Identification of linear discrete-time systems using the instrumental variable approach, IEEE Trans. Aut. Control, AC-12, 707. Young, P. C. (1976). Some observations on instrumental variable methods of time-series analysis. Int. J. Control, 23, 593.

E[x"'x m] PI2 ---EI/2[x(I)]2EI/2[x(2)12

.

Then in obvious notation, cNLS= 100; CN °st = 14.3. El/2[x~t~]2 = I, EII2[x(2)] 2 =30, PI2 =0.8

(2)

CN Ls-- 900¢ CN ~ l = 76. It may he noted that these are the typical practical examples when one is trying to identify the parameters of an ARMA process with or without exogenous inputs--especially so for the higher signal to noise ratio case. In this latter situation, certain x ~ represent the output of the plant (high variance) whereas others may be the input noise samples (low variance). One may notice that when the CN for the LS algorithm is small, the QSI algorithm does not provide any advantage.

APPENDIX B: BIAS CALCULATIONS Consider the situation where the state vector x~ is correlated with the measurement noise vt, and xt is decomposed as x t f ~ v t + ~ t for some vector ( and ~ with E r ~ v , l = 0 . Assuming that i~[°[~6 for some 6 and those i for which ~0 is non zero, we derive a bound for ~ 1 as follows. Now by simple manipulations and denoting the set { k : s g n x [ ° ¢ s g n ~ °} as r

~,~"= Y v, sgn Ck ,,,+ Y'. vk sgn x,,,, 0

k~

ke~

= ~ v~ sgn ~i" - y. v~sgn~" 0

It~K

+ Y v, sgn xl') k,~K m

= ~ v, sgn ~(i) ~, 4_ 2 ~ v~sgn xk(1) 0

ke~

m

APPENDIX A: CONDITION NUMBERS OF Q, In this appendix, we study the condition number for the matrix 1 / m ~ ' ~ x ' t for the case when Xk is zero mean and ergodic with Gaussian distribution. We first state below a useful lemma from Mashner (1970). Lemma. Let u and v be two jointly normal random variables with

E{v} =0, E{u 2} =~,~, E{v 2 }=a~, E{uv} =po,o~.

= ~ v, sgn ~L"+2 ~ k.i sgn ~.,. 0

keK

The first two equalities are obvious, the third follows since for k f g , - s g n ~t~°)_. . .°s. . -,,"). The forth identity follows, as over the set r, the sgn/~o = _ sgn (((°vD otherwise one cannot have sgn ~o ÷sgn x~°. Since I~°1~6, we obtain with P denoting probability lim 2 ~ N,l=2etlo, i i i¢%, I >6]P[kex]. m~oo

mkejt

Then Now due to the Gaussian distribution of noise t~

Now under the above assumptions the condition number of the (1/k)Q~ ~ (or Q~) matrix will approach that of the matrix

a

LF, I°'J %--' J~'"]- ~I x/(2n) x exp ( - y : / 2 ) d y

D e t e c t i o n t e c h n i q u e s i n least s q u a r e s i d e n t i f i c a t i o n

i I~"t* F -a (~-~--e, pL~

~

aiad the time t has been dropped in the integral. Further, ~b denotes the gaussian density function and ® its indefinite integral. It is clear that if the process x has as its elements the sampled values of such a nonstationary process, then conditions on the function re(t), r(t,u) and the sampling rate could be derived to obtain divergence of P-~ in all its eigcnvalues. Obviously the problem is very complex, and is not pursued here. However, in intuitive terms it is sufficient that t{Co}--*~ as T--.m and sampling is sufficiently fast for the convergence.

3 j.

Thus since

nre,,d = o. I1~' II= Ilet*,x;~-' Et*k~dll ~m

o J¢~kl

where I¢'"1= m a x I¢,1,1.

[]

I _~_~

APPENDIX C: NONSTATIONARY PROCESSES AND PERSISTENCE OF EXCITATION For the case of nonstationary processes certain properties of the sample functions of the process are required to establish the existence of such an ,,. See for example Cramer and Leadhetter (1967) for a detailed study of the zero crossings of a stochastic process. In the above rderence it is shown that the numher of zero crossings for a continuous nonst~ttionary normal process ~(t) under certain regularity conditions [the mean function ra(t) has continuous derivative, the ¢ovariance function r(t,u) has second mixed partial derivative, and ~(t) and its quadratic mean derivative ~'(t) have nonsingular joint distribution for each t], then if Co denotes the number of zero crossings in 0_~ t < T, we have

T e{Co} = ~ 7~- I (I - pz)tlark(m/a){2~a(rl )+ r/l'[email protected](t/)- I]} dt

819

APPENDIX D: PROOF OF LEMMA 5.1 Let us introduce the transformations O~=P/I'2x k, 0 =(P~I2)'0, with P k = E [ x ~ where ~ has mutually orthogonai components with E[¢kO~] = I. Also define ~k = vk - E [ v . ¢ ; . ] ¢ k

(A. I)

EC~,~'I -- O, E[v/] = ECt~] + IIEtv,¢~]ll 2

(A.2)

from which

Now (1) can be written as • ~ = ~/t (Ok + ECv~&] )+ ~k-

Introducing stationarity assumptions, E[vkO~] is time invariant, and for all m

~=

(i XO

*,q¢; /

(A.3) ,ok = P,

Ok= 0,

¢,,zk = (P'/2),OLs. 0

Thus if ~s converges, so also does 01~s.,and moreover

0

= 0 + Etvkd/,].

(A.4)

wh~c ,..

F~r(t,u) -1

F c3r(t,u) -] j,(t)=L-7~ L /7(t)o(t), ~=(t)=r(t,t)

n(t)

ra( t )'-- 7( t )la(t )ra( t )/o( t ) ~,(t)El _ p % ) ] t a

Thus (A.3) may be rewritten as

z~ = ¢,;,~s + ~k= x~d~s~+ ~k

(A.5)

and substitution into (A.2) yields

Under ergodicity, which implies that (A.4)-(A.6) hold, then (A.6) can be written as (21) and the iemma is established.