# Estimating the Noise Covariance Matrix for Input-Output Filters

## Estimating the Noise Covariance Matrix for Input-Output Filters

Copyright © IFAC Control Science and Technology (8th Triennial World Congress) Kyoto. Japan . 1981 ESTIMATING THE NOISE COVARIANCE MATRIX FOR INPUT-O...

Copyright © IFAC Control Science and Technology (8th Triennial World Congress) Kyoto. Japan . 1981

ESTIMATING THE NOISE COVARIANCE MATRIX FOR INPUT-OUTPUT FILTERS

J.

Gertler

Institute for Computing and Automation, Hungarian A cademy of Sdences , Budapest , Hungary

Abstract. An algorithm is introduced to estimate the zero-shift covariance of additive input and output noise in a situation when the process inputs and outputs are measured and the input-output model of the process is known. It is assumed that the noise to be estimated is uncorrelated in time. The estimates, that are opti mal in a least-square sense, are based upon a measurement error vector easy to compute from the actual measurements. Though the algorithm is rather simple, it is prone to numerical difficulties and, if the process parameters are not exactly known or the noises are not really white, to significant estimation errors. Keywords. Adaptive filtering; Noise-estimation; Least-square estimation; Input-output filtering. INTRODUCTION In two previous publications ( Gertler, 1978; 1979 ) , we reported on a constrained minimum variance input-output filter for linear discrete dynamic systems. This filter estimates the input and output variables, based upon the measurements of the same. The filter utilizes the input-output model of the process and the zero-shift covariance matrix of the white additive input and output noise. The estimates minimize a weighted minimum variance performance index while satisfying the inputoutput equation of the system. In two follow-up papers, computational experience gained with the filter was reported (Gertler, Varga and Sipos, 1980) and some sensitivity problems were discussed ( Gertler and Sipos, 1981 ) . All these preliminary studies assume that the process model and the zeroshift covariance matrix of the noise is known. While the process model is normally obtained from process identification, this is not so much the case for the noise covariance. The present paper is devoted to the estimation of the latter.

works utilize explicitly the structure and properties of the K~lman-filter. The common approach in the referred papers is basing the estimation of noise covariances on the covariance matrices of the coloured innovation sequence, obtained with an approximate ( suboptimal ) filter. In addition, Mehra's survey : 1972 ) reports on a variety of other approaches as well,all associated with the Kalman-filter. The present paper is concerned with input-output filtering. An algorithm will be shown to estimate the zeroshift covariance of additive input and output noise in a situation when the process inputs and outputs are measured and the input-output model of the process is known. It is assumed that the noise to be estimated is un cor related in time. The estimates will be obtained from a measurement error vector easy to compute from the actual measurements. The algorithm is based on a least-square type performance criterion. SYSTEM DESCRIPTION

The problem of estimating noise covariances for Kalman-filtering has been treated in several papers (Mehra,1970, 1972; Carew and Belanger, 1973;Godbole, 1974; Belanger, 1974 etc. ) . All these

611

Consider a multiple-input multipleoutput discrete linear dynamic system with input vector ~ ( t) = [ U1 ( t ) , ... ... , uk ( t ) J T and output vector ~ ( t)= = Yl[ (t ) ""'Ym ( t ) J. Let the system equation be

J. Gertler

612

n

.

n

.

~ ( t ) = ~ g.z-~~(t)- ~ h.z-~y ( t) i=o-~

-

~i

where matrices scalars

~

i=l

(1)

=

( i=O, ... ,n) and

hi ( i=l, ... ,n) are the known

system parameters and n is the order of the system. For the sake of further convenience, re-arrange the system equation as n . ~ F . Z-~X \ t )

i=o=~

=

where ~( t)

is a

0

(2)

I~ ( t ll I

I

=

(3)

L~ ( t t

( k+m) vector and F.

=~

are

=

= [ G. =1

-h . IJ

i=O, ... ,n

~=

~(t) + ~(t)

(5 )

The noises are random, uncorrelated in time, independent of the true system variables and have zero mean.

E{~ ( t )~T ( t )} .

(6 )

Consider a measurement error vector defined as n .

~

~=o

.

F.z =~

-~~

(7 )

x(t)

=

Utilizing Eqns. ( 5 ) and ( 2 ) , this is rewritten as n . -~ (8) ~ F.z r (t ) .

~=o

=~

-

This is an n-th order MA equation. Notice that the measurement error is computed directly from the noisy measurements, utilizing only the system matrices ~i' Find now the relationship between the zero-shift covariance matrix P of the noise and the j-shift ( j=0~1,2, .. ) covariance matrices

rr.-J

= E {i! ( t) i!T (t- j 6t ) }

--

(9 )

of the measurement error. From Eqn. ( 8 ) this is obtained as n

n.

=J

n.

=J

~ F.P i=j=~=

j=O, •.• ,n

~

j>n

i:

( 12 ) is symmetric thus only ele-

IT

=0

ments from its upper triangle submatrix would be included. The other matrices IT. are non-symmetric. =J Now Eqn. ( lO ) can be re-written as

=

=

~ p

-

( 13 )

=

where B is a coefficient ma trix;its elements depend only o n those of matrices [ i ' Obviously, if size ( ~ )( < size ( e ) , Eqn. ( 13 ) cannot be solved for p: If size ( i ) > size ( e ) , least=square solution can be obtained while in the limit-case of size ( ! ) = Eqn. 13 ) can be direct= - size ( p), ly inverted. Calculating the Ele ment s of Matrix B As stated before, matrix ~ depends only on matrices ~i' Unfortunately, no matrix formula is available to describe this relationship. However,it is easy to derive a formula between the elements of ~ and those of the F . -s. =~

.

T

In any matr~x product ~iE ~t' the ( s,t ) -th element is obta~ned as s,t m+k m+k ~ f7,q ft,r pq,r ( 14) ( F . PF T ) = ~ =~==t t r= 1 q= 1 ~ where

f~,q

is the ( s,q )-th element

~

of matrix . 10)

E,

Since ~ is symmetric, p would only contain-elements of the upper triangle submatrix. Also, re~rrange the elements of matrices K. ( j=O, ... ,n ) , columnwise, into a =Jsi ng le ve ctor

IT

ESTIMATIl;JG NOISE COVARIAJ.'JCES FROM THE MEASUREMENT ERROR

~( t ) =

E;

Here

Any non-adaptive filter design will require, in addition to the system matrices ~., the knowledge of the zero-shift-eorrelation ( covariance) matrix of the noise-vector

E=

Equation ( 10 ) seems suitabl~ . for estimating matrix matrices ~l ( j=O, .. .. ,n ) are obtained from the measurements and matrices F. ( i=O, ... ,n ) are assumed known. A=~imilar problem, with respect to the Kalman-filter,was treated by Mehra ( 1970 and 1972 ) in an elegant closed matrix form. However, while in Mehra's problem only the left-hand-side coefficients of the unknown matrix were d ifferent in the consecutive equations, we have different coefficients on both sides. Thus Mehra's elegant solutio n is not applicable here; instead, a more straightforward approach is needed. Rearrange the elements of matrix columnwLse, into a ve ctor g:

Assume that all the system variables are measured and their measurements ~ ( t ) are con. upted by a noise f(t):

=

f( t ) is uncor-

(4 )

mx ( k+m ) matrices.

~( t )

where we utilize that related in time.

F.

and

pq,r is the

=~

(q,r)-th element of matrix P. Now taking into account Eqn. ( lO ) , the ( s,t~th

Estimating the Noise Covariance Matrix

n.

element of matrix n

rr~,t= l:

=J

=J

m+k m+k

l: fs,q

l:

i= j r=l q=l

1.

is obtained as

f~'~ P q,r 1.-J

Thus tfle coe fficient of p n l: f~,q

i=j

1.

q,r

15)

in n~'t is =J

f~'~. 1.-J

Further, remember that ~ is symmetric and only tne upper triangle will appear in vector p. Thus any element of matrix B, belBnging to the ( s,t ) -th element of = IT . ( a row in B ) and the ( q,r ) -th elemertt of ~ ( a column in ~ ) will be b

=

n l: f~,q f~'~

i=j 1.

1.-J

if

q=r ( 16 )

b

if

The Least-Square Solution

E

A least-square estimate for vector will be Derived from Eqn. ( 13 ) .Introauce

E

~ = ~ p

( 17 )

and define a least-square performance index as ~

Q =

( ~-~ )

T

A

~

( J;-J; )

( 18 )

Substituting Eqn. ( 17 ) and derivating with request to g, the solution is obtained as

~ = ( ~T~ ) -l~Ti

( 19 )

In the limit case, when ~ is quadratic, Eqn. ( 19 ) reduces to -e=~-li. As it has been pOinted out, a leastsquare estimate can only be obtained if size ( i ) ~ size ( ~ ) . ~

is an ( k+m ) x ( k+m ) matrix; its upper triangle submatrix contains ( k+m+l ) ~+m ) /2 elements. The IT. matrice~ are of siz~ mxm, thus an-Jupper tr1angle ( for ~o ) contains (m+l )m/2 elements while the other ( full ) matrices mxm elements. Accordingly, the above condition is expanded as ( m+l )m 2

+

n.m

2

~

(k+m+l ) (k+m) 2

( 20)

An Extended Least-Square Solution

If matrix B is of less then full rank ( i.e. if its rank is less than the number of columns ) , the solution for Eqn.(13) is undefined even if condition ( 20) is met. In terms of the least-square approach this means that the derivatives of ( lb) ara not independent. To handle such a situation, some additional information or assumption is necessary. One possibility is to uti-

613

lize some pre-estimate p of vector p in an extended least-square perform= ance index: A ~ ) T ~-! A ~) + ( g-~ A - )T~ ( g-~ _ -) Q ' = \' J;-! ( 21 ) S=diag ( c , · · .c l " · · )· The solul tion for such a performance inde x is obtained as

where

~ = ( ~T~+~ ) -l ( BTi+cp )

-

- - -

= - ==

( 22 )

Normally only one or a few elements c l in matrix ~ are non-zero, according to the situation that pre-estimates are available only for one or a few elements of vector p. A plausible preestimate is zero for some off-diagonal element of matrix ~. The actual value for any non-zero-c can be sel lected by numerical cond1tioning c onsiderations. It is also possible to handle the rankdefect problem by size-reduction, that is by simply eliminating the unknown for which a pre-estimate is assumed. The Numerical Condition of the Solution In addition to the true rank-def e ct discussed above, the special structure of matrix ~ makes the algorithm prone to numerical ill-condition. It can be seen from Eqn. ( 16 ) that any column of matrix ~ is derived from one or two columns of-matrices F .. In par=1 ticular, elements in a column ( q,r ) of ~ are obtained from products of the f·qf·r kind while those in a column ( q,q ) from products of the f·qf·q kind. This implies that if the elements in column q of matrices F. are small =1 relative to the other columns, this will be so in all columns (q,r ) of matrix ~ and quadratically so in column ( q,q). This property will then be transmitted to the respective rows and columns of ~T~, resulting in some especially smalI values in the main diagonal and causing numerical ill-condition. Another consequence will be the appearance of large values in the T -1 T respective row of ~~ ) ~,anticipating sensitivity proolems.To deal with ill-conditioned equations one has to resort to special numerical techniques and double precision computation. The extension (21 ) of the least-square performance index can ease this problem, too, if the position and the value of the non-zero elements in ~ are selected in such a way that the -smallest elements in the main diagonal get closer to the others.

J. Gertler

614

SOURCES OF ESTIMATION ERRORS

If a pre-estimate p rent from the true vector

The main sources causing errors in estimating noise covariances are related to those deteriorating the performance of the filter. These have been discussed in a previous paper on filter sensitivity (Gertler and Sipos, 1981 ) . Out of the factors enumerated there, the misestimation of process parameters ~i and the colouredness of the noise are to be considered at this point. In addition, any error of the eventual pre-estimates will also propagate over the rest of the estimated noise vector. To assess the consequence of the misestimation of process parameter~,note that Eqn. ( 2 ) will not hold for ~i1~i. Thus an additional error term will appear in Eqn. ( 8 ) :

'il ( t

n

)

,

,n

~=o

=~

=

,

~=o

=~

=~

=

\ 23 ) ( Here, in~the first term, we neglected that E,1E,. ) Accordingly, an additional err6r=term will also appear in Eqn. ( 10):

is diffethen the

respective estimation error is obtained from Eqn. : 22), with ~=~ p, as ~ ~ T -1. 6g ~-~ = ( g ~ +~ ) S ( ~-~ ) ; 28) COMPUTATIONAL RESULTS The object of the computational studies was to demonstrate the functioning of the algorithm and, especially, to investigate its sensitivity with respect to different errors. In order to clearly separate the error-sources, the investigations were based on calculated rather than simulated data. First a single-input single-output second order system -1

W( z )

a ( 0,978+0,763z ) 1 1-1,3853z- +0,4723z- 2

,

Z FZ-~ ~( t ) + Z ( f,-F, ) z-~x ( t )

~,

was treated with a=l and a=O,l.The noise-matrix P is 2x2, the number of unknowns is-3. The F , matrices are =~

~o =[

0, 978a

-l];

[1= [ 0, 763a

1, 3853J

-0,4723J With these, matrix

n

+ Z i=o

i l,5387a2 2 B = ! 0,7462a

where

~i-j-l = E{~ ( t-i 6 t ) ~T ( t-j 6 t-l 6 t)}(25) is normally nonzero for any shift.This will cause an error-term 6i in vector i, resulting in an additive error in Egns. ( 19 ) and ( 22 ) . Since the error is proportional to the correlations ( 25) of the system variables ~ ( t ) ,it may become rather significant.If the noise ~ ( t ) is coloured, this will cause an estimation error in P through a similar mechanism. Now tfie matrices IT , are actually constituted =J as n

( 26 )

Z i=o

~i-l-j

=

E{~ ( t-i6t ) ~T ( t-j 6 t-l6t)}

( 27 )

is normally nonzero for any shift. Since our estimation algorithm assumes whiteness of the noise, it attributes the full value of IT , to the =J zero-shift matrix P=P . Thus all the = =0

other terms in Eqn.(26) constitute an error-term, resulting in an additive estimation error in

g.

°

~

is obtained as

l

0,1580a

1421 3, 0,2315a -2,0396

-0,4629a 0,4723J Since B is quadratic, the system can be hanaled by direct inversion. The inverse matrix ~-l is 2 2 2 -1 rO,3442/a -0,6304/a 0,4328/a ] B = 10,1453/a -0,2996/a -2,2605/a L

LO,1424

-0,2937

-0,0983

-1

Matrices ~ and ~ clearly show the effect of a "small- column in matrices

L' The system was investigated with P = = diag ( lil). In the error-free situation, the true values for P were obtained with little numericaI error. Four error situations were treated. 1. 10% error in the gain-factor a,with exponentially correlated input u ( t) ; E{u ( t )u(t-i6t)} = 10.0,9 i 2. 10% error is one of the poles -1

( zl =1,6~), with E{u ( t)u ( t-i6t)}= = 10.0,9~. 3. Coloured input and output noise ~l ( t) and ~2 ( t), with E{~(t)~ ~ t-i6t)}=

noises.

0,1

i

for both

615

Estimating the Noise Covariance Matrix

4. Like 3, E {~( t )~( t-i 6 t ) } = 0,3 i

The computational results for are shown in Table 2.

The estimation errors for the four cases are shown in Table 1.

TABLE 2. case 1

case 2

case 3

6Pll

f-l,137

0,024

-0,233

6: 21 6: 22 6P31

0,220

0,003

TABLE 1 . case 4 0,672

6 Pll .-i l

-

-0,597

11
-~ I

I

.-i'1!, -

~

Pll

o ~ , 6 P12 16 13 I

0,003

-0,537

I

-0,064

-0, 0 93

-0,175

-0,522 :

0,378

0,5 3 4

0,457

2,577 •

-0, 0 60 - 0 ,001

22

-0,879

-0,088 2 , 319 , -0,140 - 0 , 0 01 I

i

4, 116 - 0 ,394

As it is seen, the errors caused by process parameter misestimation, and even more the ones due to the colouredness of the noise, are rather significant. Also treated was a two-input two-output third-order system. Now matrix ~ is 4x4, the number of unknowns is lD. The

matrices are

F. =1

[0,2695

F = =0

F = =1

-0,3912

-1

0,0189

0,1838a

0

-:J

[ 0,1781

0,0768

1,2372

0

0,0125

-0,2197a

0

1,2372J

[ -0,0084 F = =2 -0,0006 0

l

0,0666 -0,4029 0,0646a 0

0

-0,4:29]

0,0154

0,0:5J

=3 = [ F

computations were done for a=l and a=lOO. The true noise-matrix is E= =diag ( 0,5;0,5;1;4 ) . The ~ matrix is now 15xlO but its rank-is only 9. Therefore the extended least-square approach is used with preestimate P7=0 and, for the a=l case, c 7 =10 ( all the other coefficients Cl being zero ) . In the a=lOO case numerical difficulties were experienced with the extended least-square approach therefore it was treated by the direct elimination of P7' The following error-situations were investigated: 1. 10% error in all

f~l elements,with J

exponentially correlated independent inputs ul ( t ) and u 2 ( t)'1 E { u ( t )u \t-i 6 t ) }= 10.0,9 . l l 2. 10% error in all f~2 elements,inputs like above. J 3.0,1 error in the pre-estimate P7 ( the true value being P7=0,1).

a=l

6~32

7,529 -1,576

-0,076 0,017

-4,577

0,043

-0,460 i

-0,804 O,lr,O

1,117 1-0,010 10- 7 1 10- 7

0,198 -0,0 {6 10- 7

6P 42

1,538

6P43

-0,016 0,008

-0,165

-0,844

6P44

0,309

-0,003

-0,0 <3

6P33 61341 (6 13 7 )

0,038

The errors in case a=l are significant but, in our opinion, still tolerable. Not so in the a=lOO case, where some errors were found unbearably large ( in the order of magnitude of hundreds ) . Though the a=lOO case may seem rather extreme and artificial, in fact,that is the model of a real-life system ( a motor-generator unit ) . CONCLUSION An algorithm has been introduced to estimate the zero-shift covariance of additive input and output noise in a situation when the process inputs and outputs are measured and the inputoutput model of the process is known. It is assumed that the noise to be estimated is uncorrelated in time. The estimates are obtained from a measurement error vector easy to compute from the actual measurements.

The algorithm is based upon a least square scheme. Though the method is rather simple, it is prone to numerical difficulties and, if the process parameters are not exactly known or the noises are not really white, also to significant estimation errors.Therefore it should not be applied wi~hout careful prior considerations, especially sensitivity analysis related to the particular system. ACKNOWLEDGEMENTS The author is indebted to his colleagues Gy. Varga for the computational support and G. Almasy for his ideas.

616

J. Gertler

REFERENCES

Discussion to Paper 25 . 1

Belanger, P.R. ( 1974 ) . Estimation of noise covariance matrices for linear time-varying stochastic processes. Automatica 10, 267-275. Carew, B. and Belanger, P.R. ' 1973). Identification of optimum filter steady-state gain for systems with unknown noise covariances. IEEE Transactions on Automatic Control AC-18, 582-587. Gertler, J.(1978). Dynamic balancing of measurements - another approach to statistical filtering. 7th Triennial World Congress of IFAC, Helsinki, Finland, pp.21672174. Gertler, J.(1979). A constrained mini mum variance input-output estimator for linear dynamic systems. Automatica 15, 353-358. Gertler, J., F. Sipos and Gy. Varga ( 1980 ) . Computational experience with the constrained minimum vari ance input-output estimator. Automatica 16, 423-427. Gertler, J. and-P. Sipos ( 1981 ) . Performance index sensitivity of the constrained minimum variance input-output estimator. Automatica 17, to appear. Godbole, S.S. (1974) . Kalman-filtering with no a priori information about noise - white noise case: identification of covariances. IEEE Transactions on Automatic-cGntrol AC-19, 561-563. Mehr~K.(1970). On the identification of variances and adaptive Kalman-filtering. IEEE Transactions on Automatic Control AC-15, 175-184. -Mehra, R.K. ( 1972 ) . Approaches to adaptive filtering. IEEE Transactions on Automatic Control AC-17, 693698. --

T . S . Ng (Australia): In your problem formulation, you require two statistically independent input measurements to estimate the system parameters . Could you give a practical example for such a system? 1'i . T.H . 11 . Van den Dungen (Netherlands): Example 1: Assume we measure the concentratlon u of a substance A in an amount of material B using samples of B. Assume the errors in these measurements are due to the fact that the concentration in a sample is not equal to the mean concentration in B, for A is not necessarily homogeneously dis tributed in B. Such errors will be statisti cally independent if the coordinates of the samples taken are random. Example 2: Let V2 be the desired tempera ture in an experiment, which can be adjusted using a thermostat. Assume the real value of the temperature u does not equal V2' l'Ie measure the temperature u and denote this measurement by v 1 • Considering V2 also as a " measurement" of u we might have two statis tically independent "measurements" of u with errors that have zero mean. Example 3 : Consider situations where the measurement principle has a random nature . One might think for example of measuring thickness using a radioactive source and Geiger-Muller counter. If there are no systematical errors, statistically independent measurements can be achieved by repeating the measurements . Example 4: Instrumental variables . Note: In some situations it doesn ' t matter whether the errors have a zero mean or not, because only deviations from the mean are considered . Discussion to Paper 25.2 P . Eykhoff (Netherlands): Could you give some indication of the computation load needed for thi s type of es tima tion , e. g. how it compares to maximum likelihood and/or Instrumental Variable 11ethods? J . Gertler (Hungary) : Since a direct least squares approach is considered, the algorithm is simpler than the extended or generalised least squares or maximum likelihood methods . However , the algorithm is sensitive to numeri cal errors, due to the special properties of equation B, therefore high - accuracy occupational techniques are necessary that, of course , are time-consuming .