Copyright © IFAC Control Science and Technology (8th Triennial World Congress) Kyoto. Japan . 1981
ESTIMATING THE NOISE COVARIANCE MATRIX FOR INPUTOUTPUT FILTERS
J.
Gertler
Institute for Computing and Automation, Hungarian A cademy of Sdences , Budapest , Hungary
Abstract. An algorithm is introduced to estimate the zeroshift covariance of additive input and output noise in a situation when the process inputs and outputs are measured and the inputoutput model of the process is known. It is assumed that the noise to be estimated is uncorrelated in time. The estimates, that are opti mal in a leastsquare sense, are based upon a measurement error vector easy to compute from the actual measurements. Though the algorithm is rather simple, it is prone to numerical difficulties and, if the process parameters are not exactly known or the noises are not really white, to significant estimation errors. Keywords. Adaptive filtering; Noiseestimation; Leastsquare estimation; Inputoutput filtering. INTRODUCTION In two previous publications ( Gertler, 1978; 1979 ) , we reported on a constrained minimum variance inputoutput filter for linear discrete dynamic systems. This filter estimates the input and output variables, based upon the measurements of the same. The filter utilizes the inputoutput model of the process and the zeroshift covariance matrix of the white additive input and output noise. The estimates minimize a weighted minimum variance performance index while satisfying the inputoutput equation of the system. In two followup papers, computational experience gained with the filter was reported (Gertler, Varga and Sipos, 1980) and some sensitivity problems were discussed ( Gertler and Sipos, 1981 ) . All these preliminary studies assume that the process model and the zeroshift covariance matrix of the noise is known. While the process model is normally obtained from process identification, this is not so much the case for the noise covariance. The present paper is devoted to the estimation of the latter.
works utilize explicitly the structure and properties of the K~lmanfilter. The common approach in the referred papers is basing the estimation of noise covariances on the covariance matrices of the coloured innovation sequence, obtained with an approximate ( suboptimal ) filter. In addition, Mehra's survey : 1972 ) reports on a variety of other approaches as well,all associated with the Kalmanfilter. The present paper is concerned with inputoutput filtering. An algorithm will be shown to estimate the zeroshift covariance of additive input and output noise in a situation when the process inputs and outputs are measured and the inputoutput model of the process is known. It is assumed that the noise to be estimated is un cor related in time. The estimates will be obtained from a measurement error vector easy to compute from the actual measurements. The algorithm is based on a leastsquare type performance criterion. SYSTEM DESCRIPTION
The problem of estimating noise covariances for Kalmanfiltering has been treated in several papers (Mehra,1970, 1972; Carew and Belanger, 1973;Godbole, 1974; Belanger, 1974 etc. ) . All these
611
Consider a multipleinput multipleoutput discrete linear dynamic system with input vector ~ ( t) = [ U1 ( t ) , ... ... , uk ( t ) J T and output vector ~ ( t)= = Yl[ (t ) ""'Ym ( t ) J. Let the system equation be
J. Gertler
612
n
.
n
.
~ ( t ) = ~ g.z~~(t) ~ h.z~y ( t) i=o~

~i
where matrices scalars
~
i=l
(1)
=
( i=O, ... ,n) and
hi ( i=l, ... ,n) are the known
system parameters and n is the order of the system. For the sake of further convenience, rearrange the system equation as n . ~ F . Z~X \ t )
i=o=~
=
where ~( t)
is a
0
(2)
I~ ( t ll I
I
=
(3)
L~ ( t t
( k+m) vector and F.
=~
are
=
= [ G. =1
h . IJ
i=O, ... ,n
~=
~(t) + ~(t)
(5 )
The noises are random, uncorrelated in time, independent of the true system variables and have zero mean.
E{~ ( t )~T ( t )} .
(6 )
Consider a measurement error vector defined as n .
~
~=o
.
F.z =~
~~
(7 )
x(t)
=
Utilizing Eqns. ( 5 ) and ( 2 ) , this is rewritten as n . ~ (8) ~ F.z r (t ) .
~=o
=~

This is an nth order MA equation. Notice that the measurement error is computed directly from the noisy measurements, utilizing only the system matrices ~i' Find now the relationship between the zeroshift covariance matrix P of the noise and the jshift ( j=0~1,2, .. ) covariance matrices
rr.J
= E {i! ( t) i!T (t j 6t ) }

(9 )
of the measurement error. From Eqn. ( 8 ) this is obtained as n
n.
=J
n.
=J
~ F.P i=j=~=
j=O, •.• ,n
~
j>n
i:
( 12 ) is symmetric thus only ele
IT
=0
ments from its upper triangle submatrix would be included. The other matrices IT. are nonsymmetric. =J Now Eqn. ( lO ) can be rewritten as
=
=
~ p

( 13 )
=
where B is a coefficient ma trix;its elements depend only o n those of matrices [ i ' Obviously, if size ( ~ )( < size ( e ) , Eqn. ( 13 ) cannot be solved for p: If size ( i ) > size ( e ) , least=square solution can be obtained while in the limitcase of size ( ! ) = Eqn. 13 ) can be direct=  size ( p), ly inverted. Calculating the Ele ment s of Matrix B As stated before, matrix ~ depends only on matrices ~i' Unfortunately, no matrix formula is available to describe this relationship. However,it is easy to derive a formula between the elements of ~ and those of the F . s. =~
.
T
In any matr~x product ~iE ~t' the ( s,t ) th element is obta~ned as s,t m+k m+k ~ f7,q ft,r pq,r ( 14) ( F . PF T ) = ~ =~==t t r= 1 q= 1 ~ where
f~,q
is the ( s,q )th element
~
of matrix . 10)
E,
Since ~ is symmetric, p would only containelements of the upper triangle submatrix. Also, re~rrange the elements of matrices K. ( j=O, ... ,n ) , columnwise, into a =Jsi ng le ve ctor
IT
ESTIMATIl;JG NOISE COVARIAJ.'JCES FROM THE MEASUREMENT ERROR
~( t ) =
E;
Here
Any nonadaptive filter design will require, in addition to the system matrices ~., the knowledge of the zeroshifteorrelation ( covariance) matrix of the noisevector
E=
Equation ( 10 ) seems suitabl~ . for estimating matrix matrices ~l ( j=O, .. .. ,n ) are obtained from the measurements and matrices F. ( i=O, ... ,n ) are assumed known. A=~imilar problem, with respect to the Kalmanfilter,was treated by Mehra ( 1970 and 1972 ) in an elegant closed matrix form. However, while in Mehra's problem only the lefthandside coefficients of the unknown matrix were d ifferent in the consecutive equations, we have different coefficients on both sides. Thus Mehra's elegant solutio n is not applicable here; instead, a more straightforward approach is needed. Rearrange the elements of matrix columnwLse, into a ve ctor g:
Assume that all the system variables are measured and their measurements ~ ( t ) are con. upted by a noise f(t):
=
f( t ) is uncor
(4 )
mx ( k+m ) matrices.
~( t )
where we utilize that related in time.
F.
and
pq,r is the
=~
(q,r)th element of matrix P. Now taking into account Eqn. ( lO ) , the ( s,t~th
Estimating the Noise Covariance Matrix
n.
element of matrix n
rr~,t= l:
=J
=J
m+k m+k
l: fs,q
l:
i= j r=l q=l
1.
is obtained as
f~'~ P q,r 1.J
Thus tfle coe fficient of p n l: f~,q
i=j
1.
q,r
15)
in n~'t is =J
f~'~. 1.J
Further, remember that ~ is symmetric and only tne upper triangle will appear in vector p. Thus any element of matrix B, belBnging to the ( s,t ) th element of = IT . ( a row in B ) and the ( q,r ) th elemertt of ~ ( a column in ~ ) will be b
=
n l: f~,q f~'~
i=j 1.
1.J
if
q=r ( 16 )
b
if
The LeastSquare Solution
E
A leastsquare estimate for vector will be Derived from Eqn. ( 13 ) .Introauce
E
~ = ~ p
( 17 )
and define a leastsquare performance index as ~
Q =
( ~~ )
T
A
~
( J;J; )
( 18 )
Substituting Eqn. ( 17 ) and derivating with request to g, the solution is obtained as
~ = ( ~T~ ) l~Ti
( 19 )
In the limit case, when ~ is quadratic, Eqn. ( 19 ) reduces to e=~li. As it has been pOinted out, a leastsquare estimate can only be obtained if size ( i ) ~ size ( ~ ) . ~
is an ( k+m ) x ( k+m ) matrix; its upper triangle submatrix contains ( k+m+l ) ~+m ) /2 elements. The IT. matrice~ are of siz~ mxm, thus anJupper tr1angle ( for ~o ) contains (m+l )m/2 elements while the other ( full ) matrices mxm elements. Accordingly, the above condition is expanded as ( m+l )m 2
+
n.m
2
~
(k+m+l ) (k+m) 2
( 20)
An Extended LeastSquare Solution
If matrix B is of less then full rank ( i.e. if its rank is less than the number of columns ) , the solution for Eqn.(13) is undefined even if condition ( 20) is met. In terms of the leastsquare approach this means that the derivatives of ( lb) ara not independent. To handle such a situation, some additional information or assumption is necessary. One possibility is to uti
613
lize some preestimate p of vector p in an extended leastsquare perform= ance index: A ~ ) T ~! A ~) + ( g~ A  )T~ ( g~ _ ) Q ' = \' J;! ( 21 ) S=diag ( c , · · .c l " · · )· The solul tion for such a performance inde x is obtained as
where
~ = ( ~T~+~ ) l ( BTi+cp )

  
=  ==
( 22 )
Normally only one or a few elements c l in matrix ~ are nonzero, according to the situation that preestimates are available only for one or a few elements of vector p. A plausible preestimate is zero for some offdiagonal element of matrix ~. The actual value for any nonzeroc can be sel lected by numerical cond1tioning c onsiderations. It is also possible to handle the rankdefect problem by sizereduction, that is by simply eliminating the unknown for which a preestimate is assumed. The Numerical Condition of the Solution In addition to the true rankdef e ct discussed above, the special structure of matrix ~ makes the algorithm prone to numerical illcondition. It can be seen from Eqn. ( 16 ) that any column of matrix ~ is derived from one or two columns ofmatrices F .. In par=1 ticular, elements in a column ( q,r ) of ~ are obtained from products of the f·qf·r kind while those in a column ( q,q ) from products of the f·qf·q kind. This implies that if the elements in column q of matrices F. are small =1 relative to the other columns, this will be so in all columns (q,r ) of matrix ~ and quadratically so in column ( q,q). This property will then be transmitted to the respective rows and columns of ~T~, resulting in some especially smalI values in the main diagonal and causing numerical illcondition. Another consequence will be the appearance of large values in the T 1 T respective row of ~~ ) ~,anticipating sensitivity proolems.To deal with illconditioned equations one has to resort to special numerical techniques and double precision computation. The extension (21 ) of the leastsquare performance index can ease this problem, too, if the position and the value of the nonzero elements in ~ are selected in such a way that the smallest elements in the main diagonal get closer to the others.
J. Gertler
614
SOURCES OF ESTIMATION ERRORS
If a preestimate p rent from the true vector
The main sources causing errors in estimating noise covariances are related to those deteriorating the performance of the filter. These have been discussed in a previous paper on filter sensitivity (Gertler and Sipos, 1981 ) . Out of the factors enumerated there, the misestimation of process parameters ~i and the colouredness of the noise are to be considered at this point. In addition, any error of the eventual preestimates will also propagate over the rest of the estimated noise vector. To assess the consequence of the misestimation of process parameter~,note that Eqn. ( 2 ) will not hold for ~i1~i. Thus an additional error term will appear in Eqn. ( 8 ) :
'il ( t
n
)
,
,n
~=o
=~
=
,
~=o
=~
=~
=
\ 23 ) ( Here, in~the first term, we neglected that E,1E,. ) Accordingly, an additional err6r=term will also appear in Eqn. ( 10):
is diffethen the
respective estimation error is obtained from Eqn. : 22), with ~=~ p, as ~ ~ T 1. 6g ~~ = ( g ~ +~ ) S ( ~~ ) ; 28) COMPUTATIONAL RESULTS The object of the computational studies was to demonstrate the functioning of the algorithm and, especially, to investigate its sensitivity with respect to different errors. In order to clearly separate the errorsources, the investigations were based on calculated rather than simulated data. First a singleinput singleoutput second order system 1
W( z )
a ( 0,978+0,763z ) 1 11,3853z +0,4723z 2
,
Z FZ~ ~( t ) + Z ( f,F, ) z~x ( t )
~,
was treated with a=l and a=O,l.The noisematrix P is 2x2, the number of unknowns is3. The F , matrices are =~
~o =[
0, 978a
l];
[1= [ 0, 763a
1, 3853J
0,4723J With these, matrix
n
+ Z i=o
i l,5387a2 2 B = ! 0,7462a
where
~ijl = E{~ ( ti 6 t ) ~T ( tj 6 tl 6 t)}(25) is normally nonzero for any shift.This will cause an errorterm 6i in vector i, resulting in an additive error in Egns. ( 19 ) and ( 22 ) . Since the error is proportional to the correlations ( 25) of the system variables ~ ( t ) ,it may become rather significant.If the noise ~ ( t ) is coloured, this will cause an estimation error in P through a similar mechanism. Now tfie matrices IT , are actually constituted =J as n
( 26 )
Z i=o
~ilj
=
E{~ ( ti6t ) ~T ( tj 6 tl6t)}
( 27 )
is normally nonzero for any shift. Since our estimation algorithm assumes whiteness of the noise, it attributes the full value of IT , to the =J zeroshift matrix P=P . Thus all the = =0
other terms in Eqn.(26) constitute an errorterm, resulting in an additive estimation error in
g.
°
~
is obtained as
l
0,1580a
1421 3, 0,2315a 2,0396
0,4629a 0,4723J Since B is quadratic, the system can be hanaled by direct inversion. The inverse matrix ~l is 2 2 2 1 rO,3442/a 0,6304/a 0,4328/a ] B = 10,1453/a 0,2996/a 2,2605/a L
LO,1424
0,2937
0,0983
1
Matrices ~ and ~ clearly show the effect of a "small column in matrices
L' The system was investigated with P = = diag ( lil). In the errorfree situation, the true values for P were obtained with little numericaI error. Four error situations were treated. 1. 10% error in the gainfactor a,with exponentially correlated input u ( t) ; E{u ( t )u(ti6t)} = 10.0,9 i 2. 10% error is one of the poles 1
( zl =1,6~), with E{u ( t)u ( ti6t)}= = 10.0,9~. 3. Coloured input and output noise ~l ( t) and ~2 ( t), with E{~(t)~ ~ ti6t)}=
noises.
0,1
i
for both
615
Estimating the Noise Covariance Matrix
4. Like 3, E {~( t )~( ti 6 t ) } = 0,3 i
The computational results for are shown in Table 2.
The estimation errors for the four cases are shown in Table 1.
TABLE 2. case 1
case 2
case 3
6Pll
fl,137
0,024
0,233
6: 21 6: 22 6P31
0,220
0,003
TABLE 1 . case 4 0,672
6 Pll .i l

0,597
11
~ I
I
.i'1!, 
~
Pll
o ~ , 6 P12 16 13 I
0,003
0,537
I
0,064
0, 0 93
0,175
0,522 :
0,378
0,5 3 4
0,457
2,577 •
0, 0 60  0 ,001
22
0,879
0,088 2 , 319 , 0,140  0 , 0 01 I
i
4, 116  0 ,394
As it is seen, the errors caused by process parameter misestimation, and even more the ones due to the colouredness of the noise, are rather significant. Also treated was a twoinput twooutput thirdorder system. Now matrix ~ is 4x4, the number of unknowns is lD. The
matrices are
F. =1
[0,2695
F = =0
F = =1
0,3912
1
0,0189
0,1838a
0
:J
[ 0,1781
0,0768
1,2372
0
0,0125
0,2197a
0
1,2372J
[ 0,0084 F = =2 0,0006 0
l
0,0666 0,4029 0,0646a 0
0
0,4:29]
0,0154
0,0:5J
=3 = [ F
computations were done for a=l and a=lOO. The true noisematrix is E= =diag ( 0,5;0,5;1;4 ) . The ~ matrix is now 15xlO but its rankis only 9. Therefore the extended leastsquare approach is used with preestimate P7=0 and, for the a=l case, c 7 =10 ( all the other coefficients Cl being zero ) . In the a=lOO case numerical difficulties were experienced with the extended leastsquare approach therefore it was treated by the direct elimination of P7' The following errorsituations were investigated: 1. 10% error in all
f~l elements,with J
exponentially correlated independent inputs ul ( t ) and u 2 ( t)'1 E { u ( t )u \ti 6 t ) }= 10.0,9 . l l 2. 10% error in all f~2 elements,inputs like above. J 3.0,1 error in the preestimate P7 ( the true value being P7=0,1).
a=l
6~32
7,529 1,576
0,076 0,017
4,577
0,043
0,460 i
0,804 O,lr,O
1,117 10,010 10 7 1 10 7
0,198 0,0 {6 10 7
6P 42
1,538
6P43
0,016 0,008
0,165
0,844
6P44
0,309
0,003
0,0 <3
6P33 61341 (6 13 7 )
0,038
The errors in case a=l are significant but, in our opinion, still tolerable. Not so in the a=lOO case, where some errors were found unbearably large ( in the order of magnitude of hundreds ) . Though the a=lOO case may seem rather extreme and artificial, in fact,that is the model of a reallife system ( a motorgenerator unit ) . CONCLUSION An algorithm has been introduced to estimate the zeroshift covariance of additive input and output noise in a situation when the process inputs and outputs are measured and the inputoutput model of the process is known. It is assumed that the noise to be estimated is uncorrelated in time. The estimates are obtained from a measurement error vector easy to compute from the actual measurements.
The algorithm is based upon a least square scheme. Though the method is rather simple, it is prone to numerical difficulties and, if the process parameters are not exactly known or the noises are not really white, also to significant estimation errors.Therefore it should not be applied wi~hout careful prior considerations, especially sensitivity analysis related to the particular system. ACKNOWLEDGEMENTS The author is indebted to his colleagues Gy. Varga for the computational support and G. Almasy for his ideas.
616
J. Gertler
REFERENCES
Discussion to Paper 25 . 1
Belanger, P.R. ( 1974 ) . Estimation of noise covariance matrices for linear timevarying stochastic processes. Automatica 10, 267275. Carew, B. and Belanger, P.R. ' 1973). Identification of optimum filter steadystate gain for systems with unknown noise covariances. IEEE Transactions on Automatic Control AC18, 582587. Gertler, J.(1978). Dynamic balancing of measurements  another approach to statistical filtering. 7th Triennial World Congress of IFAC, Helsinki, Finland, pp.21672174. Gertler, J.(1979). A constrained mini mum variance inputoutput estimator for linear dynamic systems. Automatica 15, 353358. Gertler, J., F. Sipos and Gy. Varga ( 1980 ) . Computational experience with the constrained minimum vari ance inputoutput estimator. Automatica 16, 423427. Gertler, J. andP. Sipos ( 1981 ) . Performance index sensitivity of the constrained minimum variance inputoutput estimator. Automatica 17, to appear. Godbole, S.S. (1974) . Kalmanfiltering with no a priori information about noise  white noise case: identification of covariances. IEEE Transactions on AutomaticcGntrol AC19, 561563. Mehr~K.(1970). On the identification of variances and adaptive Kalmanfiltering. IEEE Transactions on Automatic Control AC15, 175184. Mehra, R.K. ( 1972 ) . Approaches to adaptive filtering. IEEE Transactions on Automatic Control AC17, 693698. 
T . S . Ng (Australia): In your problem formulation, you require two statistically independent input measurements to estimate the system parameters . Could you give a practical example for such a system? 1'i . T.H . 11 . Van den Dungen (Netherlands): Example 1: Assume we measure the concentratlon u of a substance A in an amount of material B using samples of B. Assume the errors in these measurements are due to the fact that the concentration in a sample is not equal to the mean concentration in B, for A is not necessarily homogeneously dis tributed in B. Such errors will be statisti cally independent if the coordinates of the samples taken are random. Example 2: Let V2 be the desired tempera ture in an experiment, which can be adjusted using a thermostat. Assume the real value of the temperature u does not equal V2' l'Ie measure the temperature u and denote this measurement by v 1 • Considering V2 also as a " measurement" of u we might have two statis tically independent "measurements" of u with errors that have zero mean. Example 3 : Consider situations where the measurement principle has a random nature . One might think for example of measuring thickness using a radioactive source and GeigerMuller counter. If there are no systematical errors, statistically independent measurements can be achieved by repeating the measurements . Example 4: Instrumental variables . Note: In some situations it doesn ' t matter whether the errors have a zero mean or not, because only deviations from the mean are considered . Discussion to Paper 25.2 P . Eykhoff (Netherlands): Could you give some indication of the computation load needed for thi s type of es tima tion , e. g. how it compares to maximum likelihood and/or Instrumental Variable 11ethods? J . Gertler (Hungary) : Since a direct least squares approach is considered, the algorithm is simpler than the extended or generalised least squares or maximum likelihood methods . However , the algorithm is sensitive to numeri cal errors, due to the special properties of equation B, therefore high  accuracy occupational techniques are necessary that, of course , are timeconsuming .