Automatica, Vol. 13, pp. 265 277.
Pergamon Press, 1977.
Printed in Great Britain
Least Squares Estimation of Nonstationary Covariance Parameters in Linear Systems* H. W. B R E W E R t and C. T. L E O N D E S + A near optimal Kalman filter estimates nonstationary covariance parameters with greater accuracy, less complexity, and with less restrictions on system noise than in the past. Key Word Index
Adaptive systems; identification; Kalman filters; optimal filtering: parameter estimation.
considered as unknown constants, either maximum likelihood[2,3] or simple least squares[4] techniques have been applied; maximum likelihood being more accurate but also more complicated than simple least squares techniques. When the covariance parameters have been constrained to vary in a more restricted manner in time than the original state, the application of a Bayesian[5] and a maximum likelihood[6] method has been successful with a degradation in either accuracy or simplicity or both. A suboptimal state noise covariance Kalman filterE12 ] has been utilized to estimate the state noise covariance parameters assuming that these parameters are linearly varying in time without error. The state noise covariance parameter estimates did converge slowly after steady state conditions were reached in the original filter for the examples considered. When the covariance parameters have been allowed to vary more generally in time, simple least squares[4] or m a x i m u m likelihood[2] techniques can eventually converge. The accuracy of these last two techniques increases the more slowly the covariance parameters are varying in time. All of the above methods, which will hereafter be referred to as 'previous algorithms', can yield unbiased estimates which can converge. The convergence rate and accuracy of these estimates, however, have gone down with the relaxation of constraints on system noise. In addition the complexity of the algorithms yielding these estimates has generally gone up with the relaxation of these constraints. It is the purpose of the algorithms described in this paper to achieve optimal accuracy rapidly while at the same time allowing both the state and measurement noise covariance parameters to vary similarly in time as the original state, without the necessity of specifying the system noise distribution. The covariance filter involved estimates the covariance parameters of the state estimate, state noise, and measurement noise; and it is of a complexity similar to that of the Kalman filter. It is different from the state noise covariance filter discussed in[12] in being more general and accurate. The form of the state noise covariance
SummaryLeast squares estimation techniques are employed to overcome previous difficulties encountered in accurately estimating the state and measurement noise covariance parameters in linear stochastic systems. In the past accurate and rapidly converging covariance parameter estimates have been achieved with complex estimation algorithms only after specifying the statistical nature of the noise in the system and constraining the time variation of the covariance parameters. Weighted least squares estimation allows these restrictions to be removed while achieving near optimal accuracy using a filter on the same order of complexity as a Kalman filter. Allowing the covariance parameters to vary in as general a manner in time as the state in a linear discrete time stochastic system, and assuming that a Kalman filter is applied to this system using incorrect knowledge of the a priori statistics, it is shown how a covariance system is developed similar to the original system. Unbiased least squares estimates of the covariance parameters and of the original state are obtained without the necessity of specifying the distribution on the noise in either system. The accuracy of these estimates approaches optimal accuracy with increasing measurements when adaptive Kalman filters are applied to each system. I.
INTRODUCTION
THE MORE accurately the covariance parameters of a linear stochastic system can be determined, the more accurate are the resulting estimates of a Kalman filter applied to this system. The history of work done in the area of covariance parameter estimation has been that of imposing constraints on the system, solving the restricted problem, then relaxing restrictions to solve a more general problem. For nonstationary systems the distributions on the state, state noise, and measurement noise have been specified beforehand and either Bayesian, m a x i m u m likelihood, or simple least squares techniques have been employed to estimate constant covariance parameters. Bayesian estimation techniques[l, 13] have been applied when the covariance parameters have been treated as random variables and their distributions have also been specified a priori. When the covariance parameters have been more generally * Received 24 May 1976; revised 27 September 1976; revised 23 November 1976. The original version of this paper was not presented at any IFAC meeting. This paper was recommended for publication in revised form by associate editor A. Sage. +Beam Control Department, High Energy Laser Systems Laboratory, Hughes Aircraft, Culver City, CA 90230, U.S.A. SEngineering Systems Department, School of Engineering and Applied Science, University of California, Los Angeles, CA 90024, U.S.A. 265
26(5
H. Cv'. BI~.EWEI~ and C. T. LEONDES
filterin[12] is similar to all extended Kalman filter without modeling error. The form of the covariance filter analyzed hereto is an unbiased K a l m a n filter which has the capability of compensating for errors in modeling the time variation of the covariance parameters through its own input modeling error variance matrix. The covariance K a l m a n filter has zero mean measurement noise and yields unbiased estimates at any point in time; whereas the mean in the measurement noise of the state noise covariance Kalman filter[l 2] due to the errors in the estimated state noise covariance parameters seriously affects the accuracy of the filter. The accuracy of the state noise covariance K a l m a n filter, as a consequence, is heavily dependent on initial conditions: and, as indicated in[12], it is important to initialize the state noise covariance Kalman filter after the convergence of the system K a l m a n filter. The covariance K a l m a n filter on the other hand, is not sensitive to conservative initial conditions and can converge rapidly even in the cases where the system K a l m a n filter never converges. In order to achieve these objectives a covariance system is formulated similar to the original system: covariance system statistics are determined as accurately as feasible; and a filter similar in form to a K a l m a n filter is applied to the covariance system so that unbiased linear m i n i m u m variance estimates of the covariance parameters are obtained in the limiting case when the covariance system statistics are k n o w n without error. Section I1 describes the system and the uncertain system noise statistics. In addition, pertinent properties of K a l m a n filtering which can be applied to the covariance system are discussed. In Section II1, a system model is formulated for covariance parameter estimation. The time variation of the covariance parameters is formulated similarly to thal of the state; covariance state and measurement models and covariance system noise statistics are formulated. In Section IV, nonadaptive covariance parameter estimates are obtained based on the application of K a l m a n filtering to the covariance system model when approximate values of the covariance parameters are known a priori. Expressions for the accuracy of nonadaptive covariance parameter estimates are also formulated in this section. In Section V an algorithm is obtained for the adaptive estimation of covariance parameters when no a priori covariance information is available. Numerical examples are presented in Section VI illustrating the nonadaptive and adaptive covariance parameter estimation algorithms. If. SYSTEM DESCRIPTION Consider the discrete time linear stochastic system
X(i+l)=A(i+l,i)X(i)+F(i)D(i~
tli
Z(i+l)=h(i+l)X(i+l)+u(i+li
(2)
where X (i + l ) is the n dimensional state at sampling instant i + I, W(i) is the g dimensional state noise. Z ( i + 1 ) is the m dimensional measurement, l,(i + I i is the m dimensional measurement noise, and A(i + 1, i), F(i), h(i+ 1 ) are the corresponding dynamic plant matrices. The statistics on this system arc given fi'om the k n o w n mean on the initial state X (0), the variance matrix on the initial state P(0), the nonstationary state noise variance matrix Q(i), and the nonstationary variance matrix on the measurement noise R ( i + I ) . The state and measurement noise are assumed to be zero mean white noise processes uncorrelated with each other or with the state. When the needed variance matrices P, Q, and R are uncertain, a Kalman filter using approximate values of the variance matrices P*, Q*, R* may be applied to this system to yield estimates of the state which are suboptimal and residuals which are usually correlated in time. 1he more accurately the covariance parameters are known m these assumed variance matrices, the more accurate are the resulting estimates of the state and the less correlated are the residuals in time. The problem investigated herein is to estimate these covariance parameters as accurately as possible from the residuals of the suboptimal filter. It is convenient at this point to describe briefly the pertinent properties of a K a l m a n filter, as various forms of the Kalman filter will be applied to covariance system equations similar to ( l ) and (2) to yield different types of estimates in terms of accuracy and computational efficiency. A K a l m a n filter processes system measurements in a sequential fashion to estimate the system state from knowledge of the a priori statistics P*, Q*, R* using the plant matrices in the following algorithm[7]
~((*(i+ I/i+ l t  , { * ( i + l/i)+K*(i+ 1 )V*{i+ 1t t31
)(*(i + 1/i)= A(i + I, i)X*(i/i)
14)
V*(i+ 1 ) = Z ( i + 1 }  h ( i + 1 )32"(i + l/i)
(5)
K*(i+ l )=P*(i+ l/i)hT(i+ 1 ) x [ h ( i + I ) P * ( i + 1/ilhr(i+ 1 )+R*(i+ l)]
P*(i+ l / i ) = A ( i + [, i)P*(i/i)AT(i+ 1, il + F(i)Q*(i)f'r(i) P*(i+ 1/i+ I ) = [ l  K * ( i +
(7)
l lh(i+ I ) ] P * ( i + 1/i) (8)
dynamic
Initial Conditions J¢* (0/0) = J¢ (0), P* (0/0) =P*(0)
19)
Estimation of Nonstationary Covariance Parameters in Linear Systems where ~*(i + l / i + l) is the filtered estimate of the state with an approximate variance matrix of P*(i + 1/i + 1 t: K* (i + 1 ) is the Kalman gain matrix: and V* (i + 1 ) is the measurement residual.? X* (i + l/i) is the predicted estimate of the state at instant i + l given i measurements, and P * ( i + l / i ) is the associated approximate variance matrix• When the assumed covariance parameters are equal to the actual covariance parameters then this filter is optimal in the sense that the filtered estimate of the state is an unbiased linear minimum variance (L.MV) estimate of the state; the filtered estimate is the most accurate estimate of the state for any linear filter. Under these conditions the approximate variance matrices on the state estimates are actual variance matrices, and these matrices are in turn minimum variance matrices. If, in addition, the distributions on the initial state, state noise, and measurement noise are gaussian, the state estimate is a minimum variance estimate. The optimal Kalman filter also yields unbiased weighted least squares (WLS) estimates where the appropriate weighting matrices are determined from the actual a priori statistics[8]. When the assumed a priori variance matrices are not all true and the Kalman gain matrix is not optimal, then the resulting estimates are still unbiased over the ensemble of initial conditions, but their accuracy is degraded. The estimates will still usually converge, but not as rapidly as in the optimal case. In general this degradation in accuracy and convergence rate depends upon the error in the assumed covariance parameters. In the case of no prior knowledge, the uncertainty in the initial state estimate may be assumed to be infinite so that P * ( 0 ) = ~ ; the state noise may be assumed to be zero so that Q * ( i ) = 0 ; and the measurement noise variance matrix may be assumed to be an identity matrix so that R*(i+ l ) = / . In this case the Kalman filter reduces to a sequential simple least squares (SLS) filter which estimates the state so as to minimize the sum of the squares of the residuals, all errors being treated equally. III. C O V A R I A N C E
SYSTEM
FORMULATION
System equations for the covariance parameters will be formulated similar to those for the original state, so that WLS estimates of the covariance parameters may be obtained by applying a Kalman filter to the covariance system. In considering those situations for which the covariance parameters may be assumed to vary in as general a manner as the original state, it is convenient to formulate the covariance parameters in vector notation. Defining U as the vector ofcovariance parameters in the k x k t T h e * n o t a t i o n is used to i n d i c a t e q u a n t i t i e s w h i c h m a y or m a y n o t be o p t i m a l a n d w h i c h m a y o r m a y n o t be a c t u a l . A c t u a l q u a n t i t i e s d o n o t h a v e *.
267
variance matrix U, these parameters are assumed to be dynamically represented in time by
U(i+ l ) = A v ( i + l,i)(J(i)+Fv(i)Wv(i )
(10)
where 0 is a ~ dimensional covariance parameter vector ~.~T =
~ T ~T [U1U 2 ..
• u~T k]
(ll)
0~ being a l x ( k  L + l ) vector of covariance parameters from the diagonal element in the £h row of the variance matrix U ~T
UL = [ULLULL + 1''" ULk]"
(12)
The order of the covariance vector is then
F r o m the properties of a variance matrix U is a symmetric positive semidefinite matrix
U=UV>O.
(13)
In particular U may represent either the state or the measurement noise variance matrix Q or R ; and Wv may represent the error in modeling the state or measurement noise covariance parameters in time, a gQ or gR dimensional vector WQ or WR. This modeling error is nongaussian noise due to the properties of a variance matrix, and (13) defines the constraints on any distribution assigned to this noise. A v and F v are the appropriate transition matrices
A v = A O a ~ x ~ matrix when U = Q A v = A n a r~xrh matrix when U = R F v = F Q a ~ x g e matrix when U = Q
F v = F n a rhxgg matrix when U = R . The statistics on the modeling of the covariance parameters in time are given from the assumed and actual variance matrices on Wv, q* and qv; the assumed and actual variance matrices on U(0) are P}(0) and Pv(O). For consistency with the original state system model, WQ and WR are assumed to be zero mean white noise processes uncorrelated with each other or P(0), 9(0). Reference[9] considers some situations where the variance matrices are diagonal and Fv(i) is an identity matrix. Reference[12] considers (10) in those situations where there is no error in modeling the covariance parameters in time. In going a step further by considering all those situations for which
268
H, W. BREWER and C. r . LEONDES
(10) is representative of the time variation of the covariance parameters, a covariance state model may be formulated similar to the original state model. Several situations could be described such as those situations for which the covariance parameter modeling errors were uniformly distributed between certain time varying equal positive and negative limits. If these limits were such that (itt>0 and small enough so that
 v/ULLU.H < ULJ < +
F(i+l)=~(i+l,i)F(i)+V~(i)W,.(i)
EPT(i+1/i)OT(i+
dP(i+l/i)
q)~(i+ 1,i)=
I
(14)
l )/~ (i+ 1 )] (15)
dP(i+l/'i)
0
.4Q(i+ 1, i)
0
0
0
Fc(i) =

Ag(i+l,i)_J (16)
0
0
Fe(i) 0
0 FR(i)
(17)
(18)
d P [ i + l/i) . . . . . . ~,.,
d P ( i + l/i)~A L
{21) where dUk(i) [0~,
at,.(,5
,
0; .....
l
is an /~ x L matrix of the derivative of the vector of covariance parameters in the k x k variance matrix Uk(i) at time i with respect to the vector of covariance parameters in the L × L variance matrix UL(j) at time.,/and is a function of the parameters in the matrices Oa,O'...... The calculation of this matrix is illustrated in Appendix A.
B. Covariance measurement model The measurement system equation for the covariance parameters evolves from the definition of the residual matrix as a m × m matrix c(i+ l,j}= V*(i+ 1 )V*r(j}.
I22)
If j = i + l the residual matrix may be statistically represented as the variance matrix of residuals plus a zero mean error matrix e(i+ 1, i + l ) = c ( i + l ) = / : ( i + l ) + z X c ( i + l ) ( 2 3 )
where F is a n~ = n + g + m state vector of the vector of the actual variance matrix on the predicted state estimate, the vector of the state noise variance matrix, and the vector of the measurement noise variance matrix; W,. is a g~ = gQ + gR vector of the state noise on the covariance parameters; F,. is anc x g~ transition matrix on this noise; and ~ is a n~ x n~ state transition matrix, partially partitioned into derivative matrices, on the covariance parameters. The derivative matrices may be calculated from the propagation of the actual predicted variance matrix on the original state estimate linearly in time (see [10] for details)
where ~ indicates the error between a quantity and its mean, and the variance matrix of residuals is d(i+ l ) = h ( i + 1)P(i+ l/i)hS(i+ I ) + R ( i + 1). (24) The measurement system equation for the covariance parameters may now be described, from (23), by
((i+l)=Hc(i+l)F(i+l)+A((i+l)
= [ d c(i + ! [ nc(i+l)
+ r(i)Q(i)F~ (i) + A(i + l, i)K*(i)R(i)K*T (i) (19)
(25)
where Hc is a rh x nc observation matrix
P(i + 1/i)= (b* (i + 1, i)P(i/i 1 )0*T(i + 1, i)
AW(i + l,i)
(i + l, i)]P(i/i I)
dP(i+l/'i)~
w~T(i)=FW~_ (i)W~(i)] I
d P ( i + 1/i) P(i + l/'i)= ~ Z 1 ) [ q S *
+
A. Covariance state model From the original system equations, the Kalman filtering equations, and the formulation of the covariance parameters in time the state system equation for the covariance parameters is given by
i20i
which may be written in vector notation as
v' I/LI.U.IJ"
then (13) and the other above conditions on 14(.i would be satisfied and (10) would be applicable.
FS(i+ 1 )=
O*(i+l, i l = A ( i + l , i ) [ l  K * ( i ) h i i ! ]
LdP(i+l/i)
0
d?({+ ! ) ] d / ~ ( i + l ) J (26)
which is calculated in Appendix A from (24). The residual matrix defined for residuals at different
Estimation of Nonstationary Covariance Parameters in Linear Systems points in time where .j#i+l was investigated in [10], where it was concluded that, for stable systems, the use of these additional measurements in increasing the convergence rate of the covariance parameter estimates is not worth the additional computational burden involved.
269
estimate. If the gain matrix is optimal, t.hen the covariance matrix on the covariance system measurement noise is given by
(35)
E[A8(i)A:O')] = ~(i)6ij where the m x m variance matrix e is
C. Covariance system statistics The statistics on these system equations are given from the moments on the covariance parameters initially; the moments on the modeling errors in the covariance state system model; and the moments on the covariance system measurement noise. The mean and the uncertainty on the covariance parameters initially is
,.,
dAi(i)
cqtj = ~
E[aO(i)agr(i)]
QT(0)
/~T(0)]
o Pc(0) =
0 0
(27)
d/~O'(i)
0 Pa(0)
(28)
Qc =
[;° ° 1
qR "
The mean on the covariance system measurement noise is E [gc"] = 0 .
(30)
The covariance matrix on the covariance system measurement noise may be determined from the zero mean m x m covariance system measurement noise matrix ?ic, [10],
7~c(i) =Hb(i)7~O(i)HT(i)
(31)
Hb(i ) = [h(i)
(32)
0
I3.
where 0 is a (n + g + m) × (n + g + m) error matrix
O(i) = B(i)BT (i)
(33)
B is a (n + g + m) dimensional error vector
BT(i)=[AxT(i/i1)
wT(i)
vT(i)]
F
O(i)=~
where ~(O)=E[AP(o)NpT(O)] defines the initial uncertainty in the covariance parameters on the original state estimate. The first two moments on the modeling error are =0,
A from
(31) and
E[,~O(i)No'r(i)] may be calculated in Appendix B from
o1
PQ(0) 0
(36)
d;~F(i)
is calculated in Appendix
FT(O)=E[pT(o)
d;~F(i) T
(34)
Ax(i/i  l ) is the error in the predicted state estimate AX (i/i  1) =X (i)  X * (i/i  1). A indicating the error between a quantity and its
P(i/i1) 0 O
0 Q(i) 0
0 1 0 R(i) _
(37)
if the distributions on the state, state noise, and measurement noise are known and may be characterized by their first two moments. If more than these two moments are required to specify these distributions, the third or fourth moments on these random variables must be known. If the original gain matrix is not optimal,
E[?~c(i)~cTq)]#O for i#j, and the covariance system measurement noise is correlated in time. Reference[2] shows that for stable systems the greater the residual displacement in time the weaker the residual time correlation. Reference[10] shows that a similar type of correlation exists for the covariance system measurement noise, concluding that assuming this noise to be white noise only causes the accuracy of the resulting estimates to be degraded for a gain in computation time and computer storage. The estimates of the covariance parameters are still unbiased, and the degradation in accuracy is not serious for most stable physical systems. It is clear from equation (37) that the covariance matrix on the covariance system measurement noise may itself be a function of the covariance parameters being estimated. In applying a K a l m a n filter to the covariance system, nonadaptive least squares estimates of the covariance parameters may be obtained by determining the rfi × rh variance matrix on the covari~mce system measurement noise ~* from the assumed a priori statistics. Adaptive least squares estimates may be obtained by updating ~* from the estimated covariance parameters.
270
H . W . BREWER and C. F. LEONDES
IV. NONADAPTIVE COVARIANCE PARAMETER ESTIMATES With the assumed a priori statistics, one suboptimal Kalman filter may be applied to the original system while another suboptimal Kalman filter may be applied to the covariance system using the residuals of the first filter as data to estimate the actual statistics of the original system. If there is no prior knowledge of the statistics, SLS estimates may be obtained of the original state and covariance parameters. If there is approximate knowledge of the statistics beforehand, the suboptimal Kalman filters yield WLS estimates. The more accurate is the a priori knowledge, the more accurate are the resulting estimates. In particular the performance of the covariance estimates in terms of accuracy may be determined from the actual filtered variance matrix
Pc(i+ 1/i+ l ) = [ l  K * ( i +
1)He(i+ 1)]
. P~(i + 1/i)[I K~* (i + 1 )H~(i+ 1 )]w + K * (i ÷ 1)~(i + 1 )K*W(i+ 1)  Sc(i + l/i)
 S f ( i + l/i)
(38)
So(i+ l/i)=K*(i + 1)Dc(i + 1/i) • [IK*(i+ 1)H~(i+ 1)] T (39) where the general correlation term in time
Dc(i + l/j)=E[,~5(i + l )AFr(j + 1~])]
P(i + l/i)= A(i + l/i)P(i/i)A~(i+ 1/i) +F(i)Q(i)Fl(i):
P(O/O)=P(O) (43)
P ( i + l/i+ l )[lK*(i)h(i)]P(i + 1/i) • [1 .K*(i)h(i)] w
4K*(i)R(i)K*r(i)
(44)
If the state and measurement noise variance matrices are known to be diagonal, then the state vector of covariance parameters may be shortened and the derivative and noise matrices appropriately calculated[10]. v. ADAPTIVE COVARIANCE PARAMETER ESTIMATES If greater accuracy of the original state and the covariance parameter estimates is desired, adaptive estimates may be obtained. With no a priori knowledge of the covariance parameters, SLS estimates of both the original state and the covariance parameters are obtained until the covariance parameters are known well enough[10] to be used in WLS estimates. Assuming at this point that there is approximate knowledge of the variance matrix on the covariance parameter modeling noise and that the distributions on the random variables are known and may be characterized by their first two moments, WLS estimates at instant i + 1 may be obtained by using the appropriate predicted values of the covariance parameters:
F*(i+ l)
[email protected](i + l, i)F*(i/i)
may be calculated from
D~(i + 1/j)= Dc(i + l / j  1) P*(i+ l/i), Q*(i+ l ), R*(i + l)
x [I K*(j)Hc(j)]T~(j + 1,j) go(i+
are all positive semidefinite, otherwise there is no correction to the values of
' , T (j~c . X.0+ 1,j), .. 1,~)Kc
De(i+ l/O)=O
(40) F* (i + 1 ) =(l)c(i + 1, i)F* (i).
Pc(i+ 1/i)
[email protected](i + 1, i)Pc(i/i)O[ (i + 1, i)
(41)
+ Fc(i)Qc(i)V[(i)
(c(i+ 1,j)=E[,~5(i+ 1)/~U(/')] is the correlation of the covariance system measurement noise in time and may be calculated similarly to ~(i+1)
from
F * ( i + I ) is used in evaluating the approximate covariance system measurement noise variance matrix at instant i + 1. If this assumption about the random variable distributions does not hold then more complicated adaptive techniques are required. The simple least squares techniques outlined in[10] utilizing third or fourth order residuals to estimate
O(i,j)=E[B(i)BT(j)]=
[ O*(i,j)P(j/j1) 0 0
~*(i,j+l)F(j)Q(j) 0 0
O*(i,j+ 1)A(J+o01,j)K*(j)R(])]
(42)
Estimation of Nonstationary Covariance Parameters in Linear Systems
gaussian, then both estimates are unbiased linear minimum variance estimates. In summary, the adaptive covariance estimation algorithm is presented in Fig. 1. Figure 1 illustrates the flow of information in the usual mode of operation of the adaptive Kalman filters, when they are yielding WLS estimates and the distributions on the random variables are known. The approximate system statistics, including the covariance on the state estimate, the covariance on the state noise, and the covariance on the measurement noise are input to a system Kalman filter along with the initial state estimate and the observations to produce filtered WLS state estimates with their associated approximate accuracy and their corresponding residuals. The residuals are used to form covariance system measurements via (22) and are an input into the covariance Kalman filter. In addition, the initial covariance parameter estimates with their associated uncertainty, the assumed modeling noise statistics in the covariance system, and the approximate statistics on the covariance system measurement noise as determined from the approximate system statistics by (31) are inputs to the covariance Kalman filter. The covariance Kalman filter then yields filtered WLS estimates of the covariance parameters along with their approximate accuracy, and these estimates are then fed back through the adaptive loop to be used in updating the system statistics. The system Kalman filter is defined by the system model and system statistics
third or fourth moments on the original state, state noise, and measurement noise could be applied. The performance of the covariance Kalman filter is expected to be equivalent to a suboptimal Kalman filter becoming more and more optimal with increasing measurements. As the covariance parameters are estimated more and more accurately, the Kalman gain matrix becomes more optimal, and the correlation of the covariance system measurement noise decreases. The covariance parameter estimates, as illustrated by the numerical examples in the following section, are expected to converge to an accuracy limited by the modeling error on the covariance parameters[10, l l]. If there is no modeling error in the covariance system, and there is no assumed modeling error variance matrix in the Kalman filter applied to the covariance system, then the covariance estimates asymptotically approach exact estimates with increasing measurements[1 l]. In the limit, then, the original gain matrix asymptotically approaches the Kalman gain matrix, the correlation of the residuals in time asymptotically approaches zero, and the updated variance matrix on the covariance system measurement noise asymptotically approaches the actual variance matrix. The estimates of the original state and the covariance parameters are thus asymptotically optimal. If the distributions on the random variables are gaussian, the optimal estimate of the state is an unbiased minimum variance estimate; and that of the covariance parameters is an unbiased linear minimum variance estimate, due to the fact that the covariance system measurement noise is a nonlinear function of gaussian random variables and is therefore nongaussian itself. If the distributions on the random variables are non
X (i + 1 ) = A (i + 1, i)X (i) + F (i)W(i) Z(i+ l)=h(i+
SYSTEMMEASUREMENTS i
~
I
SYSTEMKALMAN FILTER
FILTEREDSTATE I ACCURACYESTIMAND ATES
l RESIOUALs
, SYSTEMSTATISTICS
[
[ W ( i + 1)+v(i+ 1)
e[x (O)XT(O)]=P*(0)
INtTIALSTATE ESTIMATe 'I
[ ...........................
1
COVA~tANCESYSTEM
MEASURfMENTNOISE STATISTIC~
~ : , ; : 2 22 ...... ] F[ LTIRED COVAR~ANCE
l
l[
..................... NOISE
STATISTICS
271
COVARIANCESYSTEMKALMAN FILTER
]
COVARIANCEPARAMETERESTIMATES AND ~SOCI ATSD UNCERTAINTY INITIAL
FIG. 1. Flow chart for weighted least squares adaptive Kalman filters.
272
H . W . BREWER and C. T. Lt'ONDES E
E[ W (i)WX (])] = Q*(i)6~2
 5 sin (0)~ti)
E[v(i)vT (j)] = R*(i)6ii.
r(i)
The covariance Kalman filter is defined by the covariance system model and covariance system statistics
I
0 3 sin (0)~t~) 0
o 0
2 sin ((o~¢~I
I cos to)ate) hT(i)=
F (i + 1 ) = ~ c ( i + 1, i)f (i) + Fc(i)W~(i)
t
0
0
COS(0)3/i)
1
1
_/
( ( i + 1) = H ~ ( i + 1)F(i+ 1) + z ~ ( i + 1)
E[ F (O)FV(O)] = P*(O)
(2)1 =0.0208,
E[W~(i)I/V~(i)] = Q*(i)6, i
0)2 =0.0416,
0)3 =0.00359.
The constant system statistics were
E [ A ( ( i + 1)Aax(i+ 1)] = a*(i)a o. All system statistics are determined from
XT(0)=X*V(0) = [4000
0j
F*(i + 1) =qbc(i + 1, i)F*(i/i); P*(i+ I/i), O_*(i+ 1/i),R*(i+ 1/i)=>0
P(O) = P*(O)=
F*(i+ 1) = ~ c ( i + 1, i)F*(i); P*(i + l/i), Q*(i + l/i), R*(i + 1/i) < 0
0
0
0
0
(~r=[900
400
6000
8000],
200],
and
where
/~T=[200 F*T(i+ 1)= [P*T(i+
In particular the covariance system measurement noise statistics are ,.,
dA((i)
 , •
,T • dz~?(i) a
where
0"(i)=
f P*(i/i1) 0 0
0 Q*(i) 0
0 1 0 R*(i)
VI. NUMERICAL EXAMPLES Three examples were chosen to illustrate the performance of nonadaptive and adaptive estimates of stationary covariance parameters as well as the performance of adaptive estimates of nonstationary covariance parameters. A third order stable system observed by two simultaneous scalar measurements was simulated to yield WLS estimates of stationary covariance parameters in diagonal state and measurement noise variance matrices using only a priori knowledge. The appropriate plant matrices were
i: e
A=
100].
1/i)O.*T(i+ 1/i).R*r(i+ 1/i)].
1
0 0
 .30
e
0
0 e45
The constant covariance system statistics were F [ ( 0 ) = [ 0 0 0 0 0 0 900 400 200 200 1003, q * = q c = 0 , with the nondiagonal elements of P* (0) and Pc(0) being zero; the diagonal elements of Pc(0) being zero; and the diagonal elements of P* (0) being very large ~ ~ . For the nonadaptive cases, assumed values of the variance matrices resulting from a priori knowledge were used in the original system and in evaluating the second moment on the covariance system measurement noise. Figures 2a and 2b illustrate the performance of the least accurate covariance parameter estimate/~'1. In Fig. 2a the accuracy of estimates for a priori covariance parameter knowledge 20°; in error (WLS) is compared to that for no a priori knowledge (SLS) and to that for perfect a priori knowledge (LMV). The LMV or most accurate curve utilizes the exact statistics P, Q, and R in determining the assumed variance ~* on the covariance system measurement noise; the SLS or least accurate curve assumes that a* is an identity matrix; and the WLS curve utilizes Q* and R* such that
Qil R*j  0 8 Qi* e.i.i For this 20 ~o error the resulting estimates are near optimal, better than those using no knowledge by a factor of 5. The most accurate covariance parameter estimate, Q*I, converged nearly 10 times as fast in comparison. Figuie 2b illustrates the fact that as the
Estimation of Nonstationary Covariance Parameters in Linear Systems
~,, . AIj Q,',
273
. 4/5
~ ii
~MvJ L ,~
l o~
qw~$1
J
I
I
I
L
L
1 7~
q~.vl I
FIG. 2. Accuracyof estimate ofcovarianceparameter R,.
a priori knowledge decreases, the performance of the estimates decreases. A 50 ~ error in the a priori knowledge, although still better than no knowledge by a factor of 3, doubled t h e inaccuracy of the estimates. In extreme cases, when the assumed values were off by a factor of 4 or more, estimates using no prior knowledge were better than those using the wrong knowledge available. In general the a priori knowledge must be accurate enough before a WLS technique using this information is more accurate than a SLS technique using no iifformation. The performance of adaptive estimates of constant covariance parameters for the same plant is illustrated in Figs. 3a and 3b. SLS estimates of the covariance parameters, independent of initial covariance parameter estimates, are obtained until these estimates are just accurate enough to be used in obtaining WLS estimates of the covariance parameters. This optimal crossover point is determined by varying the time at which SLS estimates are used in obtaining WLS estimates until the accuracy of the covariance parameter estimates is maximized. This point occurs at 150 measurements, and from this point on the adaptive estimates are equivalent to the near optimal estimates using only a priori knowledge 20 ~o in error as shown in Fig. 2a. In this instance, then, adaptive techniques perform better than nonadaptive techniques due to
the greater accuracy whenever the error in the assumed values of the covariance parameters is greater than 20 ~ ; otherwise the reverse is true due to the greater computational efficiency and simplicity of nonadaptive estimates. The performance of adaptive estimates of nonstationary covariance parameters in terms of convergence rate and accuracy is presented in Figs. 4a and 4b; a((~*) and a(/~*) represent the one sigma error on the adaptive estimate of Q and R respectively. A scalar stable system was constructed using the first components of X and Z in the previous system. The statistical time evolution of the covariance parameters was chosen to be Q(i+ 1)= [(0.9991 )sin (oJlti)]Q(i) + [(0.4) sin (o)2ti)] We(i)
R(i + 1) = [(0.9992) sin (~Ozti)]R(i) + [(0.35) sin (co2ti)] WR(i) where Q(o)=196, R(1)=100, and WQ, WR are uniformly distributed between  10< Wo(i)< 10,  5 < W R ( i ) < 5 ~ To insure that the variances remained positive, they were not chosen to decrease too rapidly in this example. The principle of the variances varying linearly in time, similar to the state, is illustrated; the variance estimates remain
H, W. BI~.EWEI~. ~md C . T . L E O N D E S
274
%,
. . . . . . . . . . .
/
f
r
!
)
/
600
/
) ~°°
/
..
./#
~
V i
,~o,
L
I
i
i t
i
1
I
I
I
I
1
I
F]~. 3. Adaptive estimates o l d agonal covariance parameters.
",2 m
,0o ~
~_ . . . . .
~
i ° °'~_' ~ . _
i
.....
t
R,
J
FIG. 4. Performance of adaptive estimates of nonstationary covariance parameters.
Estimation of Nonstationary Covariance Parameters in Linear Systems within their respective accuracy bounds most of the time as is expected, and they do converge. The adaptive estimates are now less optimal, however, compared to the optimal case, and 300 measurements are required before the covariance parameters are learned well enough to be used in increasing the convergence rate of covariance estimates. These examples are chosen to illustrate the relative accuracy and generality of the covariance algorithms, particularly of the adaptive algorithm. The accuracy of stationary covariance parameter SLS estimates is considered representative of the maximum accuracy expected of previous algorithms with equivalent generality using no a priori covariance parameter information. The stationary covariance parameter adaptive example indicates that the adaptive algorithm can yield estimates at least five times more accurate than previous algorithms. It is recognized that along with this increase in accuracy there is an associated increase in computational expense due to the dimension of the covariance parameter state vector. A prime objective in presenting the covariance algorithms, however, is to provide a means by which accuracy requirements may be satisfied with minimum measurements in situations where there is little prior noise information and where more emphasis is placed on accuracy than on computational expense. The covariance algorithms are capable of satisfying low convergence time requirements not satisfied by previous algorithms. For systems limited by computational requirements the covariance algorithms can still be more advantageous than previous algorithms. Since computational requirements in general increase approximately as the square of the number of parameters being estimated, the computational requirements of the covariance algorithms are more sensitive to an increase in the state dimension than are previous algorithms. As the ratio of the predicted state estimate covariance parameters to all other required estimated parameters ~/(n+rh +c~) decreases, the advantage of the covariance algorithms relative to previous algorithms increases. In particular consider a system required to converge with minimum computation time but with no requirement on the number of measurements for convergence. If f,c is the expected ratio of the instantaneous accuracy of the adaptive algorithm to that of the SLS algorithm; and i f f is the maximum ratio of the computation time per measurement required for the covariance algorithms compared to that required for previous algorithms, f ~ [1 +~/(n +rh+~)]z; then the advantage in terms of the convergence computational time of the previous algorithms compared to that of the adaptive algorithm is at least f, =f,c/f. The ratio of the
275
minimum computational convergence time required for the previous algorithms to the maximum time required for the adaptive algorithms is represented by f,. For the numerical examples considered, fac ~5. For the first order system with 2 required estimated covariance parameters (r~+~ =2), f a ~ 3 and the adaptive algorithm can computationally converge at least three times faster than expected for previous algorithms. This system is similar to that considered for the Bayesian estimation algorithm[13]. Increasing the system to third order reduces this advantage by a third (f~ ~ 2) for 9 required estimated covariance parameters and yields potentially no computational advantage (f, ~. 1) for only one required estimated covariance parameter. In this last instance, for the adaptive algorithm, the reduction in the number of measurements to converge could be compensated by the increase in the computation time per measurement. VII. CONCLUSIONS
A filter no more complex than that being used on a linear system with uncertain noise statistics may be used to estimate these statistics. Less restrictions than in the past may be placed on the time variation of these statistics and on the probability distribution of system noise with an increase in estimation accuracy. This increased accuracy is obtained by making use of either prior or learned information in determining the weighting matrices for WLS estimates of the original state or the covariance parameters. When the learned information is used in an adaptive technique, linear minimum variance estimates are approached with increasing measurements. Numerical examples show that adaptive weighted least squares techniques do converge, and that the convergence is more rapid when the time variation of the covariance parameters is known without error. In addition, for covariance parameters constant in time, near optimal estimates may be obtained with approximate a priori knowledge. For no a priori knowledge, adaptive techniques can converge nearly as fast as nonadaptive techniques using only approximate knowledge; and at least five times faster (for the typical system considered) than SLS techniques using no a priori knowledge. Although the derivation of the least squares estimates does not require the specification of the distribution of system noise, such information does facilitate the application of weighted least squares techniques. Conditions favoring the use of these covariance algorithms are (1) system performance requirements emphasizing accuracy and convergence time, (2) small state dimension, (3) relatively large number of necessary but uncertain covariance parameters, and (4) no severe computa.tional limitations. The main applications of
276
H . W . BREWER and C. T. LEONDES
these algorithms are in the use of covariance parameter information obtained from simulations before operation, during operation, or from records after operation to improve future and real time filter performance. Filter performance can be improved by modeling and compensating for not only unknown gaussian errors, but also for nonlinearities, roundoff errors, and other types of nongaussian errors. An example illustrating a situation particularly suited for the use of the covariance algorithm could be a system with a large state dimension with many unknown noise sources. If an approximate model of this system could be developed through assumptions decoupling some of the system dynamics, then the computational efficiency and accuracy of the system filter could be increased through the use of several lower dimensional adaptive Kalman filters accounting for the effects of both the unknown noise sources and the approximations.
derivative of a vector of a matrix with respect to a ~ector o l another matrix
d~ f.$E B, O ], when A and C are nonsymmetric and linearly related to each other, is calculated as an illustration. A and (' are then constrained to be square symlnetric, and the derivative of a vector of a variance matrix with respect to another vector of a variance matrix is calculated. Define the linear relationship of A and C to be
.4 =BCD+EFG+
where A is a fl x 7 matrix, C is a 6 x e. matrix, B is a flx 6 matrix, D is a e, × 7 matrix, and E, F, G are similar in form to B, C, D. Define the vector o f a nonsymmetric matrix H to be __ "1 ] T f711 [HIH21~...]
where/7, is the/th row of H. Then
ICI \
~
d
a fly 6e x
REFERENCES
[I] A. P. SAGE and G. W. HUSA: Adaptive filtering with unknown prior statistics. Proc. 1969, Joint Automatic" Control Conference, pp. 760769. [2] PATRXCK L. SMITH: Estimation of the covariance parameters in timediscrete linear dynamic systems with applications to adaptive filtering. Ph.D. in Engineering, University of California, Los Angeles, 1971. [3] J. C. SHELLENBEP,..GER: Estimation of covariance parameters for an adaptive Kalman filter. National Electronics ConJerence Proceedings, Vol. 22, pp. 698 702, October (1966). [4] J. C. SHELLENaERGEr: A multivariance learning technique for improved dynamic performance. National Electronics Conference Proceedings, October (1967). [ 5 ] G E R A L D L. SMITH: Sequential estimation of observation error variances in a trajectory estimation problem. AIAA J. 5 (11), 19641970. [6] ANDREW P. SAGE: Optimum Systems Control. PrenticeHall, Englewood Cliffs, N.J. (1968). [7] R. E. KALMAN: A new approach to linear filtering and prediction problems. J. bas. Engng Ser. D, 82, 35~.5 (1960). [8] S. L. FAGIN: Recursive linear regression theory, optimal filter theory, and error analysis of optimal systems. IEEE Cony. Record, 147.204 (1964). [9] P. D. ABRAMSON: Simultaneous estimation of the state and noise statistics in linear dynamic systems. NASA Technical Report, NASA TR R332, March (1970). [ 10] H. W. BREWER: Identification of the noise characteristics in a Kalman filter. Ph.D. in Engineering, University of California, Los Angeles (1973). [11] ANDREW P. SAGE and JAMES MELSA: Estimation Theory
dO, I dTtx
dCL
APPENDIX A
Calculation of derivative matrices In this appendix derivative matrices are calculated. The
matrix.
dff~J
i
dAK~
dAKl l ")! X E .
\ dA~, dCL~
L._~ dAKt dCLs = BKLDsI so that dA K
dCL =BKLDT" and
d,~
F~,,DT~c",J ]
When A and C are square symmetric they may be variance matrices; ), = fl; ~ = 6; D = Br; and E, F, G are similar in form to B, C, D. H is now symmetric, a n d / ~ t is the lth row of H measured from the lth diagonal element of H. Considering A and C similarly to H, the KL~h partitioned matrix of
d2 d~
with Applications to Communications and Control. McGrawHill, New York (1971). [12] H. HAGAR and B. D. TAPLEY: A sequential filter for estimating state noise covariances. Proceedings of the Fourth Symposium on Nonlinear Estimation, September (1973). [13] A. P. SAGE and G. W. HUSA: Algorithms for sequential adaptive estimation of prior statistics. Proceedings Eighth Symposium on Adaptive Processes, 5261 November (1969).
...
is now
dAKr dCLL
dAxx dCLL+1
dAxK dCL~, dAxx+t
d/] K
dAKK ¢ I
dAxx+l
dCL
dCLL
d C u ~~
dAx~ dCLL
I I I
dCt.~
\
dAx~ dCLL+ 1
x, xx_
I D I
dAxtj dCL6
_
(flK + 1)× (6L + l )
Estimation of Nonstationary Covariance Parameters in Linear Systems
277
where B is partitioned into
where
dArl=(BrLDst dCLa ( BxLD.~t + Br~DL~
if if
L=J L C:J
[ ~z,(i)~z,~)c~z, (i)~)]
d2
is partitioned into these K/] h matrices as before, and is now a/~ x or matrix where
[
AAaa(~)AA.~(J)'~AAva(1)AA~.(]) ]
/~=#2(/~+l), (~=~(6+t). ( n  / J + 1) x (,77+ l)
E[AAa~(i)AA~(j)] =Xa(i)X6(i)X~(j)X,(j)  Aa6(i)A~,(j) APPENDIX B
Calculation of the covariance of a noise matrix Given the covariance matrix on a zero mean noise vector X, the covariance on the random vector of the associated noise matrix is derived. Define the r/× ~/covariance matrix on the r/dimensional zero mean noise vector X to be
7t(i,j)=E[X (i)X (j)r].
and this last scalar expected value can be evaluated in general when the second and fourth moments on X are known. If the probability density function on X is known, then these moments may be calculated with a moment generating function[2]. In particular this moment generating function for the random p dimensional vector Y is Z(W)=E[exp (WRY)] where W is a p dimensional vector. The fourth moment on X may now be calculated from
a'[z(w)]
Define the associated r/x r/noise matrix as
yT =[xT(i)XX(j)], I × 2r1. A(i,j)=X (i)X (j)r. The covariance on the ~ dimensional noise vector ,2/(i)=/](i, i), where ~ = (r//2)(r/+ 1 ), is then the ff x ~ covariance matrix
B may thus be calculated from the second or third m o m e n t s on X when the digtribution on X may be characterized by these moments. IfX is gaussian then
B(i,j) = E[A/] (i)A/7 (j)T]
E[AA~y(i)AA~(j)] = A#~(i,j)A6~(i,j)+ A#~(i,j)A~(i,j).