Computation of the analysis error covariance in variational data assimilation problems with nonlinear dynamics

Computation of the analysis error covariance in variational data assimilation problems with nonlinear dynamics

Journal of Computational Physics 230 (2011) 7923–7943 Contents lists available at SciVerse ScienceDirect Journal of Computational Physics journal ho...

4MB Sizes 0 Downloads 30 Views

Journal of Computational Physics 230 (2011) 7923–7943

Contents lists available at SciVerse ScienceDirect

Journal of Computational Physics journal homepage: www.elsevier.com/locate/jcp

Computation of the analysis error covariance in variational data assimilation problems with nonlinear dynamics I.Yu. Gejadze a,⇑, G.J.M. Copeland a, F.-X. Le Dimet b, V. Shutyaev c a

Department of Civil Engineering, University of Strathclyde, 107 Rottenrow, Glasgow, G4 ONG, UK MOISE project (CNRS, INRIA, UJF, INPG); LJK, Université Joseph Fourier, BP 51, 38051 Grenoble Cedex 9, France c Institute of Numerical Mathematics, Russian Academy of Sciences, Gubkina 8, Moscow 119333, Russia b

a r t i c l e

i n f o

Article history: Received 26 April 2010 Received in revised form 18 March 2011 Accepted 19 March 2011 Available online 30 March 2011 Keywords: Large-scale flow models Nonlinear dynamics Data assimilation Optimal control Analysis error covariance Inverse Hessian Ensemble methods Monte Carlo

a b s t r a c t The problem of variational data assimilation for a nonlinear evolution model is formulated as an optimal control problem to find the initial condition function. The data contain errors (observation and background errors), hence there will be errors in the optimal solution. For mildly nonlinear dynamics, the covariance matrix of the optimal solution error can often be approximated by the inverse Hessian of the cost functional. Here we focus on highly nonlinear dynamics, in which case this approximation may not be valid. The equation relating the optimal solution error and the errors of the input data is used to construct an approximation of the optimal solution error covariance. Two new methods for computing this covariance are presented: the fully nonlinear ensemble method with sampling error compensation and the ‘effective inverse Hessian’ method. The second method relies on the efficient computation of the inverse Hessian by the quasi-Newton BFGS method with preconditioning. Numerical examples are presented for the model governed by Burgers equation with a nonlinear viscous term. Ó 2011 Elsevier Inc. All rights reserved.

1. Introduction The methods of variational data assimilation (DA) have become a very important tool for state observation and parameter identification for geophysical fluid flow models (e.g. [17,3,7,11,21,25]). From the mathematical point of view, these problems can be formulated as optimal control problems (e.g. [18,17]) to find unknown control variables in such a way that a cost functional related to the observation and a priori data takes its minimum value. A necessary optimality condition leads to the so-called optimality system which contains the original model and the adjoint model. Due to the input errors (background and observation errors), there is an error in the optimal solution. Statistical properties of the optimal solution error (in terms of the covariance) are very important for quantifying the accuracy of data assimilation techniques [29], for sequential variational state estimation [4] and design of optimal (or adaptive) observation schemes [1,31]. In this paper we consider a nonlinear evolution model with an unknown initial condition (analysis), which must be retrieved using some initial guess (background) and incomplete observation data of the state evolution. However, the results of this paper are valid for any prescribed set of unknowns. If the errors of the input data are random and normally distributed, then for a linearized finite-dimensional evolution model the optimal solution error covariance matrix is given by the inverse Hessian of the cost functional (see e.g. [29,30,26,11,15]). For a nonlinear evolution model no exact relationship exists. In practice it is very often assumed that

⇑ Corresponding author. Tel.: +44 0 141 5483774; fax: +44 0 141 5532066. E-mail address: [email protected] (I.Yu. Gejadze). 0021-9991/$ - see front matter Ó 2011 Elsevier Inc. All rights reserved. doi:10.1016/j.jcp.2011.03.039

7924

I.Yu. Gejadze et al. / Journal of Computational Physics 230 (2011) 7923–7943

the dynamics of errors can be satisfactorily described by the tangent linear model. This assumption is called the tangent linear hypothesis (TLH). It is assumed that if the TLH is valid, then the covariance matrix can be well approximated by the inverse Hessian of the cost functional. A similar result in the continuous case was presented in [9]. In terms of continuous representation it is said that the analysis error covariance operator can be approximated by the inverse Hessian of the auxiliary data assimilation problem, which involves the tangent linear model constraints. This Hessian does not coincide in general with the Hessian of the original nonlinear DA problem. It was also demonstrated in [9] that this approximation could be sufficiently accurate even though the TLH is not valid. However, in the case of highly nonlinear dynamics such an approximation may not be valid. This paper presents the development of ideas of [9,10] to the case of highly nonlinear dynamical models. A fully nonlinear ensemble method (first introduced in [9]) is modified to reduce the sampling error in the sample covariance. The sampling error is evaluated through the Hessian of an auxiliary DA problem within the standard Gauss–Newton optimization method. This method could be feasible for large-scale applications from the point of view of memory requirements. However, it is computationally very expensive because it requires multiple solution of the nonlinear DA problem with perturbed data to accumulate a sample of optimal solutions. Thus, it is only used here to compute the reference value of the covariance for the purpose of comparison. A new ‘effective inverse Hessian’ (EIH) method is developed to estimate the covariance matrix. It is based on ensemble computations of the inverse Hessians. However, this method does not require a sample of optimal solutions to be computed. Instead of optimal solutions, one can use artificially generated sample vectors, which must satisfy certain statistical properties. The efficient BFGS method with preconditioning is used to build the quasi-Newton approximation of the inverse Hessian. The evaluation of the covariance by the EIH method is significantly less computationally expensive as compared to the fully nonlinear ensemble method (by an order of magnitude, at least) and the method can be implemented in a form feasible for large-scale applications in terms of memory requirements. The results of numerical experiments are presented for the model governed by Burgers equation with a nonlinear viscous term. The tests are specially designed to produce as strong nonlinear effects as possible. Numerical examples allow estimates of the covariance matrix which are obtained by different methods (fully nonlinear ensemble method with sampling error compensation, the inverse Hessian method and the EIH method) to be compared. The EIH method proved to provide better approximations of the covariance matrix than both the inverse Hessian and the sample covariance obtained with the same sample size. We suggest that it is instructive to distinguish between the following types of error in the covariance estimate: linearization error, sampling error and the origin error. The linearization error is the error in the covariance estimated by the inverse Hessian. The sample covariance computed by the fully nonlinear ensemble method has no linearization error, but it has a sampling error. The estimate of the covariance by the EIH method contains a smaller linearization error as compared to the inverse Hessian, and a smaller sampling error as compared to the sample covariance. However, with all methods one may encounter the ‘origin’ error. This error appears because an optimal solution is used for evaluation of the covariance instead of the exact state. It is worth mentioning that the issue of the origin error is hardly paid enough attention in the literature. In the numerical examples we compute this error and demonstrate that in the case of highly nonlinear dynamics it could actually be larger than the linearization error. We also discuss how a sample of variance vectors which are computed with the EIH method can be used to reduce the possible impact of the origin error. The paper is organized as follows. In Section 2, we provide the statement of the variational DA problem to identify the initial condition for a nonlinear evolution model. In Section 3, the equation of the optimal solution error is given through the errors of the input data using the Hessian of the auxiliary DA problem and the basic relationship between the analysis error covariance and the inverse of this Hessian is established. A fully nonlinear ensemble method with sampling error compensation is described in Section 4. A new EIH method suitable in the case of highly nonlinear dynamics is introduced in Section 5. Section 6 presents the model employed, description of the BFGS method for computing the inverse Hessian and other details of numerical implementation. Most numerical examples are presented in Section 7. Numerical examples relevant to the origin error are presented and discussed in 8. Main results of this paper are summarized in Section 9. 2. Statement of the problem Consider the mathematical model of a physical process that is described by the evolution problem

(

@u @t

¼ FðuÞ þ f ;

ujt¼0 ¼ u;

t 2 ð0; TÞ;

ð2:1Þ

where u = u(t) is the unknown function belonging for any t to a Hilbert space X, u 2 X, F is a nonlinear operator mapping X into X. Let Y ¼ L2 ð0; T; XÞ; k  kY ¼ ð; Þ1=2 Y ; f 2 Y. Suppose that for a given u 2 X, f 2 Y there exists a unique solution u 2 Y to (2.1).  be the ‘exact’ initial state and u  – the solution to the problem (2.1) with u ¼ u  , i.e. the ‘exact’ state evolution. Let u  þ nb and the observations y 2 Yo, y ¼ C u  þ no , We define the input data as follows: the background function ub 2 X, ub ¼ u where C : Y ? Yo is a linear bounded operator (observation operator) and Yo is a Hilbert space (observation space). The functions nb 2 X and no 2 Yo may be regarded as the background and the observation error, respectively. Assuming that

7925

I.Yu. Gejadze et al. / Journal of Computational Physics 230 (2011) 7923–7943

these errors are normally distributed, unbiased and mutually uncorrelated we Vb  =E[(, nb)X nb] and V o  ¼ E½ð; no ÞY o no , where E[] is the expectation. We suppose Vb invertible. Let us formulate the following DA problem (optimal control problem) with the aim given f 2 Y find u 2 X and u 2 Y such that they satisfy (2.1), and on the set of solutions the minimum value, i.e.

define the covariance operators and Vo are positive definite, hence to identify the initial condition: for to (2.1), a cost functional J(u) takes

8 @u > ¼ FðuÞ þ f ; t 2 ð0; TÞ; > < @t ujt¼0 ¼ u; > > : JðuÞ ¼ inf Jðv Þ;

ð2:2Þ

v 2X

where

JðuÞ ¼

  1  1 1  1 V b ðu  ub Þ; u  ub þ V o ðC u  yÞ; C u  y : X Yo 2 2

ð2:3Þ

The necessary optimality condition reduces the problem (2.2)–(2.3) to the following system [18]:

(

(

@u @t

¼ FðuÞ þ f ; t 2 ð0; TÞ; ujt¼0 ¼ u;

ð2:4Þ



u  @@t  ðF 0 ðuÞÞ u ¼ C  V 1 o ðC u  yÞ;

t 2 ð0; TÞ;

ð2:5Þ

u jt¼T ¼ 0; 

 V 1 b ðu  ub Þ  u jt¼0 ¼ 0

ð2:6Þ

with the unknowns u, u⁄, u, where (F0 (u))⁄ is the adjoint to the Frechet derivative of F, and C⁄ is the adjoint to C defined by ðC u; wÞY o ¼ ðu; C  wÞY ; u 2 Y; w 2 Y o . Having assumed that the system (2.4)–(2.6) has a unique solution, we will study the impact of the errors nb, no on the optimal solution u and generalize the results presented in [9,10] to the case when Eq. (2.1) describes highly nonlinear dynamics. 3. The analysis error covariance via inverse Hessian  . Assuming this error is unbiased, i.e. E[du] = 0, we also define the Let us define the optimal solution error du ¼ u  u  was derived from covariance operator Vdu = E[(, du)X du]. In [9] the following exact equation for the analysis error du ¼ u  u (2.4)–(2.6):

Hdu ¼ V 1 b n1 þ Rn2 ;

ð3:1Þ

where the operator H : X ! X is defined by the successive solutions of the following problems:

(

@w @t

~ Þw ¼ 0;  F 0 ðu

t 2 ð0; TÞ;

ð3:2Þ

wjt¼0 ¼ v ; (



 @w  ðF 0 ðuÞÞ w ¼ C  V 1 o Cw; @t

t 2 ð0; TÞ;

ð3:3Þ



w jt¼T ¼ 0;  Hv ¼ V 1 b v  w jt¼0

ð3:4Þ 0

~ ¼u  þ sðu  u  Þ; s 2 ½0; 1, where s is chosen such that the Taylor–Lagrange formula FðuÞ  Fðu  Þ ¼ F ðu ~ Þdu is valid with u [20]. The operator R : Yobs ? X acts on the functions g 2 Yobs according to the formula

Rg ¼ h jt¼0 ;

ð3:5Þ



where h is the solution to the adjoint problem

(



 @[email protected]  ðF 0 ðuÞÞ h ¼ C  V 1 o g; 

h jt¼T ¼ 0:

t 2 ð0; TÞ;

ð3:6Þ

 and u ¼ u  þ du dependent on u  and on Here, the definitions of H and R involve the parameter s and the functions u  þ du via (2.1). Therefore, we can write as follows H ¼ Hðu  ; du; sÞ and R ¼ Rðu  ; duÞ. In general, the operator u¼u  ; du; sÞ is neither symmetric, nor positive definite. For given / 2 Y let us introduce the Hessian H of the following auxHðu iliary control problem [9]: for given nb, no find du and du such that

7926

I.Yu. Gejadze et al. / Journal of Computational Physics 230 (2011) 7923–7943

8 0 @du > > < @t  F ð/Þdu ¼ 0; dujt¼0 ¼ du; > > : J 1 ðduÞ ¼ inf J 1 ðv Þ;

t 2 ð0; TÞ; ð3:7Þ

v

where

J 1 ðduÞ ¼

  1  1 1  1 V ðdu  nb Þ; du  nb þ V ðCdu  no Þ; Cdu  no : X Yo 2 b 2 o

ð3:8Þ

 ; 0Þ ¼ Hðu  ; 0; sÞ and for  in (3.7) we have H ¼ Hðu The operator H is positive definite and self-adjoint. For / ¼ u  þ du we have H ¼ Hðu  ; duÞ ¼ Hðu  ; du; 1Þ. /¼u¼u Under the hypothesis that F is twice continuously Frechet differentiable, the error equation (3.1) is approximated in [9] by

 ; 0Þdu ¼ V 1  Hðu b nb þ Rðu; 0Þno :

ð3:9Þ

From (3.9) we obtain for the optimal solution error

   ; 0Þ V 1  du ¼ H1 ðu b nb þ Rðu; 0Þno and

   1  ; 0Þ V 1    V du ¼ H1 ðu b þ Rðu; 0ÞV o R ðu; 0Þ H ðu; 0Þ: Therefore

 ; 0ÞHðu  ; 0ÞH1 ðu  ; 0Þ ¼ H1 ðu ; 0Þ: V du ¼ H1 ðu

ð3:10Þ

This is a well established result [3,26,29] which is usually deduced (without considering Eq. (3.1)) by straightforwardly linearizing the original nonlinear DA problem (2.2)–(2.3) under the assumption that

 Þ  F 0 ðu  Þdu: FðuÞ  Fðu

ð3:11Þ

This assumption is often called the ‘tangent linear hypothesis’. It is said that the analysis error covariance Vdu can be approx ; 0Þ if the TLH (3.11) is valid. That usually happens if the nonlinearity is mild and/or the imated by the inverse Hessian H1 ðu error du and, subsequently, du is small. We derive (3.10) via Eq. (3.1). From this derivation one can see that the accuracy of  ; du; sÞ ! Hðu  ; 0Þ and Rðu  ; duÞ ! Rðu  ; 0Þ in (3.1). Clearly, the tran(3.10) depends on the accuracy of the approximations Hðu sition from Eq. (3.1) to Eq. (3.9) could still be valid even though (3.11) is not satisfied. For example, since both operators H and R include integrations, what really matters is the cumulative property of the local approximation error (rather than its modulus). We shall call the linearization error the error in the estimate of Vdu which results from the approximations discussed above.  is never known unless we deal with Before we proceed, let us make the following clarification. In reality, the exact state u a ‘game’ situation, such as identical twin experiments for example. One may know the background state ub (before DA) or a  þ Du (after DA), where Du is an optimal solution error. Let us notice that Du is not conparticular optimal solution u ¼ u sidered as a random function (in contrast with du), though it is a single implementation (event) of du and must be consistent  in (3.10), which causes another error in the estimate of Vdu to appear. We shall with Vdu. Thus, one has to use u⁄ instead of u call this error the origin error. For the reason discussed here we will distinguish two tasks: the first task means computation  Þ, i.e. given the exact solution u  , the second task – computation of Vdu(u⁄), i.e. given a particular optimal solution u⁄. of V du ðu Despite the fact that the exact state is not known in reality, the first task is important. For example, it is necessary to compute  Þ merely in order to understand a possible impact of Du upon the estimate of Vdu. From now on we speak about the first V du ðu task unless stated otherwise. The discussion focused on the issue of the origin error is presented in Section 8. As we have already said, we can use the formula (3.10) if the tangent linear hypothesis is valid and, in some cases beyond  ; 0Þ always to be a satisfactory approximation to Vdu. its validity range. In the general case, however, one may not expect H1 ðu Here we present specially designed examples for the evolution model governed by the 1-D Burgers equations where the inverse Hessian does not seem a good approximation of the covariance. Thus, in Fig. 1(left/right) the reference variance is presented in circle marked line, the diagonal of the inverse Hessian is presented in bold solid line and the background variance – in dashed line. One can see that the difference between the reference variance and its estimate by the inverse Hessian is mostly noticeable in those parts of the domain (here outlined by the ellipses) where two conditions are met: (a) the dynamics is highly nonlinear; (b) there are no observations available. The reference variance is obtained by the method described in the next Section. All implementation details and particular conditions to which the results in Fig. 1 correspond are summarized in Section 6. As we learn from these examples, the inverse Hessian approximates well the covariance in certain parts of the domain where the nonlinearity is locally mild, but outside these parts the linearization error must be taken into account. Furthermore, as the nonlinearity level goes up, the probability density function (p.d.f.) of the optimal solution error may increasingly deviate from the normal; thus the covariance matrix may eventually become of little use as a representative of such p.d.f.

7927

I.Yu. Gejadze et al. / Journal of Computational Physics 230 (2011) 7923–7943

reference inv. Hessian background

0.15

0.1

0.05

0

reference inv. Hessian background

0.2

variance

variance

0.2

0.15

0.1

0.05

0

0.2

0.4

x

0.6

0.8

1

0

0

0.2

0.4

x

0.6

0.8

1

Fig. 1. Reference variance, variance by the inverse Hessian and background variance: left – case A, right – case B.

Since we consider no and nb as normally distributed errors, the optimal solution u of the problem (2.2)–(2.3) is the classical nonlinear least-squares estimator. Let us additionally suppose that no is an independent identically distributed (i.i.d.) error, i.e. Vo is diagonal. The large sample estimation theory [22] asserts that such an estimator is unbiased (consistent) and asymptotically normal, where ‘asymptotically’ in our case means T ? 1, given a certain finite time observation step dt. Moreover,  ; 0Þ converges in probability to the covariance matrix Vdu irrespective of the magnitude with T ? 1 the inverse Hessian H1 ðu of the linearization error. The mentioned asymptotic properties are valid under very general regularity requirements to the nonlinear operators involved. This partially explains why in practice of variational DA, which implies a finite but possibly significant number of observation instants, the approximation (3.10) is often valid even though the TLH breaks down. In reality the observation window t 2 [0, T] is, of course, always finite, hence implying the finite number of i.i.d. observations. Some nonlinear estimation problems in which the consistency and normality of the estimation error hold for ‘practically relevant’ sample sizes are said to exhibit a ‘close-to-linear’ statistical behavior [27]. The method to verify this behavior suggested in [27] is testing the normality of the marginal distributions of state variables using an artificially generated sample of optimal solutions. Obviously, this type of analysis is hardly feasible for realistic large-scale applications. However, here we demonstrate that the p.d.f. of those particular samples used in computations are indeed reasonably close-to-normal. The results of the multivariate normality test based on measures of bias, skewness and kurtosis [28] and the frequency histogram associated with the worst (most skewed) sample are presented in Appendix B. In conclusion, for certain highly nonlinear evolution models a reasonably close-to-normal p.d.f. of the optimal solution error might be expected under the condition that the number of observations in time (with the i.i.d. errors) is significant and the observation network is sufficiently dense in space. In this case, the covariance matrix remains the key statistics describing the p.d.f. of the optimal solution error. Remark 3.1. It is interesting to notice that the ‘classical’ meaning of the tangent linear hypothesis in the form (3.11) is relevant to the first task only. For the second task it should probably be generalized as follows:

 Þ  F 0 ðu  þ DuÞdu; FðuÞ  Fðu

 ; DuÞ; Du ¼ Du ð u

ð3:12Þ

where Du is any statistically significant implementation (event) of du. 4. Fully nonlinear ensemble method with sampling error compensation Here we describe an ensemble approach to estimate Vdu introduced in [9]. Let us consider discretized problem (2.2)–(2.3),  and the corresponding exact field with M being the dimension of the state vector. We assume that we know the exact state u  from (2.1). By definition, the analysis error covariance matrix is u

V du ¼ E½ðu  E½uÞðu  E½uÞT ;

ð4:1Þ

 þ nb and y ¼ C u  þ no . In the ensemble where u is the solution of the original nonlinear DA problem (2.2)–(2.3) with ub ¼ u method Vdu is approximated by the sample covariance. We generate a ‘synthetic’ sample of perturbations nb,l, no,l, l = 1, . . . , L,  þ nb;l and y ¼ C u  þ no;l and solve the where L stands for the sample size. For l-th element of this sample we compute ub ¼ u original nonlinear DA problem (2.2)–(2.3) to find the solution ul. This procedure is repeated L times to get the sample of optimal solutions, then the sample covariance matrix is computed as follows

b ¼1 V L

L X ðul  b E½uÞðul  b E½uÞT ; l¼1

1 b E½u ¼ L

L X

ul :

ð4:2Þ

l¼1

 , and u  is known, we can directly compute the sample covariAssuming that the optimal solution bias is small, i.e. b E½u  u ance matrix using the formula below:

7928

I.Yu. Gejadze et al. / Journal of Computational Physics 230 (2011) 7923–7943

b¼V b L :¼ 1 V L

L X

dul duTl :

ð4:3Þ

l¼1

In this case we do not need to accumulate ul, l = 1, . . . , L. Obviously, the fully nonlinear ensemble method is computationally very expensive because it requires multiple solution of the original nonlinear DA problem (2.2)–(2.3) with perturbed data. Because of that, for realistic problems the sample size L is likely to be much smaller than the state vector dimension M, in which case special methods of regularization (‘shrinking’) must be applied. For a comprehensive review of existing methods see, for example [2]. In this sense, the EIH method presented in Section 5 should be regarded as a new method for estimation of the sample covariance with a very small sample b . This method, however, requires a available. Below we present a simple algorithm for ‘cleaning’ the sampling error out of V significant sample size L, thus it has been only used to obtain the reference values of Vdu, which serve for justification of the EIH. Let us consider a linearized error equation (3.9). From (3.9), we get

  ; 0Þ V 1  du1l ¼ H1 ðu b nb;l þ Rðu; 0Þno;l ;

ð4:4Þ

where l is the sample element’s index. The superscript ‘1’ indicates that du1l is obtained as a result of the first iteration of the  (see in the Gauss–Newton (GN) method applied for solving the problem (2.2)–(2.3), starting from the initial value u0 ¼ u Appendix A). It can be seen that for linear operators R and H1 involved in (4.4) and for the Gaussian errors nb, no the following condition holds

lim

L>1

L  T 1X  ; 0Þ: du1 du1l ¼ H1 ðu L l¼1 l

ð4:5Þ

Therefore, for any integer L, the difference

b¼ dV

L  T 1X  ; 0Þ du1 du1l  H1 ðu L l¼1 l

ð4:6Þ

gives us the sampling error for linearized case. The sampling error or its norm could be used in some regularization procedures described in [2], which is worth further investigating. Here we accept a simple approach by assuming that in the nonb for large enough sample size L. Therefore, it can be eliminated linear case the sampling error can also be approximated by d V b to obtain the reference covariance in the form: (compensated) simply by subtracting from V

b ¼ V b i;j  ai aj d V b i;j ; V i;j

i; j ¼ 1; . . . ; M;

ð4:7Þ

b i;i  a2 d V b i;i > 2 ( is a small positive number) which guarantees that where the weights ai are chosen from the condition V i the resulting variance is positive. The presented approach with compensation requires some extra memory: in addition to  ; 0Þ and the sample of du1l obtained using Eq. (4.4). the sample of dul one must keep the inverse Hessian H1 ðu k Let ul be the k-iterate of the GN method applied for solving the problem (2.2)–(2.3). Then the algorithm which implements the fully nonlinear ensemble method with compensation takes the form:  solve the nonlinear problem (2.1), keep the field u  ðx; tÞ 1. Given u ¼ u  ; 0Þ 2. Compute the inverse Hessian H1 ðu 3. Start ensemble loop l = 1, . . . , L  þ nb;l and y ¼ C u  þ no;l 3.1 Define nb,l, no,l, compute ub ¼ u  3.2 Start GN loop k = 1, . . . with u0l ¼ u  , keep in memory 3.2.1 if k = 1, compute du1l ¼ u1l  u 3.2.2 iterate until the nonlinear DA problem (2.2)–(2.3) is solved  , keep in memory 3.2.3 compute dul ¼ ukl  u 3.3 End of GN loop 4. End of ensemble loop 5. Compute the covariance using (4.3), (4.6) and (4.7). Remark 4.1. It is necessary to mention that a generic problem of ensemble methods including the presented method is that the sample of perturbations nb,l, no,l (and related sample of optimal solutions ul) does not occur naturally, in which case the formula (4.2) would be an unbiased estimator of Vdu, but is artificially generated using a certain approximation of the state  is used as an origin, but in a vector (which may also be called the ‘sample origin’). In an idealized setup the exact solution u  þ Du. Let us notice that the Hessian of the problem (3.7)–(3.8) realistic setup this could be an optimal solution u ¼ u depends on / defined by a chosen origin. If the same origin function is used for the sample generation, the error compensation algorithm presented above is valid.

I.Yu. Gejadze et al. / Journal of Computational Physics 230 (2011) 7923–7943

7929

Remark 4.2. Since in a realistic setup one must use the available optimal solution u⁄ which is likely to differ from the exact  , the sample covariance computed at u⁄ may contain the origin error for any sample size L. A common assumption solution u is that the sample covariance computed at u⁄ should be a sensible approximation of Vdu, which means that the origin error can be neglected. We will show in Section 8 that in a general nonlinear case this assumption could be rather delusive. Remark 4.3. Apparently, in terms of memory requirements the method presented above is feasible for large-scale applica ; 0Þ, which is computed by the BFGS method tions. One has to keep L vectors du1l , L vectors dul and the inverse Hessian H1 ðu b  v , where v is a given vector, can be computed by applyand stored in the compact quasi-Newton form. Then, the product V ing the formulas (4.3), (4.6) and (4.7). Remark 4.4. This ensemble method must be distinguished from sequential ensemble methods used in filtering, e.g. [5,35]. The main difference is that, in filtering, the unknown vector is the current state (the cost functional is formulated for a given time instant), while we are looking for the initial condition (‘retrospective’ analysis). Therefore, these ensemble methods are not suitable for the hind-cast problem. Since we do not propagate the covariance matrix in time (and, therefore, do not use any linearization or approximations) we call our ensemble method fully nonlinear.

5. Effective inverse Hessian method 5.1. General consideration Here we present a new method for evaluation of the covariance Vdu to be used in the case of highly nonlinear dynamics. As we said in Section 4, the main drawback of the ‘fully nonlinear ensemble method’ in large-scale applications is an extremely high computational cost of a sample of optimal solutions, whereas even computation of one optimal solution may constitute a challenging task. Further we discuss an approach which may help computational costs to be reduced very significantly. Let us first consider a related abstract problem. Let g be a random vector of dimension M defined as follows

g ¼ Aðu; gÞn; where n is a Gaussian random vector of dimension M with zero expectation and the covariance matrix Vn, g(n) is a random vector with elements which are nonlinearly dependent on n, u is a given non-random vector and A is a M  M matrix with elements ai,j nonlinearly dependent on u and g. Let us assume that the matrix A can be presented in the form

Aðu; gÞ ¼ AðuÞ þ dAðu; gÞ; where AðuÞ is the non-random part of A and dA(u, g) is the random part, such that E[dA(u, g)] = 0 (which implies that E½Aðu; gÞ ¼ AðuÞ). The last condition is in agreement with the assumption on approximate consistency of the nonlinear least-squares estimator accepted in this paper. The covariance matrix of the vector g is

h i V g ¼ E Aðu; gÞnnT AT ðu; gÞ with elements

v i;j

vi,j:

h i ¼ E Ai nnT ATj ;

where Ai is the i-th row of A(u, g). Let us introduce an approximation of the element

h   i h i v~ i;j ¼ E Ai E nnT ATj ¼ E Ai V n ATj

ð5:1Þ

vi,j in the form ð5:2Þ

and consider the difference as follows:

h

i h   i Ai ðnnT  V n ÞATj ¼ E Ai þ dAi ðnnT  V n Þ ATj þ dATj h i h i   ¼ Ai E nnT dATj þ E dAi nnT ATj þ E dAi HdATj ;

v i;j  v~ i;j ¼ E

ð5:3Þ

where H = nnT  Vn. If we assume in (5.1) that Aðu; gÞ ¼ AðuÞ, i.e. we ignore the random part of A(u, g) which depends on g, then we obtain an approximation of vi,j in the form

h

v i;j ¼ E

i Ai nnT ATj ¼ Ai V n ATj

and the approximation error is as follows:

v i;j  v i;j ¼ Ai E

h i h i   nnT dATj þ E dAi nnT ATj þ E dAi nnT dATj :

ð5:4Þ

By comparing (5.3) and (5.4) we notice that the first two terms coincide; however the third term in (5.3) is always smaller than in (5.4). The approximation error given by (5.4) corresponds to the linearization error in Vdu as discussed in Section 3. ~ instead of v  we suppose that the linearization error will be reduced (see the next subsection). Based on the Thus, by using v

7930

I.Yu. Gejadze et al. / Journal of Computational Physics 230 (2011) 7923–7943

causal relationships between variables (n ) H, n ) g ) d A) one would suggest that the correlation between dAi and the rows of H are spurious (i.e. indirect). Let us note that the correlations between directly related variables, for example, between g and n could still be weak if they are related via complex nonlinear mapping. ~ i;j is too complex, this error can be analysed numerically. For example, While the theoretical analysis of the error v i;j  v one can define arbitrary nonlinear mappings g(n) and dA(g) and, given a large sample of random vectors n, compute this error. Our numerical experiments have demonstrated that the more nonlinear (chaotic) the nonlinear mappings are, the smaller is the third term in (5.3) as compared to (5.4), whereas the magnitude of the first two terms also decreases. 5.2. Derivation of the method Let us now consider the discretized nonlinear error equation (3.1) and write down the expression for du

   ; du; sÞ V 1  du ¼ H1 ðu b nb þ Rðu; duÞno :

For the covariance Vdu we have an expression as follows:

h i h i T 1 T T 1  1  1   ; du; sÞV 1   V du : ¼ E½duduT  ¼ E H1 ðu b nb nb V b H ðu; du; sÞ þ E H ðu; du; sÞRðu; duÞno no R ðu; duÞH ðu; du; sÞ h i h i T T T 1 1  1  1   ; du; sÞV 1   þ E H1 ðu ð5:5Þ b nb no R ðu; duÞH ðu; du; sÞ þ E H ðu; du; sÞRðu; duÞno nb V b H ðu; du; sÞ :

As discussed in the beginning of this Section, we approximate the products nb nTb ; no nTo ; nb nTo and no nTb in (5.5) by             E nb nTb ; E no nTo ; E nb nTo and E no nTb , respectively. Since nb and no are mutually uncorrelated, E nb nTo ¼ E no nTb ¼ 0 and the two last terms in (5.5) vanish. Thus, we write an approximation of Vdu as follows:

h   i 1 T 1   ; du; sÞ V 1   V ¼ E H1 ðu b V b V b þ Rðu; duÞV o R ðu; duÞ H ðu; du; sÞ ;

ð5:6Þ

where the expression in the round brackets is the Hessian of the auxiliary problem (3.7)–(3.8) with / = u: T  ; duÞ ¼ V 1   Hðu b þ Rðu; duÞV o R ðu; duÞ:

ð5:7Þ

Therefore, Eq. (5.6) can be written in the form

 ; du; sÞHðu  ; duÞH1 ðu  ; du; sÞ: V ¼ E½H1 ðu

ð5:8Þ

Remark 5.1. From numerical experiments for the evolution model considered in this paper we found that s is narrowly distributed around 1/2 and it could be fixed at this value should we decide to use (5.8). There are two reasons why the latest formula cannot be used in practice, especially for large-scale DA problems: (a) it is extremely sensitive to the errors which result from the accepted approximations (as numerical experiments have demon ; du; sÞ is asymmetric and indefinite. However, strated); (b) it is hardly feasible for implementation since the operator Hðu  ; du; sÞ and keeping in mind that Hðu  ; duÞ :¼ Hðu  ; du; 1Þ this formula can be further approximated by assuming s = 1 in Hðu by a simple expression as follows:

 ; duÞ: V ¼ E½H1 ðu

ð5:9Þ

The right-hand side of (5.9) may be called the effective inverse Hessian, hence the name of the suggested method. In order to compute V directly using this equation, the expectation must be substituted by the sample mean:



L 1X  ; dul Þ: H1 ðu L l¼1

ð5:10Þ

As we discussed throughout this paper, the main difficulty with the fully nonlinear ensemble method is a need of computing  þ du be an optimal solution. Let us a sample of optimal solutions. However, the formula (5.9) does not necessarily require u explain this point. If we denote by fdu the multivariate p.d.f. of du, then Eq. (5.9) can be re-written in the form



Z

þ1

 ; v Þfdu ðv Þdv : H1 ðu

ð5:11Þ

1

As we assume that in our nonlinear case the covariance matrix V describes meaningfully the p.d.f of the optimal solution error then, with the same level of validity, we should also accept the p.d.f. fdu to be approximately normal with zero expectation and the covariance V, in which case we obtain



Z

1 ð2pÞ

M=2

jVj

1=2

þ1

1

 1  ; v Þ exp  v T V 1 v dv : H1 ðu 2

ð5:12Þ

The formula (5.10) gives V explicitly, but requires a sample of optimal solutions dul, l = 1, . . . , L to be computed. In contrast, the latest expression is a nonlinear matrix integral equation with respect to V, while v is a dummy variable. It is also interesting to notice that this is a deterministic equation.

I.Yu. Gejadze et al. / Journal of Computational Physics 230 (2011) 7923–7943

7931

We admit that in derivation of (5.12) we have relied on a number of assumptions the validity of which has yet to be rigorously analyzed. For this paper we have checked all approximation steps as well as the overall performance of the suggested method numerically. 5.3. Remarks on implementation Remark 5.2. Preconditioning is used in variational DA to accelerate convergence of the conjugate gradient algorithm at the stage of inner iterations of the GN method, but it also can be used to accelerate formation of the inverse Hessian by the Lanczos algorithm [7] or by the BFGS [10]. Since H is self-adjoint, we must consider a projected Hessian in a symmetric form:

e ¼ ðB1 Þ HB1 H e is clustered around 1, with some operator B : X ? X, defined in such a way that the eigenspectrum of the projected Hessian H e are equal or close to 1. Since the condition number of H e is supposed to be much smaller i.e. the majority of eigenvalues of H e 1 can usually be obtained (either by Lanczos or BFGS) with a than the condition number of H, a sensible approximation of H e 1 , one can easily recover H1 using the formula relatively small number of iterations. After that, having H

e 1 ðB1 Þ : H1 ¼ B1 H Assuming that B preconditioning:

V ¼B

1

ð5:13Þ

does not depend on dul we substitute (5.13) into (5.10) and obtain the version of (5.10) with

! L 1X 1  e H ðu; dul Þ ðB1 Þ : L l¼1

1

ð5:14Þ

Similarly, assuming that B1 does not depend on the variable of integration we substitute (5.13) into (5.12) and obtain the version of (5.12) with preconditioning:

V ¼B

Z

1

1

ð2pÞM=2 jVj1=2

!  1 T 1 1  e H ðu; v Þ exp  v V v dv ðB1 Þ : 2

þ1

1

ð5:15Þ

e 1 which is much less expensive to compute and store in memory. A The formulas (5.14) and (5.15) instead of H1 involve H particular choice of the operator B1 is discussed in Section 6.2. Let us only add here that the suggested method would hardly be feasible without preconditioning. Remark 5.3. The nonlinear equation (5.15) can be solved by the fixed point iterative process

V

pþ1

1

¼B

1 ð2pÞM=2 jV p j1=2

Z

þ1

1

!  1 T p 1 1  e H ðu; v Þ exp  v ðV Þ v dv ðB1 Þ ; 2

ð5:16Þ

 ; 0Þ. for p = 0, 1, . . . , starting with V 0 ¼ H1 ðu Remark 5.4. Different methods can be used for evaluation of the multi-dimensional integral in (5.16), such as quasi-Monte Carlo [23]. Here, for simplicity, we use the standard Monte Carlo method. This actually implies a return to the formula (5.14). Taking into account (5.13), the iterative process takes the form

V

pþ1

1

¼B

! L   1X p 1  e H u; dul ðB1 Þ ; L l¼1

p ¼ 0; 1; . . . ;

ð5:17Þ

where dupl is generated as a random vector with zero expectation and the covariance matrix Vp. One can see that for each p the last formula looks similar to (5.14) with one key difference: dupl in (5.17) is not an optimal solution, but a vector having the statistical properties of the optimal solution. Let us notice that a few tens of outer iterations by the GN method may be e 1 is equivalent (in terms of computational required to obtain one optimal solution, while an approximate evaluation of H costs) to just one outer iteration of the GN method. One has to repeat these computations p times, however only a few iterations on index p are required in practice. Therefore, one should expect an order of magnitude reduction of computational costs by the method (5.17) as compared to (5.14) for the same sample size. Remark 5.5. Let us underline that the implementation of (5.16) or (5.17) does not require the full M  M storage of V, but allows a compact storage schemes involving k  M eigenvalues/eigenvectors (Lanczos) or 2k  M secant pairs (BFGS), where k M is the number of vectors representing V. This is a vital condition for the presented method to be feasible for large-scale applications. In order to compute w = (Vp)1v for the exponent in (5.16) one should solve the system of equations Vpw = v instead, using for example the conjugate gradient method. This only requires the matrix-vector product Vpw to be computed. In the summation (integration) procedure in (5.17), instead of adding matrices one should look for the eigenvalues/eigenvectors of a new e 1 ðu  matrix H new ; dul Þ using the matrix-vector product

7932

I.Yu. Gejadze et al. / Journal of Computational Physics 230 (2011) 7923–7943

l  1 e 1 1 e 1  ; dul1 Þw þ H  ; dul Þw; H new ðu ðu l l where

e 1 ðu  ; dul1 Þ  H new

l1 1 X e 1 ðu  ; dui Þ H l  1 i¼1

is already computed. Thus, in the sequential summation (integration) scheme one would need memory for 3k  M instead of 2M  M real numbers. 6. Numerical implementation 6.1. Numerical model As a model we use the 1D Burgers equation with a nonlinear viscous term

@ u 1 @ðu2 Þ @ ¼ þ @x @t 2 @x



lðuÞ

@u ; @x

u ¼ uðx; tÞ; t 2 ð0; TÞ; x 2 ð0; 1Þ

ð6:1Þ

with the Neumann boundary conditions



@ u

@ u

¼ ¼0

@x x¼0 @x x¼1

ð6:2Þ

and the viscosity coefficient

lðuÞ ¼ l0 þ l1

 2 @u ; @x

l0 ; l1 ¼ const > 0:

ð6:3Þ

This type of l(u) allows us to formally qualify the problem (6.1)–(6.3) as strongly nonlinear [8]. Burgers equations are sometimes considered in data assimilation context as a simple model describing elements of atmospheric flow motion. We use the implicit time discretization as follows

ui  ui1 ht

þ

 @ 1 @ ui wðui Þui  lðui Þ ¼ 0; @x 2 @x

i 2 ð1; . . . ; NÞ; x 2 ð0; 1Þ;

ð6:4Þ

where i is the time integration index, ht = T/N is a time step. The spatial operator is discretized on a uniform grid (hx is the spatial discretization step, j = 1, . . . , M is the node number, M is the total number of grid nodes) using the ‘power law’ firstorder scheme as described in [24], which yields quite a stable discretization scheme (this scheme allows l(u) down to 0.5  104 for M = 200 without noticeable oscillations). For each time step we perform nonlinear iterations on coefficients w(u) = u and l(u), assuming initially that l(ui) = l(ui1) and w(ui) = ui1, and keep until (6.4) is satisfied (i.e. pffiffiffiffiiterating ffi the norm of the left-hand side in (6.4) becomes smaller than a threshold 1 ¼ 1012 M). In all computations presented in this paper we use the following parameters: observation period T = 0.312, discretization steps ht = 0.004, hx = 0.005, state vector dimension M = 200, and parameters in (6.3) l0 = 104, l1 = 106. For numerical experiments we consider two cases A and B which correspond to two different initial conditions; the resulting field evolutions u(x, t) are presented in Fig. 2. A general property of Burgers solutions is that an initially smooth wave moves and forms shocks. This behavior can be clearly seen in Fig. 2(left), case A. In case B, Fig. 2(right), a more complex behavior is emulated. Here we observe positive and negative parts of the wave moving towards the center of the domain, where the shocks eventually collide. These initial conditions are chosen to stimulate as highly nonlinear behavior as possible. 6.2. BFGS for computing the inverse Hessian e u  ; duÞ is computed as a collateral result of the BFGS iterations (see e.g. [6,33,9]) while The projected inverse Hessian Hð solving the following auxiliary DA problem:

8 @du  F 0 ðuÞdu ¼ 0; > > < @t dujt¼0 ¼ B1 du; > > : J 1 ðduÞ ¼ inf J 1 ðv Þ;

t 2 ð0; TÞ; ð6:5Þ

v

where

J 1 ðduÞ ¼

  1  1 1 1  1 V b B ðdu  nb Þ; B1 ðdu  nb Þ þ V o ðCdu  no Þ; Cdu  no : X Yo 2 2

ð6:6Þ

I.Yu. Gejadze et al. / Journal of Computational Physics 230 (2011) 7923–7943

ϕ

7933

ϕ

Fig. 2. Field evolution for different initial conditions: left – case A, right – case B.

The formulation (6.5)–(6.6) follows from (3.7)–(3.8) as a result of the change of variables d u ? B1du and nb ? B1nb. The optimality system which corresponds to the problem (6.5)–(6.6) looks as follows:

( @du @t

 F 0 ðuÞdu ¼ 0;

t 2 ð0; TÞ;

ð6:7Þ

1

dujt¼0 ¼ B du; (



u  @@t  ðF 0 ðuÞÞ u ¼ C  V 1 o ðCdu  no Þ;

t 2 ð0; TÞ;

u jt¼T ¼ 0; 1  V 1 b B ðdu  nb Þ  u jt¼0 ¼ 0:

ð6:8Þ

ð6:9Þ

Then, the inverse Hessian in the original space is computed by the formula

e 1 ðu ; duÞ ¼ B1 H  ; duÞðB1 Þ : H1 ðu 1

ð6:10Þ 1

1=2 Vb .

A usual choice of B used in variational DA is B ¼ This approach is often being referred as the ‘first level preconditioning’ [7]. The corresponding projected Hessian is as follows:

  e u  ; duÞ ¼ I þ V 1=2  ; duÞV o R ðu  ; duÞV b1=2 : Hð Rðu b

ð6:11Þ

e u  ; duÞ are bounded from below by 1, and its condition number is simply its largest It can be seen that the eigenvalues of Hð eigenvalue. However, in order to reduce the condition number of the sought inverse Hessian even further, we use ‘the second e u e 1=2 ðu  ; duÞ is projected using the square root H  ; 0Þ. This choice is based on a natural level preconditioning’. That is, Hð 1  1  e e assumption that H ðu; duÞ is still reasonably close to H ðu; 0Þ, at least in parts where nonlinearity is relatively mild. In this case the BFGS iterations are needed to account for nonlinearity only. Thus, the preconditioner in our method is:

e 1=2 ðu  ; 0Þ: B1 ¼ V 1=2 b H

ð6:12Þ

e 1=2 ðu e 1 . However, it is important  ; 0Þ we apply Cholesky factorization of the explicitly formed matrix H In order to compute H e 1=2 w can be computed by a recursive procedure using the accumulated secant to note that the square root-vector product H e 1 and its subsepairs (BFGS) or eigenvalues/eigenvectors (Lanczos) as described in [32], without a need of formation of H quent Cholesky factorization. Applied for solving the problem (6.5)–(6.6), the BFGS algorithm with the exact step search [9] has the form: 1. 2. 3. 4. 5. 6.

Given u, solve the nonlinear problem (2.1), keep the field u(x, t) e 1 ¼ I Put each element of du1 equal to max and H 1 1 Solve the TLM equation (6.7) with du = du and keep Cdu1 = Cdu 1 Compute J1(du ) (6.6) Compute J 01 ðdu1 Þ by solving the adjoint Eq. (6.8) Start BFGS iteration loop k = 1, . . . e 1 J01 ðduk Þ 6.1 Compute directions dk ¼ H k 6.2 Solve the TLM equation (6.7) with du = dk and keep ud = Cdu 6.3 Compute the step using the formula

7934

I.Yu. Gejadze et al. / Journal of Computational Physics 230 (2011) 7923–7943

    k k k V 1 þ V 1 b ðdu  nb Þ; d o ðCdu  no Þ; ud Yo  X   ak ¼  : k k V 1 þ V 1 b d ;d o ud ; ud X

ð6:13Þ

Yo

Update the solution duk+1 = duk + akdk Update model predictions Cduk+1 = Cd uk + akud Compute J1(duk+1) (6.6) Compute J 01 ðdukþ1 Þ by solving the adjoint Eq. (6.8) Check stopping criteria Compute sk ¼ dukþ1  duk ; yk ¼ J 01 ðdukþ1 Þ  J 01 ðduk Þ e 1 using the LBFGS update forUsing the secant pair vectors (si, yi), i = 1, . . . , k, compute the inverse Hessian H kþ1 mula [19]. 7. End of BFGS iteration loop. 6.4 6.5 6.6 6.7 6.8 6.9 6.10

Generally, any continuous function nb can be chosen as a driving function in (6.5)–(6.6), while no must be consistent with nb, i.e. one must compute no = Cdu, where du is the solution of the TLM equation (6.7) with du = nb. In this case, the solution of ~ (6.5)–(6.6) is du = nb and J1(du) = 0. In practice we use nb ¼ V 1=2 b nb , where

~nb ðxÞ ¼ 1 þ

M=2 X ðsinðpkxÞ þ cosðpkxÞÞ

ð6:14Þ

k¼1

~b ðxÞ is chosen such that it contains all ‘frequencies’ that could be represented on the mesh. The for even M. The function n consistent tangent linear and adjoint models have been generated from the original forward model by the Automatic Differentiation tool TAPENADE [12] and checked using the standard gradient test. 6.3. Additional implementation details 1 Finally we have to define V 1 b and V o in (6.6). The first one is computed assuming that the background error belongs to the Sobolev space W 22 ½0; 1 (see [10] for details). The corresponding correlation function is presented in Fig. 3, while the background error variance is presented in Fig. 1(left/right) by dashed line. We assume that the observation error is not correlated and, therefore, can be defined by the variance. Previously (in Section 6.1) we introduced two cases A and B, which correspond to two different boundary conditions. Furthermore, for each case we consider a different sensor scheme: case A-5 sensors located at ^ xk ¼ 0:35; 0:4; 0:5; 0:6; 0:65; case B-4 sensors located at ^ xk ¼ 0:4; 0:45; 0:55; 0:6. For case A we also consider two sub-cases which differ by the magnitude of the background and observation errors. Thus, in cases A1 and B the background error variance (in the central part of the domain) is r2b ¼ 0:2, the observation error variance is r2o ¼ 103 ; in case A2 we have an order of magnitude smaller errors, i.e. r2b ¼ 0:02 and r2o ¼ 104 . The background error correlation function is the same in all cases (as presented in Fig. 3). In the next section, Figs. 4, 7 and 9(left) correspond to case A1, Figs. 5, 8 and 10 – to case A2, and Figs. 6, 9(right) – to case B, and Fig. 11 comprises cases A1 and B.

7. Numerical results First of all, we compute the analysis covariance matrix using the fully nonlinear ensemble method with sampling error b  . In the course of compucompensation (Section 4) for a large sample size L = 2500, to be regarded as a reference value V  b b L Þ; L ¼ 100 tation of V we also successively evaluate (using the formula (4.3)) twenty five sample variance vectors diagð V b L Þ; L ¼ 25 (the subscript L means that each sample variance is estimated and one hundred sample variance vectors diagð V using an independent sample dul of size L). Then, the relative error in the sample variance (or relative sampling error) can be defined as the vector ^eL with the components:

b LÞ =V b   1; ð^eL Þi ¼ ð V i;i i;i

i ¼ 1; . . . ; M:

Let us also define the relative error in a certain approximation of V as the vector e with the components

ei ¼ V i;i = Vb i;i  1; i ¼ 1; . . . ; M

ð7:1Þ

and compute this error with V in (7.1) being estimated by one of the following methods:  ; 0Þ; (1) by the inverse Hessian method, i.e. simply using V du ¼ H1 ðu (2a) by the EIH method implemented in the form (5.14), which requires a sample of optimal solutions dul to be computed; (2b) by the EIH method implemented as the iterative process (5.17) (see Remark 5.4), which requires a sample of dul, but does not require that dul are optimal solutions.

7935

I.Yu. Gejadze et al. / Journal of Computational Physics 230 (2011) 7923–7943

1

correlation

0.8

0.6

0.4

0.2

0 -0.3

-0.2

-0.1

0

0.1

x

0.2

0.3

Fig. 3. Correlation function.

For computation of V by methods 2a or 2b a sample of dul is required, hence the result depends on the sample size L (for these two case we denote V by VL). Thus, we compute the ‘asymptotic’ value of VL (to be denoted V) with a large sample size L = 2500. As we did before with the sample covariance, in the course of computation of V we successively evaluate twenty five variance vectors diag(VL), L = 100, and one hundred variance vectors diag(VL), L = 25. Then, the relative error in the variance estimate obtained by methods 2a or 2b is defined as the vector eL with the components:

b   1; ðeL Þi ¼ ðV L Þi;i = V i;i

i ¼ 1; . . . ; M:

In method 2b we currently allow enough iterations on index p for the iterative process (5.17) to converge in terms of the distance between the successive iterates. In practice, this requires just a few iterations, typically 3–4. In the right panels in Figs. 4–6 a set of one hundred vectors ^e25 is presented in dark lines, and a set of twenty five vectors ^e100 – in the overlaying white lines. These plots reveal the envelopes for the relative error in the sample variance obtained with the formula (4.3) with L = 25 and L = 100, respectively. The envelopes increasingly widen for the sub-diagonals as one moves away from the diagonal. It is interesting to notice in Fig. 4 that both the dark and white envelopes have an anomaly in the area x 2 (0.35, 0.4), which can be explained by the presence of another solution (playing a role of an attractor) in the vicinity  . This manifests limitations of the Gaussian approach as applied to highly nonlinear DA problems; nevof the exact solution u ertheless we still rely on the ‘close-to-linear’ assumption described at the beginning of Section 4. The graphs of e are presented in the left panels of the same figures: line 1 corresponds to method 1 (the inverse Hessian method), lines 2 and 3 – to methods 2a and 2b (variants of the EIH method). For the last two methods the asymptotic value V is used in the formula (7.1) as V. Looking at Figs. 4–6 we observe that the relative error in the sample variance ^e25 (dark envelope) exceeds 50% almost everywhere, which is certainly beyond reasonable margins, and ^e100 (white envelope) is around 25% (that is still fairly large). In order to reduce the white envelope two times, one would need to use a sample size L = 400, etc., which is hardly feasible for large scale problems. One should also keep in mind that the envelope for the variance (diagonal of the covariance matrix) is the narrowest as compared to the sub-diagonals. Thus, methods for evaluation of the covariance other than the sample 2

2 1 2 3

1.5

1.5 1

1

ε

^

ε 0.5

0

0

-0.5

-0.5 0.1

0.5

0.2

0.3

0.4

x

0.5

0.6

0.7

0.1

0.2

0.3

0.4

0.5

0.6

0.7

x

Fig. 4. Case A1. Left: the relative error e by the inverse Hessain – line 1, and by the EIH methods using the asymptotic estimate V : method 2a – line 2 and method 2b – line 3. Right: the sample relative error ^e. Set of ^e for L = 25 – dark envelope and set of ^ e for L = 100 – white envelope.

7936

I.Yu. Gejadze et al. / Journal of Computational Physics 230 (2011) 7923–7943

1

1 1 2 3

0.5

0.5

ε^

ε 0

0

-0.5

-0.5

0.1

0.2

0.3

0.4

x

0.1

0.6

0.5

0.2

0.3

0.4

0.6

0.5

x

0.7

Fig. 5. Case A2. The same as Fig. 4.

1 2 3

1

ε

1

0.5

ε^

0

0

-0.5 0.1

0.5

-0.5 0.2

0.3

0.4

0.5

x

0.6

0.7

0.8

0.9

0.1

0.2

0.3

0.4

0.5

x

0.6

0.7

0.8

0.9

Fig. 6. Case B. The same as Fig. 4.

methods are valuable. Whereas method 1 (the inverse Hessian method) gives an estimate of Vdu with a small relative error (as compared to the sample covariance) in the areas of mild nonlinearity, this error can be much larger in the areas of high nonlinearity, for example one can observe line 1 jumping at certain locations far outside the dark envelope (Figs. 4 and 6). At the same time, the relative error in the asymptotic value V obtained by methods 2a and 2b (variants of the EIH method) is much smaller as compared to the error in line 1 as it generally remains within the white envelope. The difference between the estimates by methods 2a and 2b is not significant. Let us note that e computed with the asymptotic value V is the smallest relative error achievable with the EIH method and represents the bias of the method. This bias results from approximations made in the derivation of the method (see Section 5). If V is evaluated using smaller samples, then e increases as the sampling error associated with the EIH method must be growing. In order to see the influence of the sample size L let us refer to Fig. 7(case A1) and Fig. 8(case A2). Here, at each panel we present a set of one hundred vectors e25 (lines 1), a set of twenty five vectors e100 (lines 2), as well as diag(V) in the solid black line. These plots reveal the envelopes for e obtained by method 2a (left panels) and by method 2b (right panels) with L = 25 and L = 100, respectively. Looking at these figures one may conclude that: (a) the width of the envelope depends on the level of nonlinearity, e.g. in the areas of mild nonlinearity the envelope is narrowing; (b) in the areas of high nonlinearity the envelope for e is still much narrower than the corresponding envelope for the sample variance error ^e (see Fig. 4(right) and Fig. 6(right)); (c) the difference between envelopes obtained by methods 2a and 2b is small as compared to the magnitude of envelopes themselves, though we notice, for example, that for L = 100 method 2b gives a wider envelope. However, method 2b does not require a set of optimal solutions to be computed, which is a key factor which enables the major reduction of computational costs to be achieved. One may also expect that if the integration of (5.16) is performed by a method that converges faster than the basic Monte Carlo method, the envelopes in Figs. 7 and 8 would be narrower.

7937

I.Yu. Gejadze et al. / Journal of Computational Physics 230 (2011) 7923–7943

0.8

0.8 1 2 3

0.6

0.6

0.4

ε

0.4

0.2

ε

0.2

0

0

-0.2

-0.2

-0.4

-0.4

-0.6

0.4

-0.6

0.7

0.6

0.5

0.4

x

0.5

0.6

0.7

x

Fig. 7. Case A1. Set of e25 – dark envelope (lines 1), set of e100 – white envelope (lines 2), the relative error e – solid line 3. Left – method 2a, right – method 2b.

1 2 3

0.2

0.2

0

0

ε

ε -0.2

-0.2

-0.4

-0.4 0.4

0.7

0.6

0.5

x

0.4

0.5

0.6

0.7

x Fig. 8. Case A2. The same as Fig. 7.

Complimentary to Figs. 4–6 showing the relative error in the variance estimate, Fig. 9 shows the absolute value of the b  Þ (line 2), its estimate by the inverse Hessian method (line 1) and asymptotic estimates diag(V) reference variance diagð V obtained by the EIH methods 2a (line 3) and 2b (line 4), case A1 – the left panel, case B – the right panel. By definition the correlation function is

ri;j ¼ V i;j =ðV i;i V j;j Þ1=2 ;

i; j ¼ 1; . . . ; M;

ð7:2Þ

b  , to the covariance Let us denote the correlation matrices r , r and r which correspond to the reference covariance matrix V  obtained by method 1 and to the asymptotic covariance matrix V obtained by method 2a respectively. Then, the left panel in Fig. 11 shows the reference correlation function ^r , the lower triangle in the right panel shows the error r   ^r and the upper triangle – the error r  ^r  . The upper panel in Fig. 11 corresponds to case A1, the lower panel – to case B. One can see that the error r  ^r  is significantly smaller than r   ^r (the former makes about 20  30% of the later), which means that the EIH method yields a noticeably more accurate estimate of the correlation matrix than the inverse Hessian method. In summary, the EIH method (both variants 2a and 2b) allows the linearization error in the covariance estimate to be noticeably reduced as compared to the inverse Hessian method (method 1). The best improvement can be achieved for its diagonal elements (the variance). The covariance estimate by the EIH method is also noticeably better than the sample covariance obtained with the equivalent sample size. The variant 2b has an obvious advantage over the variant 2a in terms of computational efficiency, while in terms of accuracy the results by both variants are comparable. ^





8. Analysis error covariance and the origin error  and computation of Vdu In Section 3 we suggested to distinguish two tasks: computation of Vdu given the exact solution u  Þ is approximated by the inverse Hessian (method 1), the estimate contains the lineargiven an optimal solution u⁄. If V du ðu b , the estimate contains the sampling error. The EIH method ization error; if it is approximated by the sample covariance V

7938

I.Yu. Gejadze et al. / Journal of Computational Physics 230 (2011) 7923–7943

0.2

0.2 1 2 3 4

0.15

variance

variance

0.15

0.1

0.05

0 0.1

1 2 3 4

0.1

0.05

0.2

0.3

0.4

x

0.5

0 0.1

0.6

0.2

0.3

0.4

0.5

x

0.6

0.7

0.8

0.9

Fig. 9. The reference variance – line 2, variance by the inverse Hessian – line 1, variance by the EIH method 2a – line 3, variance by the EIH method 2b – line 4. Left – case A1, right – case B.

allows the linearization error to be reduced, whereas the sampling error associated with this particular method is relatively small and only appears in the areas of high nonlinearity. The second task (computation of Vdu(u⁄)) is characterized by the presence of the origin error. None of the methods presented in this paper is able to deal with this error. However, the second task is very important for such problems as sequential state estimation or design of adaptive measurements. For example, in Kalman filtering (classical, ensemble [5] or unscented [34]) applied for observation of highly nonlinear dynamical systems, the posterior covariance is computed given the posterior estimate of the state vector which is, in our notations, the optimal solution u⁄. Whereas the importance of the essentially nonlinear covariance propagation had been recognized and the appropriate algorithms (based on the so-called unscented transformation [13]) have been developed, the issue of the origin error is hardly discussed at all. Let us look at this issue more carefully. To this end we consider a sample of optimal solutions of size b  by the fully nonlinear ensemble method for case A2. Let L = 100 produced during computation of the reference covariance V us compute the inverse Hessian on each optimal solution from this sample (which is a part of computations by the EIH method) and keep the diagonals, i.e. the variance vectors. Next, we plot all variance vectors by dots, and the reference variance – in continuous blue line in Fig. 10(left). This figure shows the central part of the domain where the nonlinear effects are most strong. The corresponding optimal solutions are presented in Fig. 10(right) by dots, and the exact solution - in blue line. The colours are introduced to distinguish clusters of ‘small variance’ (black), ‘medium variance’ (orange) and ‘large variance’ (red) vectors in the area x = (0.4, 0.5). The variance clusters correspond to the sets of optimal solutions represented by the same colour. The first thing to notice in Fig. 10(left) is that the variance estimate varies from quite small values back to the background values for different optimal solutions. Any optimal solution from the sample is a statistically significant event, which means it might occur as an outcome of the DA procedure with the realistic (instrumental rather than synthetic) data and subsequently serve as u⁄. It is obvious that for some u⁄ from the sample the variance estimate by H1(u⁄, 0) is very far from the reference variance (blue line). Let us notice that here we consider case A2 (the relative error is presented in Fig. 5(left), line 1), where the errors on the input data are by an order of magnitude smaller than in cases A1 and B. The presented examples

0.02

0.5

A

1 2 3 4

B

variance

0.25

ϕ

0.01

0

D

-0.25 0

0.4

0.45

0.5

x

0.55

0.6

0.4

C 0.45

0.5

x

0.55

0.6

Fig. 10. Case A2. Left – set of variance vectors (obtained by the inverse Hessian method). Right – set of optimal solutions.

I.Yu. Gejadze et al. / Journal of Computational Physics 230 (2011) 7923–7943

7939

clearly demonstrate that the origin error can easily supersede the linearization error. Let us underline that the origin error cannot be eliminated by means of the ensemble methods which rely on a ‘synthetic’ sample. Another thing to notice in Fig. 10(left) is that there are less variance vectors in the ‘medium variance’ cluster (orange) than either in the ‘small variance’ (black) or ‘large variance’ (red) clusters; subsequently the distribution of variances around the reference variance does not look Gaussian at all. Therefore an appropriate question is whether or not the mean variance computed over the sample would represent the sample. It is also interesting to observe in Fig. 10(right) that the optimal solutions which correspond to the different clusters are also structured in a certain way. In particular, the solutions associated with the ‘small variance’ cluster (black) cross the line x = 0.5 from quadrant A to quadrant C; whereas those associated with the ‘large variance’ cluster (red) - from quadrant D to quadrant B. Thus, by analyzing a sample of variance vectors it might be possible to reveal certain properties of the solution space which determine the potential observability of the system. The optimal solution is the best available estimate of the state vector consistent with the data. Naturally, the analysis error covariance evaluated given the optimal solution (by any method) also seems to be the best available estimate of the true covariance. However, in the highly nonlinear case this estimate contains the origin error. As we demonstrated above, this error could be very significant. The question is as follows: is there useful information in a sample of variance vectors which can be used to reduce possible impact of the origin error? Let us consider, for example, a sequential data assimilation procedure, where the covariance defines the weights to the background term for the next DA window. A simple approach is to use as the variance the upper bound of the variance envelope (see Fig. 10(left), x 2 (0.4, 0.5), red cluster) instead of the mean variance (blue line). This approach would guarantee reliable (rather than optimal) weights to the background term and should result to a more robust state estimation procedure. It is worth mentioning that a far more delicate approach for using the information contained in a sample of variance vectors can be suggested (currently under development).

9. Conclusions In this paper we consider the hind-cast data assimilation problem, which is a typical problem in geophysical applications. From the mathematical point of view, this is an initial value control problem for a nonlinear evolution model governed by partial differential equations. Methods for computing the analysis error covariance matrix are suggested, keeping in mind the large-scale nature of the model involved. While the so-called tangent linear hypothesis (TLH) holds, the covariance is often approximated by the inverse Hessian of the objective function. Moreover, the same approximation could often be valid even though the TLH is violated. In this paper we deal with such a highly nonlinear dynamics that the inverse Hessian approach is no longer sensible, yet we assume that the p.d.f. of the optimal solution error remains reasonably close-to-normal. This implies that the covariance matrix is still a meaningful representative of the p.d.f. and, therefore, a better approximation than that given by the inverse Hessian needs to be evaluated. The ‘close-to-normal’ assumption is based on two points: (a) the large sample estimation theory asserts that the nonlinear least-squares estimator is asymptotically (for T ? 1) consistent and normal; (b) in variational data assimilation one often has a significant number of observation instants (simultaneously involved in a single estimation event), so the asymptotic properties of the estimator should be manifesting themselves. The following types of the error in the estimate of the covariance must be distinguished: the sampling error, the linearization error and the origin error. The sampling error can be defined as a difference between the asymptotic (for L ? 1) sample covariance and the sample covariance evaluated with a finite sample size L. The linearization error can be defined as a difference between the asymptotic sample covariance and the inverse Hessian. This error results from substitution of the exact error equation (3.1) by the linearized equation (3.9). The origin error can be defined as a difference between the covariance evaluated given the exact state and the covariance evaluated given an optimal solution, for any chosen method. According to this classification, the methods introduced in this paper allow us to deal with the linearization error or sampling error. The issue of the origin error is briefly discussed in Section 8. It has been shown in Section 7 that the linearization error could be large in certain areas of the computational domain where the dynamics is highly nonlinear; hence we are looking for means of reducing this error. We consider the fully nonlinear ensemble method in which we evaluate the sample covariance given a set of optimal solutions. These solutions are obtained by solving the nonlinear DA problem with data perturbed around the exact data (which correspond to the known exact state). The sample covariance obtained by this method contains only the sampling error. However, for large-scale problems, the sample size is likely to be very small, therefore a rather large sampling error by this method might be expected. In Section 4 a modification of this method is suggested, where the sampling error is partly compensated. This method has proved to perform well for a significant sample size (L > 400), thus it was used only to evaluate the reference covariance for the purpose of comparison. In Section 5 a new method for computing the covariance called the ‘effective inverse Hessian’ (EIH) method is suggested. This method requires a sample of inverse Hessians to be computed at points specified in a certain vicinity of the exact state, which is an achievable task with properly designed preconditioning. Both variants of the EIH method allow the linearization error to be noticeably reduced as compared to the inverse Hessian method, while the variant 2b offers a dramatic reduction of computational costs as compared to the fully nonlinear ensemble method. A likely presence of the origin error in the analysis error covariance estimate is another important issue. This error might actually be larger than the linearization or sampling error. None of the methods presented in this paper can deal with the origin error. Since the given optimal solution is considered as the best estimate of the state vector consistent with data, the corresponding asymptotic sample covariance is naturally believed to be the best available estimate of the true

7940

I.Yu. Gejadze et al. / Journal of Computational Physics 230 (2011) 7923–7943

covariance. However, the best available estimate does not mean the most reliable estimate. A possible magnitude of the origin error can be accessed by analyzing a sample of variance vectors, as discussed in Section 8. In order to achieve a robust sequential state estimation for highly nonlinear dynamical systems one may use the upper margins of the variance envelope instead of the mean variance. We would also like to underline that the sample of variance vectors certainly contains a lot of information about the solution space in the vicinity of an optimal solution; thus methods for extraction and use of this information should be further developed. Acknowledgments The first author acknowledges the funding through the Glasgow Research Partnership in Engineering by the Scottish Funding Council. The co-authors acknowledge the MOISE project (CNRS, INRIA, UJF, INPG), the project ADAMS, the project 09-01-00284 of the Russian Foundation for Basic Research, Russian Federal Research Program ‘Kadry’ and the program ECO-NET (EGIDE). Appendix A. On the Gauss–Newton method Let us re-write the first equation in the optimality system (2.4)–(2.6) in an equivalent form:

(

@u @t

 F 0 ðuÞu ¼ FðuÞ  F 0 ðuÞu þ f ; ujt¼0 ¼ u:

t 2 ð0; TÞ;

ðA:1Þ

Now, to solve the system (2.4)–(2.6) we can consider the following iterative algorithm:

(

~ kþ1 @u  F 0 ð k Þ ~ kþ1 @t kþ1 ~ jt¼0 ¼ ukþ1 ;

u u

¼ Fðuk Þ  F 0 ðuk Þuk þ f ;

t 2 ð0; TÞ;

ðA:2Þ

u (

ðkþ1Þ

 @ u @t

u

ðkþ1Þ

~ kþ1  yÞ;  ðF 0 ðuk ÞÞ uðkþ1Þ ¼ C  V 1 o ðC u

t 2 ð0; TÞ;

ðA:3Þ

jt¼T ¼ 0;

kþ1 V 1  ub Þ  uðkþ1Þ jt¼0 ¼ 0; b ðu

ðA:4Þ

k

k

where u is the solution to the original nonlinear problem (2.1) with u = u . The system (A.2)–(A.4) can be presented as an operator equation

Hk ukþ1 ¼ Q k ;

ðA:5Þ k

where Hk is defined by (3.2)–(3.4) with u1 = u2 = u and the right-hand side Qk is defined by

(

 F 0 ðuk Þh ¼ Fðuk Þ  F 0 ðuk Þuk þ f ;

@h @t

t 2 ð0; TÞ;

hjt¼0 ¼ 0; (



 @[email protected]  ðF 0 ðuk ÞÞ h ¼ C  V 1 o ðCh  yÞ;

t 2 ð0; TÞ;



h jt¼T ¼ 0;  Q k ¼ V 1 b ub þ h jt¼0 :

ðA:6Þ

ðA:7Þ

ðA:8Þ

It is easily seen that

Q k ¼ Hk uk  Gk ; where Gk is the gradient of the functional S at the point uk : Gk ¼ gradSju¼uk defined by the equations:

(

@ uk @t k

¼ Fðuk Þ þ f ;

t 2 ð0; TÞ;

ðA:9Þ

k

u jt¼0 ¼ u ; (

k

k  @ [email protected]  ðF 0 ðuk ÞÞ uk ¼ C  V 1 o ðC u  yÞ;

u jt¼T ¼ 0; k

k k Gk ¼ V 1 b ðu  ub Þ  u jt¼0 :

t 2 ð0; TÞ;

ðA:10Þ

ðA:11Þ

Then, from (A.5) we get

ukþ1 ¼ uk  H1 k Gk :

ðA:12Þ

7941

I.Yu. Gejadze et al. / Journal of Computational Physics 230 (2011) 7923–7943

Thus, from (A.12) we see that the iterative method (A.2)–(A.4) looks like the Newton method (see e.g. [16]) with Hk instead of the Hessian of the original functional (2.3). This is the so-called Gauss–Newton method [14]. The difference between Hk and the Hessian of the original functional is in omitting the second-order terms (see e.g. [9] for definition of the original Hessian through the second-order adjoint). As follows from [14,32], this method coincides with the incremental formulation of variational data assimilation [3]. Note, that our derivation of this method differs from the usual one [32]. We have derived this method starting from the optimality system, and this approach may be used for constructing other modifications of the GN algorithm.  as the initial guess of the GN method: u0 ¼ u  , then u0 ¼ u  , and the operator H0 for k = 0 Let us take the ‘exact’ solution u  . Moreover, the gradient G0 defined by (A.9)– coincides with the Hessian H of the problem given by (3.7) and (3.8) for / ¼ u 0  (A.11) coincides with P, where P ¼ V 1 b nb þ Rno is the right-hand side of (3.9). Therefore, with the initial guess u ¼ u, the first iteration of the GN method

u1 ¼ u0  H1 0 G0

ðA:13Þ

 another origin u⁄ is used for u0, then the formulas (A.13) and (4.4) are coincides with the error equation (4.4). If instead of u  substituted by u0 = u⁄, where u⁄ is the solution of (2.1) for u = u⁄. valid with u Appendix B. Multivariate normality test Here we consider a well established multivariate normality test, based on Srivastava’s measures of skewness and kurtosis [28], being applied to the samples of optimal solutions ul, l = 1, . . . , L, where L = 1600. These samples have been accumulated b  for cases A1, A2 and B (see Section 7). Our aim is to in the course of evaluation of the reference values of the covariance V demonstrate that in the cases under consideration the optimal solution error has a reasonably close-to-normal p.d.f. and, therefore, the covariance matrix remains useful to characterize it. b from (4.2) in the form V b ¼ WSW T , Let us consider the singular value decomposition of the sample covariance matrix V where W is a matrix of singular vectors wi and S is the diagonal matrix of singular values si. Then, the sample measures 2 of skewness b1;M and kurtosis b2 are defined in [28] as follows: 2

b1;M ¼

M L 1 X 1X s3=2 ðr i;l  r i Þ3 i M i¼1 L l¼1

!2 ;

b2 ¼

M L 1 X 1X s2 ðri;l  r i Þ4 ; i M i¼1 L l¼1

where M is the state vector dimension (also the number of degrees of freedom), L is the sample size, ri;l ¼ wTi ul and P r i ¼ ð1=LÞ Ll¼1 r i;l . Furthermore, the following theorem is proved: for large L 2

(a) c1;M :¼ ðLM=6Þb1;M ! v2M ; (b) c2 :¼ (LM/24)1/2(b2  3) ? N(0, 1). The above theorem allows us to check the statistical hypothesis that the sample of ul belongs to the normal population.  is always used instead of the sample mean Since in the EIH method the approximation of Vdu = E[duduT] is sought, u P b E½u ¼ ð1=LÞ Ll¼1 ul . The measure of a bias is then introduced as follows:

 ÞÞ1=2 :  ÞT ð b  Þ=ðu T u c0 ¼ ðð b E½u  u E½u  u The results of the normality tests which involve these samples are summarized in Table 1. Let us recall that the 5% critical point, which is typically used for testing the null-hypothesis, is 235.08 for v2201 and 1.96 for N(0, 1). In cases A2 and B the relative bias c0 is less than 5%, c2 is less than 1.96, however c1,201 is larger than 235.08, albeit not too significantly (the corresponding probabilities of exceeding the values of c1,201 are presented in the adjacent column). This comparison asserts that the deviations from the normal for the samples A2 and B are reasonably small. In case A1 we have both c0 and c2 slightly exceeding the 5% critical values, but in particular, c1,201 = 351.61 exceeds 235.08 quite significantly. In order to assess what this single value actually means we present as a 2D plot in Fig. 12 a frequency histogram for each state variable. Whereas some asymmetry can definitely be noticed in certain parts of the spatial domain, for example, in the vicinity of the point x = 0.5, the marginal distributions does not seem (at least visually) to be critically skewed anywhere. Thus we believe that in case A1 too, the p.d.f. of the optimal solution error remains reasonably close-to-normal.

Table 1 Summary of normality tests, L = 1600. Case

c0

c1,201

Probability

c2

A1 A2 B

6.692 1.612 4.942

351.61 243.71 253.75

>0.99999 0.978 0.993

2.759 1.664 1.426

7942

I.Yu. Gejadze et al. / Journal of Computational Physics 230 (2011) 7923–7943 0.8

0.7

0.6

x'

1

0.5

0.75 0.5

0.4

0.25 0

0.3

-0.25 -0.5

0.2 0.2

0.3

0.4

0.5

0.6

0.8 0.2

0.7

0.3

0.4

0.5

x

0.6

0.7

0.8

0.5

0.6

0.7

x

0.7

0.6

0.5 1

x'

0.4

0.75 0.5

0.3

0.25 0

0.2 -0.25 -0.5

0.1 0.1

0.2

0.3

0.4

0.5

0.7 0.1

0.6

0.2

0.3

0.4

x

x

Fig. 11. Left: the correlation function (reference value). Right: lower triangle – absolute error by the inverse Hessian method, upper triangle – by the EIH method 2a. Upper panel – case A1, lower panel – case B.

0.1

optimal solution error

0.2

0.09 0.08

0.1

0.07 0.06 0.05

0

0.04 0.03

-0.1

0.02 0.01

-0.2

0 0.3

0.4

0.5

x

0.6

0.7

Fig. 12. Frequency histogram for the optimal solution error, case A1, L = 1600.

I.Yu. Gejadze et al. / Journal of Computational Physics 230 (2011) 7923–7943

7943

References [1] M. Berliner, Z.-Q. Lu, C. Snyder, Statistical design for adaptive weather observations, J. Atm. Sci. 56 (1999) 2436–2552. [2] P.J. Bickel, E. Levina, Regularized estimation of large covariance matrices, Ann. Statist. 36 (1) (2008) 199–227. [3] P. Courtier, J.N. Thépaut, A. Hollingsworth, A strategy for operational implementation of 4D-Var, using an incremental approach, Q. J. R. Meteorol. Soc. 120 (1994) 1367–1388. [4] S. Dobricic, A sequential variational algorithm for data assimilation in oceanography and meteorology, Mon. Wea. Rev. 137 (1) (2009) 269–287. [5] G. Evensen, Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte-Carlo methods to forecast error statistics, J. Geophys. Res. 99 (C5) (1994) 10143–10162. [6] M. Fisher, P. Courtier, Estimating the covariance matrices of analysis and forecast error in variational data assimilation, ECMWF Res. Depart. Techn. Memo. (1995) 220. [7] M. Fisher, J. Nocedal, Y. Trémolet, S.J. Wright, Data assimilation in weather forecasting: a case study in PDE-constrained optimization, Optim. Eng. (2008), doi:10.1007/s11081-008-9051-5. [8] S. Fucˇik, A. Kufner, Nonlinear Differential Equations, Elsevier, Amsterdam, 1980. [9] I. Gejadze, F.-X. Le Dimet, V. Shutyaev, On analysis error covariances in variational data assimilation, SIAM J. Sci. Comput. 30 (4) (2008) 1847–1874. [10] I. Gejadze, F.-X. Le Dimet, V. Shutyaev, On optimal solution error covariances in variational data assimilation problems, J. Comput. Phys. 229 (2010) 2159–2178. [11] J.R. Gunson, P. Malanotte-Rizzoli, Assimilation studies of open-ocean flows. 2. Error measures with strongly nonlinear dynamics, J. Geophys. Res. 101 (C12) (1996) 28473–28488. [12] L. Hascoët, V. Pascual, TAPENADE 2.1 user’s guide, INRIA Technical Report, no.0300, 2004, pp. 78. [13] S.J. Julier, J.K. Uhlmann, A new extension of the Kalman filter to nonlinear systems, in: Proceedings of AeroSense: The 11th International Symposium on Aerospace/Defence Sensing, Simulation and Controls, 1997. [14] A.S. Lawless, S. Gratton, N.K. Nichols, Approximate iterative methods for variational data assimilation, Int. J. Numer. Methods Fl. 1 (2005) 1–6. [15] F.-X. Le Dimet, I.M. Navon, D.N. Daescu, Second-order information in data assimilation, Mon. Wea. Rev. 130 (3) (2002) 629–648. [16] F.-X. Le Dimet, V.P. Shutyaev, On Newton methods in data assimilation, Russ. J. Numer. Anal. Math. Modell. 15 (5) (2000) 419–434. [17] F.X. Le Dimet, O. Talagrand, Variational algorithms for analysis and assimilation of meteorological observations: theoretical aspects, Tellus 38A (1986) 97–110. [18] J.L. Lions, Contrôle Optimal des Systèmes Gouvernés par Des équations aux Dérivées Partielles, Dunod, Paris, 1968. [19] D.C. Liu, J. Nocedal, On the limited memory BFGS method for large scale minimization, Math. Prog. 45 (1989) 503–528. [20] G.I. Marchuk, V.I. Agoshkov, V.P. Shutyaev, Adjoint Equations and Perturbation Algorithms in Nonlinear Problems, CRC Press Inc., New York, 1996. [21] I.M. Navon, Variational data assimilation, optimal parameter estimation and sensitivity analysis for environmental problems, Computational Mechanics’95, Vol. 1, Springer, New York, 1995. pp.740–746. [22] W.K. Newey, D. McFadden, Large sample estimation and hypothesis testing, Handbook Econ. 4 (1994) 2111–2245. Chapter 36. [23] H. Neiderreiter, Random Number Generation and quasi- Monte Carlo methods, in: CBMS-NSF Regional Conference Series in Applied Math., SIAM, vol. 63, 1992. [24] S.V. Patankar, Numerical Heat Transfer and Fluid Flow, Hemisphere Publishing Corporation, New York, 1980. [25] B.S. Powell, A.M. Moore, Estimating the 4DVAR analysis error of GODAE products, Ocean Dynam. 59 (2009) 121–138. [26] F. Rabier, P. Courtier, Four-dimensional assimilation in the presence of baroclinic instability, Quart. J. Roy. Meteorol. Soc. 118 (1992) 649–672. [27] D.A. Ratkowsky, Nonlinear Regression Modelling: A Unified Practical Approach, Marcel Dekker, New York, 1983. [28] M.S. Srivastava, A measure of skewness and kurtosis and a graphical method for assessing multivariate normality, Statist. Probab. Lett. 2 (1984) 263– 267. [29] W.C. Thacker, The role of the Hessian matrix in fitting models to measurements, J. Geophys. Res. 94 (C5) (1989) 6177–6196. [30] J.N. Thepaut, P. Courtier, Four-dimensional variational assimilation using the adjoint of a multilevel primitive equation model, Quart. J. Roy. Meteorol. Soc. 117 (1991) 1225–1254. [31] Y. Trémolet, Computation of observation sensitivity and observation impact in incremental variational data assimilation, TELLUS A. 60 (2008) 964– 978. [32] J. Tshimanga, S. Gratton, A.T. Weaver, A. Sartenaer Limited-memory preconditioners, with application to incremental four-dimensional variational assimilation, Q.J.R. Meteorol. Soc. 134 (2008) 751–769. [33] F. Veersé, Variable-storage quasi-Newton operators as inverse forecast/analysis error covariance matrices in variational data assimilation, INRIA Technical Report, vol. 3685, 1999, pp. 28. [34] E. Wan, R. Van der Merwe, The unscented Kalman filter for nonlinear estimation, in: Proceedings of IEEE Symposium, 2000, Lake Louise, Alberta, Canada. [35] M. Zupanski, Maximum likelihood ensemble filter: theoretical aspects, Mon. Wea. Rev. 133 (2005) 1710–1726.