Information Theoretic Metrics to Characterize Observations in Variational Data Assimilation

Information Theoretic Metrics to Characterize Observations in Variational Data Assimilation

Available online at www.sciencedirect.com Procedia Computer Science 9 (2012) 1047 – 1055 International Conference on Computational Science, ICCS 201...

137KB Sizes 0 Downloads 53 Views

Available online at www.sciencedirect.com

Procedia Computer Science 9 (2012) 1047 – 1055

International Conference on Computational Science, ICCS 2012

Information Theoretic Metrics to Characterize Observations in Variational Data Assimilation K. Singha , A. Sandua,1 , M. Jardaka , M. Leeb , K. Bowmanb a Computational

Science Laboratory, Department of Computer Science, Virginia Tech, Blacksburg, VA 24060, USA. URL: http://csl.cs.vt.edu. b Jet Propulsion Laboratory, 4800 Oak Grove Drive, Pasadena, CA 91109, USA.

Abstract Data assimilation obtains improved estimates of the state of a physical system by combining imperfect model results with sparse and noisy observations of reality. Not all observations used in data assimilation are equally valuable. The ability to characterize the usefulness of different observation locations is important for analyzing the effectiveness of the assimilation system, for data pruning, and for the design of future sensor systems. This paper proposes a new approach to characterizes the usefulness of different observation in four dimensional variational (4D-Var) data assimilation. Metrics from information theory are used to quantify the contribution of observations to decreasing uncertainty with which the system state is known. We derive ensemble based, computationally feasible procedures to estimate the information content of various observations. Keywords: Four dimensional variational data assimilation, information theory

1. Introduction In this paper we employ metrics from information theory to quantify the contribution of observations in improving the state estimate obtained through data assimilation. The information content of observations is loosely defined by their contribution to decreasing the uncertainty in the state estimate [14]. Several of the information theoretic metrics employed here measure the decrease in the (co-)variance of the error, while others measure the benefit of data assimilation in terms of adjusting the mean of the distribution. Information theory has been used in atmospheric sciences for quantifying the lack of information in climate systems [1, 19], for developing remote-sounding instruments [21, 22, 23, 20, 30], for data selection [20], and for quantifying the information content of observations [15, 2, 31, 32]. 1 Corresponding

author. Email address: [email protected]

1877-0509 © 2012 Published by Elsevier Ltd. doi:10.1016/j.procs.2012.04.113

1048

K. Singh et al. / Procedia Computer Science 9 (2012) 1047 – 1055

In this paper we discuss a characterization of the information content of observations in the context of four dimensional variational (4D-Var) data assimilation framework [4, 13, 3]. The computational procedure proposed here uses the posterior statistics of the variational cost function and its gradient to quantify the information content of observations. The paper is organized as follows. Section 2 reviews the variational approach to data assimilation from a Bayesian perspective. Various metrics for information content are discussed in Section 3. Section 4 develops computationally feasible estimation techniques for the information content of observations in the context of 4D-Var data assimilation; this is the main contribution of this work. The numerical results are presented and discussed in Section 5. Section 6 summarizes the findings of this work and points to future research directions. 2. Four Dimensional Variational Data Assimilation Data assimilation estimates the true state of a physical system by combining three different sources of information: the prior, the model, and observations. Variational methods solve the data assimilation problem in an optimal control framework [12, 17, 18]. Specifically, one finds the control variable values (e.g., initial conditions) which minimize the discrepancy between model forecast and observations; the minimization is subject to the governing dynamic equations, which are imposed as strong constraints. The prior, or the background state, xB ∈ n represents the current best estimate of the true state. The background estimation errors are usually assumed to have a normal distribution εB = xB − xtrue ∈ N (0, B), where B ∈ n×n is the background error covariance matrix. The model encapsulates our knowledge about physical laws that govern the evolution of the system. The model evolves an initial state x0 ∈ n at the initial time t0 to future state values xi ∈ n at future times ti , x i = M t0 → t i ( x 0 ) . (1) Observations represent snapshots of reality available at several discrete time moments. Specifically, measurements yi ∈ m of the true state are taken at times ti , i = 1, · · · , N yi = H (xi ) − εobs i .

(2)

The observation error term εobs accounts for both the measurement (instrument) errors as well as reprei sentativeness errors (i.e., errors in the accuracy with which the model can reproduce reality). Typically observation errors are assumed to be normally distributed εobs ∈ N (0, Ri ), and observation errors at i different times are assumed to be independent. Based on these three sources of information data assimilation computes the analysis (posterior) estimate A x , i.e., the best estimate when the new observations are taken into account. In strongly-constrained 4D-Var data assimilation all observations (2) at all times t1 , · · · , t N are simultaneously considered. The 4D-Var cost function measures the distance of the initial conditions to the background initial state xB0 , as well as the discrepancy between model predictions H(xi ) and data yi for all measurement times N

J (x0 ) = J B (x0 ) + J obs (x0 ) = J B (x0 ) + ∑ Jiobs (x0 ) , i =1

J B ( x0 ) = Jiobs (x0 ) =

T   1 x0 − xB0 B0−1 x0 − xB0 , 2 1 (H(xi ) − yi ) T Ri−1 (H(xi ) − yi ) . 2

(3)

K. Singh et al. / Procedia Computer Science 9 (2012) 1047 – 1055

1049

All terms are weighted by the inverses of the corresponding error covariances. The 4D-Var analysis is computed as the initial condition which minimizes (3) subject to the model equation constraints (1) xA 0 = arg min J ( x0 )

subject to(1).

(4)

The optimization problem (4) can be regarded as least squares fit problem in the deterministic setting. In the Bayesian interpretation when all densities are Gaussian, the cost function (3) is proportional to the negative logarithm of the analysis probability density

J (x0 ) ∝ − log P A (x0 ) ,

(5)

and (4) represents a maximum likelihood estimation problem. The numerical optimization (4) is usually solved via a gradient-based technique. The gradient of (3) reads    N  ∂xi T T −1 ∇J (x0 ) = B0−1 x0 − xB0 + ∑ Hi Ri (H(xi ) − yi ) (6) ∂x0 i =1 The 4D-Var gradient requires not only the linearized observation operator Hi = H  (xi ), but also the transposed derivative of future states with respect to the initial conditions. The 4D-Var gradient can be obtained effectively by forcing the adjoint (M )∗ of the linearized model (1) with observation increments, and running it backwards in time; the adjoint variables λ at time t0 are related to the gradient via   ∇J (x0 ) = B0−1 x0 − xB0 + λ0 (x0 ) . 3. Information Theoretic Metrics of Observation Impact The 4D-Var data assimilation changes the distribution of errors (uncertainty) in the initial conditions from the background probability density P B (x) to the analysis probability density P A (x). If the data assimilation is beneficial, the uncertainty associated with the new distribution P A is smaller than the uncertainty associated with the original distribution P B . Roughly speaking, the information content of the observations y is measured by the decrease in uncertainty from before data assimilation (P B ) to after data assimilation (P A ). The information content depends not only on the data (yi ) but also on the data accuracy (Ri−1 ), on the background uncertainty (B0−1 ), and on the model dynamics M. We are interested to rigorously quantify the information content of observations in 4D-Var. For this we use several information theoretic metrics, which are reviewed below. 3.1. Fisher information matrix The Fisher information matrix (FIM) [14] associated with the probability density function P (x) is defined as      ∂ (− ln P (x)) T ∂ (− ln P (x)) F (P ) = ) P (x) dx ∈ n×n . (7) n ∂x ∂x The trace of the FIM offers a measure of the total level of uncertainty associated with the distribution. Under the assumption that the background errors are normally distributed, the Fisher information matrix of the background error probability density P B (x) = N (xB0 , B0 ) is just the inverse of the background

error covariance: F P B = B0−1 .

1050

K. Singh et al. / Procedia Computer Science 9 (2012) 1047 – 1055

Using (7) and (5), the FIM associated with the analysis probability density is    F PA = [∇ J (x0 )] [∇ J (x0 )] T P A (x0 ) dx0 . n

(8)

The information content of the entire set of observations used in data assimilation can be measured as the difference between the traces of the background and analysis FIM [23, 22]:       , (9) I FIM = trace F P B − trace F P A with

    trace F P A =

n

∇ J (x0 )2 P A (x0 ) dx0 = EA ∇ J (x0 )2 .

(10)

The trace of the analysis FIM is the average value of the adjoint variable norm with respect to the analysis distribution. The statistical average can be approximated by an ensemble average. After the data assimilation has been performed, we run the forward and the adjoint models Nens times starting with forward initial conditions sampled from the analysis probability density. Each run produces an adjoint gradient, whose norm is computed. The ensemble average of these gradient norms estimates the trace of the analysis FIM via (10). 3.2. Shannon information The Shannon entropy [27] associated with a probability density is defined as

H (P ) =

 n

P (x) ln (P (x)) dx

and offers a measure of the average uncertainty with which one knows the state x, if the estimation error has a probability density P . The Shannon information content of observations y used in 4D-Var data assimilation is defined as the decrease in the average uncertainty with which the initial state is known. Specifically, the Shannon information content is given by the difference between the background entropy and the analysis entropy,       1 I Shannon = H P B − H P A = ln det A0−1/2 B0 A0−1/2 , 2

(11)

where the last equality holds when both the background and the analysis error probability densities are Gaussian. This term depends only on the reduction in covariance, and corresponds to the scaling factors of the Gaussian probability densities (i.e., the factors in front of the exponential). This information is ignored by the 4D-Var cost function, which uses only the exponent of the Gaussian density. Therefore, one cannot expect to obtain accurate estimates of the Shannon information content by mining the cost function information. Assuming that we have the ability to sample both the background and the analysis probability densities. Both B0 and A0 can be approximated by ensemble covariances from samples of Nens members. The Nens nonzero eigenvalues of these matrices give a rough approximation of the Shannon information content (11) as follows: 1 1 Nens  B A  . (12) ln λi /λi ln det B0 A0−1 = 2 2 i∑ =1 For a small number of ensemble members, the ensemble covariance eigenvalues may poorly represent the eigenvalues of the true covariances. In this case the resulting estimates of the Shannon information content are expected to be inaccurate.

K. Singh et al. / Procedia Computer Science 9 (2012) 1047 – 1055

1051

3.3. Degrees of freedom for signal The degrees of freedom for signal (DFS) [21, 15, 2, 28, 32] measures the total reduction in variance and is defined as     (13) I DFS = trace (In×n − Σ) = n − trace B0−1/2 A0 B0−1/2 = n − trace B0−1 A0 . To estimate this metric we assume that the model operator (1) is linear (Mt0 →ti = Mi ), the observation operator in (2) is also linear (H = Hi ), and both the background errors and the observation errors are normally distributed. Consider running the model with random initial conditions taken from the posterior distribution

x0 ∈ N xA 0 , A0 . Each run results in different values of the 4D-Var cost function; we are interested in understanding the information provided by the statistics of the ensemble of cost function values. Using the properties of the quadratic functions of Gaussian random variables we can show that EA J B ( x0 ) x0 ) EA J obs (

  1 1/2 −1 1/2 = J B ( xA 0 ) + trace A0 B0 A0 2   n 1   −1 1/2 = J obs xA − trace A1/2 B A 0 + 0 0 0 2 2 n A A x0 )] = J (x0 ) + . E [J ( 2

The DFS information content (13) of all observations y1 · · · y N is:     1/2 −1 1/2 A obs obs

xA . x IyDFS = n − trace A B A = 2 E J − 2 J ( ) 0 ··· y 0 0 0 N 1

(14)

The contribution of the data point yi to the DFS information is:   A obs obs

xA − 2 J x IyDFS = 2 E J ( ) 0 0 i i i    T     −1 A xi ) − yi ) T Ri−1 (Hi ( xi ) − y i ) − Hi xA = EA (Hi ( − y R H − y x i i i . 0 0 i 3.4. Signal information content The signal information content of all observations is defined as Signal

Iy

= J B ( xA 0)=

T   1  A B B0−1 xA x0 − xB0 0 − x0 . 2

We now try to estimate the contributions of individual observations. Assume a linear model, linear observation operators, and Gaussian uncertainties. Denote 1 b = MT HT R− 



y − H M xB0

1 D = MT HT R−  H M ,



A0−1 =

,

b=

N

∑ b ,

=1

N

∑ D .

=1

Here A0−1 is the Hessian of the 4D-Var cost function, A0 is the analysis covariance matrix at time t0 , and b = −∇J is the negative gradient of the 4D-Var cost function. The 4D-Var problem is equivalent to solving the linear system   B A0−1 · xA − x 0 0 = b.

1052

K. Singh et al. / Procedia Computer Science 9 (2012) 1047 – 1055

In addition, we consider the 4D-Var problem of assimilating all the data except y . The second problem leads to another analysis x0C via     A0−1 − D · x0C − xB0 = b − b .   A0−1 · x0C − xB0

  = b − b + D · x0C − xB0   ≈ b − b + D · x0A − xB0

= ( I + D  A0 ) · b − b  We assume a case where there are many observations such that the contribution of b to the total right hand side vector is relatively small, b − b ≈ b, and the contribution of D to the total inverse covariance is relatively small, A0−1 − D ≈ A0−1 . The following approximations are obtained:     C A0−1 · x0C − xB0 ≈ b − b , A0−1 · xA 0 − x0 ≈ b  . The difference in the signal part due to the assimilation of observation y is Signal

I y

T   1  T   1  A B x0 − xB0 x0C − xB0 B0−1 xA B0−1 x0C − xB0 0 − x0 − 2 2 T   1  −1 A C −1 B −1 C B A0 B0 A0 A0−1 (xA A0 ( x0 − x0 ) 0 − x0 ) + A0 ( x0 − x0 ) 2 1 (b ) T A0 B0−1 A0 (2b − b ) 2 bT A0 B0−1 A0 b  T   1 −1 A B y − H M xB0 x . R− H M A B − x 0   0 0 0 

= = ≈ ≈ =

Let   x0A = A0 B0−1 xA x0B = A0 B0−1 xB0 , 0 ,   B xA − H  xB . H M A0 B0−1 xA 0 − x0 ≈ H  

(15)

The contribution of measurement y to the signal information can therefore be approximated as: Signal

I y





  T      1 xB . R− H  y − H xB xA − H  

(16)

Two modified initial conditions are computed by (15). If this is not feasible, the background and the analysis initial conditions can be used, at the price of a larger approximation error. The model is run from A

x are recorded. The model is run again the modified analysis and the “synthetic observations” H  B

from the modified background and the “synthetic observations” H  x are also recorded. Finally, the model is run from the background state, and the estimate (16) is evaluated for each data point y . 4. Averages with respect to the posterior distribution The proposed estimates are based on computing expected values with respect to the analysis probability density.

K. Singh et al. / Procedia Computer Science 9 (2012) 1047 – 1055

1053

The first idea is to sample (approximately) from the posterior error distribution in 4D-Var. Sampling from the posterior probability density at t0 is challenging since this probability density is not explicitly computed by 4D-Var. Approximate sampling can be performed using second order adjoints, and computing a few eigenvectors corresponding to the dominant eigenvalues of the inverse Hessian. One approach is to use the fact that the analysis covariance matrix is approximated by the inverse Hessian of the cost function evaluated at the optimum [29, 16]. A few eigenvectors corresponding to the dominant eigenvalues of the inverse Hessian are computed; they approximate the principal components of the posterior error and can be used for approximate sampling from the posterior distribution. An alternative approach is based on a subspace analysis of 4D-Var and is explained in [7]. This is the approach used in this paper. It provides the ability to obtain the following sample of initial conditions from the posterior distribution: x0r ∈ P A (x0 ) , r = 1, · · · , Nens . (17) Based on it we can approximate expected values of any functional f (x) with respect to the posterior density by posterior ensemble averages as follows: EA [ f (x0 )] ≈ f (x0 ) A =

1 Nens

Nens



r =1

f (x0r ) .

(18)

The second idea is to consider samples from the background distribution: q

x0 ∈ P B ( x0 ) ,

q = 1, · · · , Nens ,

(19)

and to calculate expected values with respect to the posterior density by weighted ensemble averages as follows:   obs ( xq ) Nens   exp J 0 q (20) EA [ f (x0 )] ≈ ∑ wq f x0 , wq = N

. ens obs ∑i=1 exp J (x0i ) q =1 5. Numerical Experiments In order to illustrate the estimates of various information metrics described in section 4, we consider a linear test case. The model is x k = M · x k −1 ,

k = 1, · · · , 4 ,

xk ∈

10

.

(21)

The model matrix M has eigenvalues log-equally distributed in the interval [10−2 , 102 ]. There are 5 eigenvalues greater than 1 (with the errors growing along the corresponding eigendirections) and 5 eigenvalues smaller than 1 (with the errors decreasing along the corresponding eigendirections). Observations of odd numbered states (1, 3, 5, 7, and 9) are taken at each step. The background errors are normal and characterized by a diagonal background covariance matrix; the standard deviation of the error in each component is 10% of the background mean value. The observation errors are assumed normal and independent of each other; the standard deviation of each observation error is 1% of the reference observation value. For this problem, analytical solutions are available for the analysis state xA 0 and for the analysis covariance matrix A0 . Based on them a direct evaluation of the different information metrics is possible. The results are summarized in Table 1 and show that the ensemble estimates of information metrics are accurate.

1054

K. Singh et al. / Procedia Computer Science 9 (2012) 1047 – 1055

Table 1: Results with the linear test problem (21). The Fisher information is estimated using equation (10), DFS using (14), Shannon using (12), and the signal using (16). Direct Nens = 10 Nens = 102 Nens = 103 Nens = 104

Fisher DFS Shannon Signal

2.001e+05 4.999e+00 2.234e+01 3.347e+00

1.923e+05 4.802e+00 1.998e+01+1.571i 3.347e+00

2.138e+05 5.336e+00 2.222e+01 3.347e+00

1.977e+05 4.934e+00 2.245e+01 3.347e+00

1.979e+05 4.942e+00 2.232e+01 3.347e+00

6. Conclusions and Future Work This paper employs several metrics from information theory to quantify the information content of data in the context of four dimensional variational (4D-Var) data assimilation. The ability to characterize the usefulness of different data points is important for analyzing the effectiveness of the assimilation system, for data pruning, and for the design of future sensor systems. The metrics of interest are the trace of the Fisher information matrix, the Shannon information, the relative entropy, the signal information, and the degrees of freedom for signal. Computationally efficient estimates for the information metrics are obtained under the assumptions that errors have Gaussian distributions, and that the model dynamics are observation operators are linear. The information content estimation approach is illustrated on a linear test problem. Future work will focus on applying the estimation methodology on nonlinear systems of interest in applications. We will seek to quantify the impact of nonlinearity, non-normality, approximation of posterior distributions, and small sample sizes, on the accuracy of the information content estimates. Acknowledgments This work has been supported in part by NASA through the ROSES-2005 AIST project, by NSF through the awards NSF CCF–0635194, NSF OCI–0904397, NSF CCF–0916493, NSF CMMI-1130667, and NSF DMS–0915047, and by the Computational Science Laboratory at Virginia Tech. References [1] Abramov, R. V. and Majda, A. J. (2004), Quantifying uncertainty for non-Gaussian ensemble in complex systems. SIAM Journal on Scientific Computing; 26(2), 411-447. [2] Cardinali, C., Pezzulli, S., Andersson, E. (2004), Influence-matrix diagnostic of data assimilation system. Quarterly Journal of the Royal Meteorological Societ; 130, 2767-2786. [3] Carmichael, G. R., Daescu, D. N., Sandu, A., Chai, T. (2003), Computational aspects of chemical data assimilation into atmospheric models. Computational Science - ICCS, 2003, PT IV, Book series title: Lecture Notes in Computer Science , 2660: 269-278. [4] Carmichael, G. R., Sandu, A., Chai, T., Daescu, D. N., Constantinescu, E. M., Tang, Y. (2008), Predicting air quality: Improvements through advanced methods to integrate models and measurements. Journal of Computational Physics, 227 (7): 3540-3571. [5] Chai, T. F., Carmichael, G. R., Sandu, A., Tang, Y., Daescu, D. N. (2006), Chemical data assimilation of Transport and Chemical Evolution over the Pacific (TRACE-P) aircraft measurements. Journal of Geophysical Research – Atmospheres, 111 (D2): Art. No. D02301. [6] Chai, T. F., Carmichael, G. R., Tang, Y. H., Sandu, A., Hardesty, M., Pilewskie, P., Whitlow, S., Browell, E. V., Avery, M. A., Nedelec, P., Merrill, J. T., Thompson, A. M., Williams, E. (2007), Four-dimensional data assimilation experiments with International Consortium for Atmospheric Research on Transport and Transformation ozone measurements. Journal of Geophysical Research – Atmospheres, 112 (D12): Art. No. D12S15. [7] Cheng, H., Jardak, M., Alexe, M. and Sandu, A. (2010), A hybrid approach to estimating error covariances in variational data assimilation. Tellus A. Vol. 62, Number 3, pp. 288-297(10).

K. Singh et al. / Procedia Computer Science 9 (2012) 1047 – 1055

1055

[8] Constantinescu, E. M., Sandu, A., Chai, T. F. (2007), Ensemble-based chemical data assimilation. I: General approach. Quarterly Journal of the Royal Meteorological Society, 133 (626): 1229-1243 Part A. [9] Constantinescu, E. M., Sandu, A., Chai, T. F., Carmichael, G. R. (2007), Ensemble-based chemical data assimilation. II: Covariance localization. Quarterly Journal of the Royal Meteorological Society, 133 (626): 1245-1256 Part A. [10] Constantinescu, E. M., Sandu, A., Chai, T. F., Carmichael, G. R. (2007), Assessment of ensemble-based chemical data assimilation in an idealized setting. Atmospheric Environment, 41 (1): 18-36. [11] Constantinescu, E. M., Chai, T. F., Sandu, A., Carmichael, G. R. (2007), Autoregressive models of background errors for chemical data assimilation. Journal of Geophysical Research – Atmospheres, 112 (D12): Art. No. D12309. [12] Courtier, P., and Talagrand, O. (1987), Variational assimilation of meteorological observations with the adjoint vorticity equations Part 2: Numerical results. Quarterly Journal of the Royal Meteorological Society; 113, 1329-1347. [13] Daescu, D., Carmichael, G. R., Sandu, A. (2000), Adjoint implementation of Rosenbrock methods applied to variational data assimilation problems. Journal of Computational Physics, 165 (2): 496-510. [14] Fisher, R. A. (1922), On the mathematical foundations of theoretical statistics. Philosophical Transactions of the Royal Society of London; Series A, 222, 309-368, URL: http://www.jstor.org/stable/91208. [15] Fisher, M. (2003), Estimation of entropy reduction and degrees of freedom for signal for large variational analysis systems. ECMWF Technical Memoranda; 397. [16] Gejadze, I. Y., Le Dimet, F. X., and Shutyaev, V. (2008), On analysis error covariances in variational data assimilation. SIAM Journal on Scientific Computing; 30(4), 1847-1874. [17] Le Dimet, F. X. and Talagrand, O. (1986), Variational algorithms for analysis and assimilation of meteorological observations: theoretical aspects. Tellus A, v.38, 97–110. [18] Lions, J. L. (1971), Optimal control of systems governed by partial differential equations. Springer-Verlag. [19] Majda, A. J. and Wang, X. (2006), Nonlinear dynamics and statistical theories for basic geophysical flows. Cambridge University Press. [20] Rabier, F., Fourrie, N., Chafa, D., and Prunet, P. (2002), Channel selection methods for Infrared Atmospheric Sounding Interferometer radiances. Quarterly Journal of the Royal Meteorological Society; 128, 1011-1027. [21] Rodgers, C. D. (1996), Information content and optimization of high spectral resolution measurements. Optical Spectroscopic Techniques and Instrumentation for Atmospheric and Space Research, SPIE Volume 2830, 136-147. [22] Rodgers, C. D. (1998), Information content and optimization of high spectral resolution measurements. Advances in Space Research; 21, 361-367. [23] Rodgers, C. D. (2000), Inverse methods for atmospheric sounding: Theory and Practice. World Scientific: Singapore. [24] Sasaki, Y. K. (1958), An objective analysis based on the variational method. Journal of the Meteorological Society of Japan; II(36), 77-88. [25] Sandu, A., Daescu, D. N., Carmichael, G. R., and Chai, T. (2005), Adjoint sensitivity analysis of regional air quality models. Journal of Computational Physics; Volume 204, 222-252. [26] Sandu, A. and Zhang, L. (2008), Discrete second order adjoints in atmospheric chemical transport modeling. Journal of Computational Physics; 227(12), 5949-5983. [27] Shannon, C. E. and Weaver, W. (1949), The mathematical theory of communication. University of Illinois Press, Urbana, IL.. [28] Stewart, L. M., Dance, S. L., Nichols, N. K. (2008), Correlated observation errors in data assimilation. International Journal for Numerical Methods in Fluids; 56, 1521-1527. [29] Thacker, W. C. (1989), The role of the Hessian matrix in fitting models to measurements. Journal of Geophysical Research; 94(C5), 6177-6196. [30] Worden, J. R., Bowman, K. W., and Jones D. B. A. (2004), Characterization of atmospheric profile retrievals from Limb Sounding Observations of an inhomogeneous atmosphere. Journal of Quantitative Spectroscopy & Radiative Transfer; 86,(03)00274-7. [31] Xu, Q. (2006), Measuring information content from observations for data assimilation: relative entropy versus Shannon entropy difference. Tellus, A., 198-209. [32] Zupanski, D. (2009), Information measures in ensemble data assimilation. Data Assimilation for Atmospheric, Oceanic and Hydrologic Applications.