Adv. Space Res. Vol. 12, No.7, pp. (7)193—(7)200, 1992 Printed in Great Britain.
0273—1177/92 $15.00
1992 COSPAR
VARIATIONAL DATA ASSIMILATION IN METEOROLOGY USING THE ADJOINT METHOD J. Lewis~!~ and J. Bao** Meteorological Research, National Severe Storms Laboratory, NOAAIERL, 1313 Halley Circle, Nonnan, OK 73069, U.S.A.(and University of Oklahoma, Norman, OK 73019, U.S.A.) ** Department of Meteorology, Pennsylvania State University, University Park, PA 16802, U.S.A. *
ABSTRACT Variational data assimilation using the adjoint method is being pursued in the Norman, Oklahoma, meteorological community by a concerted effort between meteorologists, computer scientists and mathematicians. A series of experiments are under way that use some of the simplest atmospheric models still useful operationally (barotropic model) and more complex models (dry nonhydrostatic model). The effort is aimed at determining the strengths and weaknesses of the adjoint method and its potential for operational use. Of particular interest to us is the potential use of this method for the operational track prediction of hurricanes using a barotropic model with a sophisticated data assimilation scheme. We also want to know if this approach will prove useful in the determination of the three component wind field in the field of view of a single Doppler radar, i.e., a radar that measures only the radial component of the wind field. We report on progress with these models and use a simple example to give the reader some idea of both the mechanics and assumptions inherent with this method. BACKGROUND With the upsurge in new observing systems for mesoscale meteorology such as the Next Generation Radar Network (NEXRAD) and the wind profiler demonstration network, there has been increased emphasis on the assimilation of these data into numerical prediction models. One of the most promising strategies is the so called adjoint method /1,2,3/. This adjoint method is a strategy for finding the optimal initial state of the prediction model that minimizes the discrepancy between the forecast and the corresponding data collected during the model evolution. The functional that is minimized is a scalar that is generally given by the sum of squared differences between the model state and the corresponding observations. The gradient of this functional with respect to the initial state parameters is formed by an iterative procedure that involves the forward integration of the model over the time period where observations are available and a backward integration of the adjoint model (of similar complexity to the forward model). With a cycle of one forward and one backward integration, the gradient of the functional is found. But to find the minimum, the procedure further requires determination of a stepsize along which the gradient is followedin effect a steepestdescent type process. To find the stepsize, another forward integration of the model is made and, consequently, the equivalent of three model integrations per iteration are required. For the simplest atmospheric models, such as that used by Talagrand and Courtier /2/, 1520 iterations are generally necessary to find the optimal state. Consequently, about 50 model integrations are needed to get the initial conditions for the model. As the models are made more complex, it is only reasonable to assume that the number of iterations will increase; this will be especially true for the mesoscale models that include water substance conversion and associated latent heat release. These processes tend to be modeled or parameterized
Author is Head of Working Group on Data Assimilation for the National Science Foundation’s Center for the Analysis and Prediction of Storms (CAPS). The work presented involves the work of the author and the following members of the CAPS: Professor S. Lakshmivarahan (Computer Science, OU); Professor Luther White (Mathematics, OU); Dr. Hartmut Kapitza (NSSL and OU postdoctorate fellow); Mr. Yong Li (Ph.D. candidate, OU Meteorology). 1
(7)193
J. Lewis and J. Bao
(7)194
in a discontinuous fashion and will add complicated structure to the cost functional surface in the space of initial conditions. At the Center for the Analysis and Prediction of Storms (CAPS) at the University of Oklahoma, a major effort is under way to implement the adjoint method of data assimilation on a variety of meteorology models. We are testing two models, the barotropic model that is still used to forecast the largescale winds associated with hurricane tracking, and a nonhydrostatic dryconvection model that is used to forecast winds in the vicinity of smaller scale fronts such as dry lines. Because of the large computational overhead with these methods, it is imperative that the most efficient use of largescale computing is achieved. Furthermore, because of the demands related to iteration toward a minimum, the development of alternate strategies is paramount. Accordingly, we combined the efforts of applied mathematics, computer science, and meteorology to address these questions. THE ADJOINT METHOD APPLIED TO A SIMPLE NONLINEAR MODEL The available literature on this subject is still sparse, and even the standard texts on the calculus of variations address the standard approach to finding the EulerLagrange equations and generally do not cover material related to the iterative methods of finding extremes. Thus, a relatively simple example will go far in outlining the methodology. We choose the oftstudied equation of Burgers, which is examined exhaustively in the paper by Platzman /4/. To simplify even further, we look at the truncated spectral solution with only two components: U(x,t)
=
u1(t)sin x
u2(t) sin 2x
+
where the general solution u(x,t) satisfies u~+uu~=Oand
u=sinxatt=O.
The truncated spectral system that is associated with Burgers’ equation is du1
~ ~U1U2
=
=
dt
(1)
~u~ 2 1
(2)
In this hypothetical problem we seek to find the initial state [u1(o) and u2(o)) that minimizes the cost functional J: 
Lx
where X

2>dt
=
~ is the “observed” amplitude in [0,1],and <.,.> is the inner product. the equations above, written in vector form as =
(3)
.7(X)
X must satisfy
(4)
Data Assimilation inMeteorology Using the Adjoint Method
.! ~1
.7(x)
where
(7)195
;U
=
(5)
12
The classic variational approach leads to the following formulation: Lagrangian (if) where
we minimize the
f{)~~(~~ +
=
T
du1
(6)
1

+ f{.~’2—~
2
.u1)}dt
+
J
Proceeding through the mechanics of integration by parts and setting ÔL
0
=
we get 1 ..~A
~l
~
dt
o
_~.~1
A=x—i
(7)
with A(o)
=
A(T)
=
0 and A
(A1)
=
These are the EulerLagrange equations and, solved in conjunction with (4), give the optimal solution. The conditions on A at t 0 and t T were chosen, otherwise the solution would not allow adjustment to X at t 0 or t 1. =
=
=
=
Now, to discuss the alternate strategy of finding the minimum to J under constraint by iteration, it is useful to talk about the grad J or gradient of the cost functional J. For deterministic models such as Burgers’ equation, the evolution of the field X is determined by the initial state, U X(o), and the boundary condition. In our simple example, the boundary conditions are assumed to be correct and known throughout the assimilation interval, so that the solution and accordingly the functional J are controlled by U alone. We call U the control variable for the problem. =
Grad J is now defined as VJ
=
Grad ~
=
äu1(o)
(8)
au2 (o)
where U
=
(~‘~) =
X(o)
and J can be thought of as a scalar in the space of u1(o) and u2(o). In an iterative method, we make an estimate of the optimal initial state and then calculate the grad J at that point. Once obtained, a new estimate is found by marching down the grad J direction
(7)196
J.LewisandJ.Bao
by a certain amount, called the stepsize. To find the stepsize is not at all trivial and involves another integration of the dynamics, as stated in the background section. We will forego the explanation of this process in the paper since the primary concern is to understand the rudiments of adjoint method which is basically the method of finding grad J. To proceed, we must define the directional derivative and call it 8~O where ~öis a function of X and we wish the derivative in the direction H, then = Urn ijr(X + ~ H)  ~p(X) (9)
Let us consider X(U) as a solution of the governing dynamical constraint, =~7(X).
(10)
Then for a given perturbation H on U, the directional derivative of X in direction H is given by the solution of the equation, (11)
with 8 X (0)
=
H.
This can be best understood by writing the solution to the equations as the sum of a basic state [whichis a solution to the nonlinear dynamics and denoted by ( )J and a perturbation. That is, =
+
=
+
(12)
and dii
1
1—— =
1U2
dU2 dt

2
(13)
1
Upon substitution of (12) into (10), using (13), we get (14)
~ÔX=
where it can be seen that the matrix is the Jacobian 3.W3X as indicated in (11). The derivation of (14) used the limiting process defined by (9), and accordingly the secondorder terms vanish. By the same process, we find that ôJ=
JfT
dt
(15)
2 Jo and by definition, ÔJ=
ÔX(0)>.
(16)
To find grad J, which is germane to the method, it is necessary to manipulate (15) so that instead of SX as part of inner product, we have 8X(0). Intuitively, one can see that a
Data Assimilation in Meteorology Using the AdjointMethod
(7)197
series of “back substitutions” could accomplish this task since each 8X is ultimately related to 8X(0). To accomplish this goal, we form the scalar product of (11) with an arbitrary vector P and integrate over [0,T]. JTdt = fT
We integrate by parts, <ôx(T),p(T)>
fT<ô~ dP>
=
— fT<8~?~P>dt
(18)
where ( )T indicates the transpose matrix. Since we are trying to capture 6X(O), it is advisable to get rid of first term in (18), and since we do not want to restrict ourselves to cases where the adjustment at T is zero, we choose P(T)
0.
=
(19)
Then —<ÔX(o)
,
P(0)>
=
f
dPa7T dt.
(20)
We then require P to satisfy (21)
where P(T)
=
0.
Then
< —8x(0)
,
P(0)>
=
f
—
i>dt
(22)
=
=
From this it can be seen that grad J
=
P(O).
Equation (21) is called the adjoint because the elements of the matrix are the transpose of the matrix that m~vedthe perturbation forward in time. See Friedman /,,5/, for example, where the operator M is said to be adjoint to operator M, if = , and for real matrices, the adjoint matrix is the transpose. If we look at the EL equations in (7) it is clear that the adjoint equation (21) is identical to (7) except for the notational difference. They have identical form but it is important to realize the following: (1) The EL equations require A(T) = A(0) = 0, whereas the grad J given by P(0) is generally nonzero; i.e., only at the minimum will P(O) = 0. Thus, there is a consistency between the developments. Also, recall that the solution to the adjoint required P(T) = 0. (2) The adjoint equation uses the current estimate of the model state, ~(t), the solution to the nonlinear dynamics as coefficients in the matrix, whereas the EL equations assume a state at the minimum.
(7)198
J. Lewis andJ. Bao
What is not clear from this simple example is that finding grad J by this approach is extremely efficient: the equivalent of only two model integrations no matter how many components are in the control vector X(0). A “brute force” way of finding grad J would be to perturb one component of X(0) and see how it affected J and thereby get one component of grad J. One model integration per component is required by this approach. With two components for the control vector, the cost is the same. But with a realistic atmospheric model with thousands of components, the advantages are obvious. APPLICATION OF A SIMPLE NONHYDROSTATIC MODEL AND ITS ADJOINT TO DATA ASSIMILATION A threedimensional, nonhydrostatic, Boussinesq model and its adjoint developed by Wolfsberg /6/ and Sun /7/ are used to test dry convection in simulated experiments. The aim of these tests is to look into the possibility of reconstructing the initial state of such a model with fragmentary information on the basic meteorological variables. We are especially interested in the case where only horizontal wind data are available during the assimilation period. In the experiments to be described, a set of simulated observations are generated from a resting initial state with a superimposed temperature perturbation. The evolution of the convection caused by the initial temperature perturbation is considered to be the “truth” or control state. The basic variables of the model are V(u,v,w), 8 and 71, i.e., velocity, the perturbation potential temperature and the perturbation pressure divided by the reference density, respectively. In the control state, the initial temperature perturbation is 1°Kand confined to 9 grid points in a domain of 19x19x17 grid points. The horizontal grid length is lOOm and the vertical grid length is 88m. The time step is 20 sec. In all of the experiments we assume that data are available at the first 10 time steps after initial time. The guess initial temperature is zero. Table 1 briefly describes the experiments. Table 1
Description of Experiments
Exp. 1
0

200 secs
u
Fig. 2
Exp. 2
0

200 secs
u,v
Fig. 3
Exp. 3
0

200 secs.
u,v,w
Fig. 4
Exp. 4
0

200 secs.
u,v,w,8
Fig. 5
Figure 1 shows vertical cross section of initial 0 for control simulation. By comparing it with Figure 2, we can see that using single wind component, u, is able to retrieve the initial temperature perturbation in terms of the structure and location. However, the intensity of the retrieved temperature perturbation is about 0.7K. Using more wind components (v,w) we will improve the retrieval of the intensity a little (see Figure 3 and Figure 4). Only when both of the velocity and temperature are used, the initial temperature perturbation can be retrieved completely (see Figure 5). Therefore, the results of these tests give rise to the following question which needs further research: how many independent variables at least are necessary to retrieve full threecomponent wind and temperature fields that are closest to the observation? There is another question related to the above one that we have not looked at in detail. Since in reality the observed variables only provide fragmentary information about the state of atmosphere, we need temporal information on the evolution of the state. How many time levels are needed to give the best retrieval? Is that true that the more the time levels, the better? What is the maximum time; interval for the time levels? WORK ON PARALLEL PROCESSING OF COMPUTER CODE FOR ADJOINT DATA ASSIMILATION In both models mentioned earlier, the barotropic model and the anelastic nonhydrostatic model, a primary ingredient in the computations is the solution of elliptic equations, for the streamfunction in the barotropic model and for the pressure in the nonhydrostatic model. The corresponding adjoint codes involve these same elliptic operators and are the
Data Assimilation inMeteorology Using the Adjoint Method
(7)199
chief bottleneck in the computational time. To alleviate this timedrain, we have made an effort to design efficient elliptic solvers for parallel processing machinery. To date, we have tried 3 versions of the block cyclic reduction (8CR) algorithm based on polynomial factorization, partial fraction expansion and rational approximations. Each of these versions is implemented using (a) the vectororiented LU decomposition method and (b) the scaler cyclic reduction method. It is found that 8CR with first degree rational approximation using the LU decomposition method performs better than that using the scaler cyclic reduction. In the other two cases of BCR, the scaler cyclic reduction based implementation performs better than the ones using the vectororiented LU decomposition. All our timings are based on the Alliant FX/8 at Argonne National Laboratory. SUMMARY We are in the process of trying the adjoint data assimilation strategy on the case of Hurricane Elena for 1985 using a barotropic model. The nonhydrostatic model discussed in this paper is currently being tested on the dryline phenomenon. We are also hopeful that an estimate of the computational cost will indicate whether this strategy has promise for more complicated models. REFERENCES 1.
F.X. LeDimet and 0. Talagrand, Variational algorithms for analysis and assimilation of meteorological observations: Theoretical aspects, Tellus, 38A, 97110 (1986).
2.
0. Talagrand and P. Courtier, Variational assimilation of meteorological observations with the adjoint vorticity equation. Part I: Theory, U J. Royal Meteor. Soc., 113, 13111328 (1987).
3.
W.C. Thacker and B.R. Long, Fitting dynamics to data, J. Geophys. Res., 93, 12271240 (1988).
4.
G.W. Platzman, An exact integral of complete spectral equations, Tellus, 21, 422431 (1964).
5.
B. Friedman, Principles and Technigues of Applied Mathematics, John Wiley and Sons, 1956.
6.
D.G. Wolfsberg, Retrieval of threedimensional wind and temperature fields from singleDoppler radar data. Ph.D. dissertation, University of Oklahoma, 91 pp. CIMMS report No. 84, Cooperative Institution for Mesoscale Meteorology Studies, 815 Jenkins, Norman, OK 73019 (1987).
7.
3. Sun, D.W. Flicker, and D.K. Lilly, Recovery of threedimensional wind and temperature fields from singleDoppler radar data, J. Atmos. Sci., in press (1990).
JASR 12:7—N
1. Lewis andJ. Baa
(7)200
111111
III
liii
Ill_I
•
.

11111
liii
H
L 0
H 0
0
L
L 0
0.
Figure 1.
I
—
ii
Figure 2.
L 0.000 H H .000 t..001 001
I
Iii
I
I
• .~I~eeI~
I
•
I ~
I
I
I
I
L .061 ~
Figure 3.
I
I
I
I
I
.~oo
I
L~
•
H .000
I
I
III
I
—
L
~‘
~
.061 ~r~rT
r
Figure 4.
I
I
Figure 1. Vertical cross section of perturbation potential temperature I
initial •
I
L —.003
L 0.000 H .000 H .001
H .001
•
III
o~ H .000
~
H .000
•
.c~o
I
for control simulation. The interval of the isotherms is 0.06K. Figure 2. Retrieved perturbation potential temperature at initial time in Exp. 1. The interval of the isotherms is 0.06K. (Max. temp. = 0.7K.) Figure 3. The same as Fig.2 for Exp. 2. (Max.
_____
L
mum
_____
I
Figure 5.
L
001115
=
0.72.)
Figure 4. The same as Fig.2 for Exp. 3. (Max. = 0.76.) Figure 5. The same as Fig.2 for Exp. 4. (Max. = 1.00.)