(Non)Uniqueness of critical points in variational data assimilation

(Non)Uniqueness of critical points in variational data assimilation

Physica D 300 (2015) 34–40 Contents lists available at ScienceDirect Physica D journal homepage: www.elsevier.com/locate/physd (Non)Uniqueness of c...

390KB Sizes 0 Downloads 52 Views

Physica D 300 (2015) 34–40

Contents lists available at ScienceDirect

Physica D journal homepage: www.elsevier.com/locate/physd

(Non)Uniqueness of critical points in variational data assimilation Graham Cox Department of Mathematics, UNC Chapel Hill, Phillips Hall CB #3250, Chapel Hill, NC 27599, United States

highlights • • • •

Bayesian formulation of variational assimilation for quasilinear equations. Uniqueness of minimizers for small observational times. Uniqueness of minimizers for small prior covariance. Existence of critical points with large Morse index.

article

info

Article history: Received 30 September 2013 Received in revised form 28 November 2014 Accepted 18 January 2015 Available online 24 January 2015 Communicated by E.S. Titi Keywords: Variational data assimilation Inverse problems Quasilinear evolution equations

abstract In this paper we apply the 4D-Var data assimilation scheme to the initialization problem for a family of quasilinear evolution equations. The resulting variational problem is non-convex, so it need not have a unique minimizer. We comment on the implications of non-uniqueness for numerical applications, then prove uniqueness results in the following situations: (1) the observational times are sufficiently small; (2) the prior covariance is sufficiently small. We also give an example of a data set where the cost functional has a critical point of arbitrarily large Morse index, thus demonstrating that the geometry can be highly nonconvex even for a relatively mild nonlinearity. © 2015 Elsevier B.V. All rights reserved.

1. Introduction An important problem in data assimilation is to estimate the initial state of a physical system given only noisy, incomplete observations of the state at later times. To make this precise, suppose y(t ) solves an evolution equation yt = F (y) in some function space V and the observations of the state are given by a bounded linear operator H: V → Rq . Given observations z1 , . . . , zN ∈ Rq at times t1 < · · · < tN , one would like to find the initial condition u = y(0) that best matches the empirical data. Of course it is important to carefully formulate what is meant by the ‘‘best’’ initial condition, to ensure that the problem is well-posed and has a physically meaningful solution. A common approach, which we adopt in this paper, is to minimize the cost functional N 2 1 1   −1/2 R J (u) := ∥u − u0 ∥2V (Hy(ti ) − zi ) + 2 i =1 2σ 2

http://dx.doi.org/10.1016/j.physd.2015.01.003 0167-2789/© 2015 Elsevier B.V. All rights reserved.

T

 (1)

for some fixed u0 ∈ V and σ > 0. It can be shown through standard variational methods that J admits a minimizer (see [1,2] for

E-mail address: [email protected]

details). However, if the forward problem is nonlinear J may fail to be convex and so uniqueness is not guaranteed. It is common practice (see, for instance, [3–6]) to solve a suitable discretization of the regularized variational problem using a gradient-based algorithm. Implicit in the implementation of such an algorithm is the assumption of a unique minimizer for the variational problem—gradient descent methods are of course local and do not have the ability to distinguish between local and global minima. However, while widely recognized as a difficulty inherent in the nonlinear case (cf. [7,8]), the problem of uniqueness has so far received little attention in the literature. A short-time uniqueness result for Burgers’ equation appeared in [2] under the assumption of continuous-in-time observations, using the cost functional

|Hy(t ) − z (t )|2 dt + 0

1 2σ 2

∥u − u0 ∥2V .

There it was proved that the variational problem admits a unique minimizer when the maximal observation time T is sufficiently small, by showing that critical points are fixed points of a map (cf. (13)) that is a contraction when T is small. This turns out to be quite different from the case of discrete-time observations considered in the current paper—in the continuous case the data term becomes

G. Cox / Physica D 300 (2015) 34–40

negligible as T → 0, whereas in the discrete case the data will always have a nontrivial contribution to the regularized cost function, no matter how small the observational times. The discrete-time problem was investigated numerically in [9], where a unique minimizer was observed as long as σ > 0. For the non-regularized (σ = 0) case—corresponding to an improper prior in the Bayesian formulation—multiple minimizers were found numerically. An interesting approach to uniqueness appeared in [10], in which a so-called curvature × size condition was used to establish uniqueness for the class of ‘‘weakly nonlinear inverse problems’’ (which includes the initialization problem for semilinear evolution equations). A current summary of these geometric methods can be found in [11]. It was shown there that the initialization problem admits a unique solution provided one is given an H 1 observation of the solution (in both space in time). While geometrically appealing, these methods seem less suited to situations where only partial information is available. The variational problem for (1) can be viewed as a Tikhonov regularization of the log-likelihood functional u →

N   −1/2 2 R (Hy(ti ) − zi ) , i =1

where R is the observational covariance matrix and y(ti ) is the solution to the evolution equation with initial condition u, evaluated at time ti . Such regularizations for linear problems are well-established; some classical sources are [12–15] and a modern overview can be found in [16]. The non-regularized problem is ill-posed, in the sense that it does not necessarily possess a minimizer in the space V . Ill-posed inverse problems are well-represented in the literature, with classic examples being the determination of a diffusion coefficient for the heat equation, or a source term for a semilinear reaction–diffusion equation. By definition these problems have one of the following properties: (1) no solutions exist; (2) a solution exists but is not stable with respect to perturbations of the data; (3) multiple solutions exist. In the literature one can find many results of existence [7,17,18]; uniqueness [19]; existence and uniqueness [20,21]; uniqueness and stability [22]. A recent survey of some analytic uniqueness results can be found in Chapter 6 of [23]. However, the question of uniqueness for regularized, nonlinear variational problems seem to be less well studied. There is a Bayesian interpretation of (1) in which the ∥ · ∥V term corresponds to a prior distribution with covariance proportional to σ 2 and minimizers of J correspond to modes of the posterior distribution, hence the variational problem for J admits multiple minimizers precisely when the posterior distribution is multimodal. This interpretation also provides a useful interpretation of the parameter σ , and suggests how it should be chosen based on prior information. The goal of this paper is to describe this Bayesian formalism for a family of quasilinear evolution equations (which includes reaction–diffusion equations and viscous conservation laws) and determine sufficient conditions to guarantee unimodality of the resulting posterior distribution. We emphasize that our methods only assume finite data at each observational time (for instance, projection onto the first N Fourier modes), rather than complete L2 or H 1 knowledge of the state. 1.1. Some notation and conventions Throughout we denote the L2 (0, 1) norm and inner product by ∥ · ∥ and ⟨·, ·⟩, respectively. We let V := H01 (0, 1), with norm ∥u∥V := ∥ux ∥. This is equivalent to the standard H 1 (0, 1) norm, because

35

π ∥u∥ ≤ ∥ux ∥ for any u ∈ H 1 (0, 1). It is well known that H01 (0, 1) ⊂ L∞ (0, 1), with sup |u| ≤ ∥ux ∥. We will frequently make use of the inequality between the arithmetic and geometric means, 2ab ≤ λa2 + λ−1 b2

(2)

for any positive a, b and λ, which we refer to as the AM–GM inequality. 2. Statement of results For the remainder of the paper we consider a quasilinear parabolic equation yt + f (y)x = yxx + r (y)

(3)

on the interval [0, 1], with Dirichlet boundary conditions. We make the standing assumption that f and r are both of class C 2 . This is more than sufficient to guarantee that the initial value problem for (3) is well-posed, as will be seen in Proposition 3. The additional regularity is needed in computing the first and the second variation of the cost functional. We also need to ensure that the initial value problem admits a global (in time) solution for any initial value, so that J is well-defined on all of H01 . This will be the case if



0

−∞





1

|r (y)| + 1

dy = 0

1

|r (y)| + 1

dy = ∞.

(4)

If this condition is not satisfied, there may exist initial conditions for which the solution blows up in a finite amount of time. We also assume that the observation operator H is bounded on L2 , and hence has a bounded adjoint H ∗ : Rq → L2 . Our first result is that the problem has a natural Bayesian formulation with respect to a Gaussian prior distribution, the significance of which will be discussed in Section 5. This requires an additional regularity assumption on r and f that will not be needed elsewhere in the paper. We let ∆D denote the Dirichlet Laplacian on L2 (0, 1)—that is, the unbounded, selfadjoint operator with domain 1 2 H 2 (0, 1) ∩ H01 (0, 1) and bounded inverse denoted by ∆− D : L (0, 1) 1 2 → H (0, 1) ∩ H0 (0, 1). Theorem 1. Let µ0 denote the Gaussian measure on L2 (0, 1) with 1 covariance C0 = −σ 2 ∆− D and mean u0 , and suppose that r (y) and ′ f (y) are uniformly Lipschitz. Given observations {zi }Ni=1 with i.i.d. N (0, R) Gaussian noise, there is a well-defined posterior measure µz , with Radon–Nikodym derivative dµz d µ0



 N   −1/2 2 R (u) ∝ exp − (Hy(ti ) − zi ) .

(5)

i =1

Moreover, the mean and covariance of the posterior distribution are continuous functions of the data, z = {zi }. In fact, one has that the posterior measure is Lipschitz with respect to the Hellinger metric; the reader is referred to [1] for further details. The nontriviality of this result is due to the infinitedimensional setting of the problem. Because there is no analog of the Lebesgue measure for infinite-dimensional spaces, one cannot define the posterior measure using the exponential of J as a density, as is done in finite dimensions. Thus it is necessary to define the posterior relative to the prior distribution, and care must be taken to ensure that this density, given by (5), is in fact µ0 -integrable and hence can be normalized. This normalizability will follow from estimates on solutions to the nonlinear evolution equation. There is thus a Bayesian formulation of the regularized variational problem, for which the MAP (Maximum A Posteriori) estimators are precisely the global minima of the cost functional (1).

36

G. Cox / Physica D 300 (2015) 34–40

This correspondence, and even the precise definition of a MAP estimator, is nontrivial for infinite-dimensional problems, because the posterior does not have a density with respect to a background Lebesgue measure, as described above. However, it was shown in [24] that the MAP estimators can be rigorously defined by comparing posterior measures of metric balls as their radii go to zero, and these estimators precisely correspond to minimizers of (1). With this framework in mind, we study the uniqueness and non-uniqueness of minima for J (u). We assume throughout that the data are uniformly bounded, with

|zi | ≤ D

(6)

for all i. While this is the case in any practical observational situation, it is not consistent with the Bayesian formulation of Theorem 1, which assumes normally distributed errors. However, as noted in the introduction, the Bayesian formulation is not necessary for the following uniqueness results, but is only intended as a guiding framework in which to interpret the results and choose the regularization parameter. Our first result is that J has a unique minimizer when all of the observational times are sufficiently small. Theorem 2. There is a constant T0 , depending on N , D, ∥u0 ∥ and σ , such that (1) has a unique global minimizer in H01 (0, 1) if tN < T0 . The time T0 also depends on the observation operator H and the observational covariance R, but we consider these to be fixed throughout, and hence will not explicitly note this dependence. We will similarly not mention any dependence of constants on the functions f and r in (3), though this dependence can easily be deduced from the proofs if desired. The theorem is proved in Section 6 by first observing that all potential minimizers are contained in a fixed ball B ⊂ H01 (0, 1), then showing that the cost functional is convex in B as long as the observational times are small enough that nonlinear effects are not yet dominant. This differs from the uniqueness result in [2] because in the discrete-time case there are non-vanishing contributions to the cost functional even as tN → 0, whereas in the continuous case the observational term T



|Hy(t ) − z (t )| dt 2

0

vanishes in the T = 0 limit. For this reason we need to compute the second variation of the cost functional. In the continuous case the Euler–Lagrange equation can be expressed as a fixed-point equation for a nonlinear map that is a contraction when T is sufficiently small, but this contractive property is easily seen to fail in the discrete case, even for linear equations. We next show that it is possible to obtain a uniqueness result for any set of observational times, provided the observational covariance is sufficiently small. Theorem 3. There is a constant σ0 > 0, depending on, N , D, ∥u0 ∥ and tN , such that (1) has a unique global minimizer in H01 (0, 1) if σ < σ0 . We will see explicitly in (15) how f ′′ and r ′′ can lead to nonconvexity in J. The general idea behind the preceding uniqueness theorems, which are proved in Section 6, is to determine under what conditions these nonlinear effects can be controlled by the linear term coming from the Gaussian prior distribution. It will be seen in the proofs below that Theorems 2 and 3 (as well as the uniqueness result in [2]) in fact establish the stronger result that the cost functional has a unique critical point in the closed subset B ⊂ H01 (described in Lemma 6) that necessarily contains any global minimizers. This observation could be useful in

implementing a gradient descent method, because it says that one can avoid spurious local minima by ensuring that the algorithm starts in the bounded region given by Lemma 6, where J is known to be convex. Our final result, which is proved in Section 7, shows that the behavior of J can be rather complicated in general. Theorem 4. Consider a reaction–diffusion equation yt = yxx + r (y) where r (0) = r ′ (0) = 0 and r ′′ (0) ̸= 0. Let H denote projection onto the first Fourier coefficient, and set R = 1. Then for any positive integer q and times t1 < · · · < tN there exist data {zi } and a prior u0 such that u ≡ 0 is a critical point of J with Morse index greater than or equal to q. Thus there are cases in which J has at least two critical points, one of which is very far from being a minimizer in the sense that it has large Morse index. 3. The variational framework We start by deriving the Euler–Lagrange equation for the variational problem (1) in the space V = H01 (0, 1). We also compute the second variation of the cost functional as it will be needed in proving the uniqueness theorems. We first recall that y denotes the unique solution to (3) with Dirichlet boundary conditions and y(0) = u. The variation of y with respect to the initial value u, in the v -direction, is denoted η := Dy(u)v , and satisfies the initial value problem

  ηt + f ′ (y)η x = ηxx + r ′ (y)η η(0) = v.

(7)

Similarly, the second variation of y is denoted ω := D2 y(u)(v, v) and satisfies

  ωt + f ′ (y)ω + f ′′ (y)η2 x = ωxx + r ′ (y)ω + r ′′ (y)η2 ω(0) = 0.

(8)

We observe that ω ≡ 0 if f ′′ and r ′′ vanish, which happens precisely when the forward equation is linear. We let p denote the solution to the adjoint equation

−pt − f ′ (y)px = pxx + r ′ (y)p p(tN ) = 0

(9)

with a discontinuous jump p(ti+ ) − p(ti− ) = H ∗ R−1 (Hy(ti ) − zi )

(10)

prescribed at each observation time, ti . The definition is such that

⟨pt , η⟩ + ⟨p, ηt ⟩ = 0

(11)

for all t ̸= ti . The first variation of the cost functional (1) can thus be written DJ (u)(v) =

N    1 η(ti ), H ∗ R−1 (Hy(ti ) − zi ) + 2 ⟨v, u − u0 ⟩V , σ i=1

and from the definition of p we obtain N N       η(ti ), H ∗ R−1 (Hy(ti ) − zi ) = η(ti ), p(ti+ ) − p(ti− ) i=1

i=1

N      = − ⟨η(0), p(0)⟩ − η(ti ), p(ti− ) − η(ti−1 ), p(ti+−1 ) i=1

where we have set t0 = 0. Then (11) implies



η(ti ), p(ti ) − −





η(ti−1 ), p(ti+−1 )





ti

= ti−1

=0

d dt

⟨p, η⟩ dt

G. Cox / Physica D 300 (2015) 34–40

H01 ∩ H 2 for all t ∈ (0, T ). Moreover, from Theorem 3.5.2 of [25], the map t → yt ∈ H01 is locally Hölder continuous. In particular this implies y and yt are continuous in both x and t. We also know that y ∈ H 2 so by the Sobolev embedding theorem yx ∈ H 1 is Hölder continuous. We then have for each fixed t that

for each i, hence DJ (u)v = − ⟨v, p(0)⟩ + σ −2 ⟨v, u − u0 ⟩V . Integrating the first term by parts, we arrive at the following. Proposition 1. The V -gradient of J is given by DJ (u) = ∆D p(0) + σ −1

−2

yxx = yt + f (y)x − r (y)

(u − u0 ).

(12)

To better understand this result, we emphasize that the solution to the adjoint equation depends on y and hence on the initial condition, u. With this dependence explicitly written as p[y(u)], the Euler–Lagrange equation can be viewed as a fixed-point equation for the map 1 u → u0 − σ 2 ∆− D p[y(u)](0).

(13)

Proceeding similarly for the second variation, we find D2 J (u)(v, v) =

N   

+ ∥R−1/2 H η(ti )∥2 +

σ2

∥v∥2V

(14)

and N  

We now gather some estimates on y, η and p that will be needed in proving the uniqueness theorems. The first of these shows that y is uniformly bounded on any finite time interval. Lemma 1. Suppose y(t ) solves (3) for t ∈ (0, T ), with ∥y(0)∥∞ ≤ A. Then (16)

for any t < T , where B depends on A and T .

i =1

1

is in C δ [0, 1] for some δ > 0, hence by elliptic regularity y(t ) ∈ C 2+δ [0, 1]. Thus y is a classical solution of (3). The long-time existence claim follows from Corollary 3.3.5 of [25] together with the pointwise bound that will follow in Lemma 1. 

∥y(t )∥∞ ≤ B

 ω(ti ), H ∗ R−1 (Hy(ti ) − zi ) 

37

Proof. Let ψ+ and ψ− solve the initial value problems

ψ+′ = |r (ψ + )|,

ψ+ (0) = A

ψ−′ = −|r (ψ − )|,  ω(ti ), H ∗ R−1 (Hy(ti ) − zi )

Then the parabolic maximum principle (cf. Theorem 1 of [26]) guarantees that

i=1

=−

ψ− (t ) ≤ y(x, t ) ≤ ψ+ (t ).

N      ω(ti ), p(ti− ) − ω(ti−1 ), p(ti+−1 ) i =1

because ω(0) = 0. Using (8) and (9) and integrating by parts, we find that

From (4) we know that ψ± exist and are continuous for all t ≥ 0, so we complete the proof by setting B := max max {−ψ− (t ), ψ+ (t )} .  0 ≤t ≤T

⟨pt , ω⟩ + ⟨p, ωt ⟩ = r ′′ (y)η2 , p + f ′′ (y)η2 , px , 

ψ− (0) = −A.







Since f and r are of class C 2 , for any given value of A and k ∈

with the following consequence.

{0, 1, 2} the quantities

Proposition 2. The Hessian of J is given by

Rk (A) := sup |r (k) (y)|

(17)

|y|≤B

D2 J (u)(v, v) =

tN



r ′′ (y)η2 , p + f ′′ (y)η2 , px









Fk (A) := sup |f (k) (y)|

dt

0

+

(18)

|y|≤B N   −1/2 2 1 R H η(ti ) + 2 ∥v∥2V . σ i =1

(15)

We observe that this is positive definite (hence J is convex) if f ′′ = r ′′ = 0, even though the Euler–Lagrange map in (13) may fail to be a contraction in that case.

are well defined, with B as in the proof of Lemma 1. We next estimate the L4 norm of η. To simplify notation, we observe that ∥η∥L4 (0,1) = ∥η2 ∥1/2 . Lemma 2. Suppose η(t ) solves (7) for t ∈ (0, T ), and ∥y(0)∥∞ ≤ A. Then

∥η2 (t )∥ ≤ ∥v 2 ∥eαt

(19)

for any t < T , where α depends on A and T .

4. Analytic preliminaries

Proof. We differentiate and then integrate by parts to obtain In this section we review some standard analysis for quasilinear evolution equations. We restrict our attention to equations that admit global solutions for all initial conditions, as this guarantees the cost functional is defined on all of V .

1



1 d 4 dt

η4 dx = 0

This result is a direct consequence of the material in Chapter 3 of [25]; for the sake of completeness we verify some of the necessary details here. Proof. The local existence and uniqueness follows from Theorem 3.3.3 of [25], because the map y → r (y)− f ′(y)yx is locally  Lipschitz from H01 to L2 . This yields a solution y ∈ C [0, T ); L2 , with y(t ) ∈

  η3 ηxx − [f ′ (y)η]x + r ′ (y)η dx 0 1



  −3η2 ηx2 + 3f ′ (y)η3 ηx + r ′ (y)η4 dx.

=



Proposition 3. Suppose r and f are locally Lipschitz and (4) is satisfied. Then for any initial condition u ∈ H01 (0, 1), (3) has a unique classical solution on [0, 1] × (0, ∞).

1



0

Then by the AM–GM inequality

 3  F1 (A) 4 η ηx  ≤ η + 4

1 F1 (A)

η2 ηx2

so we find that d dt

 0

1

  η4 dx ≤ 4R1 (A) + 3F1 (A)2

1



η4 dx. 0

The result follows from Gronwall’s inequality.



38

G. Cox / Physica D 300 (2015) 34–40

We finally turn to the adjoint equation. Invoking linearity, the solution can be expressed as p = p1 + · · · pN , where pi satisfies (9) with terminal condition p(tN ) = 0 and jump p(ti +) − p(ti −) = H ∗ R−1 (Hy(ti ) − zi ).

(20)

Therefore it suffices to bound each pi individually and sum the resulting estimates. Lemma 3. Suppose p(t ) solves (9), and ∥y(0)∥∞ ≤ A. Then

∥p(t )∥ ≤ Ceβ(tN −t )

(21)

for any t ≤ tN , and tN





∥px (t )∥dt ≤ C tN e2β tN ,

(22)

1 We first note that the covariance operator C0 = −σ 2 ∆− D has 2 eigenvalues γn = (σ /nπ ) , with normalized eigenfunctions φn (x) √ = 2 sin(nπ x). Then we can express a random variable u ∼ µ0 using the Karhunen–Loève expansion:

u = u0 +

∞ √  σ ξn sin(nπ x), 2 nπ n =1

where {ξn } is an i.i.d. sequence of N (0, 1) random variables. This means the nth Fourier coefficient of u is distributed according to N (an , (σ /nπ )2 ), where an is the nth Fourier coefficient of the prior mean, u0 . It follows that

∥ u − u0 ∥ 2 =

0

∞  σ 2 ξn2 (nπ )2 n =1

where β depends on A and tN , and C depends on N , D, A and tN .

and so

Proof. Differentiating and applying the AM–GM inequality, as in the proof of Lemma 2, we have

E ∥ u − u0 ∥ 2 =



1 d 2 dt

∥p∥ = p, pxx + r (y)p + f (y)px 2









≤ −∥px ∥2 + R1 (A)∥p∥2 + F1 (A)∥p∥∥px ∥   F1 (A)2 ≤ R1 (A) + ∥p∥2 4

and so an application of Gronwall’s inequality to the function ∥p(ti − t )∥2 yields

  ∥pi (t )∥ ≤ H ∗ R−1 (Hy(ti ) − zi ) eβ(ti −t )

(23)

σ2 6

.

Thus σ 2 measures the expected value of ∥u − u0 ∥2 . We now give some bounds that will be useful in the proof of Theorem 1. Lemma 4. Suppose r (y) is uniformly Lipschitz. Then there exist positive constants a and b so that

  ∥y(t )∥2 ≤ eat ∥y(0)∥2 + bt

(24)

and



t

2

∥yx (s)∥ ds ≤ ∥y(0)∥ − ∥y(t )∥ + a 2

2

2

for any t ≤ ti , with β = R1 (A) + F1 (A) /4. We next recall that y(t ) is bounded uniformly (and hence in L2 ), so

for all t ≥ 0 and any solution y(t ) to (3).

 ∗ −1  H R (Hy(ti ) − zi ) ≤ C ′ ,

Proof. Differentiating, we have

where C ′ depends on A and tN (through Lemma 1) and D. To complete the proof of (21) we simply note that pi (t ) = 0 for t > ti , then let C = NC ′ . With a different choice of constants in the AM–GM inequality, we obtain

1 d

0

2



1 d 2 dt

1



∥p∥ ≤ − ∥px ∥ + R1 (A) + 2

2

2

F1 (A)

 2 ∥p∥

2

and subsequently, letting γ = 2R1 (A) + F1 (A) ,

ti

∥pix (t )∥2 dt ≤ eγ ti ∥pi (ti )∥2 − ∥pi (0)∥2 γ ti ′ 2

≤e C

for each i. Now from the Cauchy–Schwarz inequality, tN 0

∥px (t )∥dt ≤

N   

ti

ti

i=1

≤ NC



∥pxi (t )∥2 dt

1/2

0



tN eβ tN ,

where we have used the fact that γ ≤ 4β .

g (y)x dx 0

vanishes by the fundamental theorem of calculus, so d

∥y∥2 ≤ −∥yx ∥2 + 2 ⟨y, r (y)⟩ .

Lemma 5. Suppose r (y) and f ′ (y) are uniformly Lipschitz. Then for any ρ > 0 there exists a positive constant L so that

0



1



It follows immediately from the Lipschitz condition that |yr (y)| ≤ K |y|2 + |r (0)||y|, which implies 2|yr (y)| ≤ a|y|2 + b for some a and b. Then (24) is a consequence of Gronwall’s inequality and (25) is obtained by integrating from 0 to t. 

dt

Integrating, we find



∥y∥2 = ⟨y, yxx + r (y) − f (y)x ⟩ .

⟨y, f (y)x ⟩ =

dt

 d  γt e ∥p ∥2 .

∥y(s)∥2 ds + bt (25) 0

Letting g (y) be an antiderivative of yf ′ (y), we find that

2

2

∥px ∥2 ≤

2 dt

t





∥y1 (t ) − y2 (t )∥2 ≤ L∥u1 − u2 ∥2 for all t ≤ tN , provided y1 (t ) and y2 (t ) satisfy (3) with max{∥y1 (0)∥, ∥y2 (0)∥} < ρ . Proof. For convenience we let Kr and Kf denote the Lipschitz constants of r and f ′ , respectively. Differentiating and then integrating by parts as in the proof of Lemma 4, we have 1 d

5. The Bayesian formulation Before proving Theorem 1 we elaborate on the meaning of  1 the Gaussian prior measure µ0 = N u0 , −σ 2 ∆− , following D throughout the presentation of [1].

2 dt

∥y1 − y2 ∥2 = ⟨y1 − y2 , (y1 − y2 )xx + r (y1 ) − r (y2 ) − f (y1 )x + f (y2 )x ⟩ ≤ −∥(y1 − y2 )x ∥2 + Kr ∥y1 − y2 ∥2 + ⟨y1 − y2 , f (y2 )x − f (y1 )x ⟩ .

G. Cox / Physica D 300 (2015) 34–40

Proof. Since u∗ is a minimizer it satisfies J (u∗ ) ≤ J (0). Letting y(t ) solve (3) with y(0) = 0, Lemma 1 implies that y(t ) is uniformly bounded for t ≤ tN . Therefore

For the final term we write

 f (y1 )x − f (y2 )x = f ′ (y1 )(y1 − y2 )x + f ′ (y1 ) − f ′ (y2 ) y2x 

and thus obtain

∥u∗ ∥2V ≤ 2σ 2 J (u∗ ) N  2  −1/2 R ≤ σ2 (Hy(ti ) − zi ) + ∥u0 ∥2V

|⟨y1 − y2 , f (y2 )x − f (y1 )x ⟩|   ≤ Kf (∥y1x ∥ + ∥y2x ∥) + |f ′ (0)| ∥y1 − y2 ∥∥(y1 − y2 )x ∥. Then after an application of the AM–GM inequality, we find that 1 d 2 dt



2

∥y1 − y2 ∥ ≤

2 1 Kr + Kf (∥y1x ∥ + ∥y2x ∥) + |f ′ (0)| 4



is bounded above as claimed.



Proof of Theorem 2. From Lemmas 2 and 3 we have

where we have defined α(t ) to be the term  tin parentheses on the right-hand side. From (25) we know that 0 α(s)ds is bounded above by a constant depending only on ∥y1 (0)∥, ∥y2 (0)∥ and tN , so the result follows from Gronwall’s inequality.  Proof of Theorem 1. This is a straightforward consequence of Corollary 4.4 of [1]. To see this we must show that: (i) L2 (0, 1) has full measure under µ0 ; (ii) for every ϵ > 0 there exists M ∈ R such that

 ′′   r (y)η2 , p  ≤ CR2 ∥v 2 ∥eαt +β(tN −t ) and tN



 ′′  √  f (y)η2 , px  dt ≤ CF2 ∥v 2 ∥ tN e(α+2β)tN .

0

Combining these estimates, we find that tN



 ′′    √  r (y)η2 , p + f ′′ (y)η2 , px  dt ≤ Γ ∥v∥2 tN , V

(27)

0

N   −1/2 2 R Hy(ti ) ≤ exp(ϵ∥y(0)∥2 + M )

where Γ depends on A (from Lemma 6), tN , N and D. It is clear that Γ remains bounded above as tN → 0, so we can choose tN small √ enough that Γ tN < σ −2 . We thus have

i =1

whenever y(t ) is a solution to (3); (iii) for every ρ > 0 there exists L ∈ R such that

tN



N   −1/2 2 R H (y1 (ti ) − y2 (ti )) ≤ L∥y1 (0) − y2 (0)∥2

0

 ′′     r (y)η2 , p + f ′′ (y)η2 , px  dt < 1 ∥v∥2 V σ2

and the convexity of J follows from (15).

i =1

whenever y1 (t ) and y2 (t ) satisfy (3) with max{∥y1 (0)∥, ∥y2 (0)∥} < ρ . To establish (i) we use Lemma 6.25 of [1], which says that any function u ∼ µ0 is almost surely α -Hölder continuous for any α < 1/2. In particular, this implies u is almost surely contained in L2 (0, 1), hence µ0 [L2 (0, 1)] = 1. Since the observation operator H is bounded on L2 , (iii) follows easily from Lemma 5. For (ii) it suffices to prove

∥y(t )∥2 ≤ exp(ϵ∥u∥2 + M ) for any t ≤ tN . Thus fixing ϵ > 0 and defining eM = max{btN eatN , ϵ −1 eatN } we have from Lemma 4 that

  ∥y(t )∥2 ≤ eM 1 + ϵ∥u∥2 M ϵ∥u∥2

≤e e



6. The uniqueness theorems Our main tool for proving Theorems 2 and 3 will be the second variation formula (15) together with the following a priori estimate for minimizers of J. We recall that V = H01 (0, 1), with the norm ∥u∥V = ∥ux ∥. Lemma 6. Let u∗ achieve the infimum of the cost functional (1). Then

∥u∗ ∥V ≤ A,

i=1

We now use the estimates of Section 4 to prove that, under the conditions of Theorems 2 and 3, J is convex on the ball ∥u∗ ∥V ≤ A.

× ∥y1 − y2 ∥2 = α(t )∥y1 − y2 ∥2 ,

as required.

39

(26)

where A depends on N , D, tN , ∥u0 ∥ and σ . It is clear from the proof that A can be assumed to be nondecreasing with respect to σ .



Proof of Theorem 3. We start with (27) and observe that Γ remains bounded above as σ → 0.√ Therefore it is possible to choose σ small enough that σ −2 > Γ tN and the result again follows from (15).  7. The non-uniqueness theorem Turning now to the proof of Theorem 4, we must establish that u = 0 is a critical point of J, and the Hessian D2 J (0) has at least q negative eigenvalues. The key to the proof is the observation that the Euler–Lagrange equation depends on both the data and the prior, whereas the Hessian is independent of the prior. Thus we can first construct data to ensure D2 J (0) has the required number of negative eigenvalues, and then choose the prior mean to ensure that 0 is a critical point of J. Proof of Theorem 4. The hypothesis r (0) = 0 ensures that y(t ) ≡ 0 is the unique solution of (3) with y(0) = 0. Then because r ′ (0) = 0, the linearized equation reduces to the heat equation, ηt = ηxx , and the adjoint equation becomes the backward heat equation, −pt = pxx . We compute the Hessian of J in the direction of the first q Fourier modes, setting vn = sin(nπ x) for 1 ≤ n ≤ q, so ∥vn ∥2V = 1/2. The corresponding solution to the linearized forward equation is 2π 2t

η(x, t ) = e−n

sin(nπ x)

and so N N   −1/2 2  2 R H η(ti ) ≤ e−2π ti . i =1

i=1

For each observation zi ∈ R we have H ∗ zi = zi sin(π x),

40

G. Cox / Physica D 300 (2015) 34–40

hence the solution to the adjoint equation is given by



p(x, t ) =

zi e

−π 2 (t −ti )

sin(π x)

{i:t
for t ̸= ti . We thus find that



r ′′ (y)η2 , p =



  2 2 2 zi e−2n π ti e−π (t −ti ) sin(π x), sin2 (nπ x)

 {i:t
=

4n2



2π (4n2 − 1) {i:t
2 2 zi e−π [t +(2n −1)ti ] .

Integrating, we have tN

 0

 {i:t
N  1  −2n2 π 2 ti  π 2 ti 2 2 zi e−π [t +(2n −1)ti ] dt = 2 −1 e zi e

π

i=1

and so we find from (15) that D2 J (0)(vn , vn ) ≤

N 

4n2 2π ( 3

+

4n2

N  i=1

− 1)

e−2π

2t

i



2 2 2 zi e−2n π ti eπ ti − 1



i=1

+

1 2σ 2

.

All N terms in the first summation are positive (with the exception of the zi coefficients) and decreasing with respect to n, so if we choose the {zi } sufficiently negative that D2 J (0)(vq , vq ) < 0, the Hessian will also be negative for all vn with 1 ≤ n ≤ q. To complete the proof, we choose the prior 1 u0 := σ 2 ∆− D p(0).

It follows immediately from (12) that u = 0 is a critical point of J.  Acknowledgments The author would like to thank Damon McDougall for numerous enlightening conversations throughout the preparation of this work. This research has been supported by the Office of Naval Research under the MURI grant N00014-11-1-0087 and the National Science Foundation under U.S. NSF Grant DMS-1312906. References [1] A.M. Stuart, Inverse problems: a Bayesian perspective, Acta Numer. 19 (2010) 451–559. [2] Luther W. White, A study of uniqueness for the initialization problem for Burgers’ equation, J. Math. Anal. Appl. 172 (2) (1993) 412–431. [3] Carlos Castro, Francisco Palacios, Enrique Zuazua, An alternating descent method for the optimal control of the inviscid Burgers equation in the presence of shocks, Math. Models Methods Appl. Sci. 18 (3) (2008) 369–416.

[4] François-Xavier Le Dimet, Olivier Talagrand, Variational algorithms for analysis and assimilation of meteorological observations: theoretical aspects, Tellus A 38A (2) (1986) 97–110. [5] J. Lundvall, V. Kozlov, P. Weinerfelt, Iterative methods for data assimilation for Burgers’ equation, J. Inverse Ill-Posed Probl. 14 (5) (2006) 505–535. [6] Antje Noack, Andrea Walther, Adjoint concepts for the optimal control of Burgers equation, Comput. Optim. Appl. 36 (1) (2007) 109–133. [7] Carl R. Hagelberg, Andrew F. Bennett, Donald A. Jones, Local existence results for the generalized inverse of the vorticity equation in the plane, Inverse Problems 12 (4) (1996) 437–454. [8] Thorsten Hohage, Mihaela Pricop, Nonlinear Tikhonov regularization in Hilbert scales for inverse boundary value problems with random noise, Inverse Probl. Imaging 2 (2) (2008) 271–290. [9] Amit Apte, Didier Auroux, Mythily Ramaswamy, Variational data assimilation for discrete Burgers equation, in: Proceedings of the Eighth Mississippi StateUAB Conference on Differential Equations and Computational Simulations, in: Electron. J. Differ. Equ. Conf., vol. 19, Southwest Texas State Univ., San Marcos, TX, 2010, pp. 15–30. [10] G. Chavent, K. Kunisch, On weakly nonlinear inverse problems, SIAM J. Appl. Math. 56 (2) (1996) 542–572. [11] G. Chavent, Nonlinear least squares for inverse problems, in: Scientific Computation, Springer, New York, 2009, Theoretical foundations and step-bystep guide for applications. [12] Joel N. Franklin, On Tikhonov’s method for ill-posed problems, Math. Comp. 28 (1974) 889–907. [13] C.W. Groetsch, The Theory of Tikhonov Regularization for Fredholm Equations of the First Kind, in: Research Notes in Mathematics, vol. 105, Pitman (Advanced Publishing Program), Boston, MA, 1984. [14] V.A. Morozov, Methods for Solving Incorrectly Posed Problems, SpringerVerlag, New York, 1984, Translated from the Russian by A.B. Aries, Translation edited by Z. Nashed. [15] Andrey N. Tikhonov, Vasiliy Y. Arsenin, Solutions of ill-posed problems, in: Scripta Series in Mathematics, V. H. Winston & Sons, Washington, D.C., 1977, John Wiley & Sons, New York-Toronto, Ont.-London. Translated from the Russian, Preface by translation editor Fritz John. [16] Heinz W. Engl, Martin Hanke, Andreas Neubauer, Regularization of inverse problems, in: Mathematics and its Applications, vol. 375, Kluwer Academic Publishers Group, Dordrecht, 1996. [17] M.V. Klibanov, A class of inverse problems for nonlinear parabolic equations, Sibirsk. Mat. Zh. 27 (5) (1986) 83–94. 205. [18] Michael V. Klibanov, Global uniqueness of a multidimensional inverse problem for a nonlinear parabolic equation by a Carleman estimate, Inverse Problems 20 (4) (2004) 1003–1032. [19] Yuri Luchko, William Rundell, Masahiro Yamamoto, Lihua Zuo, Uniqueness and reconstruction of an unknown semilinear term in a time-fractional reaction–diffusion equation, Inverse Problems 29 (6) (2013) 065019. 16. [20] Fahir Talay Akyildiz, Salih Tatar, Suleyman Ulusoy, Existence and uniqueness for a nonlinear inverse reaction–diffusion problem with a nonlinear source in higher dimensions, Math. Methods Appl. Sci. 36 (17) (2013) 2397–2402. [21] Michael S. Pilant, William Rundell, An inverse problem for a nonlinear parabolic equation, Comm. Partial Differential Equations 11 (4) (1986) 445–457. [22] Herbert Egger, Heinz W. Engl, Michael V. Klibanov, Global uniqueness and Hölder stability for recovering a nonlinear source term in a parabolic equation, Inverse Problems 21 (1) (2005) 271–290. [23] Alexander G. Ramm, Inverse problems, in: Mathematical and Analytical Techniques with Applications to Engineering, Springer, New York, 2005, Mathematical and analytical techniques with applications to engineering, With a foreword by Alan Jeffrey. [24] M. Dashti, K.J.H. Law, A.M. Stuart, J. Voss, MAP estimators and their consistency in Bayesian nonparametric inverse problems, Inverse Problems 29 (9) (2013) 27. 095017. [25] Daniel Henry, Geometric Theory of Semilinear Parabolic Equations, in: Lecture Notes in Mathematics, vol. 840, Springer-Verlag, Berlin, 1981. [26] Stanley Kaplan, On the growth of solutions of quasi-linear parabolic equations, Comm. Pure Appl. Math. 16 (1963) 305–330.