CHAPTER 5
A Simple Application of the Invariance Theorems
The purpose of this chapter is simply to illustrate on a simple example the approximation ideas (as well as some of the needed assumptions) which we will develop further in the subsequent chapters. With the exception of one result at the end of the chapter, no proofs are given. The chapter begins with an example showing how the concepts of weak convergence of measures can give us very useful generalizations of the classical limit theorems for sequences of vectorvalued random variables. We then apply this result to a problem arising in the numerical analysis of a simple twopoint boundary value problem for a secondorder ordinary differential equation. We take a finite difference approximation to the equation. If the approximation is chosen with a little care, then the resulting finite difference equation has a probabilistic interpretation, which we can exploit via weak convergence methods to get the convergence of the finite difference solution to the true solution. 78
5.1
79
A FUNCTIONAL LIMIT THEOREM
5.1 A Functional Limit Theorem We will try to illustrate the power of the ideas of weak convergence of measures in applications to problems in the numerical analysis of elliptic equations. First, we consider a special and simple convergence problem for a sequence of random variables. For each n, let {I&, k = 0, 1, . . .} denote a sequence of independent, identically distributed random variables which satisfy, for some A E (0, l),
P{II/; = l/n} = P{$; =
4,
 1/n} = A / 2 I
P{*; = 0) = 1  A, and define the random walk {<;} by
G=O.
<;+1=<;+*;,
Let n ,03, but keep k/n fixed at a constant t. Then the classical central limit theorem immediately yields that n112<; converges in distribution to a normally distributed random variable, with zero mean and variance At. As a consequenceofthis convergence,we know that Ef(n'12<;), E f ( X ) as n , 03, k/n = t, where X is normally distributed with mean 0 and variance At and f(.) is any bounded almost everywhere continuous function. The value of E f ( X )can be approximated by Ef(n112<;) if desired, although in this case, the limit will often be easier to calculate. Now let us consider a strengthening of the convergence result. Define t; = k/n2 and At; = l/n2, and define the piecewise constant process (with paths in D[O, 03) <"(.) by 5"(t) = <;
on
[at ; + d
By a straightforward calculation, it can be shown that there is a real number K for which
+ A )  rn(t114 IE l
~le.(t
c
k:t < t k +
1% t +
*; 1
4
A
IK [ A
+ o(i/n2)12.
(1.1)
By Theorem 2.4.2, {?(.)} is tight? on D[O, 03) and if <(.) is a separable process to which a subsequence converges in distribution, then <(. ) is continuous w.p. 1. The limit is actually a Wiener process with covariance At. There are several ways to show this. For the first method, we start by calculating the limit of the characteristic functions of a weakly convergent subsequence.
t It is only necessary to check the conditions of the theorem for A 2 1/2nZ.
80
5
A SIMPLE APPLICATION OF THE INVARIANCE THEOREMS
For any integer m and any real 0 I t l < t 2 .. . < t, and real I , ,. . ., I , , we can get (via the weak convergence)
C I j [ t n ( t j ) ("(ti 1)]
j= 1
,exp
 1 A I f ( t j  ti 1)/2. j= 1
The limit does not depend on the subsequence. Thus the multivariate distributions converge to those of said Wiener process. This and tightness implies the asserted convergence (Theorem 2.4.3). Often it is difficult to show u priori that the finite dimensional distributions coverge to those of a specified process, and other methods must be used. A second method does not require the fact that the multidimensional distributions converge, but uses some results in martingale theory. Note that (1.1) implies that {<"(t)}and { I t n ( t )12} are uniformly integrable for any t 2 0 since E I ( " ( t ) 1" is bounded uniformly in n for each t 2 0. Fix a particular weakly convergent subsequence and index it by n. Let <( .) denote the limit process; it is continuous w.p. 1. We will argue that the limit is a continuous martingale with quadratic variation At, hence a Wiener process with covariance A t by Section 1.4.4. To see this, let q denote an arbitrary integer, t , , . . ., t , I t be arbitrary real numbers, and g(  ) be a bounded realvalued continuous function on Rq. By a direct evaluation, we see that
Qn E E g ( t n ( t i ) , .. .) t " ( t q ) ) ( t n ( t+ A)  t n ( t ) )= 0. (1.2) The function on D[O, 0 0 ) with values g ( x ( t l ) , . . .,x ( t q ) ). ( x ( t + A)  x(t))" at x ( . ) E D[O, a),a = 1, 2, is not bounded, but owing to the uniform integrability of { 1t"(s)la),a = 1, 2, each s 2 0, and the weak convergence, we have
Q,
+
Eg(t(ti),
...
t(t,))(t(t
+ A)  t(t)),
which equals zero by (1.2). Equation (1.3) and the arbitrariness of q, t c,, and g( imply that
(1.3)
. . .,
a )
E[t(t
+ A)  t(t)I t(s), s I tl = 0,
and hence that t( ) is a continuous martingale with respect to some familly of aalgebras. Similarly, by the uniform integrability and weak convergence,
+ A)  t"(t))'
E g ( C ( t i ) , ..)t n ( t q ) ) ( t n ( t
Rn
Eg(t(t,),
Rn
+
Eg(t(t,),
..
..., t(t,))AA; .9
t(t,))[t(t+ A)  t(t)l'.
Equations (1.4) and (1.5) and the arbitrariness of q, imply that
+
tl,
E [ ( t ( t A)  ((t))' I {(s), s I t ] = AA.
(1.4) (1.5)
..., t,, and of g ( * )
5.2
AN APPLICATIONTO NUMERICAL ANALYSIS
81
Hence the quadratic variation is A t and (() is a Wiener process. Since the limit of any convergent subsequence is a Wiener process, the entire original sequence {(”( )} tends to a Wiener process in distribution. A third method depends on the fact that there is a Wiener process w ( . ) and for each n an increasing sequence of random variables {z;} so that $1 has the distribution of w(z;+ 1)  w(z;) (see Breiman [B4]). The convergence of the process (”( .) to ((. ) in distribution is considerably stronger and more useful than the convergence of (“(t)to ( ( t )in distribution for each t 2 0. In particular, if g( .) : D[O, 0 0 ) + R is bounded and almost everywhere continuous with respect to the measure that (() induces on D[O, a), then E g ( P ( .)) + Eg((( )). For example, let g( .) be the minimum of T and the first hitting time of a point X. The next section deals with a useful application to numerical analysis. 5.2 An Application to Numerical Analysis To provide a simple introduction to some of the ideas and techniques of succeeding chapters, we consider a relatively simple problem of numerical analysis; namely, solving the differential equation
a(.) 2 0,
a( .),
bounded, continuous
b( ), k( .)
inf ( l 4 x ) l
OSXSl
+
If(x)I)>O.
The last line is clearly necessary (but not sufficient) if (2.1) is to have a bounded solution for each continuous k( *). For any realvalued function g ( * )define g + ( x ) = max[O,
&)I,
g  ( x ) =
[email protected],
&)I.
Let h be a positive real number such that h is an integer, and suppose that the stochastic differential equation (2.2) has a unique solution in the sense of probability distributions. d X =f(X)dt
+ (2a(X))”’ dw
(2.2)
Let z = inf{t : X ( t ) $ (0, 1)). By Section 4.5, (2.1) is the equation formally satisfied by (if SUP,^(^, 1) E,z < a)
Jb
T
R ( x ) = Ex
k ( X ( s ) )ds + aP,{X(r) = 0 }
+ BP,(X(T) = l},
x E (0, 1).
82
5
A SIMPLE APPLICATION OF THE INVARIANCE THEOREMS
We will try to approximate the solution to (2.1) by a special finite difference method (finite difference interval h), which will have a very nice probabilistic interpretation. The solution is dejned to be (2.3), so we will want to show that the approximate solutions tend to (2.3) as h + 0 . Use the approximations
+ h)  2 V ( x ) + V ( x  h)]/h2 [ V ( x + h)  V ( x ) ] / h if f ( x ) 2 0 [ V ( x ) V ( x  h ) ] / h if f ( x ) = 0.
Vxx(x)+ [ V ( x Vx(x)+ +
(2.4) (2.5)
The choice in (2.5) is very important for reasons which will soon be clear. Let V " ( * )denote the solution to the finite difference equations and define Ph(%
hf * ( X ) ] / Q h ( X ) , X = 0, f h, Ath(x)= h2/&(X),
X
f h) = [a(.)
... .
Define ph(x, y) = 0 for y # x _+ h, where y, x are multiples of h. Next, substitute (2.4) and (2.5) into (2.1), collect terms, divide by Q,,(x), the negative of the coefficient of V"(x),and get
+
V"(X)= V"(X h)p"(x, x V"(0)= u,
+ h) + V"(X h)p"(x, x  h) + k ( x ) A ~ " ( x ) , x
Vh(1) = p.
= h,
..., 1  h, (2.6)
Equation (2.6) has an interesting probabilistic interpretation. The p"(x, y) are 2 0 and sum over y to unity for each x . Thus {ph(x,y)} is the transition probability for a Markov chain on the state space 0, f h, . . . . Let {t:, n = 0, 1, . ..} denote the random variables of the chain, and define N,,, the escape time of the chain from (0, l), by Nh = min{n : q! (0, 1)). Equation (2.6) can be rewritten in the form
<:
+
V " ( X )= E x V h ( < t ) k ( x ) At"(.) Vh(1) = p.
V"(0)= u,
Let b ( . ) be any continuous function such that b(1) = /3, b(0) = u. If E x N h < co, then by Section 3.1.2, (2.7) has the unique solution
[ 1 k(t3 Nh
V"X)
= Ex
i=O
1
At!
+b ( t 4
(2.8)
where we use At: = At"(<:).If k ( x ) = 0 and P x { N h< co} = 1, then (2.7) has the unique solution V"X) = E,b(
5.2
AN APPLICATION TO NUMERICAL ANALYSIS
83
Equation (2.8) has a superficial similarly to (2.3); if we interpret Ar: as a time interval, then the sum in (2.8) resembles a Riemann sum approximation to the integral in (2.3). In fact, the resemblence is very close as we shall see and under certain conditions (\ hich are not stringent in applications), V h ( x )+ R ( x ) as h 0. First, let us note that
<: <:
E[<:+l cov[tl+,  t l
= X ] = f ( x ) Ath(x) = x ] = 2a(x) Ath(x)
+ ( hI f ( x ) 1
(2.9)
 f 2 ( x ) Ath((.)) A r h ( x ) = 2a(x) Ath(x)
+ o(Ath(x)).
The terms in (2.10) are just what we would get [see Eqs. (1.5.15), and (1.5.16)] for the conditional increments of the solution to (2.2) over an interval? A. These calculations suggest that there are limit theorems connecting {<")} and the diffusion X ( . ) . This is, indeed, just where the ideas of weak convergence enter. But first we must interpolate {ti}into a continuous parameter process. The A t h ( x ) factors in (2.8) and (2.9) suggest the following interpolation. Define t: and rh()by n..
t: =
<"O
1 .
1 At!,
tk = 0,
i=O
= 5,
h
on [ t i , t l + 1). So t h ( *is) held constant on random intervals and is our approximation to X(*). In Chapter 6, it will be shown that {lh( )} is tight on D[O, co) and that if h indexes a subsequence which converges in distribution to a limit X ( .), then X( must satisfy (2.2). As a further check on the scaling, consider the special case where a ( . ) = 0. Then i= f ( x ) replaces (2.2) and A t h ( x )= h/J f ( x ) l = h/(velocity at x ) . This is the correct time interval since we must hold the process at a fixed value until enough time passes so that it can jump h. This time is just h times the inverse of the velocity at x . Define ph = rk,, , the escape time of th() from (0, 1). Thus (2.8) becomes a )

(2.10) Assume, for the moment: ph < co w.p. 1, each h and { p h } is uniformly integrable, Px{z = z'} = 1,
x E (0, 1).
(2.11) (2.12)
t A Lipschitz condition was assumed in Section 1.5. But the estimates are valid for any solution to the stochastic differential equation iff( .), cr(  ) are uniformly continuous.
84
5
A SIMPLE APPLICATION OF THE INVARIANCE THEOREMS
We shall return to these assumptions later. Under (2.11), we can “essentially” replace ph in (2.10) by p h n T for large T and all h. Under (2.12), the functions on D[O, co) given by
where 7 = escape time of the path x (  ) from (0, l), are continuous w.p. 1 relative to the X ( .) measure. Then weak convergence (Section 2.2) implies ) ) E , g , ( X (  ) )and similarly for g 2 ( . ) . Owing to the uniquethat E x g 1 ( C h ( *+ ness of the solution to (2.2) (in the sense of distributions), the distribution of each limit is the same; it does not depend on the particular convergent subsequence that we choose. Thus E x g i ( X ( is well defined by the approximation sequence. Condition (2.12)holds if 0 and 1 are regular (T’).A sufficient condition for this is a))
(a(0)> 0
or f(0) < 0)
and
( a ( 1 )> 0
or f(1) > 0).
If a(0) > 0 and X ( r ) = 0, then the “wildness of the function of t defined by [ 2 a ( X ( ~ ) ) ]dw(s) ” ~ forces x(7 + t ) to be less than 0 infinitely often on any interval [0, s), s > 0. If a(0) = 0 and f(0) < 0, the drift forces the process to the left. If k( .) = 0, then (2.11) can be replaced by
c+‘
”
Px{r < co} = 1.
(2.13)
If (2.13) does not hold at some x E (0,l), then the approximationsindeed any finite difference approximationmay not converge as h + 0 for that value of x, although the approximations will still converge for the x values for which (2.12) and (2.13) do hold if k(.) = 0 (Chapter 6 ) . If (2.13) and (2.12) hold then we will have ph + T w.p. 1 (using the Skorokhod imbedding) and so t;’((Ph)+X(7) w.p. 1. If (2.13) does not hold, then there is a nonnull set on which X ( t ) E (0, l), all t < 00. But for each h, the corresponding paths th(t)may still hit the boundary at a finite time (although those times will go to 00 as h + 0); hence Exb(Ch((Ph)) will not necessarily converge to the correct limit Ex b(X(r))Z,,< . It will be shown below that (2.14) implies (2.11). In general, for a problem in R‘, uniform positive definiteness of the matrix a(  ) guarantees (2.1 1). In the degenerate case, we must check whether the problem is well definedin particular, whether escape is possiblefor otherwise the functionals may not be well defined.
5.2
85
AN APPLICATION TO NUMERICAL ANALYSIS
Consider the special case where there is a real o z > 0 such that
a(.) = a2/2 and wheref(x) = 0. Then X (  ) is a Wiener process with covariance a2t and Ath(x)= h2/a2,a constant, and It!} a random walk. Then by
Section 5.1, Th( .) converges in distribution to a Wiener process with covariance &, which we will also denote by X ( *). Also, (2.12) holds and so does (2.11) (see below). Hence V h ( x ) R ( x ) . Let us use the Skorokhod imbedding (Theorem 2.2.2). Thus, we can assume that the probability space is chosen so that (w.P. 1 ) +
t“d + X ( t ) uniformly on bounded t sets. Then since 0 and 1 are regular and (w.P. l), b(th(ph)) ~
Jo
Ph
+
(T’),
p h + T w.p. 1
b(X(T)), r
w.p. 1.
k ( X ( s ) )ds,
k ( t h ( s ) )ds + 10
The expectations of the b( .) terms converge also. By (2.11), the expectations of the integral terms converge. For this simple case, more classical (and simpler) methods also work. But our intention was to illustrate the weak convergence ideas on a simple example, where the ideas are useful to prove convergence of approximations to an interesting unbounded path functional, which is the solution of a differential equation. The probabilistic approach required that we work with random variables that are elements of the abstract space D[O, co)or that we treat convergence of processes rather than of vectorvalued variables. We could not have easily treated the problem by using convergence of multivariate distributions only. Characteristic Operator
The definition of the domain of 2,the weak infinitesimal operator, involves a nonlocal calculation in the sense that X ( t ) may not be arbitrarily close to X ( 0 ) = x, uniformly in w, for small t, and we must calculate limt+o [ E , F ( X ( t ) )  F ( x ) ] / t . Even if I X ( t )  X I were suitably and uniformly (in w) bounded for small t, the limit may not exist. But the characteristic operator is a local operator. Define T h = inf{t : X ( t ) = x & h} and let X ( 0 ) = x. It is not hard to check that iff(.) = 0, then E x T h = hz/a2(x) o(h2) and if f(x) # 0 but d(.) = 0, then E x T h = h/ I f(x) I o(h). Using (2.7), we write
+
+
+ k(X) = 0,
[ExV h ( t h ( T h ) )  Vh(x)]/Ath(x)
which is just an approximation to the “local ” characteristic operator. This
86
5
A SIMPLE APPLICATION OF THE INVARIANCE THEOREMS
suggests that the probabilistic approach taken here is quite natural. Indeed, the functionals of concern (in the uncontrolled case) will usually be in the domain of the characteristic operator. PROOF THAT (2.12) AND infxeIo,1, P,{z < T } 2 2c > 0 IMPLY (2.11) FOR h, WHERE T AND c ARE POSITIVE REAL NUMBERS Suppose that there are sequences h 0,x h + x E [0, 13, such that
SMALL
Pxh{ph< T}+O
as
(2.15)
h+0.
The sequence {th( *), (“0)= xh} can be shown to be tight in D[O, 0 0 ) and all limits have the form (2.2) for X ( 0 ) = x. The measure induced by each solution (for perhaps different Wiener processes) of (2.2) is the same by our uniqueness assumption. Since, by (2.12), the escape time z is continuous w.p. 1, and g h ( t ) + X ( t ) uniformly on finite intervals (using Skorokhod imbedding), we have that p h + z w.p. 1 as h + 0. The set of paths in D[O, a), for which? z’ T is open. Thus (Chapter 2) limh+oP,,{ph < T } 2 P,{z T } 2 2c > 0. This contradicts (2.15). Hence
=
=
inf PX{ph< T } 2 c, XEIO.
all small h.
(2.16)
11
denote the stopped processes th(tn p,,) and t: n N h , resp. Let $(*), Then p h 2 2 T if and only if t h ( T )E (0,l), and the escape time is 2 T when the initial condition is zh(T).Thus (for small h)
> 2T} = E ~ z { < h ( T(0. ) ~l))z{th(ZT) Using the Markov property of {tt}, Px{ph
Pxbh
2 2T} = Ex
E
(0,1))
*
I(th(T)~(O 1 ),) P t h ( T ) { p h 2
5 E X I { p ( T ) € ( o,))Q *  c ) 5 (1  Cl2.
In general (for small h ) P,{ph 2 nT} 5 (1  c)”, which implies that there are numbers M , < 00 for which (for small h) E x p i IM ,
uniformly in x
E
(0,l),
which implies the uniform integrability. On the Finite Diference Method and the Choice (2.5)
The purpose of the choice between forward and backward differences in the approximation (2.5) is to obtain a finite difference equation whose coefficients are nonnegative. With either a forward or backward difference
t In a more formal notation, let r ( x ( * ) denote ) the escape time of x( .) from (0, l), where x( ) is an arbitrary element of D[O, w). In this terminology, P,(o)= i(th((o.)).
5.2
87
AN APPLICATION TO NUMERICAL ANALYSIS
approximation, the coefficients would sum to unity, but they could be negative. With our choice, the dynamics help “push the particle to the right if f(x) > 0 and to the left iff(.) < 0. Any approximation method which provides a computationally convenient chain {ti}and whose interpolations converge to X ( . ) in distribution would be satisfactory. However, the finite difference procedure provides an automatic method of obtaining such processes. Neitherf(  ) nor u( need be continuous (Chapter 6) if the proper approximations are used. ”
a )