# A note on martingale deviation bounds

## A note on martingale deviation bounds

Statistics and Probability Letters 111 (2016) 8–11 Contents lists available at ScienceDirect Statistics and Probability Letters journal homepage: ww...
Statistics and Probability Letters 111 (2016) 8–11

Contents lists available at ScienceDirect

Statistics and Probability Letters journal homepage: www.elsevier.com/locate/stapro

A note on martingale deviation bounds Thomas R. Boucher Department of Mathematics, Texas A&M University-Commerce, Commerce, TX 75428, USA

article

info

Article history: Received 5 September 2015 Accepted 30 December 2015 Available online 6 January 2016 Keywords: Martingale Martingale difference sequence Deviations

abstract n

Let {Yi }ni=1 be a martingale difference sequence and Sn = i=1 Yi . Probability deviation bounds for martingale difference sequences generally focus on upper bounds for probabilities of large deviations P (Sn > λ), particularly of maxima of Sn . In this article bounds for probabilities of moderate deviations P (Sn < λ) are studied. The motivation is estimating the probability that the cumulative drift of a Markov chain is moderate, and thus estimates derived from sampling the chain are reliable. © 2016 Elsevier B.V. All rights reserved.

1. Introduction Let {Yi }ni=1 be a martingale difference sequence with E |Yi |p = M for some p > 1. Let Sn = i=1 Yi . A good deal of work (see for example Lesigne and Volny, 2001; Li, 2003) has gone into developing upper bounds for probabilities of large deviations for maxima of sums of the martingale difference sequence, i.e., for a given λ > 0 interest is in bounds for

n

 P

 max |Si | > λ .

1≤i≤n

This article concerns upper bounds for probabilities of moderate, rather than large, deviations, i.e., bounds for

 P

max |Si | < λ .

1≤i≤n

Lower bounds can be obtained from existing results which provide upper bounds for

 P

max |Si | > λ .

1≤i≤n

The motivation for studying moderate deviations comes from Markov chain applications. The cumulative drift of a Markov chain can be expressed as the sum of a martingale difference sequence and two other terms. When the cumulative drift of a Markov chain is moderate the Markov chain is near a stable region in the state-space and sampling can occur with an n−1 eye on, for example, producing good estimates of E [f (Xt )] using 1/n i=0 f (Xi ) (see Kontoyiannis et al., 2006). Thus, it is worthwhile to monitor the drift of the chain in order to improve the effectiveness of MCMC techniques, as one example application. Specifically, let {Xt } be a time-homogeneous discrete time Markov chain. For a function V ≥ 1 the drift of the process at each transition with respect to V is given by E [V (Xi+1 )|Xi ] − V (Xi ) and the cumulative drift of the process is Sn (V ) =

n −1 

E [V (Xi+1 )|Xi ] − V (Xi ) =

i =0

n−1  i=0

E [V (Xi+1 )|Xi ] − V (Xi+1 ) − V (X0 ) + V (Xn ),

T.R. Boucher / Statistics and Probability Letters 111 (2016) 8–11

9

n−1 where i=0 E [V (Xi+1 )|Xi ] − V (Xi+1 ) is the sum of the martingale difference sequence {E [V (Xi+1 )|Xi ] − V (Xi+1 )}i=0 . The object then is to bound the probability the cumulative drift is moderate

 n −1 

P

max |Si (V )| < λ ,

1≤i≤n

as a complement to the conditional probability the estimate is near the target given the moderate drift; i.e., for ϵ > 0

   n −1   1       P  f (Xi ) − E [f (Xt )] > nϵ  max |Si (V )| < λ . 1≤i≤n n i=0

This being said, the emphasis on Markov chain applications will be suppressed in this article in favor of speaking generally about martingale difference sequences. 2. Results The main result depends upon the following lemmas which appear in Li (2003) and borrow from Hall and Heyde (1980) and Lesigne and Volny (2001). The lemmas use arguments based on Burkholder’s inequality for Lp martingales (see Hall and Heyde, 1980) and convexity to arrive at the results. Both lemmas establish bounds on E |Sn |p , treating the cases 1 < p < 2 and p > 2 separately. Lemma 1. Let {Xi }ni=1 be a finite sequence of a martingale difference where Xi ∈ Lp for some p ∈ (1, 2], and let Sn = Then E |Sn |p ≤ bpp

n 

n

i=1

Xi .

E |Xi |p ,

i=1 p p/2−1 E |Sn |p ≥ a− p n

n 

E |Xi |p ,

i=1

where bp = 18pq1/2 , ap = 18p1/2 q and q is such that 1/p + 1/q = 1. Slightly different results are to be expected when p > 2. When p = 2 establishing bounds on the probabilities of deviations are not difficult. Lemma 2. Let {Xi }ni=1 be a finite sequence of a martingale difference where Xi ∈ Lp for some p > 2, and let Sn = E |Sn |p ≤ np/2−1 bpp

n 

n

i=1

Xi . Then

E |Xi |p ,

i =1 p E |Sn |p ≥ a− p

n 

E |Xi |p ,

i=1 1/2

where bp = 18pq

, ap = 18p1/2 q and q is such that 1/p + 1/q = 1.

The following is the main result. The previous lemmas are applied to establish bounds on the probabilities of moderate deviations in the separate cases 1 < p < 2 and p > 2. Theorem 1. Let {Yi }ni=1 denote a finite martingale difference sequence. Let Sn = (E |Yi |p+δ )1/(p+δ) < ∞ for some δ > 0.

n

i=1

(i) If 1 < p < 2, then

 P

max |Si | ≤ λ

i∈{1,...,n}

 ≤1−

cn E |Sn |p − λp

(p+δ)/δ

[λ + M ]p

,

for λ > 0 such that 2−1/p (cn E |Sn |p )1/p − M /2 ≤ λ < (cn E |Sn |p )1/p ,

where cn = [ i=1 (1 + ap / n − i)]−p for n ≥ 2, c1 = 1, and ap = 18p3/2 /(p − 1). (ii) If p > 2, then

n−1

 P

max |Si | ≤ λ

i∈{1,...,n}

 ≤1−

cn E |Sn |p − λp

[λ + M ]p

(p+δ)/δ

,

Yi . Suppose E |Yi |p = M < ∞,

10

T.R. Boucher / Statistics and Probability Letters 111 (2016) 8–11

for λ > 0 such that 2−1/p (cn E |Sn |p )1/p − M /2 ≤ λ < (cn E |Sn |p )1/p ,

n−1

where cn = [

i =1

√ (1 + app / n − i)]−1 , c1 = 1 and ap = 18p3/2 /(p − 1).

Proof. (i) By Minkowski’s inequality, E |Sn |p ≤ ([E |Sn−1 |p ]1/p + [E |Yn |p ]1/p )p = [E |Sn−1 |p ](1 + (E |Yn |p /E |Sn−1 |p )1/p )p . Since 1 < p < 2, Lemma 1 implies p p/2−1 E |Sn |p ≥ a− p n

n 

E |Yi |p ,

i=1 1/2

where ap = 18p

q. Then since E |Yi |p = M we have E |Yn |p /E |Sn−1 |p ≤ ap /(n − 1)p/2 and p

E |Sn |p ≤ E |Sn−1 |p

ap 1+ √ n−1

p

. √

For n ≥ 2 define the decreasing positive sequence {cn } by cn = (1 + ap / n − 1)−p cn−1 . By convexity E |S2 |p = E |Y1 + Y2 |p ≤ 2p−1 (E |Y1 |p + E |Y2 |p ) = 2p M = 2p E |Y1 |p , implying (1 + ap )−p E |S2 |p ≤ (1 + ap )−p 2p E |Y1 |p ≤ E |Y1 |p since √ ap > 1 and so define c1 = 1. Then cn E |Sn |p ≤ cn E |Sn−1 |p (1 + ap / n − 1)p = cn−1 E |Sn−1 |p . This, combined with the p facts that Sn is a martingale and | · | is a convex function, implies |Sn |p − cn E |Sn |p is a submartingale. Given λ > 0 let N = min{i : i ≥ 1, |Si | > λ} and note N ∧ n is a stopping time. Then using optional stopping 0 = E |Y1 |p − c1 E |Y1 |p ≤ E (|SN ∧n |p − cN ∧n E |SN ∧n |p ).

(1)

Note that N ∧ n = n and SN = Sn ≤ λ if maxi∈{1,...,n} |Si | ≤ λ, while |SN | ≤ (λ + |YN |) if maxi∈{1,...,n} |Si | > λ. Since 1 < p < 2 choose δ > 0 so that 1 < p + δ < 2. Holder’s inequality with p′ = (p + δ)/p and q′ = (p + δ)/δ implies E [(λ + |YN |)p IN ≤n ] ≤ E [(λ + |YN |)p+δ ]p/(p+δ) (1 − P (N > n))δ/(p+δ) . Minkowski’s inequality and convexity imply p+δ) p (E [λ + |YN |]p+δ )p/(p+δ) = [(E [λ + |YN |]p+δ )δ/( ] ≤ [(E [λp+δ ])1/(p+δ) + (E |YN |p+δ )1/(p+δ) ]p ≤ (λ + E |YN |)p ≤ [λ + M ]p .  Then from (1), letting pλ = P (N > n) = P maxi∈{1,...,n} |Si | ≤ λ , since cn is decreasing p

p

0 = E |Y1 |p − c1 E |Y1 |p ≤ E (|SN ∧n |p − cN ∧n E |SN ∧n |p )

    = E (|SN |p − cN E |SN |p )IN ≤n + E (|Sn |p − cn E |Sn |p )IN >n ≤ (E [λ + |YN |]p+δ )p/(p+δ) (1 − pλ )δ/(p+δ) − cn E |Sn |p (1 − pλ ) + (λp − cn E |Sn |p )pλ ≤ [λ + M ]p (1 − pλ )δ/(p+δ) + λp pλ − cn E |Sn |p ≤ [λ + M ]p (1 − pλ )δ/(p+δ) + λp − cn E |Sn |p . In other words, 0 ≤ [λ + M ]p (1 − pλ )δ/(p+δ) + λp − cn E |Sn |p .

(2)

The bound on pλ is determined from the inequality (2) i.e., from pλ ≤ p∗ , where p∗ solves f (pλ ) = [λ + M ]p (1−pλ )δ/(p+δ) +λp −cn E |Sn |p = 0 subject to 0 ≤ p∗ ≤ 1, since f (pλ ) is a decreasing function in pλ . When λp −cn E |Sn |p < 0 and [λ + M ]p + λp − cn E |Sn |p > 0 the non-trivial solution is (p+δ)/δ  cn E |Sn |p − λp . p∗ = 1 − [λ + M ]p Note that λp − cn E |Sn |p < 0 is implied by λ < (cn E |Sn |p )1/p , while [λ + M ]p + λp − cn E |Sn |p > 0 is implied by λ ≥ 2−1/p (cn E |Sn |p )1/p − M /2 since [λ + M ]p + λp ≥ 21−p (2λ + M )p by convexity and setting 21−p (2λ + M )p ≥ cn E |Sn |p . Also note that cn E |Sn |p > 0 is always true, so there are always λ satisfying 2−1/p (cn E |Sn |p )1/p − M /2 ≤ λ < (cn E |Sn |p )1/p . (ii) The proof is similar. Since p ≥ 2, Lemma 2 implies p E |Sn |p ≥ a− p

n  i =1

E |Yi |p .

T.R. Boucher / Statistics and Probability Letters 111 (2016) 8–11

11

p

Then since E |Yi |p = M we have E |Yn |p /E |Sn−1 |p ≤ ap /(n − 1) and E |Sn |p ≤ E |Sn−1 |p

 1+

p

ap n−1

. p

For n ≥ 2 define the decreasing positive sequence {cn } by cn = (1 + ap /(n − 1))−1 cn−1 , with c1 = 1. References Hall, P., Heyde, C.C., 1980. Martingale Limit Theory and its Applications. Academic Press, New York. Kontoyiannis, I., Lastras-Montano, L.A., Meyn, S.P., 2006. Exponential bounds and stopping rules for MCMC and general Markov chains. In: ACM International Conference Proceeding Series, vol. 180. Lesigne, E., Volny, D., 2001. Large deviations for martingales. Stochastic Process. Appl. 96, 143–159. Li, Y., 2003. A martingale inequality and large deviations. Statist. Probab. Lett. 62, 317–321.