A regression approach for estimating the parameters of the covariance function of a stationary spatial random process

A regression approach for estimating the parameters of the covariance function of a stationary spatial random process

Journal of Statistical Planning and Inference 142 (2012) 2330–2344 Contents lists available at SciVerse ScienceDirect Journal of Statistical Plannin...

320KB Sizes 4 Downloads 22 Views

Journal of Statistical Planning and Inference 142 (2012) 2330–2344

Contents lists available at SciVerse ScienceDirect

Journal of Statistical Planning and Inference journal homepage: www.elsevier.com/locate/jspi

A regression approach for estimating the parameters of the covariance function of a stationary spatial random process Jung Won Hyun n, Prabir Burman, Debashis Paul Department of Statistics, University of California, Davis, United States

a r t i c l e in f o

abstract

Article history: Received 6 August 2010 Received in revised form 22 February 2012 Accepted 9 March 2012 Available online 17 March 2012

We consider the problem of estimating the parameters of the covariance function of a stationary spatial random process. In spatial statistics, there are widely used parametric forms for the covariance functions, and various methods for estimating the parameters have been proposed in the literature. We develop a method for estimating the parameters of the covariance function that is based on a regression approach. Our method utilizes pairs of observations whose distances are closest to a value h 4 0 which is chosen in a way that the estimated correlation at distance h is a predetermined value. We demonstrate the effectiveness of our procedure by simulation studies and an application to a water pH data set. Simulation studies show that our method outperforms all well-known least squares-based approaches to the variogram estimation and is comparable to the maximum likelihood estimation of the parameters of the covariance function. We also show that under a mixing condition on the random field, the proposed estimator is consistent for standard one parameter models for stationary correlation functions. & 2012 Elsevier B.V. All rights reserved.

Keywords: Spatial process Covariance function Variogram

1. Introduction We consider a random process fZðsÞ,s 2 Dg where D is a subset of Rd for d Z 1. We assume that the process ZðÞ is second-order stationary with E ½ZðsÞ  m and Cov ½ZðS1 Þ,ZðS2 Þ ¼ CðS1 S2 Þ for S1 ,S2 2 D, where CðÞ is a positive definite function required to be the covariance function. It is commonly assumed that the covariance function follows a known parametric form. Widely used parametric models for the covariance function are available (see e.g. Schabenberger and Gotway, 2005, p. 199), and several estimators of the model parameters have been proposed. They include various least squares (ordinary and weighted) estimators and the maximum likelihood estimator (Zimmerman and Zimmerman, 1991; ¨ Muller, 1999). In particular, the maximum likelihood estimator requires considerably more computational effort. In this paper, we develop a method for estimating the parameters of the covariance function of a stationary spatial random process that is based on a regression approach. It utilizes pairs of observations whose distances are closest to some positive value h. The positive value h is chosen in a way that the estimated correlation at distance h is a predetermined value. We also want to point out that the method we propose is an entirely distribution-free. In Section 4, we show that the proposed estimator is consistent when the spatial process satisfies a Rho-mixing condition has bounded fourth moments and the correlation function is monotone in the parameter. Simulation studies presented in Section 5 indicate that the proposed method outperforms widely used least squaresbased approaches. Our method is also comparable to the MLE while it requires considerably less computational effort than n

Corresponding author. E-mail address: [email protected] (J.W. Hyun).

0378-3758/$ - see front matter & 2012 Elsevier B.V. All rights reserved. doi:10.1016/j.jspi.2012.03.005

J.W. Hyun et al. / Journal of Statistical Planning and Inference 142 (2012) 2330–2344

2331

the MLE. It is well-known that the implementation of the MLE is challenging especially for a large data set. In contrast, the estimating procedure presented in this paper is comparable to the MLE, yet computationally simple, making the implementation practical. Now, we present an outline of the rest of the paper. In Section 2, we discuss the existing methods for estimating the parameters defining the covariance structure of the random process. In Section 3, we describe our proposed method along with its motivations. In Section 4, we show the consistency of the proposed estimator. In Section 5, we present the results of simulations and discuss the performance of the various estimation procedures. In Section 6, we apply the various procedures to a water pH data set. Proofs of the asymptotic results are given in Section 7. 2. Existing methods We first give a brief overview of the most commonly used procedures for estimation. Suppose that ZðÞ is an intrinsically stationary (see Cressie, 1993), isotropic random field. In geostatistics the semivariogram is widely used to represent the second-order spatial dependence. In this setting, it is defined as

gðJS1 S2 JÞ ¼ 12Var½ZðS1 ÞZðS2 Þ: Let ZðS1 Þ, . . . ,ZðSn Þ denote n observations from the random field. Then a natural moment-based semivariogram estimator due to Matheron (1962) is defined as

g^ ðhÞ ¼

X 1 ðZðSi ÞZðSj ÞÞ2 , 29NðhÞ9 NðhÞ

ð1Þ

where N(h) denotes the set of all pairs ðSi ,Sj Þ such that JSi Sj J ¼ h and 9NðhÞ9 denotes the cardinality of N(h). This semivariogram estimator is commonly referred to as the classical or Matheron estimator. For practical purposes, however, we modify the set N(h) to Ne ðhÞ ¼ fðSi ,Sj Þ : he o JSi Sj J r hþ eg so that sufficient number of pairs can be included at each lag h. A graph of g^ ðhÞ against h is called the empirical semivariogram. Noticing that g^ ðhÞ is an unbiased estimator of the semivariogram gðhÞ, a least squares method is commonly employed to fit a parametric semivariogram model to the empirical semivariogram. Let y denote the vector of all unknown parameters in the semivariogram model. Then the generalized least squares estimator of y is defined as

y^ GLS ¼ arg minðg^ gðyÞÞT Cov½g^ 1 ðg^ gðyÞÞ, y

where g^ ¼ ðg^ ðh1 Þ, . . . , g^ ðhk ÞÞT and gðyÞ ¼ ðgðh1 , yÞ, . . . , gðhk , yÞÞT . In practice, the correlation structure of g^ is ignored. If one also ignores unequal dispersion among the elements of g^ , this procedure gives the ordinary least squares estimator to be denoted by y^ OLS . Assuming Gaussianity of the random field, Cressie (1993) suggested the following approximation: Var½g^ ðhi Þ 

2gðhi , yÞ2 : 9Nðhi Þ9

P If one uses this approximation and takes into account the diagonal entries of Cov ½g^ , arg miny ki ¼ 1 ðg^ ðhi Þ gðhi , yÞÞ2 9Nðhi Þ9=2gðhi , yÞ2 gives a weighted least squares estimator to be denoted by y^ WLS . However, it was reported that the performance of the WLS estimator was more or less similar to that of the OLS estimator (Zimmerman and Zimmerman, ¨ 1991). Some authors also reported poor finite sample behavior of this variogram estimator (Muller, 1999). Another common approach to variogram estimation is to use ðZðSi ÞZðSj ÞÞ2 directly without binning and averaging. The collection of the pairs fðJSi Sj J, 12ðZðSi ÞZðSj ÞÞ2 Þ : 1 r i oj r ng ¨ is commonly referred to as the (semi-)variogram cloud (Muller, 1999). Since ðZðSi ÞZðSj ÞÞ2 is an unbiased estimator of 2gðJSi Sj JÞ, a parametric variogram model can be fitted to the variogram cloud by a least squares method. Ignoring the correlation structure of the variogram cloud, one can employ either the ordinary least squares method or a weighted least squares method. The resulting estimators are therefore denoted by the OLS (Cloud) estimator and the WLS (Cloud) estimator, respectively. When weighted least squares fitting is employed, implicitly the random field is assumed to be Gaussian. Then the variance of ðZðSi ÞZðSj ÞÞ2 is calculated as 8gðJSi Sj J, yÞ2 . Here ðZðSi ÞZðSj ÞÞ2 can be viewed as one particular case of (1) where N(h) contains only one location pair, i.e. ðSi ,Sj Þ. If we call the set N(h) the distance bin, in this case, one distance bin contains only one element, i.e. 9NðJSi Sj JÞ9  1, and hence asymptotically the number of bins tends ¨ to infinity as n-1. Muller (1999) demonstrated the inconsistency of the WLS (Cloud) variogram estimator under this asymptotic structure. Adopting the proof of inconsistency from Fedorov (1974), he rewrote the objective function to be minimized (in the limit) as 1 X

ðgðhi , yÞgðhi , yt ÞÞ2 =ðgðhi , yÞÞ2 þ

i¼1

1 X

ðg^ ðhi Þgðhi , yt ÞÞ2 =ðgðhi , yÞÞ2 ,

i¼1

2332

J.W. Hyun et al. / Journal of Statistical Planning and Inference 142 (2012) 2330–2344

where yt denotes the true parameter. By the law of large numbers this gives 2 X 1  1 X gðhi , yt Þ 2g2 ðhi , yt Þ 1 þ , gðhi , yÞ g2 ðhi , yÞ i¼1 i¼1 which is equivalent to minimizing  1  X gðhi , yt Þ 1 2  : gðhi , yÞ 3 i¼1 Thus he proposed to correct the bias by multiplying gðh, y~ WLS Þ by the correction factor 1/3. In this paper, he also argued that the corrected WLS (Cloud) estimator performed much better. If one assumes that the random field is Gaussian with unknown mean m, the distribution of Z is given by Nðm1, s2 RðyÞ þ t2 IÞ, where Z ¼ ðZðS1 Þ, . . . ,ZðSn ÞÞT . The parameter t2 represents a nugget effect (see Cressie, 1993). It refers to a discontinuity in the variogram at the origin. More specifically, by the definition of the semivariogram, gð0Þ ¼ 0. If gðhÞ-t2 as h-0, then t2 is called the nugget effect. In practice, it is interpreted as an effect of either measurement error, or spatial variation on a scale smaller than the minimum separation distance in the data, or any combination of these two (Diggle and Ribeiro, 2007, p.57). Then the negative log likelihood is given by log Lðm, y, s2 , t2 Þ ¼ const: þ 12 log9s2 RðyÞ þ t2 I9 þ 12ðZm1ÞT ðs2 RðyÞ þ t2 IÞ1 ðZm1Þ,

ð2Þ

which is minimized to yield the maximum likelihood estimates of the model parameters. More precisely, by writing n ¼ t2 =s2 and Vðy, nÞ ¼ RðyÞ þ nI, given y and n, the negative log likelihood function is minimized at

m^ ðy, nÞ ¼ ð1T Vðy, nÞ1 1Þ1 1T Vðy, nÞ1 Z

ð3Þ

s^ 2 ðy, nÞ ¼ n1 ðZm^ 1ÞT Vðy, nÞ1 ðZm^ 1Þ:

ð4Þ

and

Substituting these expressions in (2), the reduced negative log likelihood function for ðy, nÞ is given by n 1 2 log s^ ðy, nÞ þ log9Vðy, nÞ9, 2 2 2 which is minimized with respect to y and n to yield y^ MLE and n^ MLE . Then s^ MLE is obtained by substituting them back in (3) 2 2 ^ ^ ^ and (4), followed by t MLE which is equal to s MLE multiplied by n MLE . The MLE is expected to be more efficient than other estimators, based on asymptotic theory. But the computational complexity, including the inversion of an n by n matrix poses a considerable challenge. When Vðy, nÞ is close to a singular matrix, minimizing the negative log likelihood function is computationally impractical. log L0 ðy, nÞ ¼ const: þ

3. Proposed method In this section, we describe our proposed method. The method we propose does not require Gaussianity as it is based on a regression approach. However, to motivate our method we begin with assuming Gaussianity. We consider the Gaussian random field described in the maximum likelihood estimation. We first define the h-nearest neighbors among the n observations. For each ZðSi Þ, we search for another observation whose distance from ZðSi Þ is closest to a given h 40. We postpone discussing how to choose a positive value h to the end of this section. We denote this observation by ZðSi,h Þ and call it the h-nearest neighbor of ZðSi Þ. Since the covariance of ZðSi Þ and ZðSi,h Þ depends on their distance hi : ¼ JSi,h Si J, we have Cov ½ZðSi Þ,ZðSi,h Þ ¼ s2 ri,h ðyÞ, where ri,h ðyÞ :¼ rhi ðyÞ and rx ðyÞ :¼ Corr½ZðsÞ,Zðsþ xuÞ is the autocorrelation function. Here x Z 0 and u 2 Rd has unit l2-norm. Then the joint distribution of ðZðSi Þ,ZðSi,h ÞÞT is given by ! !! ! s2 þ t2 s2 ri,h ðyÞ ZðSi Þ m , : ð5Þ N ZðSi,h Þ m s2 ri,h ðyÞ s2 þ t2 We consider the joint distributions for i ¼ 1, . . . ,n. By writing n ¼ t2 =s2 and ignoring the correlations between the distinct pairs, the negative log quasi-likelihood function is given by ( ) n X r2 ðyÞ log 1 i, h 2 Q ðm, s2 , y, nÞ ¼ const: þ2n log s2 ð1þ nÞ þ ð1 þ nÞ i¼1 þ

1

n X fð1 þ nÞðZðSi,h ÞmÞri,h ðyÞðZðSi ÞmÞg2

s2 ð1þ nÞ i ¼ 1

2

2 ðyÞ i,h

ð1 þ nÞ r

þ

1

n X

s2 ð1 þ nÞ i ¼ 1

ðZðSi ÞmÞ2 :

ð6Þ

The estimator y^ is obtained by minimizing (6). As can be seen from (6), our proposed method is based on the regression principle utilizing the covariance structure of ZðSi Þ and ZðSi,h Þ. Thus the Gaussianity is rather a working assumption which is used only to give the form of the objective function to be optimized. We also note that in cases where a certain

J.W. Hyun et al. / Journal of Statistical Planning and Inference 142 (2012) 2330–2344

2333

observation has more than one h-nearest neighbor, i.e., there is a tie, e.g., in the case of data on a regular grid (lattice), we choose the h-nearest neighbor randomly. For instance, we can choose the h-nearest neighbor which is closer to the origin. In Section 4 we show that y^ is a consistent estimator of y under the simplified setting of m ¼ n ¼ 0 and s2 ¼ 1, but under fairly weak assumptions. In particular, we assume that a Rho-mixing condition for the spatial process fZðsÞg holds, the process has bounded fourth moment, and that rx ðyÞ is a monotone function of y. We propose two different methods which use the negative log quasi-likelihood function (6) to obtain estimates. Method 1: Estimate all the parameters by minimizing (6). It is straightforward to show that, given y and n, (6) is minimized at Pn ðZðS Þ þ ZðSi ÞÞ=ð1 þ n þ ri,h ðyÞÞ ð7Þ m^ ðy, nÞ ¼ i ¼ 1 Pni,h 2 i ¼ 1 1=ð1 þ n þ ri,h ðyÞÞ and

s^ 2 ðy, nÞ ¼

n n n ri,h ðyÞðZðSi,h Þm^ ÞðZðSi Þm^ Þ ð1 þ nÞ X ð1 þ nÞðZðSi,h Þm^ Þ2 1 X 1 X ðZðSi Þm^ Þ2  þ : 2 2 2n i ¼ 1 ð1 þ nÞ r2 ðyÞ ni¼1 2n i ¼ 1 ð1 þ nÞ2 r2 ðyÞ ð1 þ nÞ r2i,h ðyÞ i,h i,h

ð8Þ

By substituting these expressions in (6), the reduced negative log quasi-likelihood function is given by 2

Q 0 ðy, nÞ ¼ const: þ2n log s^ ðy, nÞ þ

n X

log fð1 þ nÞ2 r2i,h ðyÞg,

ð9Þ

i¼1

which is minimized with respect to y and n to yield y^ and n^ . The minimization of (9) is done by a grid search method. Then s^ 2 is obtained by substituting them back in (7) and (8), followed by t^ 2 in a similar way to the maximum likelihood estimator. Method 2: Estimate n separately and substitute n^ in (6) which is to be minimized with respect to m, s2 , and y. We propose to estimate n empirically. Unless the spatial correlation is too small, the variance of ZðSi ÞZðSi,0 Þ should be reasonably close to 2t2 , where ZðSi,0 Þ is the nearest neighbor of ZðSi Þ. Thus we propose the following as an initial estimator of t2 based on moments:

t^ 2initial ¼

n 1 X ðZðSi ÞZðSi,0 ÞÞ2 : 2n i ¼ 1

The initial estimator is biased because E ½ðZðSi ÞZðSi,0 ÞÞ2 2t2 ¼ 2s2 ð1ri,0 ðyÞÞ, where s2 ri,0 ðyÞ denotes Cov ½ZðSi Þ,ZðSi,0 Þ. In practice, we can iterate the estimation procedure to subtract the bias from the initial estimate of t2 . Since s2 þ t2 can be estimated by the sample variance of observations, the estimate of s2 is easily obtained once t2 is estimated. Substituting n^ in (6), the other parameters are estimated by direct minimization. The motivation for using a particular distance h this way is due to both theoretical consideration and computational convenience. In particular, we consider the stability of the estimator and the identifiability of the parameters. The proposed method adopts the framework of a simple regression model. Unlike the least squares-based approaches to the semivariogram estimation, the regression approach does not use (the averages of) the squared differences of observations as pseudo-data. The latter could result in instability particularly if the observations are somewhat heavy-tailed, since then the estimates will involve fourth-order moments of the observations. Also, the computation involved in the proposed method is simple. Unlike the maximum likelihood estimation, the minimization of the objective function (6) does not require dealing with large matrices, which makes the implementation faster and more stable. We now take up the issue of the identifiability of the parameters and the stability of the estimator under two different scenarios. First, we consider the case where there is no nugget effect, i.e., t2 ¼ 0 or t2 is known. In this case, the parameters are identifiable and the proposed estimator is stable as can be seen in the simulation study presented in Section 5. Next, we consider the case where the nugget effect is nonzero, i.e., t2 4 0 and unknown. In this case, if we use only one distance h, then the parameters are not identifiable, and the estimates of y, s2 , and t2 are not precise, which is reflected in the simulation result for Method 1. Hence we need to choose more than one value of h, in fact, in this case, using two values of h, say h1 ah2 suffices so that we can have the observed vector ðZðSi Þ,ZðSi, h1 Þ,ZðSi, h2 ÞÞT in (5). At the end of this section, we give a description on how to choose a distance h, and the principle can be generalized to choose a larger number of different values of h. We note that Method 2 reduces the dimension of the parameters by estimating n separately and does perform better in the simulation study. In general, when we have y 2 Rd for d 4 1 and the nugget effect is nonzero in (5), we need to choose ðd þ1Þ different values of h so that the observed vector can be d þ 2 dimensional, i.e., ðZðSi Þ, . . . ,ZðSi,hd þ 1 ÞÞT . The proposed estimation procedure involves choosing a distance h to obtain the h-nearest neighbor for each observation. A positive value h is chosen in a way that the estimated correlation at distance h is a predetermined value. The motivation for choosing a distance h this way is the observation that, at certain distances which correspond to the true correlations close to 0, empirically calculated correlations can be negative, and regressing ZðSi,h Þ at those distances could result in a very large value of y^ , that is, under-estimate the spatial correlation of the process. We also observe that choosing a distance which corresponds to the true correlation close to 1=ð1 þ nÞ tends to result in a poor estimate. We note that the correlation function is discontinuous at the origin due to the nugget effect and the correlation tends to 1=ð1 þ nÞ as h-0.

2334

J.W. Hyun et al. / Journal of Statistical Planning and Inference 142 (2012) 2330–2344

Based on our observation we give a formal explanation on how to choose a distance h as follows. We focus on the fourth term in (6) and take the exponential correlation function as an example. Then rh ðyÞ ¼ expðyhÞ and our goal is to find the positive value h to maximize the following:        @ rh ðyÞ   @ eyh  heyh  ¼  ð10Þ @y 1 þ n  @y 1 þ n  ¼ 1 þ n : Since yh

@ he @h 1 þ n

! ¼ 03yh ¼ 1,

it is easily seen that h ¼ 1=y maximizes (10), and this distance corresponds to the true correlation e1 =ð1 þ nÞ. Thus we first estimate the correlation function and intend to choose h^ such that the estimated correlation is close to e1 =ð1 þ n^ Þ  0:368=ð1 þ n^ Þ assuming that we can obtain a reasonable estimate of n. This is a very specific model for the correlation function. As we vary the model, the value of rh ðyÞ at the ‘‘optimal’’ h will change. However, for the Mate´rn family of correlation functions which is widely used in spatial data analysis (see, e.g., Diggle and Ribeiro, 2007, and (11) for an expression of the autocovariance function), it is reasonable to choose h such that the estimated correlation at h is around e1 =ð1 þ n^ Þ. In practice, we found that the value 0:5=ð1 þ n^ Þ works quite well across the different correlation models considered, so we first empirically estimated correlations for various distances. By smoothing them out, we represented the correlation as a function of distance. Then we chose the distance h at which the smoothed correlation is close to 0:5=ð1 þ n^ Þ. Fig. 1 illustrates an empirically estimated correlation function and its smoothed correlation function, along with the true correlation function. 1 0.8

ρ(h)/(1+ν)

0.6 0.4 0.2 0 −0.2 −0.4

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

h Fig. 1. The correlation function for a simulated data set. The dashed line is an empirically estimated correlation function, and the solid line is its smoothed correlation function. The dotted line is the true correlation function.

0.00 0.05

2γ(h;φ^ ,σ^ 2 )

0.10 0.15 0.20 0.25 h−NN regression OLS WLS(Cloud) MLE

0.30 0.35 0

20

40

60 h

80

100

Fig. 2. Smoky Mountain pH residuals: fitted variogram.

120

J.W. Hyun et al. / Journal of Statistical Planning and Inference 142 (2012) 2330–2344

2335

4. Asymptotic theory In this section, we present a result on the consistency of the proposed estimator as the sample size increases. For simplicity of exposition, we assume that m ¼ 0, n ¼ 0 (no nugget effect) and s2 ¼ 1 (equivalently, s2 is known) in (6). We first state the following assumptions on the design. Note that the assumptions on the design can be viewed as an increasing domain asymptotic regime. A1 The locations fSi g are i.i.d. in the domain Dn ¼ ND0 , where N-1 as n-1 and D0 is a compact set in Rd containing the origin with vol ðD0 Þ ¼ 1. i:i:d: A2 Si ¼ NU i , where U i  uniform on D0 . A3 The region D0 is such that there exist d 4 0 and c 4 0 such that for any annulus Aðu, d1 , d2 Þ, centered at any u in D0 and radii 0 o d1 o d2 r d, volðAðu, d1 , d2 Þ \ D0 Þ=volðAðu, d1 , d2 ÞÞ Zc: A4 N d ¼ oðn=log nÞ as N,n-1. Note that condition A4 is only slightly stronger than the requirement that npNd , which would be the case if the observation locations were on a regular grid in Rd and the different dimensions of the grid were growing at rate N. Indeed, it can be shown using simpler technical arguments than we present here, that in the latter setting, the proposed method produces a consistent estimator of the parameter under the conditions on the process and its correlation function described below. We assume a Rho-mixing condition (Bradley, 1987, 1993; Kolmogorv and Rozanov, 1960) for the spatial process fZðsÞg. Specifically, we assume: C1 For any two sets A and B in Dn , the Hausdorff distance between the sets is defined by DðA,BÞ ¼ inf f9st9 : s in A and t in B}. Let XA and XB be two bounded measurable functions of Z(s), s in A, and Z(t), t in B. Rho-mixing assumes that there is a nonnegative function g on ½0,1Þ with gð0Þ ¼ 1 and g non-increasing such that 9Corr½X A ,X B 9 rgðDðA,BÞÞ: R1 C2 0 gðuÞ du o1. 4 C3 E9ZðsÞ9 o1. About the correlation function rx ðyÞ we make the following assumptions: D1 D2 D3 D4

For every x 2 R þ , the function rx ðyÞ is monotonically decreasing in y. For every x 2 R þ , rx ðyÞ is a differentiable function of y and @rx ðyÞ[email protected] is a bounded function. Both rx ðyÞ and @rx ðyÞ[email protected] are Lipschitz functions of x for each y. The parameter space Y  R þ is such that inf y2Y ð1ðrx ðyÞÞ2 Þ Z d3 and inf y2Y [email protected] ðyÞ[email protected] Z d4 for some d3 , d4 40. Also, the Lipschitz norms of the functions in D3 are bounded over Y.

To show the applicability of conditions D1–D4, note that these are satisfied for the exponential (i.e., rx ðyÞ ¼ expðyxÞ), Gaussian (i.e., rx ðyÞ ¼ expðyx2 Þ) and Mate´rn (with a given smoothness parameter k Z0:5) autocorrelation functions when Y is a compact subset of ð0,1Þ. The reason for the condition D4 is purely technical and these can be relaxed by resorting to considerable additional computations. We then have the following result about the consistency of the proposed estimator. The proof is given in Section 7. Theorem 1. Suppose that conditions A1–A4, C1–C3 and D1–D4 hold. Let yt denote the true parameter which is an interior point of Y. Let y^ ¼ arg miny2Y Q ðyÞ where Q ðyÞ is defined through (6) with m ¼ n ¼ 0 and s ¼ 1. Then y^ converges to yt in probability. 5. Simulation In this simulation study, we compare our proposed method (henceforth, h-NN regression) with five other methods discussed in Section 2. Their performances are compared for non-Gaussian random fields as well as Gaussian random fields. In the first part of the study, we compare the accuracy of each estimation procedure using the usual Gaussian random fields. In the second part, we aim to compare the robustness of each estimator using a class of non-Gaussian random fields. As pointed out in Section 3, the least squares-based approaches use (the averages of) the squared differences of observations as pseudo-data. Thus the quality of the estimates could be significantly affected by the tail behavior of the underlying distribution. In contrast, we show that our method performs well for non-Gaussian random

2336

J.W. Hyun et al. / Journal of Statistical Planning and Inference 142 (2012) 2330–2344

fields as well as Gaussian random fields. For simplicity, the simulation study is conducted mostly without a nugget effect, and the result including a nugget effect is reported only for Gaussian random fields. We conduct the study using the Mate´rn family of covariance functions which is widely used in spatial data analysis (see, e.g., Diggle and Ribeiro, 2007). The parametric (isotropic) covariogram model with parameters k, y, and s2 is given by   s2 yh k CðhÞ ¼ 2K k ðyhÞ, ð11Þ GðkÞ 2 where K k ðÞ denotes a modified Bessel function of the second kind, of order k. The parameter k 40 determines the analytic smoothness of the underlying process. We consider k ¼ 0:5 and 1.5. The former leads to the exponential covariogram model. Due to the limitation of space, the simulation results are reported only for the Mate´rn model with k ¼ 1:5 which poses more computational challenges since it involves computation of a modified Bessel function. The results for the exponential covariogram model are given in the Supplementary material. For each covariogram model, we fix s2 and allow y to take the values 4, 8, 16, and 32, respectively. For the models with larger value of y, the spatial autocorrelation drops more rapidly. For the case without a nugget effect, we report the mean, median, and standard deviation of the estimates of y and s2 divided by the corresponding true values. To evaluate the estimation accuracy of y and s2 , the mean squared errors are also reported. If a nugget effect is included in the model, instead, we use the operator norm of the difference between the estimated covariance matrix and the true one as a performance measure. We evaluate the performance by reporting the mean, median, and standard deviation of the norm of the difference divided by the norm of the true covariance matrix. In order to find the optimum values of y and n ¼ t2 =s2 , we use a grid search method, where n is restricted to be between 0 and 1 while y ranges from 0.001 to 260. As we shall see, there are cases that the boundary values are returned as optimum values. We report the percentage of the cases in which the optimum value of y takes 0.001 or 260 in the last column. If the percentage is not zero, we also calculate the mean, median, SD, and MSE with the boundary values removed. These quantities are reported in the parentheses. 5.1. Gaussian random fields One hundred observations from a Gaussian random field are generated at randomly selected spatial locations in the unit square. In this setting, the Gaussian random field has m ¼ 0 and s2 ¼ 1. Table 2 shows the performance of the six estimation procedures for a Mate´rn covariogram model with k ¼ 1:5 assuming no nugget effect. We also plot the MSEs of y^ =y and s^ 2 =s2 against the true values of y in Figs. 3 and 4, respectively. The cases in which the MSE is out of the range of the graph are omitted from plotting. For y ¼ 4, in terms of the MSE for y^ =y, the MLE is much better than the other 2 estimators, but it shows more variability for estimating s2 . The h-NN regression estimator has the smallest MSE for s^ =s2 , ^ but its MSE for y =y is large. For y ¼ 8, 16, and 32, the MLE outperforms the other estimators, and the proposed estimator performs better than the least squares estimators. Comparing the OLS and WLS estimators, for small y, the WLS estimator performs better than the OLS estimator especially for estimating s2 , but, for large y, the difference is not clear. In general, the WLS (Cloud) estimator performs better than the other least squares estimators. Based on the above results, we select the OLS estimator, the WLS (Cloud) estimator, and the MLE as potential competitors of our method. We compare them with our proposed method assuming a nugget effect in Table 3. The results are reported for the nugget effect, t2 ¼ 0:5. For the Mate´rn model with k ¼ 1:5, for y ¼ 4, the MLE shows the smallest mean of the operator norm, and both Method 1 and Method 2 of the h-NN regression estimator show the means quite close to that of the MLE. We observe that the proposed estimator has much smaller SD than the other estimators. For y ¼ 8, the MLE shows the smallest mean, and Method 2 has the smallest SD. For y ¼ 16 and 32, the MLE outperforms the other estimators, and Method 2 performs quite competitively. Neither the OLS estimator nor the WLS (Cloud) estimator performs competitively. 5.2. Non-Gaussian random fields A class of non-Gaussian random fields is generated by squaring the Gaussian random fields whose correlations are the square root of the targeted correlations. More specifically, squaring the Gaussian random field in Section 5.1 with Corr ½ZðSi Þ,ZðSj Þ ¼ r1=2 ðJSi Sj JÞ gives a non-Gaussian random field, ðZðS1 Þ2 , . . . ,ZðS100 Þ2 ÞT with Corr ½ZðSi Þ2 ,ZðSj Þ2  ¼ rðJSi Sj JÞ. Table 1 Smoky Mountain pH residuals: CðhÞ ¼ s2 expðh=fÞ. Estimator

f^

s^ 2

h-NN regression OLS WLS OLS (Cloud) WLS (Cloud) MLE

8 11.1000 10.9600 11.980 5.11 8.180

0.1467 0.1616 0.1681 0.1633 0.1677 0.1538

J.W. Hyun et al. / Journal of Statistical Planning and Inference 142 (2012) 2330–2344

2337

Table 2 Number of replicates ¼ 400, Gaussian dist., CðhÞ ¼ ð1=Gð1:5ÞÞðyh=2Þ1:5 2K 1:5 ðyhÞ (no nugget effect).

y¼4

Mean

Median

SD

MSE

h-NN

y^ =y

1.596

1.563

0.418

0.530

regression

s^ 2 =s2

0.568

0.492

0.347

0.307

OLS

y^ =y

1.227

1.169

0.607

0.420

3.058

0.818

18.347

340.848

WLS

s^ 2 =s2 y^ =y

1.367

1.325

0.542

0.428

s^ 2 =s2

1.260

0.715

3.017

9.170

OLS

y^ =y

1.264(1.313)

1.175(1.213)

0.743(0.713)

0.622(0.606)

(Cloud)

s^ 2 =s2

1.48e þ005(4.868)

0.869(0.830)

8.23eþ 005(40.394)

2.18e þ010(1646.637)

WLS

y^ =y

1.421

1.313

0.559

0.490

(Cloud)

s^ 2 =s2

0.773

0.578

0.661

0.488

MLE

y^ =y

1.138

1.113

0.255

0.084

s^ 2 =s2

0.899

0.761

0.581

0.348

y¼8 h-NN

y^ =y

1.229

1.225

0.291

0.137

regression

s^ 2 =s2

0.818

0.737

0.350

0.156

OLS

y^ =y

1.098

1.053

0.381

0.155

s^ 2 =s2

1.140

0.963

1.144

1.328

y^ =y

1.251

1.216

0.349

0.185

s^ 2 =s2

0.991

0.906

0.472

0.223

WLS OLS

y^ =y

1.102

1.069

0.475

0.236

(Cloud)

s^ 2 =s2

1.133

0.986

0.650

0.440

WLS

y^ =y

1.188

1.125

0.356

0.162

(Cloud)

s^ 2 =s2 y^ =y

0.872

0.823

0.356

0.143

1.073

1.063

0.166

0.033

s^ 2 =s2

0.930

0.869

0.361

0.135

y ¼ 16 h-NN

y^ =y

1.096

1.064

0.240

0.067

regression

s^ 2 =s2

0.925

0.878

0.249

0.068

OLS

y^ =y

1.122

1.111

0.325

0.121

s^ 2 =s2

0.971

0.926

0.307

0.095

y^ =y

1.264

1.241

0.307

0.164

s^ 2 =s2

0.954

0.913

0.284

0.083

MLE

WLS OLS

y^ =y

1.080

1.069

0.385

0.155

(Cloud)

s^ 2 =s2

0.996

0.954

0.328

0.108

WLS

y^ =y

1.117

1.081

0.286

0.095

(Cloud)

s^ 2 =s2

0.906

0.871

0.249

0.071

MLE

y^ =y

1.064

1.055

0.145

0.025

s^ 2 =s2

0.932

0.899

0.226

0.056

y ¼ 32 h-NN

y^ =y

1.048

1.021

0.207

0.045

regression

s^ 2 =s2

0.975

0.970

0.176

0.032

OLS

y^ =y

1.169

1.117

0.375

0.169

s^ 2 =s2

1.003

0.990

0.192

0.037

y^ =y

1.308

1.247

0.318

0.196

s^ 2 =s2

1.008

0.996

0.192

0.037

OLS

y^ =y

1.039

1.024

0.321

0.105

(Cloud)

s^ 2 =s2

1.009

0.989

0.196

0.038

WLS

y^ =y

1.035

0.988

0.268

0.073

(Cloud)

s^ 2 =s2

0.970

0.946

0.196

0.039

MLE

y^ =y

1.039

1.032

0.158

0.026

s^ 2 =s2

0.980

0.968

0.173

0.030

WLS

Boundary value (%) 0 0 0 3.75 0 0

0 0 0 0 0 0

0 0 0 0 0 0

0 0 0 0 0 0

The non-Gaussian random field has m ¼ 1 and s2 ¼ 2. For non-Gaussian random fields, we report the results for the Mate´rn model with k ¼ 1:5 without a nugget effect in Table 4. For the non-Gaussian random fields, the mean squared error increases for all the estimators. In particular, all the least squares estimators perform poorly, so the discussion is focused on the performance of the MLE and the proposed estimator. For y ¼ 4, the MLE has much smaller MSE for y^ =y, but it shows

2338

J.W. Hyun et al. / Journal of Statistical Planning and Inference 142 (2012) 2330–2344

h −NN regression

0.6

OLS WLS OLS (Cloud) WLS (Cloud)

0.5

MLE

MSE of θ^ θ

0.4

0.3

0.2

0.1

0 4

8

16

32 θ

Fig. 3. MSE of y^ =y against y ¼ 4; 8,16; 32: Mate´rn model (k ¼ 1:5) without a nugget effect, Gaussian dist.

1.4

h−NN regression OLS WLS OLS (Cloud) WLS (Cloud) MLE

1.2

^ 2 σ2 MSE of σ

1 0.8 0.6 0.4 0.2 0 4

8

16

32 θ

2 Fig. 4. MSE of s^ =s2 against y ¼ 4; 8,16; 32: Mate´rn model (k ¼ 1:5) without a nugget effect, Gaussian dist.

2 quite large variability for estimating s2 . The h-NN regression estimator has much smaller MSE for s^ =s2 . For y ¼ 16, the 2 MLE outperforms the other estimators in terms of the MSE for both y^ =y and s^ =s2 .

6. Application We apply the six estimation procedures to the Smoky Mountain pH data. Kaufman et al. (1988) measured water pH at various points within the Great Smoky Mountains in a study of the chemical characteristics of streams in the mid-Atlantic and southeastern United States. The data set is from Waller and Gotway (2004). It consists of irregularly spaced measurements of water pH at 75 locations. We fit a first-order trend surface to the Smoky Mountain pH data to obtain residuals and calculate an empirical semivariogram based on the residuals. The semivariogram reaches a definite sill, and the nugget effect appears to be negligible. Without much information about the shape of the semivariogram near the origin, it is suggested that either an exponential model or a spherical model might be a good choice (Waller and Gotway, 2004).

J.W. Hyun et al. / Journal of Statistical Planning and Inference 142 (2012) 2330–2344

2339

Table 3 Operator norm of the difference between the estimated covariance matrix and the true one divided by the norm of the true covariance matrix, number of replicates ¼400, Gaussian dist., CðhÞ ¼ ð1=Gð1:5ÞÞðyh=2Þ1:5 2K 1:5 ðyhÞ, t2 ¼ 0:5 (nugget effect). Mean

Median

Method 1 Method 2

0.783(0.782) 0.737(0.751) 3.206(3.213) 2.714(2.725) 0.669(0.666)

0.846(0.856) 0.807(0.819) 0.757(0.758) 0.719(0.721) 0.579(0.576)

0.300(0.220) 0.270(0.263) 6.045(6.051) 6.096(6.110) 0.644(0.646)

11.75 8.25 0.25 0.5 1

Method 1 Method 2

0.928(0.707) 0.651(0.645) 1.596 3.693 0.584

0.673(0.650) 0.606(0.604) 0.625 0.644 0.500

1.047(0.596) 0.457(0.450) 4.637 13.593 0.544

8.25 1 0 0 0

Method 1 Method 2

1.125(0.699) 0.692(0.591) 2.084 14.414 0.432

0.490(0.474) 0.427(0.419) 0.423 0.551 0.347

1.948(0.825) 1.050(0.668) 13.468 50.477 0.377

5.75 2 0 0 0

Method 1 Method 2

0.800(0.524) 0.508(0.430) 6.016(5.968) 33.136(33.217) 0.408

0.345(0.340) 0.304(0.300) 0.368(0.368) 0.370(0.369) 0.299

y¼4 h-NN regression OLS WLS (Cloud) MLE y¼8 h-NN regression OLS WLS (Cloud) MLE y ¼ 16 h-NN regression OLS WLS (Cloud) MLE y ¼ 32 h-NN regression OLS WLS (Cloud) MLE

SD

Boundary value (%)

2.438(0.712) 1.194(0.528) 47.926(47.977) 105.237(105.357) 0.772

1.75 1.25 0.25 0.25 0

Table 4 Number of replicates ¼ 400, non-Gaussian dist., CðhÞ ¼ ð2=Gð1:5ÞÞðyh=2Þ1:5 2K 1:5 ðyhÞ (no nugget effect).

y¼4

Mean

Median

SD

MSE

h-NN

y^ =y

1.825

1.713

0.574

1.010

regression

s^ 2 =s2

0.587

0.265

0.922

1.021

OLS

y^ =y

1.573

1.400

0.912

1.160

s^ 2 =s2

3.204

0.393

26.438

703.825

y^ =y

1.747

1.575

0.826

1.240

s^ 2 =s2

1.318

0.352

3.618

13.191

OLS

y^ =y

1.456(1.549)

1.294(1.363)

1.036(0.999)

1.281(1.299)

(Cloud)

s^ 2 =s2

1.67e þ005(5.561)

0.561(0.494)

1.08eþ 006(45.457)

1.20e þ012(2087.142)

WLS

y^ =y

2.058

1.907

0.864

1.866

(Cloud)

s^ 2 =s2

0.835

0.361

1.322

1.775

MLE

y^ =y

1.259

1.228

0.426

0.249

s^ 2 =s2

1.287

0.456

3.209

10.380

y ¼ 16 h-NN

y^ =y

1.205

1.181

0.375

0.183

regression

s^ 2 =s2

0.952

0.685

0.874

0.766

OLS

y^ =y

1.225

1.181

0.489

0.290

s^ 2 =s2

0.969

0.680

0.930

0.866

y^ =y

1.413

1.333

0.495

0.416

s^ 2 =s2

0.959

0.673

0.925

0.857

WLS

WLS OLS

y^ =y

1.109(1.126)

1.152(1.163)

0.633(0.623)

0.413(0.404)

(Cloud)

s^ 2 s2

6.83e þ004(1.842)

0.779(0.748)

6.27eþ 005(12.486)

3.98e þ011(156.609)

WLS

y^ =y

1.422

1.359

0.546

0.476

(Cloud)

s^ 2 =s2

1.719

1.194

1.753

3.590

MLE

y^ =y

1.132

1.095

0.285

0.099

s^ 2 =s2

0.977

0.717

0.837

0.701

Boundary value (%) 0 0 0 6 0 0

0 0 0 1.5 0 0

We assume an exponential model without a nugget effect. Then we use the previously discussed methods to estimate the parameters in an exponential model based on the residuals from the trend surface model. Table 1 shows the estimates for each method assuming no nugget effect. We also display the fitted variograms for the h-NN regression, OLS, WLS

2340

J.W. Hyun et al. / Journal of Statistical Planning and Inference 142 (2012) 2330–2344

(Cloud), and the MLE in Fig. 2. It appears that the proposed method and the MLE give fairly close results. The various least squares methods give slightly larger estimates for both f and s2 except for the WLS (Cloud) estimator which gives considerably different estimates as shown in Fig. 2. 7. Proofs The proof of Theorem 1 needs to employ Lemmas 1–4 stated below. We introduce some notations first. We define,

bt,i : ¼ rhi ðyt Þ, bt : ¼ rh ðyt Þ, where yt denotes the true parameter; and for arbitrary y, bi :¼ rhi ðyÞ and b :¼ rh ðyÞ. Lemma 1. Suppose that conditions A1–A4 and C1–C3 hold. Let WðSi Þ ¼ ZðSi ÞðZðSi,h Þbi ZðSi ÞÞ. Then, under y X WðSi Þ ¼ Op ðN 1=2 Þ: n1 2

Lemma 2. Suppose that conditions A1–A4 and C1–C3 hold. Let UðSi Þ ¼ ðZðSi,h Þbi ZðSi ÞÞ2 ð1bi Þ. Then, under y X UðSi Þ ¼ Op ðN1=2 Þ: n1 Lemma 3. Suppose that conditions A1–A4 and C1–C3 hold. Then, under y X ðZðSi Þ2 1Þ ¼ Op ðN1=2 Þ: n1 Lemma 4. Suppose that conditions A1–A4 hold. Define R :¼ max1 r i r n 9hi h9. Then R ¼ op ð1Þ as n-1. Proof of Theorem 1. In the following, all the probabilistic statements are under the true parameter yt . The negative log quasi-likelihood function is given by Q  Q ðyÞ ¼ const: þ

n X

2

logð1bi Þ þ

i¼1

n X ðZðSi,h Þb ZðSi ÞÞ2 i¼1

i 2 1bi

þ

n X

ZðSi Þ2 :

i¼1

Thus   2 n n ZðSi,h Þbi ZðSi Þ ZðSi Þ @bi 2 X fðZðSi,h Þbi ZðSi ÞÞ2 ð1bi Þgbi @bi 1 @Q ðyÞ 2X ¼ þ : 2 2 n @y ni¼1 ni¼1 @y @y 1b ð1b Þ2 i

ð12Þ

i

Let WðSi Þ ¼ ðZðSi,h Þbi ZðSi ÞÞZðSi Þ and look at the first term on the RHS of (12) for an arbitrary y 2 Y. We have ! ( )   n n n n 1X WðSi Þ @bi 1 @b 1X 1X WðSi Þ 1 X WðSi Þ @bi ¼  WðS Þ    i 2 n i ¼ 1 1b2 @y ni¼1 n i ¼ 1 1b2 n i ¼ 1 1b2 @y 1b @y i i !  n 1 1X @bi @b :  WðSi Þ  2 @y @y 1b n i ¼ 1

ð13Þ

It is easy to see that ( )    1X n n n WðSi Þ 1 X WðSi Þ @bi 1 X WðSi Þ @bi @b    þ    2  n n i ¼ 1 1b2 @y n i ¼ 1 1b2 @y @y  i ¼ 1 1bi      1 n @b  1  1 X   9WðSi Þ9 r max i max  2 2   n @ y i i 1bi 1b i¼1   n @bi @b 1 X 1   max  9WðSi Þ9: þ  @y @y n 2 i 1b i¼1 Since max1 r i r n 9hi h9 ¼ op ð1Þ by Lemma 4, by conditions D1–D4, we have   @b @b sup max 9bi b9 ¼ op ð1Þ and sup max  i   ¼ op ð1Þ: @y y2Y 1 r i r n y2Y 1 r i r n @y 2

ð14Þ

Thus supy2Y maxi [email protected] [email protected] ¼ Op ð1Þ. Since by D4, 1b is bounded away from zero for all y 2 Y, from (14) we also have    1 1    ð15Þ sup max  ¼ op ð1Þ: 2 i 1b2 1b  y2Y i

Finally 2

E9WðSi Þ9 r E½ðZðSi,h Þbi ZðSi ÞÞ2 E½ZðSi Þ2  r c2

J.W. Hyun et al. / Journal of Statistical Planning and Inference 142 (2012) 2330–2344

P P and so E½n1 9WðSi Þ9 rc, which shows that n1 9WðSi Þ9 ¼ Op ð1Þ. Thus, we have proved that ( )   n n n  1X WðSi Þ 1 X WðSi Þ @bi 1 X WðSi Þ @bi @b    þ    ¼ op ð1Þ: 2  n n i ¼ 1 1b2 @y n i ¼ 1 1b2 @y @y  i ¼ 1 1bi

2341

ð16Þ

2

Let UðSi Þ ¼ ðZðSi,h Þbi ZðSi ÞÞ2 ð1bi Þ and look at the second term on the RHS of (12). It is given by ! ( )   n n n n 1X UðSi Þbi @bi b @b 1X 1X UðSi Þbi 1X UðSi Þb @bi ¼ UðS Þ þ  i 2 n i ¼ 1 ð1b2 Þ2 @y ni¼1 n i ¼ 1 ð1b2 Þ2 n i ¼ 1 ð1b2 Þ2 @y ð1b Þ2 @y i i !  n b 1X @bi @b þ UðSi Þ  : 2 @y @y ð1b Þ2 n i ¼ 1

ð17Þ

It is easy to see that ( )    1X n n n UðSi Þbi 1X UðSi Þb @bi 1 X UðSi Þb @bi @b    þ    2 2  n n i ¼ 1 ð1b2 Þ2 @y n i ¼ 1 ð1b2 Þ2 @y @y  i ¼ 1 ð1bi Þ     n  b @b  b  1 X  i  9UðSi Þ9 r max i max  2 2 2 2   @y i ð1b Þ i ð1b Þ n i ¼ 1 i   n @bi @b 1 X b   þ max  9UðSi Þ9:  @y @y  n 2 2 i ð1b Þ i¼1 Since 2

E9UðSi Þ9 rE½ðZðSi,h Þbi ZðSi ÞÞ2  þð1bi Þ rc, P P so that E½n1 9UðSi Þ9 rc, it follows that n1 9UðSi Þ9 ¼ Op ð1Þ. Thus, arguments similar to the above shows that ( )    1X n n n UðSi Þbi 1X UðSi Þb @bi 1 X UðSi Þb @bi @b    þ    ¼ op ð1Þ: 2 2  n n i ¼ 1 ð1b2 Þ2 @y n i ¼ 1 ð1b2 Þ2 @y @y  i ¼ 1 ð1b Þ

ð18Þ

i

As before, the op and Op terms are uniform in y 2 Y. Next, consider the term n n n 1X 1X 1X WðSi Þ ¼ ðZðSi,h Þbi ZðSi ÞÞZðSi Þ ¼ ðZðSi,h Þbt,i ZðSi Þ þ bt,i ZðSi Þbi ZðSi ÞÞZðSi Þ ni¼1 ni¼1 ni¼1

¼

n 1X ZðSi Þ2 ðbt,i bi Þ þ op ð1Þ ni¼1

by Lemma 1:

It is easy to see that  X  1X 1 1X 1X ZðSi Þ2 ðbt,i bi Þ ¼ ðbt bÞ ZðSi Þ2 þ ZðSi Þ2 ðbt,i bt Þ ZðSi Þ2 ðbi bÞ: n n n n By (14) and Lemma 3, it follows that n 1X WðSi Þ ¼ ðbt bÞ þop ð1Þ ni¼1

where the op term is uniform in y 2 Y. Next, we consider the term n n 1X 1X 2 UðSi Þ ¼ ½ðZðSi,h Þbi ZðSi ÞÞ2 ð1bi Þ ni¼1 ni¼1

¼

n 1X 2 2 2 ½ðZðSi,h Þbt,i ZðSi Þ þ bt,i ZðSi Þbi ZðSi ÞÞ2 ð1bt,i þ bt,i bi Þ ni¼1

¼

n 1X 2 2 2 ½fðZðSi,h Þbt,i ZðSi ÞÞ2 ð1bt,i Þg þðbt,i bi Þ2 ZðSi Þ2 þ 2ðbt,i bi ÞðZðSi,h Þbt,i ZðSi ÞÞZðSi Þðbt,i bi Þ ni¼1

Using Lemmas 1–3 and (14) we can show that n 1X ðb bi ÞðZðSi,h Þbt,i ZðSi ÞÞZðSi Þ ¼ op ð1Þ, n i ¼ 1 t,i

ð19Þ

2342

J.W. Hyun et al. / Journal of Statistical Planning and Inference 142 (2012) 2330–2344 n 1X 2 fðZðSi,h Þbt,i ZðSi ÞÞ2 ð1bt,i Þg ¼ op ð1Þ, ni¼1 n 1X ðb bi Þ2 ZðSi Þ2 ¼ ðbt bÞ2 þop ð1Þ n i ¼ 1 t,i

and n 1X 2 2 2 2 ðb bi Þ ¼ ðbt b Þ þ op ð1Þ n i ¼ 1 t,i

where the op ð1Þ terms are uniform in y 2 Y. Hence n 1X UðSi Þ ¼ 2bðbt bÞ þ op ð1Þ: ni¼1

ð20Þ

uniformly in y 2 Y. By (12) and (13) and (16)–(20) we have !    2 1 @Q ðyÞ 2 @b 1þb @b 2 2 ¼ b Þð b  b Þ2 b ð b  b Þ þ o ð1Þ ¼ 2 ½ð1 ðbt bÞ þ op ð1Þ p t t 2 2 n @y @y ð1b Þ2 @y ð1b Þ2 uniformly in y 2 Y. Which, together with D1–D4, shows that for any given e 4 0, with probability tending towards 1, the function Q ðyÞ is strictly decreasing in y on ð0, yt eÞ \ Y and strictly increasing in y on ðyt þ e,1Þ \ Y. The proof of Theorem 1 is now completed by invoking the following proposition. Proposition 1. Suppose that the conditions of Theorem 1 hold. Then, for every e 4 0, with probability tending towards 1, there exists a local minimum of Q ðyÞ in the interval ½yt e, yt þ e \ Y. Clearly, Proposition 1 together with the facts established above shows that, given any e 40, with probability tending to 1, y^ ¼ arg miny2Y Q ðyÞ belongs to the interval ½yt e, yt þ e \ Y. This completes the proof. & Proof of Proposition 1. Let e0 4 0 be small enough so that ½yt e0 , yt þ e0   Y and 69rh ðyt Þrh ðyt 7 e0 Þ9 o1ðrh ðyt ÞÞ2 :

ð21Þ

Let y 2 fyt e, yt þ eg where 0 o e o e0 . Observe that by (21) and the monotonicity of b as a function of y, we have 2 2 2 9bt b 9=ð1b Þ o 1=2. Turning to the main quantity of interest, we have " # n n ðZðSi,h Þbt,i ZðSi ÞÞ2 ðZðSi,h Þbi ZðSi ÞÞ2 1 1X 1X 2 2 ðQ ðyt ÞQ ðyÞÞ ¼ ½logð1bt,i Þlogð1bi Þ þ  2 2 n ni¼1 ni¼1 1bt,i 1bi ! ! ! ! 2 2 2 n n n 1bt,i 1bt 1X 1X 1bi 1X 1 1 2 2 þ  þ log log ððZðS Þ b ZðS ÞÞ ð1 b ÞÞ  ¼ log i i,h t,i t,i 2 2 2 2 2 ni¼1 ni¼1 ni¼1 1bt,i 1bi 1b 1bt 1b n n n b2t,i b2i 1 X ðbt,i bi ÞZðSi ÞðZðSi,h Þbt,i ZðSi ÞÞ 1 X ðbt,i bi Þ2 ðZðSi ÞÞ2 2X þ  2 2 2 ni¼1 n i ¼ 1 1b ni¼1 1bi 1bi i ! b2 b2 b2 b2 ðb bÞ2 ¼ log 1 t 2 þ t 2  t 2 þ op ð1Þ 1b 1b 1b



where the last step follows from (14) and (15) and Lemmas 1–3. Now, for 9x9 o 1=2, we have   1 X xk x2 x3 3 3 ¼ x  1þ x þ x2 þ    logð1xÞ ¼  4 5 k 2 3 k¼1 3

3

r x

9x9 x2 x2 29x9 þ rx þ 2 2 3 3ð19x9Þ

r x

x2 : 6 2

2

2

Using this in the last line of (22) with x ¼ ðbt b Þ=ð1b Þ, we obtain ! 2 2 2 1 1 bt b ðbt bÞ2 ðQ ðyt ÞQ ðyÞÞ r   þ op ð1Þ, 2 n 6 1b2 1b

ð22Þ

J.W. Hyun et al. / Journal of Statistical Planning and Inference 142 (2012) 2330–2344 2

2

2343

2

since by the choice of e, 9bt b 9=ð1b Þ o 1=2. This proves the result since it shows that PðQ ðyt Þ omin fQ ðyt eÞ,Q ðyt þ eÞgÞ-1 as n,N-1, and so, with probability tending towards 1, there is a local minimum of the function Q ðyÞ in the interval ½yt e, yt þ e. Proof of Lemma 1. All throughout the proof c will denote generic positive constants. Let S ¼ fS1 , . . . ,Sn g. Define W 1ðnÞ ðsÞ ¼ WðsÞIð9WðsÞ9 rnÞ

and

W ðnÞ 2 ðsÞ ¼ WðsÞIð9WðsÞ9 4 nÞ:

Then we can write X X ðnÞ X X ðnÞ 1 1 WðSi Þ ¼ n1 ðW 1 ðSi ÞE½W ðnÞ E½W ðnÞ W 2 ðSi Þ n1 1 ðSi Þ9SÞþ n 1 ðSi Þ9S þn ¼ I1 þ I2 þI3

say:

ðnÞ Let us first look at the terms I2 and I3. Since E½WðSi Þ9S ¼ 0, we have E½W ðnÞ 1 ðSi Þ9S ¼ E½W 2 ðSi Þ9S. So X X E9EðW 2ðnÞ ðSi Þ9SÞ9 rn1 E½Eð9W ðnÞ E9I2 9 rn1 2 ðSi ÞJSÞ: 2

Now it is easy to see that E½9W 2ðnÞ ðSi ÞJS r n1 E½9W 2ðnÞ ðSi Þ9 9S rcn1 . So we have E9I2 9 rcn1 : Same argument will show that E9I3 9 rcn1 : So we have proved that I2 þ I3 ¼ Op ðn1 Þ: Let us now look at the term I1. Since the fourth moment of Z(s) is bounded, we have Var½W ðnÞ 1 ðSi Þ9S rc for some constant c 4 0 for all i. Now the Rho-mixing condition tells us X X ðnÞ 2 1=2 Cov½ðW 1ðnÞ ðSi Þ,W ðnÞ gðð9Si Sj 9ðhi þ hj ÞÞ þ ÞfVar½W ðnÞ E½I21 9S ¼ n2 1 ðSj Þ9S rn 1 ðSi Þ9SVar½W 1 ðSj Þ9Sg 1 r i,j r n

r cn

X

2

1 r i,j r n

gðð9Si Sj 92h2RÞ þ Þ ¼ c=n þ cn2

1 r i,j r n

X

gðð9Si Sj 92h2RÞ þ Þ:

ð23Þ

1 r iaj r n

For any iaj, denote Q ij ¼ gðð9Si Sj 92h2RÞ þ Þ. Let V ¼ 9U 1 U 2 9 and denote is p.d.f. by fV. By the condition of the lemma, the p.d.f. fV is bounded. Then EðQ ij Þ ¼ E½Q ij IðR 4 eÞþ E½Q ij IðR r eÞr PðR 4 eÞ þ E½gðð9Si Sj 92h2eÞ þ Þ Z 1 ¼ PðR 4 eÞ þ E½gðNV2h2eÞ þ  ¼ PðR 4 eÞ þ gððNv2h2eÞ þ Þf V ðvÞ dv 0 Z 1 Z 1 rPðR4 eÞ þc gððNv2h2eÞ þ Þ dv r PðR 4 eÞ þ cN 1 gðvÞ dv 0

0

Since PðR 4 eÞ ¼ oðn1 Þ, by (24) in the proof of Lemma 4 and (23), that Z 1 gðvÞ dv ¼ Oð1=nÞ þ Oð1=NÞ: EðI21 Þ r c=n þcN 1 0

This proves that I1 ¼ Op ðN1=2 Þ. The result follows since we have already shown that I2 and I3 are Op ð1=nÞ.

&

Proof of Lemma 2. Define U 1ðnÞ ðsÞ ¼ UðsÞIð9UðsÞ9 rnÞ and U ðnÞ 2 ðsÞ ¼ UðsÞIð9UðsÞ9 4nÞ. Since E ½UðSi Þ9S ¼ 0, we can write X X ðnÞ X X ðnÞ ðnÞ 1 1 UðSi Þ ¼ n1 ðU 1 ðSi ÞE½U ðnÞ E½U U 2 ðSi Þ ðS Þ9SÞ þn n1 i 1 1 ðSi Þ9S þ n X ðnÞ X X ðnÞ ðnÞ 1 1 1 ðU 1 ðSi ÞE½U 1 ðSi Þ9SÞn E½U 2 ðSi Þ9S þ n U 2ðnÞ ðSi Þ ¼n ¼ Op ðN1=2 Þ by using arguments similar to those in the proof of Lemma 1.

&

Proof of Lemma 3. It is similar to the proof of Lemma 1 and hence omitted.

&

Proof of Lemma 4. Note that 9hi h9 ¼ inf JSj Si 9h9: jai

Thus, for any h 4 e 4 0, PðR 4 eÞ ¼ Pðmax9hi h9 4 eÞ r i

X 1rirn

Pð9hi h9 4 eÞ ¼

X 1rirn

Pðinf JSj Si 9h9 4 eÞ jai

2344

J.W. Hyun et al. / Journal of Statistical Planning and Inference 142 (2012) 2330–2344

¼

X

E½fPð9N9U 1 U 2 9h9 4 e9U 1 Þgn1  ¼ nE½fPðJU 1 U 2 9h=N9 4 e=N9U 1 Þgn1 

1rirn

¼ nE½f1PðU 2 2 AðU 1 ,ðheÞ=N,ðh þ eÞ=NÞ9U 1 Þgn1  ¼ nE½f1volðAðU 1 ,ðheÞ=N,ðh þ eÞ=NÞ \ D0 Þgn1  rnE½f1c volðAðU 1 ,ðheÞ=N,ðh þ eÞ=NÞÞgn1 

ðby A3Þ rnf1c volðAð0,ðheÞ=N,ðh þ eÞ=NÞÞgn1

for some constants c 4 0 (which may vary from one line to another). For all e 2 ð0, e0 Þ where 0 o e0 r h only depends on the dimension d, we have d1

volðAð0,ðheÞ=N,ðhþ eÞ=NÞÞ Z

ceh

Nd

for some c 40 depending only on d. Thus ! d1 n1 ceh nPðR 4 eÞ r n2 1 Nd d1

r expðceh

ðn=Nd Þ þ 2 log nÞð1 þ oð1ÞÞ:

By condition A4, this shows in particular that for each e 4 0 PðR 4 eÞ ¼ oðn1 Þ and hence the lemma is proved.

ð24Þ &

Acknowledgments Hyun and Burman’s research was partially supported by the grant DMS-0907622 from the National Science Foundation. Paul’s research was partially supported by the grants DMR-1035468 and DMS-1106690 from the National Science Foundation. The authors would like to thank the reviewers for their helpful comments. Appendix A. Supplementary materials Supplementary data associated with this article can be found in the online version at doi:10.1016/j.jspi.2012.03.005.

References Bradley, R.C., 1987. The central limit question under r-mixing. Rocky Mountain Journal of Mathematics 17, 95–114. Bradley, R.C., 1993. Equivalent mixing conditions for random fields. Annals of Probability 21, 1921–1926. Cressie, N., 1993. Statistics for Spatial Data. Wiley. Diggle, P.J., Ribeiro Jr., P.J., 2007. Model-Based Geostatistics. Springer. Fedorov, V.V., 1974. Regression problems with controllable variables subject to error. Biometrika 61, 49–56. Kaufmann, P.R., Herlihy, A.T., Elwood, J.W., Mitch, M.E., Overton, W.S., Sale, M.J., Messer, J.J., Cougan, K.A., Peck, D.V., Reckhow, K.H., Kinney, A.J., Christie, S.J., Brown, D.D., Hagley, C.A., Jager, H.I., 1988. Chemical Characteristics of Streams in the Mid-Atlantic and Southeastern United States, Volume I: Population Descriptions and Physico-Chemical Relationships, EPA/600/3-88/021a. U.S. Environmental Protection Agency, Washington, D.C. Kolmogorv, A.N., Rozanov, Y.A., 1960. On strong mixing conditions for stationary Gaussian processes. Theory of Probability and its Applications 5, 204–208. Matheron, G., 1962. Traite de Geostatistique Appliquee, Tome I. In: Memoires du Bureau de Recherches Geologiques et Minieres, No. 14. Editions Technip, Paris. ¨ Muller, W.G., 1999. Least-squares fitting from the variogram cloud. Statistics and Probability Letters 43, 93–98. Schabenberger, O., Gotway, C., 2005. Statistical Methods for Spatial Data Analysis. Chapman & Hall. Waller, L., Gotway, C., 2004. Applied Spatial Statistics for Public Health Data. John Wiley & Sons. Zimmerman, D.L., Zimmerman, M.B., 1991. A comparison of spatial semivariogram estimators and corresponding ordinary Kriging predictors. Technometrics 33, 77–91.