- Email: [email protected]

www.elsevier.com/locate/jspi

Covariance matrix estimation for estimators of mixing weak ARMA models a Universit e

Christian Francqa , Jean-Michel Zakoanb; ∗

du Littoral – Cˆote d’Opale, Laboratoire de Mathematiques Pures et Appliquees, Joseph Liouville, Centre Universitaire de la Mi-Voix, 50, rue Ferdinand Buisson – BP 699, 62228 Calais Cedex, France b Universit e de Lille I, Laboratoire de Statistique et Probabilites and CREST, 15 Bd Gabriel Peri, 92245 Malako Cedex, France Received 14 July 1997; accepted 6 April 1999

Abstract For the statistical analysis of the ARMA models, the standard methods require that the linear innovations are martingale dierences. This property is not satis ed for ARMA representations of non-linear processes. In such a case, the standard method typically entails an underestimation of the variance of the least-squares estimator of the ARMA parameters (and consequently it entails a serious risk of overparameterization). In this paper, the martingale dierence assumption is relaxed. We propose a consistent estimator of the covariance matrix of the least-squares estimator c 2000 Elsevier Science B.V. All rights under a mixing assumption on the observed process. reserved. MSC: 62M10 Keywords: ARMA models; Non-linear models; Least-squares estimator; Consistency; Robust covariance matrix estimate

1. Introduction The well-known Wold decomposition states that any purely non-deterministic stationary process with nite variance can be written as an in nite moving average in terms of its linear innovation process. The ARMA representations in which the noise is the linear innovation process (called in this paper weak ARMA models) appear to be fairly general when approximating the in nite moving average of the Wold decomposition. Moreover, a number of non-linear processes admit exact weak ARMA representations which are obviously not strong ones (i.e. in which the noise is not the strong innovation process). ∗ Corresponding author. E-mail addresses: [email protected] (C. Francq), [email protected] (J.-M. Zakoan)

c 2000 Elsevier Science B.V. All rights reserved. 0378-3758/00/$ - see front matter PII: S 0 3 7 8 - 3 7 5 8 ( 9 9 ) 0 0 1 0 9 - 3

370

C. Francq, J.-M. Zakoan / Journal of Statistical Planning and Inference 83 (2000) 369–394

Therefore the class of weak ARMA models is likely to be better suited than that of strong ARMA models in a number of practical situations. However, ready-made packages used by practitioners are typically based on the asymptotic theory derived for strong ARMA models. This theory requires that the noise involved in the ARMA equation be an i.i.d. sequence, or at least a martingale dierence sequence (see, for example, Brockwell and Davis (1991) for the theory and practice of strong ARMA models). Although such strong assumptions on the noise are generally not satis ed for non-linear models, it is a common practice to use the strong ARMA software packages for tting weak ARMA representations of non-linear time series. If the non-linear model was known, it is clear that the weak ARMA representation would be of little interest (it can only provide the optimal linear predictions but not the optimal predictions in the mean-square error sense). However, the selection of a suitable non-linear model is often a real challenge and the practitioner generally begins his study by tting ARMA models. A natural question arises: in such situations, can we rely on the aforementioned ready-made packages? In particular, can we use standard procedures to estimate weak ARMA representations? As we will show, the answer is partly negative. The aim of this paper is to propose suitable modi cations of the standard procedures. Under moment and mixing conditions (which hold for many non-linear processes), Francq and Zakoan (1998) have shown that the least-squares estimator of the weak ARMA representations is strongly consistent and asymptotically normal. However, the asymptotic covariance matrix is generally dierent from that of the standard theory. Typically, the dependence structure of the linear innovation process results in a reduced asymptotic accuracy (although some examples show that the inverse can also be true). Consequently, standard identi cation routines can lead to severe misspeci cations (with a serious risk of overparameterization). Similarly, correct tests and con dence intervals cannot be obtained from the standard routines developed for the analysis of strong ARMA models. To solve these problems, it is therefore essential to have a weakly consistent estimator of the asymptotic covariance matrix. The estimators used in the standard theory of strong ARMA processes are generally not consistent for weak ARMA representations. The aim of this paper is to circumvent this lack of robustness of the standard procedures. The weak consistency of the estimator proposed in this paper is proved without additional assumptions on the observed process (i.e. under the assumptions made for the asymptotic normality). The method consists in weighting appropriately some empirical fourth-order moments by means of a window and a truncation point. This technique or similar ones are used, as it is the case here, in the estimation of the covariance matrix of estimators (see e.g. Andrews, 1991; Gallant and White, 1988; Hansen, 1992; Newey and West, 1987), and in various other elds. This includes the spectral analysis (see e.g. Anderson, 1971; Priestley, 1981; Robinson, 1991), the kernel density estimation (see e.g. Bosq, 1996; Bosq and Lecoutre, 1987; Devroye, 1987), the generalized method of moments estimation (Hansen, 1982), the unit root tests (Phillips, 1987; Phillips and Perron, 1987), and the information matrix estimation (White, 1984). The approach taken in this paper is more closely related to that used

C. Francq, J.-M. Zakoan / Journal of Statistical Planning and Inference 83 (2000) 369–394

371

by Berlinet and Francq (1997), Melard et al. (1991) and Robinson (1977) for the estimation of covariances between sample autocovariances. The paper is organized as follows. The covariance matrix estimator and its asymptotic properties are presented in Section 2. The estimator is de ned by means of a window and a truncation point. The choice of these two parameters is discussed in Section 3. Section 4 proposes empirical illustrations. Section 5 concludes. The proofs of the asymptotic results of Section 2 are collected in the appendix. 2. Main results Let (Xt )t∈Z be a second-order stationary process such that, for all t ∈ Z, p q P P Xt + ai Xt−i = t + bi t−i ; i=1

i=1

(1)

where (t ) is a sequence of uncorrelated random variables de ned on some probability space ( ; A; P) with zero mean and common variance 2 ¿ 0, and where the polynomials (z) = 1 + a1 z + · · · + ap z p and (z) = 1 + b1 z + · · · + bq z q have all their zeros outside the unit disk and have no zero in common. Without loss of generality, assume that ap and bq are not both equal to zero (by convention a0 = b0 = 1). Let 0 =(a1 ; : : : ; ap ; b1 ; : : : ; bq )0 , =(1 ; : : : ; p ; p+1 ; : : : ; p+q )0 and denote by the parameter space, ={ ∈ Rp+q : (z)=1+1 z+· · ·+p z p and (z)=1+p+1 z+· · ·+p+q z q have all their zeros outside the unit disk}. For all ∈ , let (t ()) be the second-order stationary process obtained by replacing 0 by and t−i by t−i () in (1). Given a realization of length n, X1 ; X2 ; : : : ; Xn , t () can be approximated, for 0 ¡ t6n, by et () de ned recursively by p q P P (2) et () = Xt + i Xt−i − p+i et−i (); i=1

i=1

where the unknown starting values are set to zero: e0 () = e−1 () = · · · = e−q+1 () = X0 = X−1 = · · · = X−p+1 = 0. Let be a strictly positive constant chosen so that the true parameter 0 belongs to the compact set := { ∈ Rp+q ; the zeros of polynomials (z) and

(z)

have moduli ¿1+}:

The random variable ˆn is called least-squares estimator if it satis es, almost surely, n 1P e2 (): (3) Qn (ˆn ) = min Qn () where Qn () = ∈ n t=1 t The consistency of these least-squares estimators can be proved when (Xt )t∈Z is an ergodic strictly stationary process. For the derivation of the asymptotic distribution it is supposed that the sequence (X (k))k∈N∗ of the strong mixing coecients of the stationary process (Xt )t∈Z satis es ∞ P [X (k)]=(2+) ¡ ∞ (4) k=0

372

C. Francq, J.-M. Zakoan / Journal of Statistical Planning and Inference 83 (2000) 369–394

for some ¿ 0. Pham (1986) has shown that, for a wide class of processes, this mixing condition holds. For any ∈ , let n 1P On () = 2 () n t=1 t and (@[email protected])On () = ((@[email protected]1 )On (); : : : ; (@[email protected]p+q )On ())0 : We consider the following matrices: 2 √ @ @ I () = lim var n On () and J () = lim Qn () a:s:; n→∞ n→∞ @i @j @ where [A(i; j)] denotes the matrix A with elements A(i; j). We have the following result. Theorem 1 (Francq and Zakoan, 1998). Let (Xt )t∈Z be a strictly stationary process satisfying (1). Suppose that (Xt )t∈Z satis es E|Xt |4+2 ¡ ∞ and (4) for some ¿ 0. Then ˆn → 0 a:s: as n → ∞ and

√ n(ˆn − 0 )

has a limiting centered normal distribution with covariance matrix := J −1 (0 )I (0 )J −1 (0 ): It is important to note that, in the case of strong ARMA, we have n 1 P @ @ 2t (0 ) t (0 ) = 42 var t (0 ) = 22 J (0 ) I (0 ) = lim var √ n→∞ @ @ n t=1 and √ := lim n(ˆn − 0 ) = 22 J −1 (0 ): n→∞

(5)

(6)

However, this simpli cation can generally not be made when the strong assumptions do not hold (as illustrated in Example (12) below). It is also worth noticing the analogy between the expression of in (5) and that of the so-called quasi-maximum likelihood (QML) method studied, for example, by Gourieroux et al. (1984). In this method, the asymptotic covariance matrix takes the form J −1 IJ −1 , where I and J involve rst and second derivatives of the log-likelihood function. It is important to note that this method is generally not valid under the weak assumptions we consider. In the framework of ARMA models, the Gaussian QML estimator is the same as the least-squares estimator. However, the asymptotic matrix provided by the QML does not correspond to (5) but to (6). We are now in a position to state our rst result which shows the strong consistency of an empirical estimator of J (0 ). Write (@[email protected])et () = ((@[email protected]1 )et (); : : : ; (@[email protected]p+q )et ())0 , where the (@[email protected]i )et ()’s are de ned recursively from (2). Let 0 n @ 2P @ et () et () : Jˆn () = n t=1 @ @

C. Francq, J.-M. Zakoan / Journal of Statistical Planning and Inference 83 (2000) 369–394

373

Theorem 2. Under the assumptions that (Xt )t∈Z is a strictly stationary and ergodic process satisfying (1) and that EXt2 ¡ ∞ we have Jˆn (ˆn ) → J (0 )

a:s: as n → ∞:

In the case of strong ARMA models, the matrix can be estimated by −1 ˙ n = 2ˆ2n Jˆn (ˆn );

(7)

where = Qn (ˆn ): Alternative estimators can be used (e.g. by plugging ˆn into the explicit expression of J (·) obtained for a given model, as in De Gooijer (1985)). However, for larger sample sizes (as those of Section 4), the dierence becomes negligible. In the general case of weak ARMA, it remains to build a consistent estimator of I (0 ). Let 0 @ @ i () := 4E t () t () t+i () t+i () : @ @ ˆ2n

It can be shown (Francq and Zakoan, 1998, Lemma 3) that, under the assumptions of Theorem 1, the sequence (i (0 )) is absolutely summable. Therefore, from the stationarity of the centred process (t (0 )(@[email protected])t (0 )) we have n P n @ 1P @ 2 I (0 ) = lim t (0 ); s2 (0 ) cov n n t=1s=1 @ @ n P n @ @ 4P cov t (0 ) t (0 ); s (0 ) s (0 ) = lim n n t=1s=1 @ @ @ @ 4 P (n − |i|)cov 1 (0 ) 1 (0 ); 1+i (0 ) 1+i (0 ) = lim n n |i|¡ n @ @ =

+∞ P

i (0 ):

i=−∞

(8)

Moment i (0 ) will be estimated by ˆ i (ˆn ), where, for 06i ¡ n, 0 P 4 n−i @ @ ˆ et () et () et+i () et+i () i () = n t=1 @ @ and ˆ−i () = ˆi ()0 : Under the assumptions of Theorem 1, it can be shown that ˆi (ˆn ) is a consistent estimator of i (0 ). Therefore, one can wonder whether the simple estimator In (ˆn ) =

n−1 P

ˆi (ˆn )

i=−n+1

is a consistent estimator of I (0 ). The answer is negative since, for all n, n 2 2 4 P @ @ Qn (ˆn ) = 0: et (ˆn ) et (ˆn ) = n In (ˆn ) = n t=1 @ @

374

C. Francq, J.-M. Zakoan / Journal of Statistical Planning and Inference 83 (2000) 369–394

To remedy the failure of this estimator, it can be noted that, as in Berlinet and Francq (1997), the estimand is an in nite sum of moments, and that some of these moments are likely to be poorly estimated since their estimators are based on only few observations. The classical solution used to solve this problem consists in weighting the empirical moments ˆi (ˆn ). The weight given to ˆi (ˆn ) takes the form !(ibn ), where (bn )n∈N∗ is a sequence of real numbers and the function ! : R → R is supposed to be bounded, with compact support [ − a; a] and continuous at the origin with !(0) = 1. Examples of such weight functions are the rectangular window !(x) = 1[−1;1] (x) or the Bartlett window !(x) = (1 − |x|)1[−1;1] (x). With regard to the choice of (bn ), a few heuristic remarks can be made. When i is small relative to n, a weight !(ibn ) close to one is required. Therefore, it is supposed that bn decreases to zero as n tends to in nity. On the contrary, when i is large relative to n, one wants a weight !(ibn ) close to zero (to avoid the troubles occurring with the estimator In (ˆn )). Therefore, it is supposed that bn does not decrease to zero too quickly. In the following theorem, a sucient condition on the decrease rate of bn is given. A consistent estimator of I (0 ) is obtained without additional assumptions on the observed process (Xt ). Theorem 3. Let the assumptions of Theorem 1 be satis ed. Consider the matrix Iˆn () =

+T Pn i=−Tn

!(ibn )ˆi ();

where Tn = [a=bn ] ([x] denoting the integer part of x). If bn is chosen such that bn → 0

and

nb4+10= →∞ n

as n → ∞;

(9)

then Iˆn (ˆn ) → I (0 )

in probability as n → ∞:

Theorems 2 and 3 show that −1 −1 ˆ n := Jˆn (ˆn )Iˆn (ˆn )Jˆn (ˆn )

is a weakly consistent estimator of . Even when is positive de nite, for nite samples, the positive-de niteness of ˆ n is not guaranteed. For instance, in simulation experiments, we have done, negative estimates of the variance of real parameters have been observed. To avoid such troubles, the following theorem gives a simple condition ensuring the non-negative de niteness of ˆ n . Note however that this property is not always desirable since, as pointed out to us by a referee, an estimate ˆ n which is not positive semi-de nite, even for a large sample, may indicate a rank de ciency of (which arises, for instance, when the model is overparameterized). Theorem 4. Under the assumptions of Theorem 3; ˆ n is a weakly consistent estimator of . If; in addition; !(·) is an even and non-negative de nite function; then ˆ n is a non-negative-de nite matrix.

C. Francq, J.-M. Zakoan / Journal of Statistical Planning and Inference 83 (2000) 369–394

375

3. On the choice of the estimator parameters In the practical situation of nite samples, we are confronted with the problem of choosing the weight function !(·) and the truncation point Tn . From (8) the problem is closely related to the behavior of the sequence (i (0 ))i¿0 . Note that, for a strong ARMA process, we have i (0 ) = 0; ∀i¿1. More generally, the mixing assumption made on (Xt ) entails that i (0 ) tends quickly to zero as i → ∞. Suppose therefore P that for some h¿1, i¿h i (0 ) is roughly the null matrix. Then P i (0 ) I˜h (0 ) = |i|6h

is a good approximation of I (0 ) depending on a xed number of i (0 )’s. A natural estimator of I˜h (0 ) is the empirical estimate P ˆi (ˆn ) |i|6h

which corresponds to the estimator Iˆn (ˆn ) with the weight function !(x) = 1[−1;1] (x) and the truncation point Tn = h. However, it should be noted that there is no guarantee that the corresponding estimator ˆ n is a non-negative de nite matrix since the rectangular function is not non-negative de nite. Moreover for a given particular sequence (i (0 )), one can generally construct a better estimator of by considering a linear combination of ˆ−h (ˆn ); : : : ;ˆh (ˆn ) with non-equal coecients, which corresponds to a non-rectangular weight function. However, since in practice the i (0 )’s are unknown, the rectangular weight function seems to be a reasonable choice. The choice of the truncation point is more tricky. It is clear that a too small truncation point (Tn ¡ h) will yield an estimator which may have a large bias. On the other hand, a too large truncation point may yield an estimator with a large variance since it takes into account too many useless ˆi (ˆn )’s. It is worth noting that, for the standard lag window estimator of the spectral density or for the kernel density estimator, the eect of the truncation point is opposite (the bias increases and the variance decreases as the width of the window increases). To solve the problem of the choice of the truncation point we propose here a heuristic method consisting in the examination of the behaviour of the sequence ˆi (ˆn ), i = 0; 1; : : : For a strong ARMA model with nite eighth-order moments, the standard deviation of the element of the m1 th row and the m2 th column of ˆi (ˆn ), say ˆi (ˆn )(m1 ; m2 ), is approximately, for i¿1, s 2 @ @ 4 √ E 1 (0 ) 1 (0 )1+i (0 ) 1+i (0 ) @m1 @m2 n which can be estimated by s 2 n P @ @ 1P 1 n−i 4 √ et (ˆn )2 et (ˆn ) et+i (ˆn ) : et (ˆn ) n t=1 @m1 @m2 n n t=1

(10)

376

C. Francq, J.-M. Zakoan / Journal of Statistical Planning and Inference 83 (2000) 369–394

It should be noted that for a weak ARMA model the previous expression of the standard deviation of ˆi (ˆn )(m1 ; m2 ) is no longer valid. However, in the numerical illustrations of Section 4, and in other numerical simulations, not reported here for the sake of brevity, the truncation point Tn has been chosen in such a way that, for most of the indices i greater than Tn , |ˆi (ˆn )(m1 ; m2 )| is small relative to its estimated standard deviation given by (10). 4. Numerical illustrations We illustrate the estimation procedure of Section 2 by applying it to some simulated and real data sets. The rst simulated example is used to demonstrate that the proposed procedure and the standard one give very close results when the underlying process is strongly linear. In the second example (a weak ARMA representation of a non-linear process) our procedure outperforms the standard one. The detailed calculations of the matrix are not reported here, but are available from the authors. By means of the real example, we show that the two procedures may lead the practitioner to dierent conclusions. 4.1. Two simulation experiments In this rst experiment, our aim is to check that our procedure can do a good job for strong ARMA models (although it is mainly designed to deal with weak ones). Therefore, we compare it with standard procedures: rst for the estimation of the ARMA parameters and, second, for the estimation of the asymptotic covariance matrix . Consider the following strong ARMA(2,1) process: Xt − 0:4Xt−1 − 0:2Xt−2 = t + 0:7t−1 ;

(11)

where (t ) is a sequence of independent standard normal random variables. One thousand independent trajectories of size 300 of model (11) have been simulated using the NAG Fortran workstation library. For each trajectory, := (a1 ; a2 ; b1 ) has been estimated from the least-squares procedure de ned in Section 2 (denoted by LS) and the standard procedure implemented by means of the NAG routine G13AFF. The latter includes backforecasting which, however, can be avoided since Ansley (1979), Melard (1984) and Newbold (1974) provided very ecient algorithms for the computation of the exact likelihood of ARMA models. For the sample size we used, the eect of backforecasting is negligible. Actually, Fig. 1 shows that, as expected, both procedures provide the same numerical results for aˆ1 ; aˆ2 ; bˆ 1 . Let us now compare the two procedures for the estimation of the nite-sample standard deviations of aˆ1 ; aˆ2 ; bˆ 1 , denoted as aˆ1 ; aˆ2 and bˆ1 , respectively. To determine these unknown values, our asymptotic results (Theorem 1) can be useful, although they provide only approximations (because we work with a nite sample size). The approximations obtained in that way are equal, respectively, to 0.11, 0.10 and 0.09. A second approximation of the theoretical

C. Francq, J.-M. Zakoan / Journal of Statistical Planning and Inference 83 (2000) 369–394

377

Fig. 1. Comparison between the parameter estimates obtained by the procedure proposed in Section 2 (denoted by LS) and the standard method (NAG Routine G13AFF) for the strong ARMA(2,1) model (11). A1, A2 and B1 correspond respectively to the boxplots of the 1000 estimation errors of parameters a1 , a2 and b1 .

Fig. 2. As in Fig. 2, but for the estimates of aˆ1 (A1), aˆ2 (A2) and bˆ (B1). 1

nite-sample standard deviations is given by the observed standard deviations over the 1000 values of aˆ1 ; aˆ2 ; bˆ 1 . They are equal, respectively, to 0.12, 0.10 and 0.10. These values are very close to those obtained from the asymptotic theory. This leads us to think that, at least for this simulated process of size 300, the asymptotic standard deviations provide a good approximation of the exact ones. Of course, in pratical situations, the previous approximations are not available, since only one sample is observed and the data-generating process is unknown. Therefore, for each of the 1000 simulated trajectories, the estimations of the theoretical standard deviation obtained by LS and NAG methods were compared. In this comparison, the approximations of the theoretical standard deviations obtained from the asymptotic theory are considered as the true values. For the implementation of the LS method, the rectangular window !(x) = 1[−1;1] (x) was used. We have taken bn equal to 1=ln(n) which corresponds to the truncation point Tn =5 for n=300. As shown in Fig. 2, both methods provide again very close estimates for aˆ1 ; aˆ2 ; bˆ1 . The observed standard deviation of the estimates of aˆ1 ; aˆ2 ; bˆ1 are, respectively, 0.02, 0.02 and 0.03 with the LS method, and 0.02, 0.01 and 0.02 with the NAG method. It seems that the proposed method has the nice feature of performing almost as well as the standard procedure in the case of the strong ARMA process (11). Fig. 3 shows that, as expected for strong ARMA models, the ˆi (ˆn )(1; 1)’s are very close to zero for i¿1. In practice, such a graph plot should suggest to choose a small truncation point (for a strong ARMA model the best truncation point is Tn = 0, since

378

C. Francq, J.-M. Zakoan / Journal of Statistical Planning and Inference 83 (2000) 369–394

Fig. 3. The bars correspond to the ˆi (ˆn )(1; 1)’s, for i = 0; 1; : : : ; 30, computed on a realization of length 300 of (11). The dotted lines correspond to ± the estimated standard deviations of the ˆi (ˆn )(1; 1)’s given by (10).

i (0 ) = 0 for all i 6= 0). Similar graphs have been obtained for the other sequences (ˆi (ˆn )(m1 ; m2 ))i¿0 . However it is to be noted that, as expected, the performances of the estimator de ned in Section 2 deteriorate as Tn increases. As an example, the same simulated experiment was repeated with Tn = 40. The observed standard deviations of aˆ1 ; aˆ2 ; bˆ1 are now approximately 0.04 and several observed values are strictly negative. In view of Theorem 4, such strictly negative values occur because the rectangular window is not a non-negative-de nite function. These strictly negative values were avoided for the non-negative-de nite Bartlett window !(x) = (1 − |x|)1[−1;1] (x). Our second experiment is based on a simple Markov-switching model. Markovswitching models are widely used in econometrics to model changes in regime (see e.g. Hamilton, 1989, 1994). Let Xt = t + (0:7 − 1:4t )t−1 ;

(12)

where (t ) is an i.i.d N(0,1) sequence, (t ) is a stationary Markov chain independent of (t ) with state-space {0; 1}, and the transition probabilities are P(t = 1=t−1 = 0) = P(t = 0=t−1 = 1) = 0:05. The two regimes (t = 0 and t = 1) correspond to MA(1) models with respective coecients 0.7 and −0:7. To assess the validity of our procedure, we will estimate the following weak MA(1) representation: Xt = t + bt−1 :

(13)

It is easy to show that (Xt ) is actually a weak white noise. The true value of b is therefore equal to zero, and the variance of the noise (t ) is EXt2 = 1:49. One thousand independent trajectories of size 500 of model (12) have been simulated. Fig. 4 clearly shows that the simulated process is not a strong MA(1) since ˆi (ˆn ) slowly decreases as i increases and a lot of the ˆi (ˆn )’s are signi cantly dierent from

C. Francq, J.-M. Zakoan / Journal of Statistical Planning and Inference 83 (2000) 369–394

379

Fig. 4. As in Fig. 3, but for a realization of length 500 from model (12).

Fig. 5. Comparison between the errors observed with the procedure proposed in Section 2 (LS) and the standard method (NAG Routine G13AFF) to estimate bˆ in the weak MA(1) model (13).

zero. This gure also inclined us to chose a large value for the truncation point; since, for i626, most of the |ˆi (ˆn )(1; 1)|’s seem large relative to their estimated standard deviation, we decided to use the rectangular window and the truncation point Tn = 26. Theorem 1 and standard computations of moments of Xt show that the approximation ˆ is 0:108. This is close to the of bˆ , obtained from the asymptotic distribution of b, value of the observed standard deviation of bˆ over the 1000 simulations. Therefore, this value is taken as a gauge to compare the two procedures. As shown in Fig. 5, the performances of the procedure of Section 2 are much more satisfactory than those of the standard method; the LS method slightly underestimates bˆ (the median of the estimation errors is close to 0.01), but the NAG estimation errors are always greater than the LS ones (with NAG, the median of the estimation errors is close to 0.06, and the estimated standard deviation is less than one-half the true standard deviation). The NAG estimates are close to the value obtained for a strong model (in this case

380

C. Francq, J.-M. Zakoan / Journal of Statistical Planning and Inference 83 (2000) 369–394

√ the standard deviation would be 1= 500 ' 0:045), instead of the true value, which is equal to 0.108). To show the possibility of model misspeci cation, let us test the null hypothesis H0 : “(Xt ) is a weak white noise” by rejecting H0 when bˆ n is greater than twice its estimated standard deviation. If the standard deviation of bˆ n is accurately estimated, then the error of rst kind is approximately 5%. With the LS method, the frequency of rejection of H0 is 10.4% for the 1000 trajectories, whereas it is 40.2% with the NAG method. The frequency of bad decision is therefore much greater with the standard method. 4.2. A real data set Our example concerns the hourly data of the wind velocity at Athens (Climatological Bulletin – Year 1990, National Observatory of Athens, Institute of Meteorology and Physics of the Atmospheric Environment) in 1990, denoted by Xt ; t = 1; : : : ; 8760. Let Yt = Xt+1 − Xt ; t = 1; : : : ; 8759, be the dierenced series. Fig. 6 displays the rst 191 values. At rst sight it seems plausible that (Yt ) be a martingale dierence sequence (the best point prediction of the wind velocity at time t + 1 would be the wind velocity at time t). We note that the empirical partial autocorrelation of order one is greater than twice its estimated standard deviation, given by the usual Bartlett formula. Unfortunately, the Bartlett formula is generally wrong for non-linear processes. Berlinet and Francq (1997) and Romano and Thombs (1996) have proposed alternative methods to estimate the standard deviation of functions of the empirical autocovariances. The estimators based on the Bartlett formula remain the most widely used, however, and they can lead to misspeci cation. To mimic practical situations, where overparmeterization can occur, let us t an AR(1) model. Using the NAG routine G13AFF, the

Fig. 6. The dierenced series of the wind velocity from the rst of January 1990 at 2 h to the eighth of January at 24 h.

C. Francq, J.-M. Zakoan / Journal of Statistical Planning and Inference 83 (2000) 369–394

381

Fig. 7. As in Fig. 3, for the estimated AR(1) model of the wind velocity.

tted AR(1) model is Yt + 0:036Yt−1 = t :

(14)

With the NAG routine, the estimation of the standard deviation of the AR(1) parameter is equal to 0.011. The estimation of the parameter is more than three times as great as its estimated standard deviation. Therefore, the standard procedure implemented by the NAG routine concludes that the AR(1) parameter is signi cantly dierent from zero. However, the validity of the NAG routine conclusion requires that the noise of the AR(1) model is a strong one. Fig. 7 clearly indicates that this assumption is false since many ˆi (ˆn )’s are signi cantly dierent from zero. Since Fig. 7 indicates that many ˆi (ˆn )’s must be taken into account, the truncation point Tn = 20 (with the rectangular window) has been chosen. The estimated standard deviation is now equal to 0.034 which does not lead to reject the assumption that the AR(1) coecient is zero. The same experiment has been repeated using only the rst 6000 values. The same conclusions hold. Once again, in contrast with the NAG inference, the signi cance of the coecient is doubtful. The two corresponding models (the strong AR(1) model and the weak white noise model) have been tested on the last 2759 out-of-sample values. On these last 2759 values, the mean-square error of prediction is merely the same for both the AR(1) and the white noise models. Therefore, in opposition to the conclusion given by the standard Box and Jenkins methodology, it seems that the hypothesis that the dierenced series is a weak white noise can not be rejected (in this case the wind velocity at time t is the best linear prediction of the wind velocity at time t + 1). Moreover, Fig. 7 clearly indicates that the martingale dierence assumption must be rejected and that a non-linear model should be considered, in order to obtain optimal (in the mean-square error sense) predictions.

382

C. Francq, J.-M. Zakoan / Journal of Statistical Planning and Inference 83 (2000) 369–394

5. Conclusion Standard diagnostic tools for analyzing ARMA models are sensitive to misspeci cations of the noise dependence structure. In fact, these misspeci cations cause standard methods for computing variances of estimators to present a misleading view of their accuracy. The use of the proposed estimator in applied work can protect oneself against incorrect inferences. The numerical results presented in this paper, and other ones not reported here (i.e. for n = 50, n = 100, n = 200) but available from the authors, suggest that the estimator performs remarkably well for moderate sample sizes. Moreover the empirical covariances (the ˆi (ˆn )’s) used to estimate the covariance matrix can serve as a diagnostic tool for detecting departure from the standard strong ARMA case. However, building a formal statistical test based on these quantities remains to be done.

Acknowledgements We would like to thank two anonymous referees for their helpful comments and suggestions.

Appendix The appendix is organized as follows. Lemmas A.1 and A.2 state that t (), et () and their derivatives can be expressed as linear combinations of Xt ; Xt−1 ; : : : Since the coecients of these linear combinations decay at an exponential rate, Lemma A.3 shows that et () can be replaced by t () in the expression of Jˆn (). Therefore, in Lemma A.4, the ergodic theorem can be applied to the stationary process (t ()), which shows, in Lemma A.5, that Jˆn (0 ) converges to the unknown matrix Jn (0 ). However Jˆn (0 ) is not an estimator since 0 is unknown. Using a standard Taylor expansion and the consistency of ˆn , Lemma A.6 shows that 0 can be replaced by ˆn , which completes the proof of Theorem 2. Using Lemma A.1, it is shown in Lemmas A.7 and A.8 that et () can be replaced by t () in the expression of Iˆn (). Lemma A.9 states a moment inequality. Two kinds of truncations are required. A rst truncation is used to ensure moment existence (this sort of truncation is made in the standard proof of the weak law of large numbers (see e.g. Feller, 1957, pp. 231–233)). A second truncation is used to approximate t () by a function of a nite number of past values of Xt , in order to keep the mixing property. Using this lemma, together with Lemma A.10, it is shown in Lemma A.11 that Iˆn (0 ) converges to the unknown matrix I (0 ). Using once again a Taylor expansion and the √ asymptotic normality of n(ˆn − 0 ), Lemma A.12 shows that the unknown parameter 0 can be replaced by ˆn , which completes the proof of Theorem 3. The proof of Theorem 4 is based on Theorems 2, 3 and Lemma A.13.

C. Francq, J.-M. Zakoan / Journal of Statistical Planning and Inference 83 (2000) 369–394

383

Lemma A.1. For any ∈ and any (m1 ; m2 ) ∈ {1; : : : ; p + q}2 ; there exist absolutely summable sequences (ci ())i¿0 ; (ci; m1 ())i¿1 and (ci; m1 ; m2 ())i¿2 such that almost surely t () =

∞ P i=0

ci ()Xt−i ;

∞ P @ t () = ci; m1 ()Xt−i ; @m1 i=1

∞ P @2 t () = ci; m1 ; m2 ()Xt−i @m1 @m2 i=2

(A.1)

and sup |ci ()| = O(i );

∈

sup |ci; m1 ()| = O(i );

∈

sup |ci; m1 ; m2 ()| = O(i ); (A.2)

∈

for some ∈ [0; 1[. Proof. The result is obvious for q=0 or (p; q)=(0; 1). In the other cases the proof can be obtained from a straightforward extension of results given by Francq and Zakoan (1998, Lemmas 9 and 10). Lemma A.2. For any ∈ and any (m1 ; m2 ) ∈ {1; : : : ; p + q}2 ; we have et () =

t−1 P i=0

ci ()Xt−i ;

t−1 P @ et () = ci; m1 ()Xt−i ; @m1 i=1

t−1 P @2 et () = ci; m1 ; m2 ()Xt−i ; @m1 @m2 i=2

(A.3)

where (ci ())i¿0 ; (ci; m1 ())i¿1 and (ci; m1 ; m2 ())i¿2 are the sequences de ned in Lemma A:1. Proof. First note that, since the variance 2 of the linear innovations is supposed to be strictly positive, the sequence (ci ()) satisfying (A.1) is unique. From (1) and (2) it is easy to check that et () = Xt +

t−1 P i=1

ci ()Xt−i :

Since t () can be dierentiated twice under the summation sign we have ci; m1 () =

@ ci () @m1

and

ci; m1 ; m2 () =

@2 ci (): @m1 @m2

The conclusion follows. Lemma A.3. For any (m1 ; m2 ) ∈ {1; : : : ; p + q}2 we have @2 @2 2 2 sup t () − et () → 0 a:s: @m1 @m2 ∈ @m1 @m2

(A.4)

384

and

C. Francq, J.-M. Zakoan / Journal of Statistical Planning and Inference 83 (2000) 369–394

@ @ @ @ t () t () − et () et () → 0 a:s: sup @ @ @ @ m1 m2 m1 m2 ∈

(A.5)

as t → ∞. Proof. From Lemmas A.1 and A.2 there exist almost surely constants K1 ¿ 0 and ∈ [0; 1[ such that @2 @2 t2 () − et2 () sup @m1 @m2 ∈ @m1 @m2 @2 (t () − et ()) 62 sup |t ()| sup ∈ ∈ @m1 @m2 @2 et () + 2 sup |t () − et ()| sup ∈ ∈ @m1 @m2 @ @ + 2 sup t () sup (t () − et ()) @ @ m1 m2 ∈ ∈ @ @ + 2 sup (t () − et ()) sup et () ∈ @m1 ∈ @m2 @2 sup |t ()| + sup et () + sup ∈ ∈ @m1 @m2 ∈ ! @ et () : + sup ∈ @m2

6K1 t

From Lemma A.1 we have E sup |t ()|6E|X1 | ∈

∞ P

sup |ci ()| = M1 ¡ ∞:

i=0 ∈

Therefore, by Markov’s inequality, ∀ ¿ 0

∞ P t=0

t

P sup |t ()|¿ 6 ∈

By Borel–Cantelli’s lemma t sup |t ()| → 0 ∈

a:s:

Similarly

@ t () → 0 sup @m t

∈

!

1

a:s:

∞ M1 P t ¡ ∞: t=0

@ () @m t 1

C. Francq, J.-M. Zakoan / Journal of Statistical Planning and Inference 83 (2000) 369–394

385

From Lemma 2 we have ∞ P @2 E sup et () 6E|X1 | sup |ci; m1 ; m2 ()| = M2 ¡ ∞: ∈ @m1 @m2 i=2 ∈ We deduce that @2 t sup et () → 0 ∈ @m1 @m2 By the same arguments @ t sup et () → 0 ∈ @m2

a:s:

a:s:

which completes the proof of (A.4), and (A.5) follows in the same way. Lemma A.4. For any ∈ let 0 n @ 2P @ t () t () : Jn∗ () = n t=1 @ @ We have Jn∗ (0 ) → J (0 )

a:s:

as n → ∞. Proof. Using Lemma A.1 together with the assumption that EXt2 ¡ ∞ we show that t (), (@[email protected]i )t () and (@2 [email protected]i @j )t () have moments of order 2. Therefore @ @2 @ E t () t () ¡ ∞ and E t () t () ¡ ∞: @i @j @i @j From Lemma A.3 and the ergodic theorem we have 2 @ On () J () = lim n→∞ @i @j n @2 @ @ 2 P = lim t () t () + t () t () n→∞ n t=1 @i @j @i @j @2 @ @ = 2 E0 t () t () + E0 t () t () : @i @j @i @j Since (@2 [email protected]i @j )t () belongs to the Hilbert space generated by (Xs ; s ¡ t), we have E0 t (@2 [email protected]i @j )t (0 ) = 0 which shows that 0 @ @ t () : J (0 ) = 2E0 t () @ @ Then the proof follows from the ergodic theorem.

386

C. Francq, J.-M. Zakoan / Journal of Statistical Planning and Inference 83 (2000) 369–394

Lemma A.5. For any matrix norm k:k we have lim sup kJˆn () − Jn∗ ()k = 0

n→∞ ∈

a:s:

Proof. From Lemma A.3 we obtain sup kJˆn () − Jn∗ ()k

∈

0 0

@

n 2P @ @ @

6 t () − et () et () → 0 sup t ()

n t=1 ∈ @ @ @ @

a:s:

Lemma A.6. We have lim kJn∗ (ˆn ) − Jn∗ (0 )k = 0 a:s:

n→∞

Proof. Without loss of generality suppose that k:k is the Euclidean norm. From Lemma A.1 we easily show that

@ @ @

¡ ∞: E0 sup () () (A.6)

@ @m t @m t ∈

1

2

Doing a Taylor expansion and using the ergodic theorem, (A.6) and Theorem 1 we obtain |Jn∗ (ˆn )(m1 ; m2 ) − Jn∗ (0 )(m1 ; m2 )|

n

@ @ 2P @ ˆ

sup t () t () 6

kn − 0 k → 0 n t=1 ∈ @ @m1 @m2

a:s:

Proof of Theorem 2. We have kJˆn (ˆn ) − J (0 )k6kJˆn (ˆn ) − Jn∗ (ˆn )k + kJn∗ (ˆn ) − Jn∗ (0 )k + kJn∗ (0 ) − J (0 )k: Therefore the conclusion follows from Lemmas A.4 –A.6. Lemma A.7. For 06i ¡ n; let ∗i ()

0 P 4 n−i @ @ = t () t () t+i () t+i () n t=1 @ @

and ∗−i () = ∗i ()0 : We have sup

sup |ˆi ()(m1 ; m2 ) − ∗i ()(m1 ; m2 )| = OP

−n ¡i¡ n ∈

1 √ n

:

C. Francq, J.-M. Zakoan / Journal of Statistical Planning and Inference 83 (2000) 369–394

387

Proof. Proceeding as in the proof of Lemma A.3 we show that there exist almost surely constants K2 ¿ 0 and ∈ [0; 1[ independent of and i such that, for (m1 ; m2 ) ∈ {1; : : : ; p + q}2 , @ @ @ @ K2 t () t () t+i () t+i () − et () et () et+i () et+i () @m1 @m2 @m1 @m2 @ @ t () + |et ()| t+i () t+i () 6t @m1 @m2 @ t+i @ + @m t+i () + |et+i ()| et () @m et () : 1 1 By the Cauchy–Schwarz inequality we have P t @ @ 1 n−i t () + |et ()| t+i () t+i () sup sup √ @m1 @m2 n t=1 06i¡ n ∈ s n n @ P @ 1P t 6 sup t () + |et ()| sup t () t () : @ n @ m1 m2 t=1 ∈ t=1 ∈ By the ergodic theorem and Lemma A.1 we have almost surely n @ @ 1P sup t () t () → E0 sup t () t () ¡ ∞: n t=1 ∈ @m2 @m2 ∈ The Markov inequality shows that the sequence n @ P t sup t () + |et ()| @m1 t=1 ∈ is bounded in probability. Similarly, we show that P t+i @ @ 1 n−i t+i () + |et+i ()| et () et () = OP (1) sup sup √ @m1 @m1 n t=1 06i¡n ∈ which completes the proof. Lemma A.8. Let In∗ () =

n−1 P

!(ibn )∗i ():

i=−n+1

We have sup kIˆn () − In∗ ()k → 0

∈

in probability as n → ∞:

Proof. The assumptions made on !(·) and bn entail that P bn |!(ibn )| = O(1): −n¡ i¡ n

388

C. Francq, J.-M. Zakoan / Journal of Statistical Planning and Inference 83 (2000) 369–394

Thus, from Lemma A.7 we obtain sup kIˆn () − In∗ ()k 6

∈

sup kˆi () − ∗i ()k

sup

−n¡ i¡ n ∈

P

|i|¡ n

= OP (n−1=2 b−1 n )bn

P |i|¡ n

|!(ibn )|

|!(ibn )| = oP (1):

Lemma A.9. Let Yt; i = 4(Yt Yt+i − EYt Yt+i ); where Yt = t t∗ Let ˜t =

∞ P

and

t∗ =

|cj (0 )Xt−j |

j=0

@ t (0 ) @m1 ˜∗t =

and

(m1 ∈ {1; : : : ; p + q}):

∞ P

|cj; m1 (0 )Xt−j |;

j=1

where the sequences (cj (0 )) and (cj; m1 (0 )) are de ned in Lemma A:1. For any positive constant de ne the truncated variable Yt;i = Yt; i

if

max{˜t ; ˜t+i ; ˜∗t ; ˜∗t+i }61=4 and |Yt; i |6; and Yt;i = 0 otherwise:

For |i| ¡ n and 0 ¿ 0 there exists a constant K4 independent of ; i and n such that ! P K4 (|i| + 1) 1 n−|i| var ; ∀¿0 : Y 6 n t=1 t; i n Proof. In order to lighten the notations, consider only the case 06i ¡ n. The case −n ¡ i ¡ 0 is similar. By stationarity we have n−i P 1 +∞ 1P var Yt; i 6 |cov(Y1; i ; Y1+k; i )|: n t=1 n k=−∞ We have 2 |EY1; i Y1+k; i |6E|Y1; i |68EYt

and

2 |EY1; i EY1+k; i |6E|Y1; i |68EYt :

Therefore we have 2 |cov(Y1; i ; Y1+k; i )|616EYt :

Now we will show that there exist a constant ∈ [0; 1[ and a constant K5 independent of i, k and such that for 2i6|k|;

|k| |cov(Y1; i ; Y1+k; + (X ([|k|=2] − i))=(2+) ): i )|6K5 (

The conclusion will follow. For any positive integer r write r t

=

r P

cj (0 )Xt−j ;

j=0

∗ r t

=

r P

cj; m1 (0 )Xt−j ;

j=1

r ˜t

=

r P

|cj (0 )Xt−j |;

j=0

(A.7)

C. Francq, J.-M. Zakoan / Journal of Statistical Planning and Inference 83 (2000) 369–394 ∗ r ˜t

=

r P

389

|cj; m1 (0 )Xt−j |:

j=1

Consider also the twice truncated variables r Yt; i

∗ = 4( r t r t∗ r t+i r t+i − EYt Yt+i )

if

|Yt;i | 6= 0

and

r Yt; i

=0

otherwise:

Let r

Yt;i = Yt;i − r Yt;i :

We have | r Yt;i |64(d1 + d2 + d3 + d4 ); where d1 =

∞ P

|cj (0 )Xt−j |˜∗t ˜t+i ˜∗t+i ;

j=r+1

d3 = r ˜t r ˜∗t

∞ P

d2 = r ˜t

|cj (0 )Xt+i−j |˜∗t+i ;

j=r+1

∞ P

|cj; m1 (0 )Xt−j |˜t+i ˜∗t+i ;

j=r+1

d4 =r ˜t r ˜∗t r ˜t+i

∞ P

|cj; m1 (0 )Xt+i−j |

j=r+1

if max{˜t ; ˜t+i ; ˜∗t ; ˜∗t+i }61=4 and d 1 = d 2 = d 3 = d4 = 0

otherwise:

Using Lemma A.1 it is easy to show that, for j = 1; 2; 3; 4, |dj |6

and

E|dj | = O(r )

uniformly in i for some ∈ [0; 1[. Therefore there exists a constant K7 independent of i, r and such that E|r Yt;i |2 6K7 r and there exists a constant K8 , independent of i, k, r and , such that r E| r Yt;i Yt+k; i |6K8

and

r |E r Yt;i EYt+k; i |6K8 :

Thus we have r |cov(r Yt;i ; Yt+k; i )|62K8 :

(A.8)

The same bound holds for |cov( r Yt;i ; r Yt+k; i )|. Denoting IF the indicator function of an event F, we have

k r Yt;i k2+ 6 4EYt2 + 4k˜t ˜t+i ˜∗t ˜∗t+i I{|Yt;i |6=0} k2+ q q 6 4EYt2 + 4k ˜t ˜t+i ˜∗t ˜∗t+i I{max{˜t ;˜t+i ;˜∗t ;˜∗t+i }61=4 } ˜t ˜t+i ˜∗t ˜∗t+i k2+ 6 4EYt2 + 4((2+)=2 E|˜t ˜t+i ˜∗t ˜∗t+i |(2+)=2 )1=(2+) 6 4EYt2 + 41=2 k˜t k4+2 k˜∗t k4+2 :

390

C. Francq, J.-M. Zakoan / Journal of Statistical Planning and Inference 83 (2000) 369–394

Now note that r Yt;i is measurable with respect to the - eld generated by Xt+i ; Xt+i−1 ; : : : ; Xt−r . Let X (j) = X (0) for all j60. The Davydov (1968) inequality gives 2 =(2+) |cov( r Yt;i ; r Yt+k; i )| 6 12k r Yt; i k2+ (X (|k| − i − r))

6 K9 (X (|k| − i − r))=(2+)

(A.9)

for some constant K9 independent of i, k, r and . Since r r |cov(Yt;i ; Yt+k; i )|6|cov( Yt; i ; Yt+k; i )| + |cov( r Yt; i ; Yt+k; i )| + |cov( r Yt; i ; r Yt+k; i )|;

Eq. (A.7) is obtained from (A.8) and (A.9) by taking r = [|k|=2]. Lemma A.10. Let Tn be the integer part of a=bn . For all ¿ 0 we have ! 1 n−|i| P sup P Yt; i ¿ bn = o(bn ): n t=1 |i|6Tn Proof. The proof is based on a standard method of truncation. In view of (9) we can consider a sequence of numbers n such that n n=(2+) b2=(2+) →∞ n

and

n b−4 n → 0;

as n → ∞. Note that if 1 n−|i| 1 n−|i| P P n n Yt; i 6= Yt; i n t=1 n t=1 then there exists t ∈ {1; : : : ; n} such that max{˜t ; ˜t+i ; ˜∗t ; ˜∗t+i } ¿ (n n)1=4 or |Yt; i | ¿ n n. To prove the lemma it suces therefore to show that n(P(˜t ¿ (n n)1=4 ) + P(˜∗t ¿ (n n)1=4 )) = o(bn );

(A.10)

n sup P(|Yt; i | ¿ n n) = o(bn )

(A.11)

! 1 n−|i| P n n Y ¿ bn = o(bn ): sup P n t=1 t; i i

(A.12)

i

and

¡ ∞: From the Markov inequality First note that Lemma A.1 entails that M := E ˜4+2 t we obtain P(˜t ¿ (n n)1=4 )6

M : (n n)(4+2)=4

The same inequality holds with ˜∗t which entails (24). Since supi E|Yt; i |(2+)=2 ¡ ∞, (A.11) is obtained in the same way.

C. Francq, J.-M. Zakoan / Journal of Statistical Planning and Inference 83 (2000) 369–394

Now note that 1 n−|i| P n n Y 6|EYt;in n |6E|Yt; i − Yt;in n | E n t=1 t; i Z Z |y|dPYt; i (y) + =

∗ 1=4 and |y|6 n max{˜t ;˜t+i ;˜∗ n t ;˜t+i }¿(n n)

|y|¿n n

391

|y| dPYt; i (y)

E|Yt; i |1+=2 4M + = o(bn ) (n n)=2 (n n)=2 uniformly in i. From the Chebyshev inequality and Lemma A.9 we obtain, for n suciently large, ! ! 1 n−|i| 1 n−|i| P n n 1 n−|i| P n n P n n Y ¿ 2bn 6 sup P Y −E Y ¿ bn sup P n t=1 t; i n t=1 t; i n t=1 t; i |i|6Tn |i|6Tn 6

6

K10 n 2 b3n

which entails (A.12). Lemma A.11. We have In∗ (0 ) → I (0 )

in probability

as n → ∞. Proof. It is easy to show (see Francq and Zakoan, 1998, Lemma 3) that =(2+) i ki ()k6M3 i + M4 X 2 for some constants M3 ¿ 0, M4 ¿ 0 and ∈ [0; 1[. We have

(A.13)

kI (0 ) − In∗ (0 )k6d5 + d6 + d7 ; where d5 = d7 =

P |i|6Tn

P

|i|¿Tn

|!(ibn )k|∗i (0 ) − i (0 )k;

d6 =

P |i|6Tn

|1 − !(ibn )k|i (0 )k;

ki (0 )k:

In view of (A.13), d7 → 0 as n → ∞. Let l be a xed integer and write d6 = s1 + s2 , where P P |1 − !(ibn )k|i (0 )k and s2 = |1 − !(ibn )k|i (0 )k: s1 = |i|6l

l¡|i|6Tn

For |i|6l, !(ibn ) → 1, so s1 → 0: Using (A.13) and the fact that !(·) is bounded, we show that s2 can be made arbitrarily small by choosing l suciently large. Hence d6 → 0. We have d5 6supx |!(x)|(s3 + s4 ), where P P k∗i (0 ) − E∗i (0 )k and s4 = ki (0 ) − E∗i (0 )k: s3 = |i|6Tn

|i|6Tn

392

C. Francq, J.-M. Zakoan / Journal of Statistical Planning and Inference 83 (2000) 369–394

Now note that P 1 n−i Yt; i = (∗i (0 ) − E∗i (0 ))(m1 ; m1 ): n t=1 Therefore, a straightforward extension of Lemma A.10 shows that, for all ¿ 0, P(s3 ¿ )6(2Tn + 1) sup P k∗i (0 ) − E∗i (0 )k ¿ = o(1): (2Tn + 1) |i|6Tn Noting that sup ki (0 )k ¡ ∞ i

we obtain s4 =

P |i| ki (0 )k = o(1) |i|6Tn n

since nb2n → ∞: Lemma A.12. We have kIn∗ (ˆn ) − In∗ (0 )k → 0

in probability:

Proof. Doing a Taylor expansion around 0 , we obtain

@ ∗ ∗

|In∗ (ˆn )(m1 ; m2 ) − In∗ (0 )(m1 ; m2 )|6kˆn − 0 k sup I ( )(m ; m ) 1 2 : n

∗ ∈ @

Using the Cauchy–Schwarz inequality, the ergodic theorem and Lemma A.1 we obtain

@ ∗ ∗ 1

sup I ( )(m ; m ) 1 2 n

2Tn + 1 ∗ ∈ @

@ P 4 n−i ∗ @ ∗ ∗ @ ∗

t ( ) sup t ( )t+i ( ) t+i ( ) 6 sup |!(x)| sup

m1 m2 x 06i6Tn n t=1 ∗ ∈ @ = O(1)

a:s:

Now note from Theorem 1 that √ un n(ˆn − 0 ) → 0 in probability for any sequence (un ) tending to zero. Therefore kˆn − 0 k = oP (Tn−1 ) and the conclusion follows. Proof of Theorem 3. From Lemmas A.8, A.11 and A.12. Lemma A.13. Under the assumptions of Theorem 4; Iˆn (ˆn ) is a non-negative-de nite matrix.

C. Francq, J.-M. Zakoan / Journal of Statistical Planning and Inference 83 (2000) 369–394

393

Proof. We rst show that the sequence (ˆi (ˆn ); i ∈ Z) is of non-negative-de nite type. Let @ Yˆ t = et (ˆn ) et (ˆn ) @ for 16t6n and Yˆ t = 0 otherwise. For all positive integer m and for all vectors v1 ; : : : ; vm ∈ Cp+q we have m P i; j=1

vi∗ˆj−i (ˆn )vj = =

m P i; j=1

vi∗

P 1 +∞ ∗ Yˆ t+i Yˆ t+j vj n t=−∞

m P P 1 +∞ ∗ vi∗ Yˆ t+i Yˆ t+j vj n t=−∞i; j=1

m P P 1 +∞ = vi∗ Yˆ t+i n t=−∞ i=1

m P j=1

!∗ vj∗ Yˆ t+j

¿0;

where ∗ stands for conjugate transpose. The sequence (ˆi (ˆn ); i ∈ Z) is therefore the autocovariance function of a (p + q)-dimensional second-order stationary process (Wt )t∈Z (see e.g. Rozanov, 1967, p. 20). Similarly (!(ibn ); i ∈ Z) is the autocovariance function of a zero-mean second-order stationary process (wt )t∈Z that we can select to be independent of (Wt )t∈Z . It is easy to show that (!(ibn )ˆi (ˆn ); i ∈ Z) is the autocovariance function of (wt Wt )t∈Z . Therefore we have ∞ P Iˆn (ˆn ) = !(ibn )ˆi (ˆn ) i=−∞

=

∞ P i=−∞

cov(w1 W1 ; w1+i W1+i )

m P 1 var wt Wt m→∞ m t=1

= lim

which proves the result. Proof of Theorem 4. From Theorems 2, 3 and Lemma A.13. References Anderson, T.W., 1971. The Statistical Analysis of Time Series. Wiley, New York. Andrews, D.W.K., 1991. Heteroskedasticity and autocorrelation consistent covariance matrix estimation. Econometrica 59, 817–858. Ansley, C.F., 1979. An algorithm for the exact likelihood of a mixed autoregressive-moving average process. Biometrika 66, 59–65. Berlinet, A., Francq, C., 1997. On Bartlett’s formula for nonlinear processes. J. Time Series Anal. 18, 535–552. Bosq, D., 1996. Nonparametric Statistics for Stochastic Processes — Estimation and Prediction. Lecture Notes in Statistics, Vol. 110. Springer, New York. Bosq, D., Lecoutre, J.P., 1987. Theorie de l’estimation fonctionnelle. Economica. Brockwell, P.J., Davis, R., 1991. Time Series: Theory and Methods. Springer, Berlin.

394

C. Francq, J.-M. Zakoan / Journal of Statistical Planning and Inference 83 (2000) 369–394

Davydov, Y.A., 1968. Convergence of distributions generated by stationary stochastic processes. Theory Probab. Appl. 13, 691–696. De Gooijer, J.G., 1985. A Monte Carlo study of the small-sample properties of some estimators for ARMA models. Comput. Statist. Quart 3, 245–266. Devroye, L., 1987. A Course in Density Estimation. Birkhauser, Basel. Feller, W., 1957. An Introduction to Probability Theory and its Application. Vol. 1, 2nd Edition. Wiley, New York. Francq, C., Zakoan, J.M., 1998. Estimating linear representations of nonlinear processes. J. Statist. Plann. Inference 68, 145–165. Gallant, A.R., White, H., 1988. A Uni eld Theory of Estimation and Inference for Nonlinear Dynamic Models. Basil Blackwell, New York. Gourieroux, C., Monfort, A., Trognon, A., 1984. Pseudo maximum likelihood methods: theory. Econometrica 52, 681–700. Hamilton, J.D., 1989. A new approach to the economic analysis of nonstationary time series and the business cycle. Econometrica 57, 357–384. Hamilton, J.D., 1994. Time Series Analysis. Princeton University Press, Princeton, NJ. Hansen, B.E., 1982. Large sample properties of generalized method of moments estimators. Econometrica 50, 1029–1054. Hansen, B.E., 1992. Consistent covariance matrix estimation for dependent heterogeneous processes. Econometrica 60, 967–972. Melard, G., 1984. Algorithm AS 197: a fast algorithm for the exact likelihood of autoregressive-moving average models. J. Roy. Statist. Soc. Ser. C 33, 104 –114. Melard, G., Paesmans, M., Roy, R., 1991. Consistent estimation of the asymptotic covariance structure of multivariate serial correlations. J. Time Series Anal. 12, 351–361. Newbold, P., 1974. The exact likelihood function for a mixed autoregressive-moving average process. Biometrika 61, 423–426. Newey, W.K., West, K.D., 1987. A simple, positive semi-de nite, heteroskedasticity and autocorrelation consistent covariance matrix. Econometrica 55, 703–708. Pham, D.T., 1986. The mixing property of bilinear and generalized random coecient autoregressive models. Stochastic Process. Appl. 23, 291–300. Phillips, P.C.B., 1987. Time series regression with a unit root. Econometrica 55, 277–301. Phillips, P.C.B., Perron, P., 1987. Testing for unit root in time series regression. Biometrika 75, 335–346. Priestley, M.B., 1981. Spectral Analysis and Time Series. Academic Press, New York. Robinson, P.M., 1977. Estimating variances and covariances of sample autocorrelations and autocovariances. Austral. J. Statist. 19, 236–240. Robinson, P.M., 1991. Automatic frequency domain inference on semiparametric and nonparametric models. Econometrica 59, 1329–1363. Romano, J.P., Thombs, L.A., 1996. Inference for autocorrelations under weak assumptions. J. Amer. Statist. Assoc. 91, 590 – 600. Rozanov, Y.A., 1967. Stationary Random Processes. Holden-Day, San Francisco. White, H., 1984. Asymptotic Theory for Econometricians. Academic Press, New York.