- Email: [email protected]

Contents lists available at SciVerse ScienceDirect

Computational Statistics and Data Analysis journal homepage: www.elsevier.com/locate/csda

Robust estimation for the covariance matrix of multivariate time series based on normal mixtures Byungsoo Kim, Sangyeol Lee ∗ Department of Statistics, Seoul National University, Seoul, 151-742, Republic of Korea

article

info

Article history: Received 24 November 2011 Received in revised form 7 May 2012 Accepted 12 June 2012 Available online 21 June 2012 Keywords: Density-based divergence measures Robust estimation Autocovariance function Consistency Asymptotic normality

abstract In this paper, we study the robust estimation for the covariance matrix of stationary multivariate time series. As a robust estimator, we propose to use a minimum density power divergence estimator (MDPDE) designed by Basu et al. (1998). To supplement the result of Kim and Lee (2011), we employ a multivariate normal mixture family instead of a multivariate normal family. As a special case, we consider the robust estimator for the autocovariance function of univariate stationary time series. It is shown that the MDPDE is strongly consistent and asymptotically normal under regularity conditions. Simulation results are provided for illustration. A real data analysis applied to the portfolio selection problem is also considered. © 2012 Elsevier B.V. All rights reserved.

1. Introduction The maximum likelihood estimator (MLE) for normal mixture models has been studied by many authors; for instance, we can refer the reader to Sundberg (1974), Laird (1978), Redner (1981), Lindsay (1983), Redner and Walker (1984), Hathaway (1985), and the articles cited therein. It is widely appreciated that MLE shows very poor performance either when outliers exist or the likelihood function explodes as in such a case that one of the means in the model equals one of the data and the corresponding variance is close to 0. To cope with such defect, the research has developed the minimum distance estimators based on the Wolfowitz distance (Choi, 1969), the Cramer–von Mises distance (Woodward et al., 1984), the squared L2 norm of cumulative distribution function (Clarke and Heathcote, 1994), the minimum Hellinger distance (Cutler and CorderoBraña, 1996) and L2 distance of the density function (Scott, 2001). Unlike the others, the minimum Hellinger distance has the asymptotic efficiency as MLE achieves when observations follow hypothesized models under consideration. However, this method has a drawback of requiring to use some nonparametric smoothing methods, where one possibly encounters rather a demanding problem like the selection of bandwidth. Estimation of the autocovariance function (ACF) has been a core issue in time series analysis since ACF stands for the dependence structure of time series and the estimation of ACF is closely connected with a model selection problem. Due to its importance, some authors studied the robust estimation for ACF in univariate time series; for instance, see Ma and Genton (2000). Ma and Genton’s estimator is proven to produce a highly robust estimator for the ACF. However, it also has a shortcoming that the normalizing factor in their estimator must be chosen differently according to the underlying distribution of given data. To overcome this defect, Kim and Lee (2011) proposed to use the minimum density power divergence estimator (MDPDE) designated by Basu et al. (1998) (BHHJ) in the ACF estimation problem and demonstrated a superiority to Ma and Genton’s estimator. The MDPDE is proven to have strong robust properties with low loss in the asymptotic efficiency relative to the MLE under various circumstances. As a relevant paper, we refer the reader to

∗

Corresponding author. E-mail addresses: [email protected], [email protected] (S. Lee).

0167-9473/$ – see front matter © 2012 Elsevier B.V. All rights reserved. doi:10.1016/j.csda.2012.06.012

126

B. Kim, S. Lee / Computational Statistics and Data Analysis 57 (2013) 125–140

Fujisawa and Eguchi (2006), who show that the objective function of MDPDE is bounded under mild conditions in iid univariate normal mixture models. The objective of this paper is to provide a robust estimator for the mean and covariance of multivariate time series. In this study, we use the MDPDE method but employ a multivariate normal mixture family instead of a multivariate normal family since according to the result of Kim and Lee (2011), the normal distribution approach does not perform well when the distributions of data is far from a normal distribution. It will be shown that the normal mixture approach performs more properly in terms of both the efficiency and robustness. Although we emphasize the robust estimation for the mean and covariance matrix of multivariate time series, if the true distribution of data belongs to the multivariate normal mixture family, we can also provide the robust estimators for normal mixture parameters. This paper is organized as follows. In Section 2, we introduce the construction of the robust estimator using the BHHJ’s procedure. In Section 3, we show asymptotic properties of the MDPDE and its robustness by analyzing the influence function. In Section 4, we apply our method to the estimation of parameters of multivariate normal mixture time series and compare its performance with that of the MLE. Further, we conduct a simulation study to compare the performance of the proposed estimator for the ACF with the sample autocovariance function (SACF). In Section 5, we apply our method to the portfolio optimization problem by using Dow Jones Industrial average data. In Section 6, we provide the proofs. 2. MDPDE with multivariate normal mixture family Consider a parametric family of models {Fθ }, indexed by the unknown parameter θ ∈ Θ ⊂ Rρ , possessing densities {fθ } with respect to the Lebesgue measure, and let G be the class of all distributions having densities with respect to the Lebesgue measure. For estimating the unknown parameter θ , BHHJ introduced a family of density power divergences

1 1+α 1 α 1+α g (z )f (z ) + g (z ) dz , f (z ) − 1 + α α dα (g , f ) := g (z )(log g (z ) − log f (z ))dz ,

α > 0, α = 0,

where g and f are density functions, and defined the minimum density power divergence functional Tα (·) for G in G by dα (g , fTα (G) ) = min dα (g , fθ ), θ∈Θ

where g is the density of G. Note that if G belongs to {Fθ }, Tα (G) := θα = θ for some θ ∈ Θ . Based on these, given the random sample X1 , . . . , Xn with unknown density g, they defined the MDPDE as

θˆα,n = argmin Hα,n (θ ),

(2.1)

θ∈Θ

where Hα,n (θ ) = Vα (θ; x) =

1 n

n

t =1

Vα (θ; Xt ) and

fθ1+α (z )dz −

1+

− log fθ (x),

1

α

fθα (x),

α > 0,

(2.2)

α = 0.

BHHJ demonstrated that the estimator is robust against outliers but still has a high efficiency when the true density belongs to the parametric family {Fθ } and α is close to 0. Note that when α = 0, 1, MDPDE is the same as the MLE and L2 the distance estimator respectively. In this paper, we study the MDPDE for the mean and covariance matrix of a d-dimensional strictly stationary and ergodic time series {Xt , t = 1, 2, . . .}. Since the α > 1 case can cause a great loss of efficiency for some basic models as described by Basu et al., we focus on the case 0 < α ≤ 1. In order to obtain the MDPDE for the mean and covariance matrix of Xt , we consider a d-dimensional multivariate normal mixture parametric family in BHHJ’s procedure. Let F = {fθ : θ ∈ Θ } be the set of m-component d-dimensional multivariate normal mixture densities of the form fθ (x) =

m

ωj φ(x; µj , Σj ),

j =1

where m is known and φ(x; µj , Σj ) = √

Σj is symmetric, 0 <

1

1 (1 + α)d/2+1 n

α

exp − 12 (x − µj )′ Σj−1 (x − µj ) satisfies for j = 1, . . . , m,

d

2π |Σj |1/2

≤ min ωj ≤ 1,

0 < c2 ≤ λmin (Σj ) ≤ λmax (Σj ) ≤ c3 < ∞

j

m

ωj = 1,

∥µj ∥ ≤ c1 < ∞,

j =1

for some positive constants c1 , c2 , c3 ,

where λmin (Σj ) and λmax (Σj ) denote the minimal and maximal eigenvalues of Σj .

(2.3)

B. Kim, S. Lee / Computational Statistics and Data Analysis 57 (2013) 125–140

127

Further, we assume that the model F is weakly identifiable, that is, m

ωj1 φ(x; µ1j , Σj1 ) =

m

j=1

ωj2 φ(x; µ2j , Σj2 ) a.e. ⇔

j =1

m j =1

ωj1 δ(µ1 ,Σ 1 ) = j

j

m

ωj2 δ(µ2 ,Σ 2 ) ,

j =1

j

j

where δ(µj ,Σj ) (·) is a function with δ(µj ,Σj ) (µj , Σj ) = 1 and δ(µj ,Σj ) (x, y) = 0 for all (x, y) ̸= (µj , Σj ). We set

θ = (ω1 , . . . , ωm−1 , (µj )t , (Σj )r ,s , j = 1, . . . , m, t = 1, . . . , d, r = 1, . . . , d, s = 1, . . . , r )′ and denote by Θ ⊂ (0, 1]m−1 × Rmd × Rmd(d+1)/2 the set of all θ ’s satisfying (2.3). For notational convenience, we denote by ρ the dimension of the parameters, i.e., ρ = (m − 1) + md + md(d + 1)/2. Note that due to the assumption on ωj , Hα,n (θ ) is bounded (see Fujisawa and Eguchi, 2006, for details). We assume that the order m is known; the selection of m can be made by comparing the robust version of AIC (Ronchetti, 1997) values. The assumptions in (2.3) guarantee the compactness of the parameter space. ˆ j,α )r ,s , Since Vα (θ; x) is differentiable with respect to θ , we can get the MDPDE θˆα,n := (ω ˆ 1,α , . . . , ωˆ m−1,α , (µ ˆ j,α )t , (Σ j = 1, . . . , m, t = 1, . . . , d, r = 1, . . . , d, s = 1, . . . , r )′ from the estimating equation: Uα,n (θˆα,n ) =

∂ Hα,n (θˆα,n ) = 0. 1+α ∂θ 1

To obtain θˆα,n , we can feasibly develop an algorithm similar to the EM algorithm; see Section 4. By using θˆα,n , we define the MDPDE for the mean and covariance matrix of Xt by

m−1

µ ˆ α :=

ωˆ j,α µ ˆ j,α + 1 −

j =1

m−1

ωˆ j,α µ ˆ m,α

j =1

and m−1

ˆ α := Σ

ωˆ j,α

m −1 ′ ˆ ˆ m,α + µ Σj,α + µ ˆ j,α µ ˆ j,α + 1 − ωˆ j,α Σ ˆ m,α µ ˆ ′m,α − µ ˆ αµ ˆ ′α .

j =1

j =1

In particular, the result can be applied to obtain the robust estimator for the ACF of univariate time series. Suppose that

{Xt , t = 1, 2, . . .} is a strictly stationary and ergodic univariate time series. From this, we can make the d-dimensional multivariate time series such as X1 = (X1 , . . . , Xd )′ , X2 = (X2 , . . . , Xd+1 )′ , . . . . By applying the MDPDE method, we can get a robust estimator for the ACF from lag 0 to d − 1. 3. Asymptotic properties of MDPDE

In this section, we verify the strong consistency and asymptotic normality of the MDPDE. To establish the asymptotic properties of the MDPDE, we assume the following conditions. (A1) {Xt , t = 1, 2, . . .} is d-dimensional strictly stationary and ergodic. (A2) θα = Tα (G) exists 2 uniquely in the interior of Θ . ∂ Vα (θα ;X ) 1 E is nonsingular. (A3) Jα := − 1+α ′ ∂θ∂θ (A4) {Xt } is strong mixing with mixing order τ (·) of size −γ /(γ − 2) for some γ > 2, i.e., ∞ 1 (A5) Kα is positive definite, where (i, j)-th component (Kα )i,j = (1+α) 2 k=−∞ Γα,ij (k) and

∞

Γα,ij (t − s) := E

∂ Vα (θα ;Xt ) ∂ Vα (θα ;Xs ) ∂θi ∂θj

n =1

τ (n)1−2/γ < ∞.

for all t , s ≥ 1.

p

q

Remark 1. For instance, the VARMA(p, q) process {Xt } from the equation i=0 Φi Xt −i = j=0 Ψj ϵt −j , where ϵt are iid normal random vectors with zero mean, Φ1 , . . . , Φp , Ψ1 , . . . Ψq are d × d matrices, Φ0 , Ψ0 are d × d identity matrices, p and det ( i=0 Φi z i ) ̸= 0 for all |z | ≤ 1, satisfies (A1)–(A4): with regard to the mixing condition, see Doukhan (1994) and Bradley (2007). However, verifying (A5) is not straightforward compared to the others. Note that if G belongs to the multivariate normal mixture family and θ0 is true parameter, for any y ̸= 0, y′ (−Jα )y = if

∂ f θ0 ( z ) z : y′ ∂θ ̸= 0 is not measure 0 for each y ̸= 0, (A3) is satisfied. ∂ fθ0 (z ) : ∂θ ̸= 0 contains a closed ball. Note that in the iid case, (Kα )i,j is

z

dz. Hence,

This happens, for instance, if fθ0 is positive and 1 reduced to (1+α) 2E

∂ Vα (θα ;X) ∂ Vα (θα ;X) ∂θi ∂θj

due to (3.5) below. Theorem 3.1. Let {θˆα,n } be the sequence of the MDPDEs satisfying (2.1). Then under (A1) and (A2), P {θˆα,n → θα as n → ∞} = 1.

∂ fθ ( z ) 2

1 0 fθα− (z ) y′ ∂θ 0

, which is

128

B. Kim, S. Lee / Computational Statistics and Data Analysis 57 (2013) 125–140

Remark 2. Since the MDPDE is Fisher consistent (cf. Basu et al., 1998), if the distribution of data belongs to the multivariate normal mixture family and θ0 is the true parameter, it must hold that θˆα,n → θ0 a.s. Based on the above, we obtain the asymptotic normality of the MDPDE. Since Hα,n (θ ) is three-times differentiable with

respect to θ ∈ Θ and Uα,n (θˆα,n ) = 0, by Taylor’s theorem, we get 0 = Uα,n (θˆα,n ) = Uα,n (θα ) − Rα,n (θˆα,n − θα ), where Rα,n is the ρ × ρ matrix whose (i, j)-th entry is

(Rα,n )i,j = −

1 1+α

ρ ∗ ∂ 3 Hα,n (θα, ∂ 2 Hα,n (θα ) 1 n) + ((θˆα,n )k − (θα )k ) ∂θi ∂θj 2 k=1 ∂θi ∂θj ∂θk

(3.4)

∗ ∗ ˆ for some θα, n satisfying θα,n = θα + u(θα,n − θα ), u ∈ [0, 1]. Then, we have the following equation:

θˆα,n − θα = Jα−1 Uα,n (θα ) + ∆α,n , where ∆α,n = Jα−1 (Jα − Rα,n )(θˆα,n − θα ). Consequently, we have

√

n(θˆα,n − θα ) = Jα−1

√

nUα,n (θα ) +

√

n∆α,n .

From this expression and Lemmas 6.3–6.5, we get the following result. Theorem 3.2. Suppose that conditions (A1)–(A5) hold. Then, we have

√

d

n(θˆα,n − θα ) − → N (0, Jα−1 Kα Jα−1 )

as n → ∞,

where 0 denotes the ρ -dimensional zero vector and Jα and Kα are the ones defined in (A3) and (A5), respectively.

ˆ α ) can be represented by µ ˆ α ) = h2 (θˆα,n ), where h1 , h2 are continuously Note that µ ˆ α and v ech(Σ ˆ α = h1 (θˆα,n ) and v ech(Σ ˆ α into a column differentiable functions in a neighborhood of θ0 and v ech converts the upper triangular components of Σ vector. Therefore, by using the delta method, we can have ∂ h1 (θ0 ) −1 −1 ∂ h1 (θ0 ) ′ n(µ ˆ α − µα ) → N 0, J α Kα J α ∂θ ′ ∂θ ′

√ and

∂ h2 (θ0 ) −1 −1 ∂ h2 (θ0 ) ′ ˆ J α Kα J α . n(v ech(Σα ) − v ech(Σα )) → N 0, ∂θ ′ ∂θ ′

√

Remark 3. As seen in the above, the strong mixing condition is only needed for deriving the asymptotic normality; see Section 6. Remark 4. In general, there are no universal rules for the selection of α . Some guidelines are provided in Fujisawa and Eguchi (2006). Now, we discuss the influence function of the MDPDE. The influence function is widely used to describe the robust properties of the estimator. We follow the arguments in Chapter 8.3b of Hampel et al. (1986). Suppose that (A1)–(A3) hold. The statistical functional T at G corresponding to the MDPDE can be defined as

argmin θ∈Θ

Vα (θ; x)dG,

where Vα (θ; x) is defined in (2.2). Note that since EVα (θ; X ) = dα (g , fθ ) − α1

g 1+α (z )dz,

θα = argmin dα (g , fθ ) = argmin EVα (θ; X ). θ ∈Θ

θ∈Θ

According to Hampel et al., for x0 ∈ Rd , the influence function of T at G is obtained as

−1 ∂2 ∂ Vα (T (G); x)dG Vα (T (G); x0 ) IFα (x0 ; T , G) = − ∂θ ∂θ ′ ∂θ 2 −1 ∂ Vα (θα ; X ) ∂ Vα (θα ; x0 ). =− E ′ ∂θ ∂θ ∂θ

(3.5)

B. Kim, S. Lee / Computational Statistics and Data Analysis 57 (2013) 125–140

129

The first term on the right hand side of the equality does not depend on x0 and is finite by (A3). Meanwhile, if α > 0,

∂ ∂θi Vα (θα ; x0 ) ≤ lα,1 for i = 1, . . . , ρ and lα,1 does not depend on x0 by Lemma 6.1 in Section 6. Hence, the influence function of the MDPDE with α > 0 has a finite gross error sensitivity, i.e., supx∈Rd ∥IFα (x; T , G)∥ < ∞. On the other hand, if α = 0, we get 1 ∂ Vα (θα ; x0 ) = − φ(x0 ; µj,α , Σj,α ) − φ(x0 ; µm,α , Σm,α ) , for 1 ≤ j ≤ m − 1, ∂ωj fθα (x0 ) d ∂ 1 Vα (θα ; x0 ) = − ωj,α φ(x0 ; µj,α , Σj,α ) ((x0 )l − (µj,α )l )(Σj−,α1 )l,t , for 1 ≤ j ≤ m, 1 ≤ t ≤ d, ∂(µj )t fθα (x0 ) l =1 1 1 ∂ Vα (θα ; x0 ) = ωj,α φ(x0 ; µj,α , Σj,α ) (Σj−,α1 )r ,s − (x0 − µj,α )′ Σj−,α1 Jrs Σj−,α1 (x0 − µj,α ) , ∂(Σj )r ,s 2 fθα (x0 ) for 1 ≤ j ≤ m, 1 ≤ r ≤ d, 1 ≤ s ≤ r . ∂ We can easily see that, ∂ω Vα (θα ; x0 ) ≤ 2/minj ωj,α ≤ 2nα/(1 + α)d/2+1 , but ∂(µ∂ ) Vα (θα ; x0 ) and ∂(Σ∂ ) Vα (θα ; x0 ) are j j t j r ,s obviously unbounded for x0 . For example, if we consider 1-component normal mixture, we have that for 1 ≤ t ≤ d, d ∂ −1 ∂(µ ) Vα (θα ; x0 ) = ((x0 )l − (µ1,α )l )(Σ1,α )l,t 1 t l =1 and for 1 ≤ r ≤ d, 1 ≤ s ≤ r,

∂ 1 −1 = (Σ )r ,s − (x0 − µ1,α )′ Σ −1 Jrs Σ −1 (x0 − µ1,α ) , V (θ ; x ) α α 0 1,α 1,α 1,α ∂(Σ ) 2 1 r ,s

which imply supx∈Rd ∥IF0 (x; T , G)∥ = ∞. Therefore, we can conclude that the MDPDE with α > 0 has a robust property unlike the MLE. 4. Simulation In this section, we compare the performance of the MDPDE with α > 0 for multivariate normal mixture parameters with that of the MLE. Further, we compare the performance of the MDPDE with α > 0 for ACF with that of the SACF. To obtain the MDPDE, we expand Fujisawa and Eguchi’s (2006) EM-like algorithm to the multivariate normal mixture models. Assume that we are given (multivariate) time series sample x1 , . . . , xn . According to Fujisawa and Eguchi (2006), given the current (k) (k) (k) (k) (k) (k) parameter θ (k) = (ω1 , . . . , ωm , µ1 , . . . , µm , Σ1 , . . . , Σm ), the EM-like algorithm for multivariate normal mixture models iterates as follows

(k+1)

ωj

1 n

=

n

χα (j|xt ; θ (k) ) −

t =1

(k)

fθ (k) (z )χα (j|z ; θ (k) )dz + ωj 1 n

µj(k+1) =

1 n

n

xt χα (j|xt ; θ (k) ) −

t =1 1 n

(k+1)

Σj

1 n

=

n t =1

n

n m

χα (j|xt

m j =1

fθ (k) (z )χα (j|z ; θ (k) )dz

,

; θ (k) )

t =1 j =1

(k)

fθ (k) (z )χα (j|z ; θ (k) )(z − µj )dz

,

χα (j|xt ; θ (k) )

t =1

xt xt ′ χα (j|xt ; θ (k) ) −

(k)

fθ (k) (z )χα (j|z ; θ (k) )(zz ′ − Σj 1 n

n

χα (j|xt ; θ (k) )

′

− µ(j k) µ(j k) )dz

′

− µ(j k+1) µj(k+1) ,

t =1

where χα (j|x; θ ) = ωj φ(x; µj , Σj )/fθ (x)1−α , for j = 1, . . . , m. Note that the EM-like algorithm is the same as the EM m (k+1) algorithm when α = 0 and it holds that j=1 ωj = 1. We use 10 initial values as constructed by Ingrassia (2004). First, we conduct a simulation study for the multivariate normal mixture parameter estimation. Assume that we observe stationary multivariate normal mixture time series as Xt = (1 − Pt )Yt + Pt Zt , where Yt = a1 + AYt −1 + ϵ1,t , Zt = a2 + AZt −1 + ϵ2,t , ϵ1,t , ϵ2,t are iid N (0, Σϵ,1 ), N (0, Σϵ,2 ) random variables, and Pt are iid Bernoulli random variables with success probability ω. It is assumed thatYt , Zt , and Pt are all independent. we consider the case of In this study,

, Σϵ,1 = 10/2 10/2 , Σϵ,2 = −51//44 −11//44 , and ω = 1/2. Then, Xt has the true density of the form (1 − ω)φ(·; µ1 , Σ1 ) + ωφ(·; µ2 , Σ2 ), where ω = 1/2, µ1 = (0, −3)′ ,

a1 = (3/2, −9/2)′ , a2 = (−3/2, 9/2)′ , A =

1/2 1/2

1/2 −1/2

130

B. Kim, S. Lee / Computational Statistics and Data Analysis 57 (2013) 125–140

Table 1 Sample mean (variance × 102 /MSE × 102 ) of estimators for the multivariate normal mixture parameters, p = 0.

α

ωˆ

(µ ˆ 1 )1

(µ ˆ 1 )2

(µ ˆ 2 )1

M

0(MLE)

0.503(0.020/0.021)

D

0.1

0.503(0.020/0.021)

−0.007(0.653/0.651) −0.007(0.635/0.633)

−3.004(0.194/0.193) −3.004(0.199/0.198)

P

0.3

0.503(0.020/0.021)

−0.006(0.627/0.625)∗

−3.004(0.218/0.218)

−0.002(1.158/1.146)

D

0.5

0.504(0.021/0.022)

E

1

0.503(0.027/0.028)

−0.006(0.642/0.640) −0.007(0.731/0.729)

−3.005(0.249/0.249) −3.007(0.366/0.368)

0.001(1.474/1.459)

α

(µ ˆ 2 )2

ˆ 1 )1,1 (Σ

ˆ 1 )1,2 (Σ

0(MLE) 0.1 0.3 0.5 1

2.998(0.180/0.179) 2.998(0.183/0.182) 2.999(0.198/0.196) 3.000(0.221/0.218) 3.002(0.309/0.306)

α

ˆ 2 )1,1 (Σ

0(MLE) 0.1 0.3 0.5 1

1.973(2.310/2.359) 1.971(2.380/2.439) 1.966(2.777/2.863) 1.962(3.384/3.493) 1.958(5.376/5.501)

M D P D E

M D P D E

∗

1.001(0.708/0.701) 1.004(0.732/0.727) 1.007(0.807/0.804) 1.010(0.901/0.901) 1.014(1.219/1.225)

∗

−0.003(0.255/0.253) −0.002(0.268/0.266) −0.001(0.311/0.308) 0.000(0.365/0.361) 0.002(0.526/0.522)

∗

ˆ 2 )1,2 (Σ −0.005(0.526/0.523) −0.006(0.539/0.537) −0.008(0.610/0.611) −0.012(0.730/0.736) −0.021(1.180/1.211)

∗

∗

0.000(1.234/1.222)

ˆ 1 )2 , 2 (Σ ∗

0.998(0.712/0.705)∗ 0.998(0.714/0.707) 0.997(0.770/0.763) 0.996(0.857/0.850) 0.990(1.160/1.160)

1.006(0.653/0.650)∗ 1.009(0.661/0.663) 1.012(0.737/0.746) 1.015(0.869/0.882) 1.020(1.344/1.369)

−0.006(0.393/0.393)∗ −0.005(0.394/0.393) −0.004(0.412/0.410) −0.003(0.440/0.437) −0.003(0.532/0.527)

ˆ 2 )2,2 (Σ ∗

−0.005(1.088/1.079)∗ −0.004(1.100/1.091)

ˆ )1,2 (Σ

(µ) ˆ 1

α

(µ) ˆ 2

ˆ )1,1 (Σ

M

0(MLE)

1.488(0.706/0.712)∗

0.000(4.341/4.298)

D

0.1

1.489(0.728/0.733)

0.001(4.315/4.272)∗

10.00(4.516/4.472)

P D E

0.3 0.5 1

−0.024(0.751/0.799)∗ −0.023(0.761/0.808) −0.023(0.786/0.833) −0.024(0.825/0.872) −0.022(1.111/1.150)

ˆ )2,2 (Σ

1.488(0.859/0.865) 1.487(1.052/1.058) 1.489(1.601/1.597)

0.003(4.436/4.393) 0.004(4.680/4.636) 0.005(5.547/5.493)

10.01(4.782/4.740) 10.01(5.195/5.160) 10.03(6.822/6.819)

9.999(4.458/4.413)∗

0 , and E (Xt ) = µ, Var (Xt ) = Σ are computed by (0, 0)′ , 30/2 10 respectively. To compare the robustness of estimators, we consider the contaminated data Wt = (1 − Pt′ )Xt + Pt′ Nt , where Pt are iid Bernoulli random variables with success probability p, and Nt are N (µN , ΣN ) random variables with 1 0 µN = (8, 5)′ , ΣN = 0 1 . Also, we assume that Xt , Nt , and Pt are all independent. We consider the case of p = 0, 0.05, µ2 = (0, 3)′ , Σ1 =

1 0

0 1

, Σ2 =

2 0

0 1

and 0.1. For the comparison, we investigate the sample mean, variance, and mean squared error for estimators. The sample size under consideration is n = 1000, and the repetition number in each simulation is 100. Table 1 shows that when the data is not contaminated by outliers, the MLE outperforms the MDPDE. In the tables, the figures marked by the symbol ∗ stand for minimal MSEs. From the table, one can see that the MLE has minimal MSEs for almost all parameters and the performance of the MDPDE with α close to 0 is similar to that of the MLE. This result confirms that, as anticipated, the efficiency of the MDPDE decreases as α increases. Tables 2 and 3 show the results when data are contaminated by outliers. In Tables 2 and 3, the dark area represents the MDPDE with smaller MSEs than the MLE. Some parameters are not affected by outliers severely so that the MLE shows the better performance than the MDPDE. However, in most parameters, the MDPDE is more robust ˆ 2 )1,1 in Table 3, the relative MSE of the MLE to the MDPDE with α = 0.5 is than the MLE. For example, the case of (Σ about 1050. We can observe that as p increases, the dark area becomes wider. This result suggests that the more data are contaminated by outliers, the better the MDPDE performs than the MLE. Further, we can see that as p increases, the symbol ∗ moves downward. This means that if data are severely contaminated by outliers, the MDPDE with high α performs better. Next, we consider the case of non-invertible multivariate time series. Assume that we observe Xt = (1 − Pt )Yt + Pt Zt , where Yt = a1 + ϵ1,t + Aϵ1,t −1 , Zt = a2 + ϵ2,t + Aϵ2,t −1 , ϵ1,t , ϵ2,t are iid N (0, Σϵ,1 ), N (0, Σϵ,2 ) random variables, and with success probability ω. We employ a1 = (2, −2)′ , a2 = (−2, 2)′ , A = Pt are iid Bernoulli random variables 1 1

1

, Σϵ,1 =

2/3 −1/3

−1/3 1/3

1/3 1/3

, Σϵ,2 =

1/3 2/3

, and ω = 1/2. Then, Xt has the true density of the form (1 − ω)φ(·; µ1 , Σ1 ) + ωφ(·; µ2 , Σ2 ), where ω = 1/2, µ1 = (2, −2)′ , µ2 = (−2, 2)′ , Σ1 = 10 02 , Σ2 = 2 0 11/2 −4 ′ , and E ( X ) = µ, Var ( X ) = Σ are computed by ( 0 , 0 ) , respectively. We consider the contaminated t t 0 1 −4 11/2 ′ ′ data Wt = (1 − Pt )Xt + Pt Nt , where Pt are iid Bernoulli random variables with success probability p, and Nt are N (µN , ΣN ) −1

random variables with µN = (7, 7)′ , ΣN =

1 0

0 1

. We also consider the case of p = 0, 0.05, and 0.1. The results are

presented in Tables 4–6, which show behavior similar to those in Tables 1–3. Hence, we can see that the proposed method is not sensible to the invertibility condition. Further, we carry out a simulation study for the ACF estimation. We consider the situation in which the univariate time series contaminated by outliers Wt = (1 − Pt )Xt + Pt Nt are observed, where Xt = ξ Xt −1 + ϵt , Pt are iid Bernoulli random variables with success probability p, and Nt are N (µN , σN2 ) random variables. We consider the cases that ϵt are iid skewed

B. Kim, S. Lee / Computational Statistics and Data Analysis 57 (2013) 125–140

131

Table 2 Sample mean (variance × 102 /MSE × 102 ) of estimators for the multivariate normal mixture parameters, p = 0.05.

α

ωˆ

(µ ˆ 1 )1

(µ ˆ 1 )2

(µ ˆ 2 )1

M

0(MLE)

0.476(0.027/0.082)

0.000(0.698/0.691)

D

0.1

0.478(0.027/0.077)

0.001(0.721/0.714)

−3.000(0.215/0.213) −3.001(0.216/0.214)

P

0.3

0.498(0.027/0.027)∗

0.001(0.784/0.776)

−3.003(0.227/0.226)

0.021(1.242/1.276)

D

0.5

0.499(0.027/0.027)

0.002(0.860/0.852)

−3.005(0.252/0.252)

0.007(1.270/1.262)∗

E

1

0.497(0.033/0.034)

0.003(1.088/1.078)

−3.008(0.348/0.351)

0.005(1.493/1.480)

α

(µ ˆ 2 )2

M

0(MLE)

3.196(0.320/4.162)

0.985(0.622/0.638)∗

D

0.1

3.129(0.330/1.982)

P

0.3

3.003(0.263/0.262)∗

D

0.5

E

1

α

ˆ 2 )1 , 1 (Σ

M

0(MLE)

7.611(45.73/3194)

1.426(4.658/207.8)

1.356(0.961/13.62)

0.410(0.809/17.62)

D

0.1

6.192(62.82/1819)

1.050(5.447/115.6)

1.261(1.000/7.795)

0.273(0.838/8.305)

P

0.3

2.192(6.505/10.13)

0.030(0.941/1.022)

1.028(0.944/1.016)∗

0.012(0.520/0.528)∗

D

0.5

2.086(4.483/5.174)

1.040(1.161/1.308)

0.005(0.545/0.541)

E

1

2.120(6.987/8.347)

−0.011(1.432/1.430)

1.075(1.869/2.413)

0.004(0.645/0.640)

α

(µ) ˆ 2

ˆ )1 , 1 (Σ

∗

ˆ 1 )1 , 1 (Σ

∗

0.783(1.921/63.20) 0.523(2.108/29.39)

ˆ 1 )1,2 (Σ

ˆ 1 )2,2 (Σ 0.993(0.828/0.824)∗

0.985(0.637/0.653)

−0.002(0.233/0.231) −0.004(0.231/0.230)∗

0.992(0.853/0.851)

1.002(0.720/0.713)

−0.007(0.253/0.256)

1.005(0.977/0.969)

3.001(0.274/0.272)

1.014(0.833/0.844)

−0.009(0.296/0.301)

1.016(1.157/1.171)

3.002(0.317/0.314)

1.031(1.260/1.342)

−0.013(0.488/0.500)

1.029(1.702/1.771)

ˆ 2 )1 , 2 (Σ

∗

−0.003(0.875/0.868)

ˆ 2 )2,2 (Σ

∗

ˆ )1 , 2 (Σ

(µ) ˆ 1

ˆ )2,2 (Σ

M

0(MLE)

0.245(1.221/7.212)

4.614(16.04/985.5)

1.956(9.260/391.7)

10.75(5.617/61.84)

D

0.1

0.201(1.212/5.232)

3.780(20.15/539.7)

1.346(10.53/191.5)

10.50(5.667/30.55)

P

0.3

0.013(1.106/1.113)∗

1.605(1.898/2.973)

0.043(5.051/5.184)

10.03(5.086/5.117)∗

D

0.5

0.004(1.135/1.126)

1.556(1.357/1.662)∗

0.002(5.229/5.177)∗

10.04(5.487/5.558)

E

1

0.016(1.396/1.408)

1.586(1.963/2.690)

−0.008(6.854/6.792)

10.07(7.193/7.637)

normal random variables with location 0, scale 1, and √ shape parameter η. It is assumed that Xt , Pt , and Nt are all independent. We study the cases of ξ = 1/2, µN = 5, σN = 5 Var (Xt ), η = 3, 6, and p = 0, 0.05, 0.1. Note that if we denote the true ACF at lag h of Xt by γ (h), γ (h) = ξ h {π (1 + η2 ) − 2η2 }/π (1 + η2 )(1 − ξ 2 ). In all cases, we use the two components normal mixture to compare the performance of the MDPDE and SACF. Since the multivariate version of the EM-like algorithm is not so efficient, we only consider the cases of lag 0 and 1. The multiple integral terms in the algorithm, approximated by the Monte Carlo method in this simulation study, is troublesome (if this problem is resolved, the EM-like algorithm is easily applicable). In Table 7, we can see that the MDPDE has very small MSEs when p = 0, and η = 3 and 6. Also, the SACF performs well in this case. However, Table 7 shows that as p increases (p = 0.05 and 0.1), the MDPDE with high α outperforms the SACF. Overall, our findings strongly support that the MDPDE method is a functional tool to yield a robust estimator. 5. Real data analysis In this section, we apply the MDPDE for the covariance matrix to the portfolio optimization problem. We consider the global minimum variance (GMV) portfolio optimization, since estimating the mean return vector of assets is extremely difficult (Merton, 1980) and depends on the covariance matrix of portfolio. We introduce the GMV portfolio optimization at first. Suppose that an investor has N risky assets and is conservative to the risk. We denote by r = (r1 , r2 , . . . , rN )′ the return of assets over certain time interval and denote by Σ the covariance of r. The investor’s choice is embodied in an N-dimensional N vector w = (w1 , w2 , . . . , wN )′ of weights with i=1 wi = 1, where each weight i represents the percentage of the i-th asset held in portfolio. Since in mean–variance optimization theory, the risk of a portfolio is measured by the variance of the portfolio return w′ Σ w, GMV portfolio optimization is formulated as min w′ Σ w w

subject to w′ 1 = 1, which has the solution w=

1 1′ Σ −1 1

Σ −1 1 .

1 = (1, 1, . . . , 1)′ ,

132

B. Kim, S. Lee / Computational Statistics and Data Analysis 57 (2013) 125–140

Table 3 Sample mean (variance × 102 /MSE × 102 ) of estimators for the multivariate normal mixture parameters, p = 0.1.

α

ωˆ

(µ ˆ 1 )1

(µ ˆ 1 )2

(µ ˆ 2 )1

M

0(MLE)

0.495(1.641/1.627)

0.002(0.660/0.654)

0.1

0.474(0.682/0.741)

0.001(0.687/0.680)

−2.692(80.35/89.03) −2.816(41.83/44.78)

2.160(374.0/836.8)

D P

0.3

0.473(0.059/0.130)

0.003(0.764/0.757)

−3.005(0.235/0.235)∗

0.436(17.62/36.42)

D

0.5

0.497(0.026/0.027)∗

0.003(0.832/0.824)

−3.007(0.255/0.258)

0.011(1.292/1.291)∗

E

1

0.492(0.032/0.038)

0.003(1.048/1.039)

−3.010(0.343/0.350)

0.008(1.486/1.477)

α

(µ ˆ 2 )2

ˆ 1 )1,1 (Σ

ˆ 1 )1,2 (Σ

M

0(MLE)

3.536(22.27/50.82)

1.028(2.434/2.489)

0.000(0.632/0.625)

1.968(777.1/863.1)

D

0.1

3.396(8.945/24.54)

0.996(0.984/0.975)

−0.013(0.829/0.837)

1.719(615.7/661.2)

P

0.3

3.103(1.316/2.374)

1.002(0.817/0.809)∗

−0.006(0.266/0.267)∗

1.005(1.042/1.034)∗

D

0.5

3.000(0.267/0.264)

1.046(0.942/1.144)

−0.008(0.322/0.326)

1.044(1.200/1.385)

E

1

3.001(0.300/0.297)

1.079(1.483/2.086)

−0.012(0.517/0.526)

1.073(1.801/2.314)

ˆ 2 )1,2 (Σ

ˆ 2 )2,2 (Σ

∗

∗

1.592(138.6/390.8)

ˆ 1 )2,2 (Σ

α

ˆ 2 )1,1 (Σ

M

0(MLE)

11.32(1304/9974)

2.345(68.47/617.6)

1.576(3.753/36.90)

0.851(1.698/74.07)

(µ) ˆ 1

D

0.1

12.20(1671/12059)

2.500(62.84/687.34)

1.594(2.173/37.48)

0.745(3.235/58.69)

P

0.3

5.962(1258/2815)

0.953(80.30/170.4)

1.255(4.770/11.21)

0.238(5.434/11.05)

D

0.5

2.200(5.567/9.496)∗

−0.002(0.983/0.973)∗

1.081(1.219/1.859)∗

0.007(0.550/0.550)∗

E

1

2.283(8.149/16.08)

−0.016(1.557/1.568)

1.148(2.089/4.263)

0.006(0.648/0.645)

α

(µ) ˆ 2

ˆ )1,1 (Σ

ˆ )1,2 (Σ

ˆ )2,2 (Σ

M

0(MLE)

0.510(1.424/27.38)

7.779(71.76/4013)

3.850(25.50/1507)

11.44(7.368/214.3)

D

0.1

0.477(1.598/24.34)

7.528(241.6/3873)

3.487(60.45/1276)

11.31(18.77/189.9)

P

0.3

0.215(3.890/8.465)

3.768(413.6/923.9)

1.181(127.6/265.9)

10.42(17.99/35.79)

D

0.5

0.017(1.096/1.116)∗

1.632(1.676/3.408)∗

0.008(5.279/5.232)∗

10.08(5.302/5.836)∗

E

1

0.042(1.358/1.522)

1.698(2.322/6.219)

−0.006(6.709/6.646)

10.13(6.750/8.434)

Table 4 Sample mean (variance × 102 /MSE × 102 ) of estimators for the multivariate normal mixture parameters, non-invertible case, p = 0.

α

ωˆ

(µ ˆ 1 )1

(µ ˆ 1 )2

(µ ˆ 2 )1

M

0(MLE)

0.505(0.023/0.025)∗

2.007(0.250/0.251)

D

0.1

0.505(0.023/0.025)

2.006(0.249/0.251)∗

P D E

0.3 0.5 1

0.505(0.024/0.025) 0.505(0.024/0.026) 0.505(0.030/0.032)

2.006(0.264/0.264) 2.005(0.289/0.289) 2.003(0.382/0.379)

−2.002(0.291/0.288)∗ −2.002(0.293/0.290) −2.002(0.318/0.316) −2.002(0.367/0.364) −2.002(0.559/0.554)

−1.992(0.495/0.497)∗ −1.993(0.505/0.505) −1.995(0.555/0.552) −1.997(0.626/0.621) −1.999(0.855/0.847)

α

(µ ˆ 2 )2

ˆ 1 )1,1 (Σ

ˆ 1 )1,2 (Σ

ˆ 1 )2 , 2 (Σ

0(MLE)

2.001(0.123/0.122)

0.995(0.556/0.554)

D

0.1

2.001(0.123/0.122)

0.996(0.582/0.577)

0.014(0.466/0.480)

P D E

0.3 0.5 1

2.001(0.132/0.130) 2.001(0.148/0.146) 2.002(0.217/0.215)

0.998(0.650/0.644) 0.998(0.740/0.733) 0.999(1.052/1.042)

0.016(0.499/0.519) 0.019(0.573/0.602) 0.026(0.887/0.945)

α

ˆ 2 )1,1 (Σ

ˆ 2 )1,2 (Σ

ˆ 2 )2,2 (Σ

(µ) ˆ 1

0(MLE) 0.1 0.3 0.5 1

2.031(2.469/2.542)∗ 2.033(2.507/2.592) 2.034(2.792/2.883) 2.034(3.253/3.338) 2.030(5.020/5.062)

−0.023(0.476/0.525)∗ −0.026(0.496/0.559) −0.030(0.573/0.659) −0.034(0.678/0.786) −0.041(1.065/1.220)

1.004(0.519/0.516)∗ 1.005(0.524/0.522) 1.006(0.576/0.574) 1.005(0.662/0.659) 1.003(1.000/0.991)

0.026(0.462/0.524)∗ 0.025(0.466/0.524) 0.024(0.488/0.540) 0.023(0.522/0.569) 0.022(0.684/0.727)

M

M D P D E

∗

∗

0.013(0.471/0.482)

1.988(2.773/2.760)∗ ∗

1.991(2.802/2.783) 1.993(3.022/2.997) 1.993(3.409/3.379) 1.996(4.969/4.921)

α

(µ) ˆ 2

ˆ )1,1 (Σ

ˆ )1,2 (Σ

ˆ )2,2 (Σ

M

0(MLE)

5.503(3.788/3.751)∗

0.1

P D E

0.3 0.5 1

−4.002(0.715/0.708)∗ −4.004(0.768/0.762) −4.006(0.943/0.937) −4.007(1.188/1.181) −4.007(2.015/1.999)

5.503(2.144/2.124)

D

−0.019(0.434/0.465)∗ −0.019(0.438/0.469) −0.019(0.457/0.488) −0.019(0.488/0.520) −0.021(0.626/0.661)

5.507(3.820/3.787) 5.512(4.175/4.148) 5.514(4.763/4.735) 5.513(6.788/6.737)

5.505(2.122/2.103)∗ 5.506(2.260/2.241) 5.506(2.583/2.561) 5.510(3.903/3.874)

B. Kim, S. Lee / Computational Statistics and Data Analysis 57 (2013) 125–140

133

Table 5 Sample mean (variance × 102 /MSE × 102 ) of estimators for the multivariate normal mixture parameters, non-invertible case, p = 0.05.

α

ωˆ

(µ ˆ 1 )1

(µ ˆ 1 )2

(µ ˆ 2 )1

M

0(MLE)

0.902(1.865/18.03)

0.220(42.86/359.1)

0.1

0.651(4.492/6.741)

1.402(102.0/136.7)

−0.181(29.76/360.3) −1.175(77.56/144.9)

6.171(661.3/7331)

D P

0.3

0.504(0.028/0.029)

2.000(0.351/0.347)∗

−1.999(0.341/0.337)∗

−2.008(0.755/0.753)∗

D

0.5

0.504(0.027/0.028)∗

1.999(0.382/0.378)

−2.001(0.363/0.360)

−2.010(0.836/0.838)

E

1

0.503(0.032/0.033)

1.998(0.492/0.487)

−2.006(0.476/0.475)

−2.014(1.095/1.104)

ˆ 1 )1 , 1 (Σ

ˆ 1 )1,2 (Σ

ˆ 1 )2,2 (Σ

M

0(MLE)

6.538(202.3/2259)

5.111(150.1/1839)

D

0.1

3.811(536.1/858.7)

3.010(431.8/831.3)

−3.473(266.7/1470) −0.762(780.5/830.8)

P

0.3

2.004(0.160/0.160)∗

1.016(0.646/0.665)∗

0.001(0.527/0.522)∗

2.019(2.278/2.290)∗

D

0.5

2.005(0.178/0.179)

1.026(0.789/0.846)

0.002(0.623/0.617)

2.042(2.527/2.677)

E

1

2.008(0.249/0.252)

1.040(1.336/1.486)

0.005(1.027/1.019)

2.089(3.650/4.403)

M

0(MLE)

1.552(397.8/413.8)

0.256(97.79/103.4)

1.156(29.55/31.69)

0.369(0.794/14.43)

D

0.1

3.428(713.5/910.2)

0.993(169.3/266.2)

1.528(48.03/75.45)

0.271(1.751/9.102)

P

0.3

2.081(3.317/3.933)∗

0.009(0.633/0.635)∗

1.023(0.528/0.576)∗

0.010(0.734/0.738)

D

0.5

2.101(3.728/4.714)

0.008(0.746/0.744)

1.034(0.631/0.743)

0.009(0.735/0.736)∗

E

1

2.143(5.583/7.582)

0.007(1.230/1.222)

α

(µ ˆ 2 )2

α

ˆ 2 )1,1 (Σ

α

(µ) ˆ 2

ˆ 2 )1 , 2 (Σ

ˆ 2 )2,2 (Σ

1.053(1.133/1.407)

ˆ )1 , 1 (Σ

ˆ )1,2 (Σ

1.254(1751/2793)

5.331(122.4/1231) 4.557(506.4/1155)

(µ) ˆ 1

0.006(0.842/0.837)

ˆ )2,2 (Σ

M

0(MLE)

0.357(0.729/13.45)

7.700(11.50/495.5)

0.1

0.262(1.611/8.457)

7.323(44.79/376.7)

−1.363(11.52/706.9) −2.100(33.95/394.5)

7.668(9.703/479.6)

D P

0.3

−0.012(0.498/0.507)∗

5.557(4.858/5.137)∗

−4.000(1.131/1.119)∗

5.527(2.427/2.476)∗

D

0.5

−0.012(0.503/0.513)

5.577(5.647/6.191)

−4.006(1.324/1.315)

5.552(2.789/3.027)

E

1

−0.013(0.610/0.620)

5.613(7.995/9.194)

−4.015(2.154/2.154)

5.601(4.098/5.074)

7.268(51.04/363.0)

Then, w can be estimated by

ˆ = w

1

ˆ −1 1 1′ Σ

ˆ −1 1 . Σ

ˆ is influenced by temporal economic or social events, portfolio cannot be constructed according to an It is manifest that if w investor’s object. Since this can do damage to investors, the robust estimation method for w has been studied extensively by many researchers. For example, see Fabozzi et al. (2007). To illustrate the behavior of the MDPDE in the presence of outliers, we analyze the weekly returns of the International Business Machine Corp.(IBM) and 3M Company(MMM), which constitute the Dow Jones Industrial Average, over the period from November 5, 1973 to June 29, 1981 (400 observations). We use adjusted close prices which are adjusted for all splits and dividends. We split the data into two subseries in which the first series r1,t covers the period from November 5, 1973 to August 29, 1977 (200 observations) and the second subseries r2,t covers the period from September 6, 1977 to June 29, 1981 (200 observations). In Fig. 1, we can observe that r1,t have some aberrant observations while r2,t seemingly have no such observations. As the covariance matrix estimator of IBM and MMM, we use the sample covariance matrix and MDPDE ˆ for the two subseries, based on two component normal mixtures. By using these covariance matrix estimators, we obtain w separately. The results are reported in Tables 8 and 9. Therein, we can see that a remarkable difference exists between the sample covariance matrix and MDPDE with α > 0 for the subseries r1,t . Contrastively, the behavior of the MDPDE for r2,t is similar to that of the sample covariance matrix, particularly when α is small. The result suggests that outliers can severely affect the optimal portfolio selection procedure and the MDPDE could be used to better estimate the covariance matrix when outliers exist: nevertheless, this does not mean that the optimal portfolio based on the MDPDE method must necessarily result in better returns. 6. Proofs For d-dimensional vectors y, ∥y∥ denotes the Euclidean norm. To prove Theorems 3.1 and 3.2, we first establish the following lemma.

134

B. Kim, S. Lee / Computational Statistics and Data Analysis 57 (2013) 125–140

Table 6 Sample mean (variance × 102 /MSE × 102 ) of estimators for the multivariate normal mixture parameters, non-invertible case, p = 0.1.

α

ωˆ

(µ ˆ 1 )1

(µ ˆ 1 )2

(µ ˆ 2 )1

M

0(MLE)

0.898(0.009/15.89)

0.009(0.760/397.3)

0.1

0.893(0.203/15.67)

0.039(4.822/389.3)

−0.009(0.516/396.7) −0.020(4.631/396.6)

7.029(0.850/8154)

D P

0.3

0.542(1.361/1.524)

1.804(35.29/38.77)

−1.803(36.76/40.27)

−1.100(746.9/820.4)

D

0.5

0.504(0.027/0.028)∗

1.999(0.366/0.362)∗

−2.007(0.418/0.419)∗

−2.011(0.828/0.832)∗

E

1

0.503(0.031/0.032)

1.999(0.464/0.459)

−2.014(0.531/0.544)

−2.016(1.063/1.078)

ˆ 1 )1,2 (Σ

ˆ 1 )2 , 2 (Σ

α

(µ ˆ 2 )2

ˆ 1 )1,1 (Σ

6.941(62.71/8055)

M

0(MLE)

7.003(0.957/2504)

5.506(4.022/2034)

D

0.1

6.943(21.50/2465)

5.773(29.27/2307)

−3.982(1.131/1587) −4.263(19.05/1837)

5.772(17.47/1440)

P

0.3

2.505(226.3/249.6)

1.572(266.1/296.2)

−0.486(221.4/242.8)

2.495(180.0/202.8)

D

0.5

2.008(0.187/0.191)∗

1.057(0.846/1.163)∗

0.009(0.705/0.706)∗

2.106(2.822/3.926)∗

E

1

2.011(0.255/0.266)

1.095(1.505/2.400)

0.017(1.137/1.155)

2.194(4.326/8.052)

α

ˆ 2 )1,1 (Σ

ˆ 2 )1,2 (Σ

M D

0(MLE) 0.1

0.994(2.215/103.4) 1.220(258.9/317.2)

1.029(2.398/2.460) 1.461(1677/1682)

0.722(0.901/52.95) 0.733(1.261/55.00)

P

0.3

2.011(16.10/15.95)

−0.012(0.958/0.962) 0.260(489.2/491.1) 0.015(0.746/0.762)∗

1.032(0.812/0.903)∗

0.094(6.889/7.706)

0.5

2.177(3.949/7.032)

0.015(0.821/0.835)

1.075(0.783/1.341)

0.009(0.721/0.723)∗

D E

1

2.268(6.243/13.37)

α

(µ) ˆ 2

∗

ˆ 2 )2,2 (Σ

5.487(2.521/1218)

0.021(1.333/1.363)

ˆ )1 , 1 (Σ

(µ) ˆ 1

1.123(1.414/2.902)

ˆ )1,2 (Σ

0.004(0.820/0.813)

ˆ )2,2 (Σ

M

0(MLE)

0.703(0.872/50.22)

9.542(19.15/1652)

0.908(18.08/2427)

9.517(15.73/1629)

D

0.1

0.712(1.107/51.83)

9.837(31.80/1913)

0.650(47.49/2209)

9.858(79.20/1978)

P

0.3

0.067(6.389/6.774)

6.103(247.9/281.8)

−3.543(190.5/209.5)

6.088(247.8/279.8)

D

0.5

−0.015(0.491/0.507)∗

5.631(5.539/7.200)∗

−4.007(1.393/1.385)∗

5.621(3.383/4.820)∗

E

1

−0.013(0.600/0.611)

5.708(7.885/12.13)

−4.015(2.203/2.205)

5.711(4.854/9.238)

Table 7 Sample mean (variance × 102 /MSE × 102 ) of estimators for the ACF at lag 0,1, p = 0, 0.05, 0.1, and η = 3, 6.

η=3 SACF

γˆ (0)

γˆ (1)

p = 0.05

γˆ (0)

γˆ (1)

p = 0.1 γˆ (0)

γˆ (1)

0.567(0.128/0.127)

0.281(0.088/0.089)

1.839(7.717/168.9)

0.252(0.476/0.579)

3.036(14.15/622.6)

0.217(1.082/1.523)

p=0

M

0.1

0.561(0.120/0.127)∗

0.277(0.081/0.086)∗

2.291(69.74/365.3)

0.011(8.349/15.74)

4.068(122.3/1345)

−0.268(15.61/45.95)

D

0.3

0.552(0.122/0.152)

0.265(0.078/0.114)

1.209(45.02/85.44)

0.194(2.500/3.299)

2.987(78.98/662.7)

−0.102(7.320/22.19)

P

0.5

0.541(0.127/0.207)

0.255(0.084/0.174)

0.619(0.369/0.609)∗

0.261(0.111/0.168)∗

1.754(155.0/293.9)

0.087(4.420/8.299)

DE

1

0.525(0.159/0.355)

0.242(0.151/0.333)

0.595(0.967/1.026)

0.243(0.291/0.459)

0.774(2.473/6.641)∗

0.245(0.523/0.677)∗

η=6 SACF

γˆ (0)

γˆ (1)

p = 0.05

γˆ (0)

γˆ (1)

p = 0.1 γˆ (0)

γˆ (1)

0.504(0.109/0.109)

0.250(0.072/0.073)

1.681(6.423/144.1)

0.224(0.403/0.488)

2.787(11.87/531.3)

0.194(0.901/1.253)

p=0

M

0.1

0.499(0.094/0.100)∗

0.247(0.060/0.063)∗

1.818(32.98/204.4)

0.112(2.719/4.693)

3.313(68.52/854.8)

−0.092(6.835/18.72)

D

0.3

0.485(0.092/0.144)

0.233(0.055/0.096)

0.609(6.647/7.615)

0.226(0.263/0.337)

2.366(57.05/402.0)

0.010(3.242/9.138)

P

0.5

0.471(0.097/0.227)

0.221(0.064/0.167)

0.520(0.162/0.177)

0.222(0.076/0.179)∗

0.835(85.98/95.88)

0.202(0.493/0.751)

DE

1

0.452(0.102/0.414)

0.202(0.090/0.354)

0.508(0.174/0.172)∗

0.207(0.110/0.331)

0.600(0.364/1.211)∗

0.215(0.163/0.315)∗

Lemma 6.1. For all θ ∈ Θ and x ∈ Rd , there exist constants lα,0 , . . . , lα,3 such that |Vα (θ; x)| ≤ lα,0 and for 1 ≤ i1 , . . . , ir ≤ ρ (r = 1, 2, 3),

r ∂ Vα (θ; x) ∂θ · · · ∂θ i1

ir

≤ lα,r .

B. Kim, S. Lee / Computational Statistics and Data Analysis 57 (2013) 125–140

Table 8

ˆ and w ˆ for r1,t . Estimation results of Σ ˆ )1,1 (Σ SCM

α = 0.1 M D P D E

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

ˆ )1,2 (Σ

ˆ )2,2 (Σ

ˆ )1 (w

ˆ )2 (w

12.86

9.112

18.52

0.715

0.285

26.83 22.22 16.96 7.199 8.131 7.841 7.733 7.644 7.566 7.631

21.99 17.93 13.83 6.029 7.198 7.182 7.194 7.213 7.233 7.244

36.53 31.76 27.51 18.06 15.70 16.82 16.81 16.86 16.95 16.85

0.750 0.763 0.814 0.911 0.901 0.936 0.947 0.957 0.967 0.961

0.250 0.237 0.186 0.089 0.099 0.064 0.053 0.043 0.033 0.039

SCM: Sample Covariance Matrix.

Table 9

ˆ and w ˆ for r2,t . Estimation results of Σ

SCM

α = 0.1 M D P D E

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

ˆ )1,1 (Σ

ˆ )1,2 (Σ

ˆ )1 (w

ˆ )2 (w

6.416

3.968

ˆ )2,2 (Σ 9.384

0.689

0.311

6.588 6.402 6.659 6.310 6.265 6.212 5.957 5.931 5.872 5.615

4.192 4.107 4.344 4.143 4.145 4.140 4.638 4.672 4.692 4.765

9.666 9.755 9.704 9.820 9.892 9.981 9.830 9.887 9.952 10.27

0.696 0.711 0.698 0.724 0.730 0.738 0.797 0.806 0.817 0.866

0.304 0.289 0.302 0.276 0.270 0.262 0.203 0.194 0.183 0.134

SCM: Sample Covariance Matrix.

Fig. 1. The return series of r1,t and r2,t .

135

136

B. Kim, S. Lee / Computational Statistics and Data Analysis 57 (2013) 125–140

Proof. Note that

1 |Vα (θ; x)| = fθ1+α (z )dz − 1 + fθα (x) α 1 f α (x) ≤ fθ1+α (z )dz + 1 + α θ := Wα,0 (θ ) + Qα,0 (θ; x). Let wα,0 := supθ∈Θ Wα,0 (θ ) and qα,0 (x) := supθ ∈Θ Qα,0 (θ; x). Since Θ is compact and Wα,0 (θ ) and Qα,0 (θ; x) are continuous in θ ∈ Θ and Wα,0 (θ ) ≤ wα,0 < ∞, Qα,0 (θ; x) ≤ qα,0 (x) < ∞ for all x ∈ Rd . Further, qα,0 (x) is uniformly bounded on Rd , so that qα,0 (x) ≤ q′α,0 for some constant q′α,0 . Let lα,0 := wα,0 + q′α,0 . Then, |Vα (θ; x)| ≤ lα,0 . Now, we consider the case of r = 1. Note that

∂ fθ (x) = φ(x; µj , Σj ) − φ(x; µm , Σm ) for 1 ≤ j ≤ m − 1, ∂ωj d ∂ fθ (x) = ωj φ(x; µj , Σj ) (xl − (µj )l )(Σj−1 )l,t for 1 ≤ j ≤ m, 1 ≤ t ≤ d, ∂(µj )t l=1 ∂ fθ (x) 1 = − ωj φ(x; µj , Σj ) (Σj−1 )r ,s − (x − µj )′ Σj−1 Jrs Σj−1 (x − µj ) ∂(Σj )r ,s 2

for 1 ≤ j ≤ m, 1 ≤ r ≤ d, 1 ≤ s ≤ r ,

where Jrs is a d × d matrix of which (r , s)-th element is equal to 1 and all other elements are 0. From the fact that ωj φ(x; µj , Σj )/fθ (x) ≤ 1 for 1 ≤ j ≤ m, we have

∂ f 1+α (z ) ∂ Vα (θ; x) 1 ∂ fθα (x) θ = dz − 1 + ∂ω ∂ωj α ∂ωj j 1 1 + ≤ (1 + α) fθα (z )(φ(z ; µj , Σj ) − φ(z ; µm , Σm ))dz + fθα (x) ωj ωm := (1 + α) Wα,1,1 (θ ) + Qα,1,1 (θ; x) for 1 ≤ j ≤ m − 1, α 1+α ∂ Vα (θ; x) = ∂ fθ (z ) dz − 1 + 1 ∂ fθ (x) ∂(µ ) ∂(µj )t α ∂(µj )t j t d −1 α ωj (Σj )l,t fθ (z )(zl − (µj )l )φ(z ; µj , Σj )dz ≤ (1 + α) l =1 d + fθα (x) (xl − (µj )l )(Σj−1 )l,t l=1 := (1 + α) Wα,1,2 (θ ) + Qα,1,2 (θ; x) for 1 ≤ j ≤ m, 1 ≤ t ≤ d, α 1 +α ∂ Vα (θ; x) = ∂ fθ (z ) dz − 1 + 1 ∂ fθ (x) ∂(Σ ) ∂(Σj )r ,s α ∂(Σj )r ,s j r ,s 1+α −1 −1 α ′ −1 ≤ fθ (z )ωj φ(z ; µj , Σj )((Σj )r ,s − (z − µj ) Σj Jrs Σj (z − µj ))dz 2 +f α (x) (Σ −1 )r ,s − (x − µj )′ Σ −1 Jrs Σ −1 (x − µj ) θ

:=

j

1+α 2

j

j

Wα,1,3 (θ ) + Qα,1,3 (θ; x) for 1 ≤ j ≤ m, 1 ≤ r ≤ d, 1 ≤ s ≤ r .

Let wα,1,i = supθ∈Θ Wα,1,i (θ ) and qα,1,i (x) = supθ∈Θ Qα,1,i (θ; x), i = 1, 2, 3. Since Wα,1,i (θ ) and Qα,1,i (θ; x) are continuous in θ ∈ Θ , Wα,1,i (θ ) ≤ wα,1,i < ∞ and Qα,1,i (θ; x) ≤ qα,1,i (x) < ∞ for all x ∈ Rd , i = 1, 2, 3. Further, qα,1,i (x) is uniformly bounded on Rd , so that qα,1,i (x) ≤ q′α,1,i for some constants q′α,1,i , i = 1, 2, 3. Let lα,1 = max{(1 + α)(wα,1,1 + q′α,1,1 ), (1 +

α)(wα,1,2 + q′α,1,2 ),

1+α (wα,1,3 + q′α,1,3 )}. Then, |∂ Vα (θ; x)/∂θi | ≤ lα,1 for 1 ≤ i ≤ ρ . The cases of r = 2, 3 can be derived 2 similarly. This establishes the lemma.

The following is also needed to verify Theorem 3.1. Lemma 6.2. Let X1 , X2 , . . . be strictly stationary and ergodic, if (a) Θ is compact; (b) A(x, θ ) is continuous in θ for all x;

B. Kim, S. Lee / Computational Statistics and Data Analysis 57 (2013) 125–140

137

(c) there exists B(x) such that EB(X ) < ∞ and |A(x, θ )| ≤ B(x) for all x and θ , then

n 1 P lim sup A(Xt , θ ) − a(θ ) = 0 = 1, n→∞ θ∈Θ n t =1

(6.6)

where a(θ ) = EA(X , θ ). In addition, if there exists θ0 = argminθ∈Θ a(θ ) and it is unique, then P {θˆn → θ0 , n → ∞} = 1, where θˆn = argmin

n 1

t =1

n

θ ∈Θ

(6.7)

A(Xt , θ ).

Proof. Since the proof of (6.6) is standard (cf. Ferguson, 1996, pp. 107–111), we do not provide a detailed proof. Below, we prove (6.7). For any δ > 0, let S = {θ ∈ Θ : ∥θ − θ0 ∥ ≥ δ} be a compact subset of Θ . Then, ∥θˆn − θ0 ∥ > δ implies min θ∈Θ

n 1

n t =1

A(Xt , θ ) =

n 1

n t =1

A(Xt , θˆn ) = min θ∈S

n 1

n t =1

A(Xt , θ ).

Hence, we can have

n n 1 1 a(θ0 ) − min a(θ ) ≤ min A(Xt , θ ) − min a(θ ) + min A(Xt , θ ) − min a(θ ) θ∈Θ θ∈S θ∈S θ∈S n t =1 θ∈Θ n t =1 n 1 ≤ 2 sup A(Xt , θ ) − a(θ ) . θ∈Θ n t =1 Note that due to conditions (b), (c) and the dominated convergence theorem, we have a(θ ′ ) = EA(X, θ ′ ) → EA(X, θ ) = a(θ )

as θ ′ → θ ,

and subsequently, a(θ ) is continuous in θ . Hence, from the uniqueness of θ0 , it follows that

a(θ0 ) − min a(θ ) = min a(θ ) − a(θ0 ) := ϵδ > 0. θ∈S θ∈S Therefore, by (6.6) we have

0 ≤ P

∞ ∞

(∥θˆn − θ0 ∥ > δ)

N =1 n=N

≤P

∞ ∞

N =1 n=N

n 1 ϵ δ sup = 0. A(Xt , θ ) − a(θ ) ≥ >0 2 θ ∈Θ n t =1

This completes the proof.

Proof of Theorem 3.1. Since Vα (θ; x) is continuous in θ ∈ Θ for all x ∈ Rd and |Vα (θ; x)| is bounded by lα,0 for all x ∈ Rd

and θ ∈ Θ by Lemma 6.1, the conditions in Lemma 6.2 are fulfilled. Therefore, by (3.5), P {θˆα,n → θα , n → ∞} = 1. This completes the proof. The following three lemmas are useful to prove Theorem 3.2. Lemma 6.3. Under the conditions in Theorem 3.2,

√

d

nUα,n (θα ) − → N (0, Kα ) as n → ∞.

Proof. We use the Cramer–Wold device. For any λ = (λ1 , λ2 , . . . , λρ ) ∈ Rρ with ∥λ∥ = 1, let λ

Yt :=

1

ρ

1 + α i=1

∂ Vα (θα ; Xt ) 1 ′ ∂ Vα (θα ; Xt ) λi = λ . ∂θi 1+α ∂θ

Then, we can easily check

138

B. Kim, S. Lee / Computational Statistics and Data Analysis 57 (2013) 125–140

(a) {Ytλ } is a strictly stationary real-valued process; (b) {Ytλ } is strong mixing of size −γ /(γ − 2) for some γ > 2; ∂ dα (g ,f

∂ V (θ ;X )

)

1 1 (c) EYtλ = 1+α λ′ E α ∂θα t = 1+α λ′ ∂θ θα = 0. Further, by the Cauchy inequality, Jensen’s inequality with convex function ϕ(t ) = t γ /2 , and Lemma 6.1, we have

γ ρ ∂ Vα (θα ; Xt ) E E |Yt | = λi (1 + α)γ i=1 ∂θi γ2 ρ ∂ Vα (θα ; Xt ) 2 1 E ≤ (1 + α)γ ∂θi i =1 γ ρ γ −2 ∂ Vα (θα ; Xt ) 1 2 ≤ ρ E (1 + α)γ ∂θi i=1 1

λ γ

<∞

(6.8)

and

1

Var

n

n

λ

1

=

Yt

n (1 + α)2

t =1

n t =1

ρ

i =1

Var

ρ n t =1 i=1

ρ n

1

1

=

Since E

1

n (1 + α)2

E

∂ Vα (θα ; Xt ) λi ∂θi λi

t =1 i=1

∂ Vα (θα ; Xt ) ∂θi

2

α ;Xt ) λi ∂ Vα (θ = λ′ nt=1 E ∂ Vα (θ∂θα ;Xt ) = λ′ nt=1 ∂θi

− E

ρ n t =1 i =1

∂ dα (g ,fθα ) ∂θ

2 ∂ Vα (θα ; Xt ) λi . ∂θi

= 0, the second term in (6.9) is 0. Hence, the

argument in (6.9) equals

λE Let E

n n 1 ∂ Vα (θα ; Xt ) ∂ Vα (θα ; Xs )

1

′

(1 + α)2 n

∂θ

t =1

λ := λ′ Kα,n λ.

∂θ ′

s=1

∂ Vα (θα ;Xt ) ∂ Vα (θα ;Xs ) ∂θi ∂θj

:= Γα,ij (t − s). Note that by Lemma 2 of Billingsley (1995, p. 365) and Lemma 6.1, ∂ Vα (θα ; Xk+1 ) ∂ Vα (θα ; X1 ) ≤ 4lα,1 2 τ (k) |Γα,ij (k)| = E ∂θ ∂θ i

j

and

∂ Vα (θα ; X1 ) ∂ Vα (θα ; Xk+1 ) ≤ 4lα,1 2 τ (k). |Γα,ij (−k)| = E ∂θi ∂θj Using this fact and τ (k) ≤ Bkζ for some ζ < −γ /(γ − 2) < −1, B > 0, we have ∞

−1

|Γα,ij (k)| =

k=−∞

=

|Γα,ij (k)| +

∞

k=−∞

k=1

∞

∞

|Γα,ij (−k)| +

k=1

|Γα,ij (k)| + |Γα,ij (0)| |Γα,ij (k)| + |Γα,ij (0)|

k=1

≤ 2 · 4lα,1 2

∞

τ (k) + 4lα,1 2 τ (0)

k =1

< 4lα,1

2

2B

∞

−γ /(γ −2)

k

+ τ (0) < ∞,

k=1

which indicates that Γα,ij (k) is absolutely summable. Since

E

1

n n 1 ∂ Vα (θα ; Xt ) ∂ Vα (θα ; Xs )

(1 + α)2 n

t =1

∂θi

s=1

∂θj

=

→

1

1

n −1

(1 + α)2 n k=−(n−1) ∞

1

(1 + α)

2 k=−∞

(6.9)

(n − |k|)Γα,ij (k)

Γα,ij (k) as n → ∞,

B. Kim, S. Lee / Computational Statistics and Data Analysis 57 (2013) 125–140

139

which implies 1 n

Var

n

λ

→ λ′ Kα λ as n → ∞,

Yt

t =1

1 where (Kα )i,j = (1+α) 2

∞

k=−∞

Γα,ij (k), by (a)–(c), (6.8), and Theorem 1.7 of Peligrad (1986), we have

n √ 1 λ d Yt − → N (0, λ′ Kα λ). λ′ nUα,n (θα ) = √

n t =1

This validates the lemma.

Lemma 6.4. Under the conditions in Theorem 3.2, Rα,n converges to Jα a.s. Proof. The second term of (3.4) converges almost surely to 0, since

ρ ρ n 1 ∗ ∗ ∂ 3 Hα,n (θα, ∂ 3 Vα (θα, n) n ; Xt ) ((θˆα,n )k − (θα )k ) = ((θˆα,n )k − (θα )k ) k=1 ∂θi ∂θj ∂θk n k=1 t =1 ∂θi ∂θj ∂θk ≤

ρ n 1

n k=1 t =1

lα,3 ∥θˆα,n − θα ∥

by Lemma 6.1

a.s.

= ρ lα,3 ∥θˆα,n − θα ∥ −−→ 0 as n → ∞ by Theorem 3.1. 2

Note that since {Xt } is strictly stationary and ergodic,

∂ Vα (θα ;Xt ) ∂θi ∂θj

is also stationary and ergodic for all 1 ≤ i, j ≤ ρ . Hence,

n 1 ∂ 2 Vα (θα ; Xt ) a.s. ∂ 2 Vα (θα ; X1 ) ∂ 2 Hα,n (θα ) = −−→ E = −(1 + α)(Jα )i,j . ∂θi ∂θj n t =1 ∂θi ∂θj ∂θi ∂θj

a.s. Therefore, (Rα,n )i,j − −→ (Jα )i,j as n → ∞. This completes the proof. Lemma 6.5. Under the conditions in Theorem 3.2,

√

n∥∆α,n ∥ = op (1).

a.s. Proof. Since det(Rα,n ) − −→ det(Jα ) in view of Lemma 6.4, by Egoroff’s Theorem, for given ϵ > 0, there exists an event A with P (A) < ϵ and an integer N0 such that on Ac and for all n > N0 ,

| det(Rα,n )| − | det(Jα )| ≤ det(Rα,n ) − det(Jα ) < 1 |det(Jα )| ̸= 0. 2

In particular,

det(Rα,n ) > 1 |det(Jα )| ̸= 0. 2

1 Therefore on A and for all n > N0 , R− α,n exists. Now, for any δ > 0, c

P

n∥∆α,n ∥ ≥ δ

√

=

n∥∆α,n ∥ ≥ δ, A + P

√

≤ P (A) + P

n∥∆α,n ∥ ≥ δ, Ac

√

n∥∆α,n ∥ ≥ δ, A

√

c

.

Further, on Ac and for all n > N0 ,

√

n∥∆α,n ∥ =

√

1 n∥Jα−1 (Jα − Rα,n )R− α,n Uα,n (θα )∥

√ 1 −1 ≤ ∥R− nUα,n (θα ) α,n − Jα ∥ = op (1) as n → ∞

by Lemmas 6.3 and 6.4. Hence, there exits N1 such that for all n > N1 > N0 , P n > N1 , P

√

n∥∆α,n ∥ ≥ δ ≤ 2ϵ . This establishes the lemma.

√

n∥∆α,n ∥ ≥ δ, Ac

≤ ϵ . Therefore, for all

Acknowledgments We are grateful to the associated editor and two referees for their valuable comments. This research was supported by Mid-career Researcher Program through NRF grant funded by the MEST (No. 2010-0000374).

140

B. Kim, S. Lee / Computational Statistics and Data Analysis 57 (2013) 125–140

References Basu, A., Harris, I.R., Hjort, N.L., Jones, M.C., 1998. Robust and efficient estimation by minimizing a density power divergence. Biometrika 85, 549–559. Billingsley, P., 1995. Probability and Measure, third ed. Wiley, New York. Bradley, R.C., 2007. Introduction to Strong Mixing Conditions. Kendrick Press, Heber City. Choi, K., 1969. Estimators for the parameters of a finite mixture of distributions. Annals of the Institute of Statistical Mathematics 21, 107–116. Clarke, B.R., Heathcote, C.R., 1994. Robust estimation of k-component univariate normal mixtures. Annals of the Institute of Statistical Mathematics 46, 83–93. Cutler, A., Cordero-Braña, O.I., 1996. Minimum Hellinger distance estimation for finite mixture models. Journal of the American Statistical Association 91, 1716–1723. Doukhan, P., 1994. Mixing: Properties and Examples. Springer, New York. Fabozzi, F.J., Kolm, P.N., Pachamanova, D.A., Focardi, S.M., 2007. Robust Portfolio Optimization and Management. Wiley, New Jersey. Ferguson, T.S., 1996. A Course in Large Sample Theory. Chapman and Hall, London. Fujisawa, H., Eguchi, S., 2006. Robust estimation in the normal mixture model. Journal of Statistical Planning and Inference 136, 3989–4011. Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J., Stahel, W.A., 1986. Robust Statistics: The Approach Based on Influence Functions. Wiley, New York. Hathaway, R.J., 1985. A constrained formulation of maximum likelihood estimation for normal mixture distributions. Annals of Statistics 13, 795–800. Ingrassia, S., 2004. A likelihood-based constrained algorithm for multivariate normal mixture models. Statistical Methods and Applications 13, 151–166. Kim, B., Lee, S., 2011. Robust estimation for the covariance matrix of multivariate time series. Journal of Time Series Analysis 32, 469–481. Laird, N.M., 1978. Nonparametric maximum likelihood estimation of a mixing distribution. Journal of the American Statistical Association 73, 805–811. Lindsay, B.G., 1983. The geometry of mixing likelihoods: a general theory. Annals of Statistics 11, 86–94. Ma, Y., Genton, M.G., 2000. Highly robust estimation of the autocovariance function. Journal of Time Series Analysis 21, 663–684. Merton, R., 1980. On estimating expected returns on the market: an exploratory investigation. Journal of Financial Economics 8, 323–361. Peligrad, M., 1986. Recent advances in the central limit theorem and its weak invariance principle for mixing sequences of random variables (a survey). In: Eberlein, E., Taqqu, M.S. (Eds.), Dependence in Probability and Statistics. Birkhäuser, Boston, pp. 193–223. Redner, R., 1981. Note on the consistency of the maximum likelihood estimate for nonidentifiable distributions. Annals of Statistics 9, 225–228. Redner, R., Walker, H.F., 1984. Mixture densities, maximum likelihood and the EM algorithm. SIAM Review 26, 195–239. Ronchetti, E., 1997. Robustness aspects of model choice. Statistica Sinica 7, 327–338. Scott, D.W., 2001. Parametric statistical modelling by minimum integrated square error. Technometrics 43, 274–285. Sundberg, R., 1974. Maximum likelihood theory for incomplete data from an exponential family. Scandinavian Journal of Statistics 1, 49–58. Woodward, W.A., Parr, W.C., Schucany, W.R., Lindsay, H., 1984. A comparison of minimum distance and maximum likelihood estimation of a mixture proportion. Journal of the American Statistical Association 79, 590–598.