- Email: [email protected]

On asymptotic distributions of normal theory MLE in covariance structure analysis under some nonnormal distributions Ke-Hai Yuana , Peter M. Bentler b; ∗ b Departments

a Department

of Psychology, University of North Texas, Denton, TX 76203-1280, USA of Psychology and Statistics, University of California, Box 951563, Los Angeles, CA 90095-1563, USA Received November 1996; received in revised form September 1997

Abstract Within some classes of nonnormal distributions, we study the asymptotic distribution of the MLE in a covariance structure model based on an incorrect assumption of normality. The asymptotic covariance matrix of the MLE has a similar form as found when the sampling distribution is elliptical, though the true sampling distribution can have arbitrary marginal skewnesses and kurtoses. The asymptotic covariance of some subset of the parameter estimators can be obtained c 1999 Elsevier Science B.V. by rescaling its normal theory counterpart. Speci c models are considered as examples. All rights reserved MSC: 62E20; 62H25 Keywords: MLE; Pseudo-elliptical distribution; Factor analysis

1. Introduction The asymptotic distribution of the MLE of parameters in a covariance structure model based on a multivariate normality assumption is studied under violation of assumptions, that is, when the true sampling distributions are far from normal. This topic is related to the previous work by Browne (1984), Kano (1992), Satorra and Bentler (1988, 1994), and Shapiro and Browne (1987). These authors studied properties of the MLE within the class of elliptical distributions (Fang et al., 1990). We brie y review existing results within the class of elliptical distributions, some of which are still valid in classes of asymmetric distributions as will be studied in Section 2. Examples are presented in Section 3. ∗ Corresponding author. Tel.: +1-310-825-2893; fax: +1-310-206-4315; e-mail: [email protected] This research was supported by grants DA01070 and DA00017 of the National Institute on Drug Abuse at the U.S. National Institutes of Health.

c 1999 Elsevier Science B.V. All rights reserved 0167-7152/99/$ – see front matter PII: S 0 1 6 7 - 7 1 5 2 ( 9 8 ) 0 0 1 7 1 - 0

108

K.-H. Yuan, P.M. Bentler / Statistics & Probability Letters 42 (1999) 107 – 113

Let X be a p-dimensional random vector with E(X ) = and Cov(X ) = . For a hypothetical structure = (0 ), estimator ˆn of 0 is classically obtained by minimizing the Wishart likelihood function FML () = tr{S−1 ()} − log|S−1 ()| − p;

(1.1)

where S is the sample covariance matrix based on a sample of size N = n + 1. For a symmetric matrix B, let vech(B) be the vector formed by stacking the columns of B leaving out the elements above the diagonals. Denote ()=vech(()) and =Cov[vech{(X −)(X −)0 }]. We are interested in the asymptotic distribution of ˆn . For this, we will assume that regularity conditions, such as those in Yuan and Bentler (1997a), hold for (). Let Dp be the duplication matrix as in Magnus and Neudecker (1988, p. 49), W = 2−1 Dp0 (−1 ⊗ −1 )Dp ; ˙ = @(0 )[email protected]0 . Then √ L n(ˆn − 0 ) → N(0; ); (1.2) ˙ −1 and = ˙0 W W . ˙ There are some special cases for which the where = N N with N = (˙0 W ) matrix can be simpli ed. When X ∼ Np (; ); = W −1 , so = N in such a case. When X follows an elliptical distribution with marginal kurtosis , 0

= 2 Dp+ ( ⊗ )Dp+ + ( − 1)0 : If we further assume that there exist constants c1 ; : : : ; cq such that = c1 ˙ 1 + · · · + cq ˙ q ;

(1.3) (1.4)

then

= N + ( − 1)cc0 ;

(1.5)

where q is the number of unknown parameters, ˙ j = @(0 )[email protected]j and c = (c1 ; : : : ; cq )0 . The result in (1.5) has been discussed by Browne (1984), Shapiro and Browne (1987), and Satorra and Bentler (1988, 1994). Let (r) ˆn = (ˆn1 ; : : : ; ˆnr )0 and N(r) be the corresponding submatrix of N , then when c1 = · · · = cr = 0, √ (r) L n(ˆn − 0(r) ) → N(0; N(r) ); (1.6) (r) implying that the asymptotic covariance of ˆn can be obtained by rescaling N(r) . The c1 ; : : : ; cq can be obtained from the model structure and parameter estimates. So we can get a consistent ˆ

if a consistent estimator of is available. Browne (1984) suggested estimating by Mardia’s (1970) multivariate kurtosis parameter

= E{(X − )0 −1 (X − )}2 ={p(p + 2)}:

(1.7)

Since = within the class of elliptical distributions, ˆ =

N 1 X {(Xi − X )0 S −1 (Xi − X )}2 ={p(p + 2)} N i=1

is a consistent estimate of . Let p∗ = p(p + 1)=2 and ˙ −1 ˙0 W; U = W − W ( ˙ ˙0 W )

(1.8)

is given Satorra and Bentler (1994) observed that all of the p∗ −q nonzero eigenvalues of U equal if by (1.3). Thus = tr(U )=(p∗ − q) = . Let Yi = vech{(Xi − X )(Xi − X )0 }; SY be the corresponding sample covariance matrix of Yi . Then, ˆ = tr(Uˆ SY )=(p∗ − q) is also consistent for if X follows an elliptical distribution, where Uˆ is obtained by substituting ˆn for 0 in (1.8). Even though both and equal when X follows an elliptical distribution, and are generally not equal. We will further discuss this for speci c asymmetric distributions in the next section.

K.-H. Yuan, P.M. Bentler / Statistics & Probability Letters 42 (1999) 107 – 113

109

2. Asymptotic covariance of MLE under some nonnormal distributions Motivated by data sets encountered in practice and the stochastic decomposition of an elliptical distribution (Fang et al., 1990), Yuan and Bentler (1997b, 1999) de ned two classes of nonnormal distributions, both of which possess heterogeneous skewnesses and kurtoses. They studied the likelihood ratio statistic within these classes of distributions, but they did not study the asymptotic distributions of ˆn that will be given below. We brie y restate the two classes of nonnormal distributions and their fourth-order moment matrices. These will next be used for the study of the asymptotic distribution of ˆn . Since the population mean = E(X ) is a nuisance parameter and is not involved in covariance structure analysis, without loss of generality, we will assume = 0 in the rest of this paper. Let 1 ; : : : ; m be independent random variables with E( j ) = 0; E(2j ) = 1; E(3j ) = j ; E(4j ) = j , and = ( 1 ; : : : ; m )0 . Let r be a random variable which is independent of ; E(r 2 ) = 1; E(r 3 ) = , and E(r 4 ) = . Also, let m¿p and A = (aij ) = (a1 ; : : : ; am ) be a p × m matrix of rank p such that AA0 = . Then the random vector X = rA

(2.1)

generally follows an asymmetric distribution with Cov(X ) = . Actually, the X can have arbitrary marginal = skewness and kurtosis by properly selecting m; A; j ; j ; , and . The fourth-order moment matrix Cov{vech(XX 0 )} corresponding to (2.1) is given by =

2 Dp+ (

⊗

0 )Dp+

0

+ ( − 1) +

m X

(j − 3)vech(aj a0j )vech0 (aj a0j ):

(2.2)

j=1

in (2.2) reduces to that corresponding to an elliptical distribution in When all the j equal 3, the (1.3). Yuan and Bentler (1997b, 1999) called the corresponding distribution of X in (2.1) a pseudo elliptical distribution, since the distribution of X in (2.1) is not symmetric anymore. When = 1 in addition to j = 3, the corresponding distribution of X in (2.1) was called a pseudo normal distribution. It was also noted that for a given matrix A, dierent marginal skewnesses will not in uence the matrix in (2.2). Let 1 ; : : : ; m be independent random variables with E( j ) = 0; E( 2j ) = 1; E( 3j ) = j ; E( 4j ) = 3, and = ( 1 ; : : : ; m )0 . So there is no excess kurtosis in the distribution of each j . Let r1 ; : : : ; rp be independent random variables with E(ri ) = ; E(ri2 ) = 1; E(ri3 ) = i ; E(ri4 ) = i ; R = diag(r1 ; : : : ; rp ), and R and be independent. Let A = (aij ) be a p × m nonstochastic matrix of rank p and AA0 = G = (gij ). Another class of nonelliptical random vectors is given by X = RA:

(2.3)

It is obvious that = E(XX 0 ) = 2 G + (1 − 2 )DG ;

(2.4)

where DG = diag(G), and cannot be too small in order for to be positive de nite. Similar to the X in (2.1), the X in (2.3) can also have arbitrary marginal skewness and kurtosis. The matrix = Cov{vech(XX 0 )} is much more complicated, and so will be the corresponding matrix . We only give the fourth-order moment matrix when = diag(11 ; : : : ; pp ), that is, 0

= 2Dp+ ( ⊗ )Dp+ + 3

p X

( i − 1)ii2 vech(ei ei0 )vech0 (ei ei0 );

(2.5)

i=1

where ei is the ith unit vector of dimension p. When i = 1, (2.5) corresponds to the fourth order moment matrix of a pseudo-normal distribution. Similarly, the dierent marginal skewnesses of X in (2.3) will not aect (2.5).

110

K.-H. Yuan, P.M. Bentler / Statistics & Probability Letters 42 (1999) 107 – 113

We will rst study the asymptotic distribution of ˆn when sampling from (2.1). We need to assume that, for j = 1; : : : ; m, there exist constants c1( j) ; : : : ; cq( j) such that aj a0j = c1( j) ˙ 1 + · · · + cq( j) ˙ q :

(2.6)

Note that (1.4) is implied by (2.6) which depends on the data generation mechanism and the model structure. We will discuss this for speci c models in the next section. For a sample from (2.1), the asymptotic distribution of ˆn is given in the following theorem. Theorem 2.1. Let X be as in (2.1); then under condition (2.6) the asymptotic covariance of ˆn is given by

= N + ( − 1)cc0 +

m X

0

(j − 3)c( j) c( j) ;

(2.7)

j=1

where c = (c1 ; : : : ; cq )0 and c( j) = (c1( j) ; : : : ; cq( j) )0 . Proof. Let

1

0

= 2Dp+ ( ⊗ )Dp+ ,

1 = ˙0 W

1 W ˙

2

= 0 , and

2 W ˙

= vech(aj a0j )vech0 (aj a0j ). Then

= N−1 :

(2.8)

It follows from (1.4) and (2.6) that = c ˙ and 2 = ˙0 W

3j

vech(aj a0j )

= N−1 cc0 N−1

(j)

= c ˙ , which leads to (2.9)

and 3j = ˙0 W

3j W ˙

0

= N−1 c( j) c( j) N−1 :

(2.10)

The theorem follows from (1.2), (2.2), and (2.8) – (2.10). The matrix in (2.7) is very similar to the one in (1.5). When j = 3, condition (2.6) can be replaced by condition (1.4) in Theorem 2.1 and (2.7) reduces to (1.5). So the corresponding to a pseudo elliptical distribution is exactly the same as that corresponding to an elliptical distribution. When = 1; = N , indicating that the normal theory method can be used for a skewed data set sampled from a pseudo normal distribution. If model hypothesis implies that c1 = · · · = cr = 0 and c1( j) = · · · = cr( j) = 0, the upper left submatrix of in (2.7) reduces to N(r) . This is also the same as when the sample is from an elliptical distribution. We need a consistent estimator of in order to rescale N(r) . The following lemma is from Yuan and Bentler (1999). Lemma 2.1. Let X be given as in (2.1), then under (2.6) the scaling factors are given respectively by = and m X (i − 3)(a0i −1 ai )2 : (2.11) = + p(p + 2) i=1

Under the condition (2.6), ˆ is always consistent for if sampling from (2.1). On the other hand, ˆ is generally not consistent for . When j =3 and X follows a pseudo elliptical distribution, ˆ is also a consistent estimator for . So the scaling factor that originated from an elliptical distribution is also valid in the class of pseudo elliptical distributions. But ˆ is consistent in a much larger class of distributions. Interestingly, when = 1; P(r 2 = 1) = 1, the X in (2.1) can be far from normal, but the normal theory standard error is still valid for ˆn(r) in some special cases. This is essentially the result obtained by Anderson and Amemiya (1988).

K.-H. Yuan, P.M. Bentler / Statistics & Probability Letters 42 (1999) 107 – 113

111

Now we turn to the situation of sampling from (2.3) with = diag(11 ; : : : ; pp ). The following condition is needed: there exist constants c1(i) ; : : : ; cq(i) such that ei ei0 = c1(i) ˙ 1 + · · · + cq(i) ˙ q ;

(2.12)

for i = 1; : : : ; p. Comparing (2.5) with (2.2), we have the following result. Theorem 2.2. Let X be as in (2.3) with G = AA0 being a diagonal matrix, then under (2.12) the in (1.2) is given by

= N + 3

p X

0

( i − 1)ii2 c(i) c(i) ;

(2.13)

i=1

where c(i) = (c1(i) ; : : : ; cq(i) )0 . Even when is diagonal, the marginal xi of X may not be independent. Notice that 3 i is the marginal kurtosis. When i = 1, the in (2.13) reduces to N , and the normal theory standard error will give correct inference. A rescaling factor not approaching 1 cannot be used. The following lemma is from Yuan and Bentler (1999). Lemma 2.2. Let X be given as in (2.3); then under (2.12) the two scaling factors are = 1 and p

=1+

X 3 ( i − 1): p(p + 2)

(2.14)

i=1

Obviously, = 1 when i = 1, this corresponds to a pseudo-normal distribution. Generally, ˆ is not a consistent estimator of , while ˆ is always consistent under (2.12). Kano (1992) gave another correction factor which is also consistent in this case and in the case of a pseudo-elliptical distribution. 3. Examples We consider two models, one is a con rmatory factor model which is commonly used in psychology, education, and social sciences; the other is the classical uncorrelated variables model. Here we consider the form of the matrix in (1.2) corresponding to each model and each distribution. A con rmatory factor model is given by X = f + ;

(3.1) 0

where is the factor loading matrix, f = (f1 ; : : : ; fs ) is a vector of common factors with Cov(f) = ; = (1 ; : : : ; p )0 is a vector of unique factors or errors with Cov() = , and f and are assumed uncorrelated. This leads to a covariance structure = 0 + :

(3.2)

One popular structure for is 0 0 1 .. = 0 . 0 ; 0 0 s where j = (1j ; : : : ; sj j )0 . That is each observed variable only depends on one common factor. is an s × s symmetric matrix, and = diag( 1 ; : : : ; p ) is a diagonal matrix. In order for model (3.2) to be identi able,

112

K.-H. Yuan, P.M. Bentler / Statistics & Probability Letters 42 (1999) 107 – 113

it is necessary to x the scale of each factor fj . This can be obtained by xing the last element sj j in each j . Under these conditions, we have ˙ ij = (0p×( j−1) ; ek ; 0p×(p−j) )0 + (0p×( j−1) ; ek ; 0p×(p−j) )0 for i = 1; : : : ; sj − 1; j = 1; : : : ; s, where ek is a p-dimensional unit vector with k = matrix of 0’s; ˙ ij = (ei ej0 + ej ei0 )0 ;

˙ ii = ei ei0 0 ;

(3.3a) Pj−1 l=1

i; j = 1; : : : ; s;

sl + i; 0p×( j−1) is a (3.3b)

where either ei or ej is of s-dimension; and ˙ i = ei ei0 ;

i = 1; : : : ; p;

(3.3c)

where ei is of p-dimension. The uncorrelated variables model is = diag(1 ; : : : ; p )

(3.4)

˙ i = ei ei0 ;

(3.5)

and i = 1; : : : ; p:

When sampling from (2.1) and the matrix A is as A = (a1 ; : : : ; as ; as+1 ; : : : ; as+p ) = (1=2 ; 1=2 )

(3.6)

in the con rmatory factor model, it is obvious that the ai a0i satisfy (2.6). In particular, =

X

ij ˙ ij +

i6j

p X

˙ i;

i

i=1

so c = (00p−s ; vech0 (); then for i = 1; : : : ; s, ai a0i = bi b0i 0 =

0 1; : : : ; p) ,

X

where 0p−s is a vector of p − s zeros. Let B = (bij ) = (b1 ; : : : ; bs ) = 1=2 ,

bji bli ˙ ij ;

l6j

so c(i) = (00p−s ; vech0 (bi b0i ); 00p )0 for i = 1; : : : ; s. For i = s + j ai a0i =

0 j ej ej

=

˙ j;

j

(i) = 0, the so c(s+j) = (00p−s+s∗ ; 00j−1 ; j ; 00p−j )0 for j = 1; : : : ; p. Since c1 = · · · = cp−s = 0 and c1(i) = · · · = cp−s estimators of the r = p − s factor loadings have a distribution form as in (1.6). If A is dierent from that in (3.6), e.g., A = 1=2 , then (2.6) may not be met, and we cannot have the form (2.7) for the asymptotic √ covariance of n(ˆn − 0 ). For the uncorrelated variables model, if we choose A = (a1 ; : : : ; ap ) with ai = i1=2 ei , then (1.4) and (2.6) are obviously met. So the estimator ˆn follows a distribution form as in (2.7). No subset of the parameter estimator has an asymptotic covariance as in (1.6). Other forms of A may not satisfy (2.6), and no simple form as in (2.7) for can be obtained. For a sample generated from (2.3), it is obvious that (2.12) is satis ed for the uncorrelated variables model with any matrix A in (2.3) such that AA0 is diagonal. The asymptotic covariance of ˆn is as in (2.7), and no subset of the parameter estimator possesses an asymptotic distribution as in (1.6).

K.-H. Yuan, P.M. Bentler / Statistics & Probability Letters 42 (1999) 107 – 113

113

References Anderson, T.W., Amemiya, Y., 1988. The asymptotic normal distribution of estimators in factor analysis under general conditions. Ann. Statist. 16, 759–771. Browne, M.W., 1984. Asymptotic distribution-free methods for the analysis of covariance structures. British J. Math. Statist. Psychol. 37, 62–83. Fang, K.-T., Kotz, S., Ng., K.W., 1990. Symmetric Multivariate and Related Distributions. Chapman & Hall, London. Kano, Y., 1992. Robust statistics for test-of-independence and related structural models. Statist. Probab. Letters 15, 21–26. Magnus, J.R., Neudecker, H., 1988. Matrix Dierential Calculus with Applications in Statistics and Econometrics. Wiley, New York. Mardia, K.V., 1970. Measure of multivariate skewness and kurtosis with applications. Biometrika 57, 519–530. Satorra, A., Bentler, P.M., 1988. Scaling corrections for chi-square statistics in covariance structure analysis. American Statistical Association, 1988. Proc. Business and Economics Sections, American Statistical Association, VA, pp. 308–313. Satorra, A., Bentler, P.M., 1994. Corrections to test statistics and standard errors in covariance structure analysis. In: von Eye, A., Clogg, C.C. (Eds.), Latent Variables Analysis: Applications for Developmental Research, Sage, Thousand Oaks, CA, pp. 399 – 419. Shapiro, A., Browne, M., 1987. Analysis of covariance structures under elliptical distributions. J. Amer. Statist. Assoc. 82, 1092–1097. Yuan, K.-H., Bentler, P.M., 1997a. Mean and covariance structure analysis: theoretical and practical improvements. J. Amer. Statist. Assoc. 92, 767–774. Yuan, K.-H., Bentler, P.M., 1997b. Generating multivariate distributions with speci ed marginal skewness and kurtosis. In: Bandilla, W., Faulbaum, F. (Eds.), SoftStat’97: Advances in Statistical Software 6. Lucius & Lucius, Stuttgart, pp. 385 –391. Yuan, K.-H., Bentler, P.M., 1999. On normal theory and associated test statistics in covariance structure analysis under two classes of nonnormal distributions. Statist. Sinica, to appear.