# The likelihood ratio test for a separable covariance matrix

## The likelihood ratio test for a separable covariance matrix

ARTICLE IN PRESS Statistics & Probability Letters 73 (2005) 449–457 www.elsevier.com/locate/stapro The likelihood ratio test for a separable covaria...
ARTICLE IN PRESS

Statistics & Probability Letters 73 (2005) 449–457 www.elsevier.com/locate/stapro

The likelihood ratio test for a separable covariance matrix Nelson Lua,1, Dale L. Zimmermanb, b

a Wyeth Research, Pearl River, NY 10965, USA Department of Statistics and Actuarial Science, University of Iowa, Iowa City, IA 52242, USA

Received 4 November 2004; received in revised form 11 March 2005 Available online 10 May 2005

Abstract We consider the problem of testing whether a covariance matrix has a separable (Kronecker product) structure. Such structure is of particular interest when the observed variables can be cross-classiﬁed by two factors, as occurs for example when comparable or identical characteristics are measured on several parts of each subject. We derive the likelihood ratio test for separability on the basis of a random sample from a multivariate normal population, and we establish an invariance property of the test statistic that allows us to table its null distribution. An example illustrates the methodology. r 2005 Elsevier B.V. All rights reserved. Keywords: Kronecker product; Likelihood ratio test; Separability

1. Introduction This article is concerned with inference on the covariance structure of a particular type of multivariate data for which the variables, not the subjects, can be cross-classiﬁed by two factors. Such data occur in several important sampling situations, such as when comparable or identical characteristics are measured on several parts of each subject. An example we consider herein involves length and width measurements on the sepals and petals of a sample of iris ﬂowers. Another situation giving rise to crossed variables is that of a multivariate longitudinal study; in Corresponding author. Tel.: +1 319 335 0818; fax: +1 319 335 3017. 1

ARTICLE IN PRESS N. Lu, D.L. Zimmerman / Statistics & Probability Letters 73 (2005) 449–457

450

this situation, ‘‘characteristics’’ (the collection of variables measured on each subject at a single time) is one factor and ‘‘time’’ the other. Suppose that Y1 ; . . . ; YN represent N continuous multivariate observations of this type, each of dimensions mn  1, where m and n are the numbers of levels of the two crossed factors. We assume throughout that these observations are a random sample from a multivariate normal population with unknown mean vector l and unknown positive deﬁnite covariance matrix R ¼ ðsij Þ. A classical multivariate approach to the modeling and analysis of these data would take R to be unstructured, yielding maximum-likelihood estimates (mles) l^ ¼ Y ¼

N 1X Yt N t¼1

N X b¼ 1 and R ðYt  YÞðYt  YÞ0 , N t¼1

(1)

provided that N4mn. However, for reasons of parsimony or more efﬁcient mean estimation, or b to be positive deﬁnite almost surely, it because the sample size may be insufﬁciently large for R may be desirable or necessary to assume that R has a more structured form (Wolﬁnger, 1996). In the context of crossed variables, a natural structure to consider is the Kronecker product structure, i.e., R ¼ R1  R2 for two matrices R1 and R2 . This structure separates, in a sense, the covariance structure of all the variables into covariance structures attributable to each of the two factors, hence a covariance matrix with this structure is commonly said to be separable. More precisely, we say that R is ðm; nÞ-separable if R ¼ R1  R2 where R1 is m  m and positive deﬁnite, and R2 is n  n and positive deﬁnite. Separability imposes a number of constraints on the variances of, and correlations among, the observed variables. Consider ﬁrst the simplest case, m ¼ n ¼ 2. Writing  R1 ¼

a

b

b

c

 and R2 ¼

a

b

b

g

!

and letting q represent the correlation matrix corresponding to R ¼ R1  R2 , we have 0 B B R¼B B @

aa

ab

ba

ag

bb ca

1

0

C bg C C cb C A cg

B B and q ¼ ðrij Þ ¼ B @

bb

pﬃﬃﬃﬃﬃ 1 b= ag 1

pﬃﬃﬃﬃﬃ pﬃﬃﬃﬃﬃﬃﬃﬃﬃ 1 b= ac bb= acag pﬃﬃﬃﬃﬃ pﬃﬃﬃﬃﬃﬃﬃﬃﬃ bb= acag b= ac C C pﬃﬃﬃﬃﬃ C. 1 b= ag A 1

Thus, (2,2)-separability imposes ﬁve non-redundant constraints on the variances and correlations, which can be expressed as follows: s11 s22 ¼ ; s33 s44

r12 ¼ r34 ;

r13 ¼ r24 ;

r23 ¼ r14 ;

r14 ¼ r12 r13 .

(2)

More generally, it can be shown that ðm; nÞ-separability imposes the following ﬁve non-redundant sets of constraints on variances and correlations, where we write r½ij ½kl for the ðk; lÞth element of

ARTICLE IN PRESS N. Lu, D.L. Zimmerman / Statistics & Probability Letters 73 (2005) 449–457

451

the ði; jÞth n  n block of q: s11 s22 snn ¼ ¼ ¼ sinþ1;inþ1 sinþ2;inþ2 sinþn;inþn

ði ¼ 1; . . . ; m  1Þ,

ð3Þ

r½ii ½kl ¼ r½11 ½kl

ði ¼ 2; . . . ; m; k ¼ 1; . . . ; n; l ¼ k þ 1; . . . ; nÞ,

ð4Þ

r½ij ½kk ¼ r½ij ½11

ði ¼ 1; . . . ; m; j ¼ i þ 1; . . . ; m; k ¼ 2; . . . ; nÞ,

ð5Þ

r½ij ½kl ¼ r½ij ½lk

ði ¼ 1; . . . ; m; j ¼ i þ 1; . . . ; m; k ¼ 1; . . . ; n; l ¼ k þ 1; . . . ; nÞ;

ð6Þ

r½ij ½kl ¼ r½ij ½11 r½11 ½kl

ði ¼ 1; . . . ; m; j ¼ i þ 1; . . . ; m; k ¼ 1; . . . ; n; l ¼ k þ 1; . . . ; nÞ.

ð7Þ

Of course, the constraints given by (3)–(7) could, in principle, be satisﬁed by an arbitrary set of mn variables, but they have much more meaningful interpretations when the variables are crossed (by two factors). For such variables, (3) is equivalent to the property that variances of variables within one level of a factor are proportional to the variances of variables within any other level of the same factor. Constraints (4) and (5) say that the correlation structure within one level of either factor is identical to the correlation structure within any other level of the same factor. Constraint (6) says that the two cross-correlations corresponding to any two distinct levels of the ﬁrst factor and any two distinct levels of the second factor are equal, and (7) adds that each cross-correlation factors into the product of the corresponding within-factor correlations. This article presents methodology for testing for ðm; nÞ-separability on the basis of a random sample from a multivariate normal population. The remainder of the article is organized as follows. Section 2 gives a brief review of an existing algorithm for obtaining the mle of R when it is ðm; nÞ-separable. In Section 3, we derive the likelihood ratio test statistic, establish its limiting distribution and some other properties, and obtain its ﬁnite-sample distribution empirically. An example is presented in Section 4. Some discussion in Section 5 completes the article. 2. An algorithm for maximum-likelihood estimation The likelihood ratio test for ðm; nÞ-separability to be derived in the next section requires that we be able to obtain the mle of a ðm; nÞ-separable covariance matrix. Although no explicit form exists for the mle, it can be obtained by an efﬁcient ‘‘ﬂip–ﬂop’’ algorithm that we now describe. This algorithm, which was derived, apparently independently, by Mardia and Goodall (1993), Dutilleul (1999), and Brown et al. (2001), is much faster than a more general-purpose optimization algorithm such as Newton–Raphson; for example, the ﬂip–ﬂop algorithm is about twice as fast when m ¼ n ¼ 2, 50 times faster when m ¼ n ¼ 3, and 5000 times faster when m ¼ n ¼ 6 (Lu and Zimmerman, 2004). Under our normality assumption and using well-known properties of the determinant and inverse of a nonsingular Kronecker product, the log-likelihood function is given by Nmn Nn Nm log 2p  log jR1 j  log jR2 j 2 2 2 N 1X 1 ðYt  lÞ0 ðR1  1  R2 ÞðYt  lÞ. 2 t¼1

log Lðl; R1  R2 Þ ¼ 

ARTICLE IN PRESS 452

N. Lu, D.L. Zimmerman / Statistics & Probability Letters 73 (2005) 449–457

By essentially the same arguments invoked in the classical case of an unstructured covariance matrix (see e.g. Anderson, 1984, Section 3.2), it can be shown that for any ﬁxed positive deﬁnite R1  R2 , log Lðl; R1  R2 Þ is maximized with respect to l by Y, and it sufﬁces to maximize the resulting proﬁle log-likelihood function Nmn Nn Nm log 2p  log jR1 j  log jR2 j 2 2 2 N 1X 1 ðYt  YÞ0 ðR1  1  R2 ÞðYt  YÞ. 2 t¼1

log LðY; R1  R2 Þ ¼ 

ð8Þ

Provided that ðN  1ÞnXm, it is easily shown that for any ﬁxed positive deﬁnite R2 , (8) is maximized with respect to positive deﬁnite R1 by N X n X n X e1 ¼ 1 R sð2Þ ðYtv  Yv ÞðYtu  Yu Þ0 , Nn t¼1 u¼1 v¼1 uv

(9)

where sð2Þ element of R1 2 , Ytu is the uth column of the m  n array of observations uv is the ðu; vÞth P N for subject t, and Yu ¼ t¼1 Ytu =N. Similarly, provided that ðN  1ÞmXn, it can be shown that for any ﬁxed positive deﬁnite R1 , log LðY; R1  R2 Þ is maximized with respect to positive deﬁnite R2 by N X m X m X   e2 ¼ 1 R sð1Þ ðY  Yv ÞðYtu  Yu Þ0 , Nm t¼1 u¼1 v¼1 uv tv

(10)

0

 where sð1Þ of R1 1 , Ytu is the uth row of the m  n array of observations for uv is the ðu; vÞth PN element  subject t, and Yu ¼ t¼1 Ytu =N. Expressions (9) and (10) indicate that log LðY; R1  R2 Þ can be maximized using an iterative e 2 . To begin this process, an e 1 and R ‘‘ﬂip–ﬂop’’ scheme, alternating between the computation of R initial value of R2 must be supplied; one obvious possibility is In . The process may be continued ðkÞ until a convergence criterion is satisﬁed. Letting RðkÞ 1 and R2 be the values produced at iteration k of (9) and (10), and letting kc be the iteration at which convergence is deemed to occur, we put

e ðkc Þ  R e ðkc Þ .  R2 ¼ R R1d 1 2

(11)

Provided that ðN  1ÞmXn and ðN  1ÞnXm, or equivalently that NX maxðmn ; mnÞ þ 1, every  R2 . Note that if m ¼ n then every iterate is positive deﬁnite with probability one, hence so is R1d iterate is positive deﬁnite even when the sample size is as small as two. By construction, values of the log-likelihood corresponding to successive iterates of the ﬂip–ﬂop b algorithm cannot decrease and, provided that N4mn, they are bounded above by log LðY; RÞ b where R was given in (1). Consequently, the algorithm is guaranteed to converge, at least when N4mn. But does it converge to a mle? Obtaining an answer to this question analytically is difﬁcult because the parameter space of ðm; nÞ-separable covariance matrices is not convex. Lu and Zimmerman (2004) demonstrate that when N ¼ 2 or N ¼ 3, the algorithm can converge to many different estimates, depending on the starting value, but the likelihood function at each of these estimates is identical. When NX4, however, it appears that the algorithm always converges to the same estimate regardless of starting value. Thus, for situations with samples of a size

ARTICLE IN PRESS N. Lu, D.L. Zimmerman / Statistics & Probability Letters 73 (2005) 449–457

453

commonly encountered in practice, the limit point of the ﬂip–ﬂop algorithm can safely be regarded as the unique mle.

3. The likelihood ratio test for separability Consider now the likelihood ratio test for ðm; nÞ-separability, i.e., the likelihood ratio test of H0 : R ¼ R1  R2

against HA : not H0 ,

(12)

where R1 and R2 are unspeciﬁed m  m and n  n positive deﬁnite matrices, respectively, but m and n are speciﬁed. We should note that several authors have previously considered likelihood ratio tests for hypotheses that superﬁcially resemble (12) but differ by specifying more assumptions on either R1 or R2 or both. For example, Rogers and Young (1975) test for the Kronecker product structure of (12) with R1 known, and Naik and Rao (2001) also test for (12) but assume that R1 has Huynh–Feldt structure, i.e. that R1 ¼ s2 ðI þ a10 þ 1a0 Þ where s2 40 and a is an unspeciﬁed m-vector. First observe that ðm; nÞ-separability and ðn; mÞ-separability are not equivalent when man; however, it can be readily demonstrated that ðm; nÞ-separability of varðYÞ is equivalent to ðn; mÞseparability of varðPYÞ where P is an appropriately chosen row-permutation matrix. Hence without loss of generality, we may always assume that mpn. We have that sup log Lðl; R; Y1 ; . . . ; YN Þ ¼  H0

Nmn N  R2 j log 2p  log jR1d 2 2

N 1X  ðYt  YÞ0 ðR1d  R2 Þ1 ðYt  YÞ 2 t¼1

Nmn N Nmn  R2 j  log 2p  log jR1d , 2 2 2 where R1d  R2 was given by (11) and where the second equality follows from Theorem 16.2.2 of Harville (1997). Using (1) we obtain ¼ 

sup log Lðl; R; Y1 ; . . . ; YN Þ ¼  H0 [HA

Nmn N b  Nmn log 2p  log jRj 2 2 2

and thus minus twice the log of the likelihood ratio test statistic is given by   b . T  N log jR1d  R2 j  log jRj By standard theory, T converges in distribution as N ! 1 to a chi-square random variable with mnðmn þ 1Þ nðn þ 1Þ mðm þ 1Þ   þ1 2 2 2 degrees of freedom. An important property of T is that it is invariant with respect to nonsingular linear transformations of the observations, as we now demonstrate. Let C be any mn  mn nonsingular n

ARTICLE IN PRESS 454

N. Lu, D.L. Zimmerman / Statistics & Probability Letters 73 (2005) 449–457

matrix and put Zt ¼ CYt for t ¼ 1; . . . ; N. Then lZ  EðZt Þ ¼ Cl and RZ  varðZt Þ ¼ CRC0 , and hence jRZ j ¼ jCj2 jRj: From this we obtain log LðlZ ; RZ ; Z1 ; . . . ; ZN Þ ¼ N log jCj þ log Lðl; R; Y1 ; . . . ; YN Þ, by which it follows that the likelihood ratio test statistic based on the transformed observations is equal to that based on the original observations. To investigate how well the ﬁnite-sample null distribution of T is approximated by the limiting chi-square distribution, we obtained the ﬁnite-sample null distribution via simulation. Several analytical results streamline this task. First and foremost, the invariance result noted above 1=2 1=2  R2 ) that the ﬁnite-sample null distribution of T does not implies (by considering C ¼ R1 depend on the actual values of R1 and R2 (or l). Therefore, for simplicity we may as well sample observations from a distribution for which R1 ¼ Im and R2 ¼ In (in which case R1  R2 ¼ Imn Þ and l ¼ 0, i.e. from N mn ð0; IÞ. Second, it P can be seen from (1), (9), and (10) that T depends on the observations only through 0 the value of N t¼1 ðYt  YÞðYt  YÞ , which under our N mn ð0; IÞ distribution for the observations has a Wishart distribution with N  1 degrees of freedom and covariance parameter Imn . Thus, it sufﬁces to draw samples from this Wishart distribution, which we accomplished using essentially the method of Smith and Hocking (1972). This method consists of ﬁrst constructing a lower triangular matrix T ¼ ðtij Þ for which the tij ’s are i.i.d. Nð0; 1Þ for i4j, t2ii has a w2 distribution with N  i degrees of freedom ðiP¼ 1; . . . ; mnÞ, and all nonnull tij ’s are independent. Next, TT0 is N 0 inP (1), and the appropriate submatrices of TT0 formed and used in place PNof t¼1 ðYt  YÞðYt  YÞ   0 N 0   are used in place of t¼1 ðYtv  Yv ÞðYtu  Yu Þ and t¼1 ðYtv  Yv ÞðYtu  Yu Þ in (9) and (10). This approach is more time-efﬁcient than sampling N times independently from N mn ð0; IÞ, especially when N is large; e.g., when m ¼ n ¼ 4 and N ¼ 400 we found it to be about 30 times faster. Table 1 displays estimated 90th, 95th, and 99th percentiles of the empirical null distribution of T for selected values of m, n, and N (with mpn). Standard approximate 95% nonparametric conﬁdence intervals (not shown) for the corresponding true percentiles indicated that the relative difference between the estimated and true 90th percentiles ranges from o0:1% when m ¼ n ¼ 8 to o1:0% when m ¼ n ¼ 2. Likewise, the relative difference between the estimated and true 99th percentiles ranges from o0:2% when m ¼ n ¼ 8–o2:0% when m ¼ n ¼ 2. (Note: the relative difference increases with decreasing m and n due to an attendant increase in the coefﬁcient of variation of the null distribution of T.) Results for the 95th percentile were intermediate to those of the other two. Percentiles of the limiting w2 distribution are also given in Table 1. Note that when N is relatively small, the null distribution of T is quite different from the limiting w2 distribution, and that this disparity increases (relative to the percentile’s magnitude) as mn increases. Also, for ﬁxed m and n, the null distribution function of T based on smaller N appears to be stochastically greater than that based on larger N. Consequently, comparing an observed T to one of the given percentiles of the limiting w2 distribution will result in an anti-conservative test. The discrepancy between the nominal and actual sizes of the test is large enough to be of concern, except perhaps when m ¼ n ¼ 2 and NX100 or when m ¼ n ¼ 3 and NX200. For example, when N ¼ 50, we estimated (from the same simulations used to produce Table 1) PH0 ðT4w2n;0:05 Þ to be 0.066, 0.125, 0.370, 0.921, and 1.000 as m ¼ n ¼ 2; 3; 4; 5; 6; when N ¼ 100, the corresponding estimates were

ARTICLE IN PRESS N. Lu, D.L. Zimmerman / Statistics & Probability Letters 73 (2005) 449–457

455

Table 1 Empirical 90th, 95th, and 99th percentiles of the null distribution of T for selected values of m, n, and N, based on 50,000 simulations for each combination of values N

90th

m¼n¼2 5 32.55 6 22.28 7 18.50 8 16.45 9 14.89 10 14.17 12 12.97 15 12.07 20 11.13 50 10.01 100 9.61 1 9.24 m¼n¼5 26 771.9 30 571.7 35 499.7 40 462.3 45 438.8 50 422.5 60 401.7 70 388.4 80 378.6 100 398.4 200 375.0 1 327.6 m ¼ 2, n ¼ 3 7 8 9 10 12 15 20 25 50 100 150 1

62.74 45.74 39.10 35.14 30.88 27.65 25.14 23.79 21.68 20.77 20.36 19.81

95th 39.66 26.85 22.28 19.74 17.93 16.97 15.54 14.46 13.31 12.00 11.47 11.07 811.7 590.1 515.0 476.3 452.1 435.4 413.3 399.9 390.0 377.4 355.3 337.1 72.91 52.38 44.42 39.76 34.98 31.28 28.38 26.92 24.54 23.37 22.97 22.36

99th 55.46 37.36 30.56 27.00 24.75 23.07 21.48 19.51 18.05 16.63 15.64 15.09 897.0 626.3 544.5 503.5 477.5 460.8 436.5 422.5 411.2 366.7 345.4 355.5 95.86 66.61 55.83 49.38 43.71 38.85 35.03 33.26 30.63 28.91 28.45 27.69

N

90th

m¼n¼3 10 125.52 12 86.13 14 74.16 16 67.74 18 63.94 20 61.32 25 56.89 30 54.12 50 50.18 100 47.24 200 46.09 1 44.90 m¼n¼6 37 1518.2 40 1236.4 45 1079.9 50 999.4 55 946.9 60 909.6 70 860.5 80 826.8 100 786.6 150 742.9 200 722.6 1 670.7 m ¼ 2, n ¼ 4 9 10 12 16 20 25 30 40 50 100 200 1

99.57 75.45 59.83 48.95 44.38 41.40 39.85 37.78 36.99 35.08 33.97 33.20

95th 139.81 93.97 80.64 73.62 69.26 66.37 61.65 58.66 54.38 51.17 49.82 48.60 1574.6 1266.5 1104.4 1020.6 966.4 928.3 877.9 843.5 803.0 758.1 737.0 684.3 112.73 83.88 65.90 53.89 48.88 45.46 43.66 41.47 40.59 38.47 37.20 36.42

99th 172.21 110.51 93.40 85.13 79.88 76.71 71.26 68.21 62.44 58.85 57.48 56.06 1699.6 1324.8 1148.4 1061.4 1004.6 964.5 911.0 875.4 834.7 786.4 764.3 710.2 142.07 101.66 78.67 63.40 58.18 53.49 51.81 49.18 47.98 45.19 43.79 42.98

N

90th

m¼n¼4 17 343.93 20 244.90 25 203.73 30 186.58 35 176.23 40 170.13 45 165.18 50 162.11 70 153.69 100 147.99 200 142.20 1 136.98 m¼n¼8 65 4519.4 70 3751.0 75 3449.5 80 3256.2 85 3117.0 90 3010.1 100 2855.2 120 2667.0 150 2512.1 200 2384.6 400 2223.5 1 2090.7 m ¼ 4, n ¼ 6 25 27 30 35 40 45 50 60 80 100 200 1

713.63 576.84 504.76 448.25 417.51 397.55 383.65 365.58 344.82 335.06 316.52 300.18

95th 368.60 257.52 213.51 195.26 184.31 178.03 172.70 169.26 160.80 154.76 148.53 143.25 4622.9 3803.1 3494.3 3296.2 3154.0 3044.8 2887.1 2697.4 2541.1 2411.0 2248.2 2114.4 751.22 597.87 521.66 462.36 430.18 409.65 395.90 376.75 355.48 345.25 325.85 309.33

99th 426.14 282.60 232.89 212.41 200.04 193.02 188.25 183.90 174.63 168.10 161.11 155.50 4844.8 3904.1 3575.7 3372.7 3223.8 3113.8 2948.8 2756.1 2595.3 2463.0 2295.6 2159.4 831.54 641.59 555.32 490.78 456.25 433.43 418.25 398.61 375.45 364.75 344.54 326.98

0.057, 0.080, 0.152, 0.404, and 0.884. Thus, apart from the few exceptions noted, we recommend that the empirical null distribution, rather than the limiting w2 distribution, be used to carry out the test.

ARTICLE IN PRESS N. Lu, D.L. Zimmerman / Statistics & Probability Letters 73 (2005) 449–457

456

4. Example Fisher’s classical iris data (Fisher, 1936) consist of measurements on four variables—sepal length, sepal width, petal length, and petal width—for each of 150 iris ﬂowers, 50 from each of three species. Thus, these variables arise by crossing factors that could be termed ‘‘plant part’’ (sepal or petal) and ‘‘physical dimension’’ (length or width), and it is of interest to test whether these factors are separable. We consider the Iris versicolor data only; results for the other two species are qualitatively similar. Let Y t ¼ ðY ijt Þ represent the measurement vector on plant t (t ¼ 1; . . . ; 50Þ where i ¼ 1; 2 correspond to sepal and petal, respectively, and j ¼ 1; 2 correspond to length and width. Assuming that the data are a random sample from a N 4 ðl; RÞ population with no restrictions on R except positive-deﬁniteness, the mles are 1 1 0 0 26:11 8:35 17:92 5:47 5:936 B B 2:770 C 9:65 8:10 4:04 C C C B b ¼ 102 B l^ ¼ B C. C and R B @ @ 4:260 A 21:64 7:16 A 3:83

1:326

The likelihood ratio test for separability yields T ¼ 27:73. From Table 1 we observe that the Pvalue is less than 0.01, and we conclude that the covariance matrix is not separable. A careful b and the associated correlation matrix examination of the diagonal elements of R 0 1 1:00 0:526 0:754 0:546 B 1:00 0:561 0:664 C B C R¼B C @ 1:00 0:787 A 1:00 suggests, in light of (2), that the rejection of separability is attributable primarily to (a) a ratio of sepal length variance to sepal width variance that is about half the ratio of petal length variance to petal width variance, and (b) a correlation between sepal length and sepal width that is about 30% smaller than the correlation between petal length and petal width.

5. Discussion In this article, we have developed the likelihood ratio test and attendant methodology for testing for ðm; nÞ separability. This methodology is applicable and relevant when replicate independent observations are available from a multivariate normal population with crossclassiﬁed variables. However, there are several important settings in which the variables are crossclassiﬁed but independent replications are not available. Examples include spatially correlated agricultural ﬁeld-plot data on a two-dimensional rectangular grid (with factors ‘‘row’’ and ‘‘column’’ of the grid), spatio-temporal data (with factors ‘‘site’’ and ‘‘time’’), and multivariate spatial data. A few methods exist for testing for separability in some of these settings (Modjeska and Rawlings, 1983; Shitan and Brockwell, 1995; Lu, 2002), but more and better procedures are needed.

ARTICLE IN PRESS N. Lu, D.L. Zimmerman / Statistics & Probability Letters 73 (2005) 449–457

457

Because separability imposes quite severe constraints on the covariance matrix, we do not expect it to hold for a great many data sets. Indeed, in the iris ﬂower example we considered here, separability was rejected. However, an example for which separability is not rejected can be found in Lu and Zimmerman (2004). Plausible alternatives to separability include covariance structures that satisfy some, but not all, of constraints (3)–(7). An investigation of the power of the likelihood ratio test for separability under various alternatives is presented by Lu and Zimmerman (2004). The methodology presented herein can be extended without difﬁculty to situations with variables cross-classiﬁed by three or more factors. In the three-factor case, for example, ðm; n; qÞseparability could be deﬁned as the property whereby R ¼ R1  R2  R3 for m  m, n  n, and q  q positive deﬁnite matrices R1 , R2 , and R3 respectively. In addition to testing for ðm; n; qÞseparability versus an unstructured R, it is straightforward to test for it against the intermediate cases of ðmn; qÞ-separability and ðm; nqÞ-separability.

References Anderson, T.W., 1984. An Introduction to Multivariate Statistical Analysis, second ed. Wiley, New York. Brown, P.J., Kenward, M.G., Bassett, E.E., 2001. Bayesian discrimination with longitudinal data. Biostatistics 2, 417–432. Dutilleul, P., 1999. The MLE algorithm for the matrix normal distribution. J. Statist. Comput. Simulation 64, 105–123. Fisher, R.A., 1936. The use of multiple measurements in taxonomic problems. J. Eugenics 7, 179–188. Harville, D.A., 1997. Matrix Algebra from a Statistician’s Perspective. Springer, New York. Lu, N., 2002. Tests on multiplicative covariance structures. Ph.D. Thesis, University of Iowa, Iowa City, Iowa, unpublished. Lu, N., Zimmerman, D.L., 2004. On likelihood-based inference for a separable covariance matrix. Technical Report No. 337, Department of Statistics and Actuarial Science, University of Iowa, Iowa City, Iowa. Mardia, K.V., Goodall, C., 1993. Spatial-temporal analysis of multivariate environmental monitoring data. In: Patil, G.P., Rao, C.R. (Eds.), Multivariate Environmental Statistics, vol. 6, North-Holland, New York, pp. 347–385. Modjeska, J.S., Rawlings, J.O., 1983. Spatial correlation analysis of uniformity trial data. Biometrics 39, 373–384. Naik, D.N., Rao, S.S., 2001. Analysis of multivariate repeated measures data with a Kronecker product structured covariance matrix. J. Appl. Statist. 28, 91–105. Rogers, G.S., Young, D.L., 1975. Some likelihood ratio tests when a normal covariance matrix has certain reducible linear structures. Comm. Statist. 4, 519–535. Shitan, M., Brockwell, P.J., 1995. An asymptotic test for separability of a spatial auto-regressive model. Comm. Statist.—Theory Methods 24, 2027–2040. Smith, W.B., Hocking, R.R., 1972. Wishart variate generator. Appl. Statist. 21, 341–345. Wolﬁnger, R.D., 1996. Heterogeneous variance–covariance structures for repeated measures. J. Agri. Biol. Environ. Statist. 1, 205–230.