- Email: [email protected]

Contents lists available at ScienceDirect

Journal of Statistical Planning and Inference journal homepage: www.elsevier.com/locate/jspi

Reduced-rank estimation of the difference between two covariance matrices James R. Schott Department of Statistics, University of Central Florida, Orlando, FL 32816-2370, USA

a r t i c l e i n f o

abstract

Article history: Received 29 September 2008 Received in revised form 9 October 2009 Accepted 14 October 2009 Available online 22 October 2009

We consider m m covariance matrices, S1 and S2 , which satisfy S2 S1 ¼ D, where D has a speciﬁed rank. Maximum likelihood estimators of S1 and S2 are obtained when sample covariance matrices having Wishart distributions are available and rankðDÞ is known. The likelihood ratio statistic for a test about the value of rankðDÞ is also given and some properties of its null distribution are obtained. The methods developed in this paper are illustrated through an example. & 2009 Elsevier B.V. All rights reserved.

Keywords: Comparison of covariance matrices Likelihood ratio test Maximum likelihood estimators

1. Introduction We consider the comparison of the covariance matrices S1 and S2 of an m 1 random vector x obtained from two related populations. If the covariance matrices are not the same, then the identiﬁcation of features that are in common for the groups, as well as those that are different, may lead to a better understanding of the two groups. One of the most popular ways of comparing covariance matrices involves the comparison of their spectral decompositions. Examples of analyses of this type can be found in, for instance, Krzanowski (1979), Flury (1984), Schott (1991) and Boik (2002). Applications of the common principal components procedures developed by Flury (1984, 1987, 1988) have been particularly widespread in biology, for instance, in the comparison of related species. However, Houle et al. (2002) argued that there may be many situations in which none of these common principal component models ﬁt, yet the covariance matrices are very similar. They illustrated this by using factor analysis structure for the covariance matrices; that is,

Si ¼ Fi Li Fi0 þ Ci ; for i ¼ 1; 2;

ð1Þ

where Fi is m p, prm, and Li and Ci are diagonal matrices. An alternative way of comparing S1 and S2 is through their difference D ¼ S2 S1 . Covariance matrices which are very similar may yield a difference matrix D which has rank much smaller than m. For covariance matrices satisfying (1), D ¼ F2 L2 F20 F1 L1 F10 , when C1 ¼ C2 . If portions of F1 and F2 are the same and portions of L1 and L2 are the same, then it may be possible to simplify D further. For instance, if Fi ¼ ðFi1 ; Fi2 Þ and Li ¼ diagðLi1 ; Li2 Þ, where Fij is m pj , Lij is 0 0 F12 L12 F12 when F11 ¼ F21 and L11 ¼ L21 . In this case, rankðDÞr2p2 so that pj pj and p1 þ p2 ¼ p, then D ¼ F22 L22 F22 small values of p2 will correspond to situations in which D is less than full rank. For example, when p2 ¼ 1 and F12 ¼ F22 but L12 aL22 , then rankðDÞ ¼ 1. It is not difﬁcult to construct examples like this for which rankðDÞ is small yet the spectral decompositions of S1 and S2 have nothing in common. E-mail address: [email protected] 0378-3758/$ - see front matter & 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.jspi.2009.10.005

ARTICLE IN PRESS J.R. Schott / Journal of Statistical Planning and Inference 140 (2010) 1038–1043

1039

The comparison of S1 and S2 through their difference S2 S1 is related to the comparison of S1 and S2 based on 1 S1 1 S2 . If rankðDÞ ¼ s, then m s of the eigenvalues of S1 S2 will be equal to 1. The eigenvectors corresponding to the s eigenvalues of S1 1 S2 that do not equal 1 may shed light on how S1 differs from S2 . Let f1 Z Zfm be the eigenvalues of 1 S1 S2 and let b1 ; . . . ; bm be corresponding eigenvectors scaled so that bi0 S1 bi ¼ 1 for i ¼ 1; . . . ; m. Deﬁne B1 to be the m ðm sÞ matrix whose columns are the bi ’s corresponding to the eigenvalues of 1, while the m s matrix B2 has as its columns the remaining eigenvectors. Then, with B ¼ ðB1 ; B2 Þ, using the transformation y ¼ ðy10 ; y20 Þ0 ¼ ðx0 B1 ; x0 B2 Þ0 ¼ B0 x in both populations yields the covariance matrices S1 ¼ B0 S1 B ¼ Im and S2 ¼ B0 S2 B ¼ diagðIms ; F Þ, where F is a diagonal matrix containing the non-unity eigenvalues of S1 1 S2 . Thus, information about the difference in covariance matrices for the two populations arises only from the variables in y2 ¼ B20 x. That is, the description of differences is obtained by examining the coefﬁcients in B2 rather than those in D. Studying the difference in two populations via the eigenvectors of S1 1 S2 is not a new concept as this was ﬁrst proposed by Flury (1983). In particular, he motivated the choice of b1 ; . . . ; bm in b0 x as they yield the extremal values of b0 S2 b=b0 S1 b. In a subsequent paper, Flury (1985), a method was developed for testing that some of the variables have zero coefﬁcients in several of the bi ’s simultaneously. While this second approach of comparing covariance matrices via S1 1 S2 appears to hold as much promise as the approach based on common principal component models, there is very little evidence in the literature to date of applications using it. Rao (1983) obtained the likelihood ratio tests for several models that identify the relationship between two covariance matrices. One of the models he considered is the reduced-rank model that is the subject of this paper. He also discusses applications of these models to problems of inference on familial correlations. In this paper, we add to the work that Flury (1983, 1985) and Rao (1983) have done in this area. First, we consider the estimation of S1 and S2 when rankðDÞ is known. In particular, we ﬁnd the maximum likelihood estimators for the situation in which the sample covariance matrices have independent Wishart distributions. In addition, a procedure for estimating the rank of D is developed. We conclude with an example to illustrate our methods.

2. Estimation of R1 and R2 We assume that we have independent random samples from the two m-dimensional multivariate normal populations, with a sample size of ni þ 1 from the i th population, from which we compute the sample covariance matrices S1 and S2 . Thus, A1 ¼ n1 S1 Wm ðS1 ; n1 Þ independently of A2 ¼ n2 S2 Wm ðS1 þ D; n2 Þ. We will derive the maximum likelihood estimators for S1 and S2 ¼ S1 þ D when rankðDÞ ¼ s from the distributions of A1 and A2 . Rao (1983) gave estimating equations for these maximum likelihood estimators, but he did obtain explicit solutions. Now the log likelihood function of S1 and D, after dropping an additive constant term, is 1 n1 1 n2 L1 ðS1 ; DÞ ¼ trðS1 logjS1 j trfðS1 þ DÞ1 A2 g logjS1 þ Dj: 1 A1 Þ 2 2 2 2 ^ Þ which maximizes this equation or, equivalently, minimizes ^ 1; D We need the solution ðS 1 2L1 ðS1 ; DÞ ¼ trðS1 A2 g þ n2 logjS1 þ Dj; 1 A1 Þ þ n1 logjS1 j þ trfðS1 þ DÞ

subject to S1 2 P m , S1 þ D 2 P m and D 2

Ss

j¼0

S j , where P i is the set of all m m nonnegative deﬁnite matrices of rank i and

S j is the set of all m m symmetric matrices of rank j. Although we are constraining D so that rankðDÞrs instead of rankðDÞ ¼ s, the maximum likelihood estimate of D that we obtain will have rank s with probability 1. Let C be an m m nonsingular matrix satisfying CS1 C 0 ¼ Im and CS2 C 0 ¼ D ¼ diagðd1 ; . . . ; dm Þ, so that d1 Z Zdm 40 are the eigenvalues of ~ ¼ C DC 0 , our minimization problem can be reexpressed as that of minimizing ~ 1 ¼ C S1 C 0 and D S1 S2 . Then, with S 1

1

cðS~ 1 ; S~ 1 þ D~ ; D; n1 ; n2 Þ ¼ n1 ftrðS~ 1 Þ þ logjS~ 1 jg þ n2 ½trfðS~ 1 þ D~ Þ1 Dg þ logjS~ 1 þ D~ j; Ss

ð2Þ

~ Þ 2 Bs , where Bs ¼ fðA; BÞ : A 2 P m ; B 2 P m ; B A 2 ~1þD ~ 1; S subject to ðS j¼0 S j g. We ﬁrst state a result which will simplify the process of ﬁnding the required minimal solution. This result can be proven by making some minor adjustments to the proof of an analogous result given by Schott and Saw (1984) for the S minimization of c over Cs ¼ fðA; BÞ : A 2 P m ; B 2 P m ; B A 2 sj¼0 P j g. Theorem 1. The absolute minimum of cðA; B; D; n1 ; n2 Þ subject to ðA; BÞ 2 Bs occurs when both A and B are diagonal. When A ¼ diagða1 ; . . . ; am Þ and B ¼ diagðb1 ; . . . ; bm Þ, cðA; B; D; n1 ; n2 Þ simpliﬁes to

cðA; B; D; n1 ; n2 Þ ¼ n1

m m X X ð1=ai þ log ai Þ þ n2 ðdi =bi þ log bi Þ: i¼1

i¼1

The minimal solution for ðA; BÞ in this case is given next.

ARTICLE IN PRESS 1040

J.R. Schott / Journal of Statistical Planning and Inference 140 (2010) 1038–1043

Theorem 2. Let N s be the set of m s integers, i, yielding the smallest values of hðdi Þ ¼

ðn1 þ n2 di Þn1 þn2 : dni 2

If A ¼ diagða1 ; . . . ; am Þ and B ¼ diagðb1 ; . . . ; bm Þ, then cðA; B; D; n1 ; n2 Þ is minimized subject to ðA; BÞ 2 Bs when A ¼ Ys ¼ 2N s. diagðy1s ; . . . ; yms Þ and B ¼ Zs ¼ diagðz1s ; . . . ; zms Þ, where yis ¼ zis ¼ ðn1 þ n2 di Þ=ðn1 þ n2 Þ for i 2 N s and yis ¼ 1; zi ¼ di , for i= For di 40, the function hðdi Þ is convex and is minimized at di ¼ 1, so the values in N s correspond to the di among d1 ; . . . ; dm that are ‘‘closest’’ to 1, where here by closest we mean having the smallest hðdi Þ. When n1 ¼ n2 things are particularly simple in that the reciprocal of the di ’s less than 1 can be compared directly to the di ’s greater than 1. That is, if and dj . di o1 and dj 41, the one closer to 1 will be determined by the smaller value of d1 i ~ ¼ Zs . The ~ 1 ¼ Ys and S ~1þD An immediate consequence of Theorem 2 is that the minimal solution to (2) is given by S ^ 1 ¼ C 1 Ys C 10 and S ^ 2 ¼ C 1 Zs C 10 . maximum likelihood estimators of S1 and S2 are then given by S 3. A test for determining the rank of D Typically the rank of D will not be known and so we will need to estimate it. One way of estimating the rank is through a sequence of tests of hypotheses of the form H0s : rankðDÞ ¼ s;

Has : rankðDÞ4s:

ð3Þ H0s

is not rejected for To estimate the rank of D, one would test (3) ﬁrst with s ¼ 0, then s ¼ 1, and continue until either some s or it is rejected for s ¼ m 1. We will ﬁnd the likelihood ratio test of (3). The likelihood function of S1 and S2 based on A1 and A2 , after dropping a multiplicative constant, is n2 =2 expf12trðS1 L2 ðS1 ; S2 Þ ¼ jS1 jn1 =2 expf12 trðS1 1 A1 ÞgjS2 j 2 A2 Þg:

Consequently, the likelihood ratio test is based on the statistic ( ðn1 þn2 Þ=2 ) 0 0 Y L ðC 1 Y C 1 ; C 1 Z C 1 Þ n =2 n1 þ n2 di ls ðA1 ; A2 Þ ¼ 2 1 s 10 1 s 10 ¼ di 2 n1 þ n2 L2 ðC Ym C ; C Zm C Þ i2N s

¼

ðn1 þ n2 ÞðmsÞðn1 þn2 Þ=2 Y ðmsÞn1 =2 ðmsÞn2 =2 n2

n1

n =2 fdi2 ð1

þ di Þðn1 þn2 Þ=2 g;

i2N s

where di ¼ n2 di =n1 ; i ¼ 1; . . . ; m, are the eigenvalues of A1 1 A2 . Under the general theory of likelihood ratio statistics, it follows that 2log ls ðA1 ; A2 Þ is distributed asymptotically as a chi-squared random variable if H0s holds. Since an m m symmetric matrix of rank s requires ms sðs 1Þ=2 parameters, this chi-squared distribution will have m2 mðm 1Þ=2 fms sðs 1Þ=2g ¼ ðm sÞðm s þ 1Þ=2 degrees of freedom. This result is not new as the likelihood ratio test of H0s was given by Rao (1983). When s ¼ 0, we get

l0 ðA1 ; A2 Þ ¼

m ðn1 þ n2 Þmðn1 þn2 Þ=2 Y ðn1 þ n2 Þmðn1 þn2 Þ=2 jA1 jn1 =2 jA2 jn2 =2 n =2 fdi2 ð1 þ di Þðn1 þn2 Þ=2 g ¼ : mn1 =2 mn2 =2 mn =2 mn =2 jA1 þ A2 jðn1 þn2 Þ=2 n1 n2 n1 1 n2 2 i¼1

The statistic l0 ðA1 ; A2 Þ is the familiar modiﬁed likelihood ratio statistic for testing the equality of S1 and S2 (see, for example, Muirhead, 1982, Section 8.2). It is well known (see, for example, Tyler, 1983) that many normal-theory likelihood ratio tests can be adjusted by a simple scalar multiple so as to retain their asymptotic null distribution over the class of elliptical distributions. Conditions under which such an adjustment is possible have been given by Shapiro and Browne (1987). In particular, their condition of positive homogeneity is not satisﬁed in the current problem since, in general, rank ðS1 S2 Þarankða1 S1 a2 S2 Þ, where a1 and a2 are arbitrary constants. That is, such an adjustment is not possible in our setting. More details on this issue for the special case involving the test of the equality of covariance matrices can be found in Schott (2000). s The exact distribution of ls ðA1 ; A2 Þ depends on f1 Z Zfm , the eigenvalues of S1 1 S2 . Under H0, f1 Z Zfs1 Z1, fs1 þ1 ¼ ¼ fms2 ¼ 1, and 1Zfms2 þ1 Z Zfm , where s1 and s2 are nonnegative integers satisfying s1 þ s2 ¼ s. As an alternative reference distribution for ls ðA1 ; A2 Þ, we consider its asymptotic distribution, as fi -1 for i ¼ 1; . . . ; s1 and fmjþ1 -0 for j ¼ 1; . . . ; s2, when n1 and n2 ﬁxed. This is given in the following theorem. Theorem 3. Let V1 Wms ðIms ; n1 s2 Þ and V2 Wms ðIms ; n2 s1 Þ, independently, and deﬁne

ls1 ;s2 ðV1 ; V2 Þ ¼

ðn1 þ n2 ÞðmsÞðn1 þn2 Þ=2 jV1 jn1 =2 jV2 jn2 =2 : ðmsÞn1 =2 ðmsÞn2 =2 jV1 þ V2 jðn1 þn2 Þ=2 n1 n2

If fs1 þ1 ¼ ¼ fms2 ¼ 1, then ls ðA1 ; A2 Þ converges in distribution to ls1 ;s2 ðV1 ; V2 Þ as fi -1 for i ¼ 1; . . . ; s1 and fmjþ1 -0 for j ¼ 1; . . . ; s2 .

ARTICLE IN PRESS J.R. Schott / Journal of Statistical Planning and Inference 140 (2010) 1038–1043

1041

If ca is the constant satisfying Pðls1 ;s2 ðV1 ; V2 Þoca Þ ¼ a;

ð4Þ

then Theorem 3 suggests that a test that rejects H0s when ls ðA1 ; A2 Þoca would have type I error approximately equal to a as long as fs1 and fms2 þ1 are not too close to 1. We conjecture that the distribution of ls1 ;s2 ðV1 ; V2 Þ is a lower bound for the distribution of ls ðA1 ; A2 Þ so that the test with the rejection region described above would be conservative. Note that H0s only 2N s for s1 and speciﬁes s, not s1 and s2 . In employing (4), a reasonable approach would be to use the number of di 41 when i= 2N s for s2. the number of di o1 when i= Some exact percentage points for the modiﬁed likelihood ratio statistic l0 ðV1 ; V2 Þ have been tabulated by Korin (1968) and Gupta and Tang (1984) when a ¼ 0:05 and n1 ¼ n2 . These can be used to ﬁnd corresponding percentage points for ls1 ;s2 ðV1 ; V2 Þ when n1 ¼ n2 and s1 ¼ s2 . In this case, fls1 ;s2 ðV1 ; V2 Þgni si =ni has the same distribution as l0 ðV1 ; V2 Þ. Some simulation results were obtained so as to estimate the actual type I error probabilities realized when using the asymptotic distribution given in Theorem 3. We used n1 ¼ n2 ¼ 7 and n1 ¼ n2 ¼ 17, m ¼ 6, and each estimate was based on 15,000 simulations. We considered the test of H02 when s1 ¼ s2 ¼ 1 and the test of H04 when s1 ¼ s2 ¼ 2. In each case we had fi ¼ f1 miþ1 ¼ d for i ¼ 1; . . . ; s1, while d varied over the different cases. For comparison purposes, we also estimated the type I error probabilities when using the chi-squared approximation. The simulation results are given in Table 1. These results support the conjecture that the approximation of Theorem 3 is conservative and that the signiﬁcance level generally increases as d increases. The degree of conservatism appears to depend on the sample size as the estimated signiﬁcance levels when n1 ¼ n2 ¼ 7 are substantially lower than those when n1 ¼ n2 ¼ 17 for small to moderate values of d. The chisquared approximation yields very inﬂated signiﬁcance levels except in some cases in which d is very small. We also estimated type I error probabilities associated with the chi-squared approximation for larger sample sizes and some of these are given in Table 2. For the sample sizes of 50 and 100, we see that in most cases the type I error probabilities are slightly inﬂated. The percentage points tabulated for l0 ðV1 ; V2 Þ by Korin (1968) and Gupta and Tang (1984) are only for mr6 and n1 ¼ n2 . However, Booth et al. (1995) have shown that saddlepoint approximations can be used to calculate extremely accurate tail probabilities for l0 ðA1 ; A2 Þ for any values of m, n1 and n2 . Due to the similarity of ls1 ;s2 ðV1 ; V2 Þ and l0 ðV1 ; V2 Þ, it should be straightforward to modify their methods for ls1 ;s2 ðV1 ; V2 Þ.

Table 1 Estimated signiﬁcance levels for the likelihood ratio test of H0s1 þs2 when a ¼ 0:05.

d

n1 ¼ n2 ¼ 7

n1 ¼ n2 ¼ 17

s1 ¼ s2 ¼ 1

1 2 4 8 16 32 64 128 256 512 1024

s1 ¼ s2 ¼ 2

s1 ¼ s2 ¼ 1

s1 ¼ s2 ¼ 2

l0

w2

l0

w2

l0

w2

l0

w2

0.000 0.000 0.001 0.003 0.008 0.020 0.031 0.037 0.041 0.048 0.048

0.070 0.092 0.147 0.243 0.334 0.405 0.437 0.453 0.459 0.469 0.464

0.000 0.000 0.000 0.001 0.008 0.018 0.032 0.040 0.047 0.048 0.049

0.001 0.002 0.013 0.049 0.113 0.168 0.199 0.209 0.216 0.217 0.221

0.001 0.001 0.010 0.028 0.042 0.049 0.047 0.051 0.046 0.051 0.050

0.003 0.010 0.042 0.098 0.125 0.133 0.135 0.136 0.132 0.134 0.136

0.000 0.000 0.004 0.023 0.041 0.047 0.051 0.054 0.047 0.048 0.048

0.000 0.001 0.001 0.056 0.080 0.088 0.092 0.099 0.091 0.093 0.090

Table 2 Estimated signiﬁcance levels for the likelihood ratio test of H0s1 þs2 for larger sample sizes.

d

1 2 4 8 32 128 512

s1 ¼ s2 ¼ 1

s1 ¼ s2 ¼ 2

n1 ¼ n2 ¼ 50

n1 ¼ n2 ¼ 100

n1 ¼ n2 ¼ 50

n1 ¼ n2 ¼ 100

0.001 0.011 0.056 0.070 0.072 0.071 0.073

0.000 0.027 0.055 0.063 0.058 0.059 0.059

0.000 0.003 0.043 0.056 0.059 0.063 0.060

0.000 0.014 0.050 0.055 0.052 0.053 0.054

ARTICLE IN PRESS 1042

J.R. Schott / Journal of Statistical Planning and Inference 140 (2010) 1038–1043

4. An example To illustrate the methods of this paper, we will use the iris data analyzed by Fisher (1936). The data set consisted of measurements on 50 plants from each of the three species of iris, versicolor, virginica and setosa. The four variables measured were sepal length, sepal width, petal length and petal width. We will focus here only on the two species versicolor and virginica which we will identify as populations 1 and 2, respectively. The sample covariance matrices based on 49 degrees of freedom each are 1 1 0 0 26:643 8:518 18:290 5:578 40:434 9:376 30:329 4:909 B 8:518 9:847 8:265 4:120 C B 9:376 10:400 7:138 4:763 C C C B B S1 ¼ B C; S2 ¼ B C: @ 18:290 8:265 22:082 7:310 A @ 30:329 7:138 30:459 4:882 A 5:578

4:120

7:310

3:911

4:909

4:763

4:882

7:543

A test of the hypothesis that S1 and S2 have common principal components is rejected at the 0.05 signiﬁcance level (Flury, 1984) so the comparison of spectral decompositions does not reveal any similarities between S1 and S2 . We will use the procedure of Section 3 to determine the rank of S2 S1 . The eigenvalues of S1 1 S2 are 5.653, 1.531, 1.130 and 0.717. From these we ﬁnd that 2log l0 ðA1 ; A2 Þ ¼ 36:65. Since the sample sizes are fairly large, we refer to the large sample asymptotic distribution of 2log l0 ðA1 ; A2 Þ, which is chi-squared with 10 degrees of freedom, and ﬁnd that the hypothesis that rankðDÞ ¼ 0 is rejected at any reasonable signiﬁcance level; that is, clearly S1 and S2 are not the same. For the test that rankðDÞ ¼ 1, we compute 2log l1 ðA1 ; A2 Þ ¼ 3:74 which is compared to the chi-squared distribution with 6 degrees of freedom. This yields an observed signiﬁcance level of 0.71 so that it is reasonable to say that S2 S1 only has rank of one. Since N 1 ¼ f2; 3; 4g, we can conclude that f1 4f2 ¼ f3 ¼ f4 ¼ 1. The maximum likelihood estimators of S1 and S2 under this rank constraint are given by 1 1 0 0 31:755 9:351 22:434 6:670 35:323 8:544 26:185 3:517 B 9:351 10:033 8:126 4:051 C B 8:544 10:215 7:278 4:832 C C C ^2 ¼B S^ 1 ¼ B C; S C: B B @ 22:434 8:126 24:298 7:911 A @ 26:185 7:278 28:243 4:281 A 6:970 The matrix 0

0:044 B 0:158 B C¼B @ 0:112 0:236

4:051

7:911

4:057

0:113

0:331

0:892

0:249

0:034

0:327

0:125

0:124

0:268

3:517

4:832

4:281

7:397

1

0:258 C C C 0:097 A 0:018

is an estimator of the matrix B which transforms the observations within each population to canonical form. The components of y ¼ Bx are uncorrelated within each population, the ﬁrst component has larger variance in the second population and the remaining components have constant variance across the two populations. The analysis of all three iris species would require a generalization of the methods developed in this paper to the situation in which k42 populations are involved. Several extensions are possible. For instance, for each ioj, we could only require that each Dij ¼ Si Sj , has rank of s. A more restricted generalization would impose additional assumptions on the Dij ’s; one reasonable restriction would be that Dij Dhl also has rank of s for all such differences. Generalizations of this type will be considered in a future paper.

Acknowledgment I am grateful to a referee for some helpful comments.

Appendix

Proof of Theorem 2. Let gða; b; dÞ ¼ n1 ðð1=aÞ þ log aÞ þ n2 ððd=bÞ þ log bÞ so that cðA; B; D; n1 ; n2 Þ ¼ gðai ; bi ; di Þ is minimized at ai ¼ 1

and

bi ¼ di ;

Pm

i¼1

gðai ; bi ; di Þ. Now ð5Þ

while the minimum subject to ai ¼ bi is attained at ai ¼ b i ¼

n1 þ n2 di : n1 þ n2

ð6Þ

ARTICLE IN PRESS J.R. Schott / Journal of Statistical Planning and Inference 140 (2010) 1038–1043

1043

Thus, the minimum of cðA; B; D; n1 ; n2 Þ subject to ðA; BÞ 2 Bs will have (5) holding for s values of i while (6) will hold for the remaining m s values of i. In particular, we need (6) to hold for the values of i that yield the smallest values for n1 þ n2 di n1 þ n2 di n1 þ n2 di n2 log di ; ; di gð1; di ; di Þ ¼ ðn1 þ n2 Þlog f ðdi Þ ¼ g n1 þ n2 n1 þ n2 n1 þ n2 ! ðn1 þ n2 di Þn1 þn2 ðn1 þ n2 Þlogðn1 þ n2 Þ; ¼ log dni 2 from which the result immediately follows.

&

We will need the following result the proof of which can be found in Section 4 of Schott and Saw (1984). Lemma 1. Suppose that A1 Wm ðIm ; n1 Þ independently of A2 Wm ðF; n2 Þ, where F ¼ diagðf1 ; . . . ; fm Þ, with f1 Z Zfm . Let 1 d1 Z Zdm be the eigenvalues of A1 1 A2 and let e1 Z Zems be the eigenvalues of U1 U2 , where U1 Wms ðIms ; n1 Þ independently of U2 Wms ðF2 ; n2 sÞ and F2 ¼ diagðfsþ1 ; . . . ; fm Þ. Then as fi -1 for i ¼ 1; . . . ; s, the eigenvalues d1 ; . . . ; ds1 approach inﬁnity with probability one and the eigenvalues dsþ1 ; . . . ; dm converge in distribution to e1 ; . . . ; ems . 0 Proof of Theorem 3. Since the eigenvalues of A1 1 A2 are invariant under transformations of the form TAi T ,where T is an m m nonsingular matrix, we may assume without loss of generality that S1 ¼ Im and S2 ¼ F ¼ diagðf1 ; . . . ; fm Þ. It follows from Lemma 1 that, as fi -1 for i ¼ 1; . . . ; s1, ds1 þ1 ; . . . ; dm converge in distribution to e1 ; . . . ; ems1 , the where U1 Wms1 ðIms1 ; n1 Þ independently of U2 Wms1 ðF2 ; n2 s1 Þ and eigenvalues of U11 U2 , 1 F2 ¼ diagð1; . . . ; 1; fms2 þ1 ; . . . ; fm Þ. The distribution of e1 ; . . . ; ems1 is identical to the distribution of fms ; . . . ; f11 , where 1 1 U1 , U1 Wms1 ðF1 f1 ; . . . ; fms1 are the eigenvalues of U2 2 ; n1 Þ independently of U2 Wms1 ðIms1 ; n2 s1 Þ. Now applying Lemma 1 again, we ﬁnd that as fmjþ1 -0 for j ¼ 1; . . . ; s2, fs2 þ1 ; . . . ; fms1 converge in distribution to l1 ; . . . ; lms , the

eigenvalues of V21 V1 , where V1 and V2 are given in the theorem. Thus, we have shown that as fi -1 for i ¼ 1; . . . ; s1 and 1 fmjþ1 -0 for j ¼ 1; . . . ; s2, ds1 þ1 ; . . . ; dms2 converge in distribution to l1 ms ; . . . ; l1 , the reciprocals of the eigenvalues of

V21 V1 , or equivalently to the eigenvalues l1 ; . . . ; lms of V11 V2 . The result now follows since m s Y

n =2

fli2 ð1 þ li Þðn1 þn2 Þ=2 g ¼

i¼1

jV1 jn1 =2 jV2 jn2 =2 : jV1 þ V2 jðn1 þn2 Þ=2

&

References Boik, R.J., 2002. Spectral models for covariance matrices. Biometrika 89, 159–182. Booth, J.G., Butler, R.W., Huzurbazar, S., Wood, A.T.A., 1995. Saddlepoint approximations for p-values of some tests of covariance matrices. Journal of Statistical Computation and Simulation 53, 165–180. Fisher, R.A., 1936. The use of multiple measurements in taxonomic problems. Annals of Eugenics 7, 179–188. Flury, B., 1983. Some relations between the comparison of covariance matrices and principal component analysis. Computational Statistics & Data Analysis 1, 97–109. Flury, B.N., 1984. Common principal components in k groups. Journal of the American Statistical Association 79, 892–898. Flury, B.N., 1985. Analysis of linear combinations with extreme ratios of variance. Journal of the American Statistical Association 80, 915–922. Flury, B.K., 1987. Two generalizations of the common principal component model. Biometrika 74, 59–69. Flury, B., 1988. Common Principal Components and Related Multivariate Models. Wiley, New York. Gupta, A.K., Tang, J., 1984. Distribution of likelihood ratio statistic for testing equality of covariance matrices of multivariate Gaussian models. Biometrika 71, 555–559. Houle, D., Mezey, J., Galpern, P., 2002. Interpretation of the results of common principal components analyses. Evolution 56, 433–440. Korin, B.P., 1968. On the distribution of a statistic used for testing a covariance matrix. Biometrika 55, 171–178. Krzanowski, W.J., 1979. Between-group comparison of principal components. Journal of the American Statistical Association 74, 703–707. Muirhead, R.J., 1982. Aspects of Multivariate Statistical Theory. Wiley, New York. Rao, C.R., 1983. Likelihood ratio tests for relationships between two covariance matrices. In: Karlin, S., Amemiya, T., Goodman, L.A. (Eds.), Studies in Econometrics, Time Series, and Multivariate Statistics. Academic Press, New York, pp. 529–543. Schott, J.R., 1991. Some tests for common principal component subspaces in several groups. Biometrika 78, 771–777. Schott, J.R., 2000. Some tests for the equality of covariance matrices. Journal of Statistical Planning and Inference 94, 25–36. Schott, J.R., Saw, J.G., 1984. A multivariate one-way classiﬁcation model with random effects. Journal of Multivariate Analysis 15, 1–12. Shapiro, A., Browne, M.W., 1987. Analysis of covariance structures under elliptical distributions. Journal of the American Statistical Association 82, 1092–1097. Tyler, D.E., 1983. Robustness and efﬁciency of scatter matrices. Biometrika 79, 411–420.