Maximum likelihood estimators for extended growth curve model with orthogonal between-individual design matrices

Maximum likelihood estimators for extended growth curve model with orthogonal between-individual design matrices

Statistical Methodology 23 (2015) 59–72 Contents lists available at ScienceDirect Statistical Methodology journal homepage: www.elsevier.com/locate/...

417KB Sizes 0 Downloads 13 Views

Statistical Methodology 23 (2015) 59–72

Contents lists available at ScienceDirect

Statistical Methodology journal homepage: www.elsevier.com/locate/stamet

Maximum likelihood estimators for extended growth curve model with orthogonal between-individual design matrices Daniel Klein ∗ , Ivan Žežula Institute of Mathematics, P. J. Šafárik University, Jesenná 5, SK-04154 Košice, Slovakia

article

info

Article history: Received 10 February 2014 Received in revised form 12 August 2014 Accepted 25 September 2014 Available online 8 October 2014 MSC: 62H12 Keywords: Extended growth curve model Maximum likelihood estimators Orthogonality Moments

abstract The extended growth curve model is discussed in this paper. There are two versions of the model studied in the literature, which differ in the way how the column spaces of the design matrices are nested. The nesting is applied either to the betweenindividual or to the within-individual design matrices. Although both versions are equivalent via reparametrization, the properties of estimators cannot be transferred directly because of nonlinearity of estimators. Since in many applications the betweenindividual matrices are one-way ANOVA matrices, it is reasonable to assume orthogonality of the column spaces of betweenindividual design matrices along with nestedness of the column spaces of within-individual design matrices. We present the maximum likelihood estimators and their basic moments for the model with such orthogonality condition. © 2014 Elsevier B.V. All rights reserved.

1. Introduction In this paper, R(A) denotes the column space of a matrix A, r (A) its rank, and Tr(A) its trace. PA = A(A′ A)− A′ and QA = I − PA denote the orthogonal projection matrices onto the column space R(A) and onto its orthogonal complement, respectively. PBA = A(A′ BA)− A′ B and QBA = I − PBA denote corresponding projection matrices in the (semi)metric given by a positive (semi)definite matrix B. If A is a random matrix then the variance–covariance matrix Var[A] is defined as Var[vec A], where vec



Corresponding author. Tel.: +421 552346160. E-mail addresses: [email protected] (D. Klein), [email protected] (I. Žežula).

http://dx.doi.org/10.1016/j.stamet.2014.09.005 1572-3127/© 2014 Elsevier B.V. All rights reserved.

60

D. Klein, I. Žežula / Statistical Methodology 23 (2015) 59–72

j

is the column-wise vectorization operator. Symbol ⊗ denotes the Kronecker product. Symbol l=i Al for j < i denotes decreasing-indices product Ai Ai−1 . . . Aj+1 Aj of a sequence of matrices. This paper deals with the model known as the extended growth curve model (EGCM) with fixed effects (also called the sum-of-profiles model): Y=

k 

Xi Bi Z′i + E ,

vec E ∼ Nn×p (0, 6 ⊗ In )

(1)

i=1

where Y ∈ Rn×p is a matrix of independent p-variate observations, Xi ∈ Rn×ri and Zi ∈ Rp×qi , i = 1, . . . , k, are the between-individual and the within-individual design matrices, respectively. Bi ∈ Rqi ×ri , i = 1, . . . , k, are matrices of unknown parameters. The matrix E is a matrix of random errors whose rows are independently normally distributed with unknown common covariance matrix 6 > 0. This model was introduced by Verbyla and Venables [6] as a generalized version of the growth curve model (GCM), which has only one profile, i.e. Y = XBZ′ + E . They gave several examples for how this model may arise. Since the maximum likelihood equations cannot be solved explicitly in this model, they also discussed an iterative algorithm for obtaining the maximum likelihood estimators (MLEs). Von Rosen [7] derived the MLEs of the unknown parameters in partially explicit form under the additional nested-subspace condition of between-individual matrices

R (Xk ) ⊆ · · · ⊆ R (X1 ) ,

(2)

while nothing is said about different within-individual matrices Zi ’s. Many results have been presented assuming this condition, such as uniqueness conditions for MLEs or moments of estimators (see e.g. Kollo and von Rosen [3]). However, there are situations where the column spaces of betweenindividual design matrices should be disjoint or at least the intersection should be as small as possible. Filipiak and von Rosen [1] discussed the model with the nested-subspace condition of withinindividual design matrices

R (Zk ) ⊆ · · · ⊆ R (Z1 ) .

(3)

The conditions (2) and (3) lead to different parametrizations of the model (1), however, Filipiak and von Rosen [1] showed that via reparametrization one can derive model with condition (3) from model with condition (2) or vice versa, i.e. the two models are equivalent. Let us refer to the model with condition (2) as Model I, and with condition (3) as Model II. Because of non-linearity of estimators the properties of estimators from one model cannot be transmitted directly to the other one. In Model II Filipiak and von Rosen [1] gave also the MLEs of unknown parameters for the three component model and they discussed the uniqueness conditions and the moments for MLEs. Hu [2] came up with a modification of Model I, assuming that the column spaces of betweenindividual design matrices are orthogonal, i.e. X′i Xj = 0 ∀ i ̸= j,

(4)

while no ordering among R(Zi )’s is assumed. His idea is to separate groups rather than models. The idea is illustrated in Example 1. This example also demonstrates that it is very natural to assume the nested-subspace condition of within-individual design matrices Zi ’s in the case of orthogonal between-individual design matrices. Example 1. This is Example 4.1.2, p. 374, given in Kollo and von Rosen [3]. Let there be 3 treatment groups of animals, with nj animals in the jth group, and each group is subject to different treatment conditions. The aim is to investigate the weight increase of the animals in all groups. All animals have been measured at the same p time points (say tr , r = 1, . . . , p). The expected growth curve for each treatment group is supposed to be a polynomial in time, and the groups differ by the order of the polynomial. Thus, the means of the three treatment groups at time t are

µ1 = β11 + β21 t + · · · + β(q−2)1 t q−3 , µ2 = β12 + β22 t + · · · + β(q−2)2 t q−3 + β(q−1)2 t q−2 , µ3 = β13 + β23 t + · · · + β(q−2)3 t

q −3

+ β(q−1)3 t

q −2

+ βq3 t

(5) q −1

.

D. Klein, I. Žežula / Statistical Methodology 23 (2015) 59–72

61

In order to describe these different responses, we can use the model Y = X1 B1 Z′1 + X2 B2 Z′2 + X3 B3 Z′3 + E ,

(6)

where Yn×p is the observation matrix, E n×p is the matrix of random errors and the remaining matrices are defined as below:

 X1 =

1n1 0n2 0n3

 β11 B1 = β12 β13 

0n1 1n2 0n3

0n1 0n2 1n3

β21 β22 β23

··· ··· ···

 ..

.. .

··· ··· .. .

1

tp

···

1 1 

Z1 =  .

t1 t2



0n1 0n2 1n3

0n1 1n2 0n3

 ,

X2 =

 β(q−2)1 β(q−2)2  , β(q−2)3 q −3 

t1 q −3 t2   tpq−3

..  , . 



 ,

X3 =

 β(q−1)2 , B2 = β(q−1)3 

q −2





 ,

B3 = βq3 ,





t1  t q −2  2 

0n1 0n2 1n3

q−1





t1 t q−1  2 

Z2 =  .  , .

Z3 =  .  . .

 . 

 . 

tpq−2

tpq−1

This is clearly Model I, since R (X3 ) ⊆ R (X2 ) ⊆ R (X1 ). Notice that it is not possible to model the mean structure with only one component, i.e. a single-profile GCM. We say that Model I separates models, since every component contains polynomial growth of different order. The same mean structure (5) could be modeled by model (6) with the following matrices:

 X1 =

1n1 0n2 0n3

B1 = β11

  B2 = β12  B3 = β13 



 ,

X2 =

β21 β22 β23

··· ··· ···

.. .

t1 q −3 t2  

tp

···

tpq−3

t1 t2

 ..

1



1

1  Z3 =  .  .. 1



 ,

X3 =

0n1 0n2 1n3



 β(q−2)1 ,  β(q−2)2 β(q−1)2 ,  β(q−2)3 β(q−1)3 βq3 ,  q −3 

··· ··· .. .

1 1 

Z1 =  .

0n1 1n2 0n3

..  , . 

q −3



.. .

t1 q −3 t2

 ..

··· ··· .. .

t1 q −2 t2  

1

tp

···

tpq−3

tpq−2

.. .

..  , . 

q −1 

.. .

··· ··· .. .

t1 q −3 t2

t1 q −2 t2

t1 q −1 t2  

tp

···

tpq−3

tpq−2

tpq−1

.. .

q −2

t1 t2

t1 t2

.. .

q −3

1 1 

Z2 =  .

q −2

,

..  . . 

Here we have a model similar to Model II but with opposite order of spaces, since R (Z1 ) ⊆ R (Z2 ) ⊆ R (Z3 ). We say that this model separates groups (or treatments), since each component models growth of only one group of observations. Moreover, the column spaces of between-individual matrices in this model are orthogonal, i.e. X′i Xj = 0 for i ̸= j. We will refer to the model with assumption of nestedness of the column spaces of withinindividual design matrices and with an additional assumption of the orthogonality of betweenindividual design matrices as Model III: Y=

k 

Xi Bi Z′i + E ,

i =1

X′i Xj = 0 ∀ i ̸= j,

R (Z1 ) ⊆ · · · ⊆ R (Zk ) .

vec(E ) ∼ Nn×p (0, 6 ⊗ In ) ,

(7)

62

D. Klein, I. Žežula / Statistical Methodology 23 (2015) 59–72

In order to have all parameters of interest estimable, we also assume that n > i=1 r (Xi ) + p. Example 1 indicates that via reparametrization we can derive Model III from Model I.

k

Lemma 1. Model I and Model III are equivalent via reparametrization. Proof. Let us assume Model I, where without loss of generality let r1 ≥ r2 ≥ · · · ≥ rk . Then it follows from condition (2) that there exist matrices Hi of dimension n × ri − ri+1 , i = 1, . . . , k − 1, such that

R(Xi ) = R(Xi+1 )  R(Hi ),

i = 1, . . . , k − 1,

where  denotes the orthogonal sum of subspaces. Then there exist matrices 4i such that Xi = (Xk : Hk−1 : · · · : Hi ) 4i ,

i = 1, . . . , k − 1.

Let  us partition each transformed matrix of unknown parameters 4i Bi according to Xk : Hk−1 : · · · : Hi , i.e.

 4i Bi = 

2i,1



.. .

2i,k−i+1

  ,

i = 1, . . . , k − 1,

where 2i,1 is of dimension rk × qi and 2i,j is of dimension rk−j+1 − rk−j × qi for j = 2, . . . , k − i + 1. Then we can reparametrize Model I as Y =

k 

Xi Bi Z′i + E

i=1

= (Xk : Hk−1 : · · · : H1 )(2′1,1 : Θ′1,2 : · · · : 2′1,k )′ Z′1 + (Xk : Hk−1 : · · · : H2 )(2′2,1 : 2′2,2 : · · · : 2′2,k−1 )′ Z′2 + · · · + (Xk : Hk−1 )(2′k−1,1 : 2′k−1,2 )′ Z′k−1 + Xk Bk Z′k + E = H1 21,k Z′1 + H2 (21,k−1 : 22,k−1 )(Z1 : Z2 )′ + · · · + Hk−1 (21,2 : 22,2 : · · · : Θk−1,2 )(Z1 : Z2 : · · · : Zk−1 )′ + Xk (21,1 : 22,1 : · · · : Θk−1,1 : Bk )(Z1 : Z2 : · · · : Zk )′ + E , which is Model III.



Remark 1. It can be observed that the same result can be obtained in the previous lemma, if also the condition (3) is fulfilled in Model I, i.e. in the GCM model with double nestedness condition, R (Xk ) ⊆ · · · ⊆ R (X1 ) and R (Z1 ) ⊆ · · · ⊆ R (Zk ). ECGM with Hu’s condition, i.e. Model III, is much easier to handle in the case of known variance–covariance matrix 6. To illustrate the situation, let us look at two components Model I. If X1 , X2 , Z1 , Z2 are of full rank and R (Z1 ) ∩ R (Z2 ) = ∅, then the (unbiased) least-squares estimators of B1 , B2 is known to be

 −1 ′ −1  ′ −1 −1  B1 = X′1 X1 X1 Y6 Z1 Z1 6 Z1   −1 ′ Σ −1 QΣ  ′  −1 ′  −1 Z1 − X1 X1 X1 PX2 Y PZ2 6−1 Z1 Z′1 6−1 Z1 ,  −1 ′ −1  ′ −1 Σ −1 −1  B2 = X′2 X2 X2 Y6 Z2 Z2 6 QZ1 Z2 , see Žežula [9]. The situation is much more complicated for more than two components. On the other hand, explicit form of the least-squares estimator of all Bi ’s in Model III with arbitrary number of components is known to be

  −1 ′ −1  ′ −1  −1  Bi = X′i Xi Xi Y6 Zi Zi 6 Zi , see [2].

D. Klein, I. Žežula / Statistical Methodology 23 (2015) 59–72

63

The aim of this paper is to present the results for Model III. We derive the maximum likelihood estimators of the unknown parameters as well as basic moments of these estimators. The benefit of Hu’s condition is manifested greatly in deriving the variances of estimators of the location parameters, since the maximum likelihood estimators do not depend on each other (as is the case in Model I, Model II and GCM with double nestedness condition). Also, these variances in Model I and Model II are known only for three component model, while we are able to solve this problem for general k-component model in Model III. Similar results presented for Model III in this paper can be found in Kollo and von Rosen [3] and Filipiak and von Rosen [1] for Model I and Model II, respectively. 2. The maximum likelihood estimators The next theorem shows the MLEs in Model III. Theorem 2. Let us consider Model III, and denote X = (X1 , X2 , . . . , Xk ). All solutions of likelihood equations for the unknown parameters in this model are

 −  ′ −1 − 1 1 ′ −1 −  Bi = X′i Xi X′i YS− + Gi − (X′i Xi )− X′i Xi Gi Z′i S− i Zi Zi S i Zi i Zi (Zi Si Zi ) ,  ′   k k   Xi Bi Z′ Y− Xi n 6= Y− Bi Z′ i

i

i=1

= S1 +

k 

i = 1, . . . , k,

i=1 S



−1

S

−1

QZii Y′ PXi Y QZii

′

,

i=1 −1



−1

S 1 S 1 Y′ PXi−1 Y QZii− where Gi , i = 1, . . . , k, are arbitrary ri × qi matrices, S1 = Y′ QX Y and Si = Si−1 + QZii− −1 −1

′

for i = 2, . . . , k. Proof. The likelihood function equals − np 2

L(B1 , . . . , Bk , 6; Y) = (2π )

− 2n

|6|

   ′    − 12 Tr 6−1 Y− ki=1 Xi Bi Z′i Y− ki=1 Xi Bi Z′i

e

.

Differentiating the log-likelihood with respect to the unknown parameters and setting this derivatives to zero, we get the following likelihood equations: 0 = X′i Y − Xi Bi Z′i 6−1 Zi ,

i = 1, . . . , k,

′ 







 n6 =

Y−

k 

Xi Bi Z′i

Y−

k 

i=1

Xi Bi Z′i

(8)

.

(9)

i=1

According to Puntanen et al. [5], Theorem 11, p. 267, the general solution of the system (8) is Bi = X′i Xi



−

X′i Y6−1 Zi Z′i 6−1 Zi



−

+ Gi − (X′i Xi )− X′i Xi Gi Z′i 6−1 Zi (Z′i 6−1 Zi )− ,

i = 1, . . . , k,

(10)

where Gi is arbitrary matrix of appropriate order. k Since the column spaces of matrices Xi are orthogonal, it holds PX = i=1 PXi . As a result, we can rewrite Eq. (9) as n6 = S1 +

k  

PXi Y − Xi Bi Z′i

′ 

PXi Y − Xi Bi Z′i ,



i =1



where S1 = Y QX Y, and replacing Bi by formula (10) we obtain n6 = S1 +

k  i=1

QΣ Zi

−1



Y′ PXi Y QΣ Zi

−1

′

.

(11)

64

D. Klein, I. Žežula / Statistical Methodology 23 (2015) 59–72



It is easy to see that QΣ Zi

−1

′

6−1 Zj = 0 for j ≤ i because of nested subspace condition R (Z1 ) ⊆

· · · ⊆ R (Zk ). Therefore we first multiply (11) by 6−1 Z1 and we obtain nZ1 = S1 6−1 Z1 which is equivalent to 1 6−1 Z1 = nS− 1 Z1 .

Replacing this expression into (11) we have n6 = S 2 +

k 

QΣ Zi

−1



Y′ PXi Y QΣ Zi

−1

′

,

i=2 S



−1

S

−1

where S2 = S1 + QZ11 Y′ PX1 Y QZ11

′

. Now we multiply the last expression by 6−1 Z2 and obtain

1 6−1 Z2 = nS− 2 Z2 . Continuing this process we finally obtain that 1 6−1 Zi = nS− i Zi ,

where Si = Si−1 +

i = 1, 2, . . . , k,

−1 S 1 Y′ PXi−1 Y QZii− −1



−1 S 1 QZii− −1

′

(12) for i = 2, . . . , k, and S1 = Y′ QX Y. Finally replacing (12) into

(10) and (11) we obtain the desired result for  Bi and n 6.



It is obvious that  6 is unique, thus being the MLE. However, the estimators  Bi need not be unique, as all Bi are not necessarily estimable. The uniqueness conditions are given in the next theorem for Model III. Theorem 3. The maximum likelihood estimator  Bi , i = 1, . . . , k, is unique if and only if Xi and Zi are of full rank, i.e. r (Xi ) = ri and r (Zi ) = qi . In that case

 −1 ′ −1  ′ −1 −1  Bi = X′i Xi Xi YSi Zi Zi Si Zi , where Si is given in Theorem 2. Linear functions Xi Bi Z′i , i = 1, . . . , k, are always estimable, their (unique) MLEs being

 −1 ′ S ′  Xi Bi Zi = PXi Y PZii . 1 ′ −1 − Proof. It follows from Theorem 2 that  Bi is unique if and only if Gi = (X′i Xi )− X′i Xi Gi Z′i S− i Zi ( Zi S i Zi ) for all g-inverses, which holds if and only if Xi and Zi are of full rank. The second claim follows from the fact that



S

−1

Xi Bi Z′i = PXi Y PZii

′

 −1 ′ S + Xi Gi Z′i − PXi Xi Gi Z′i PZii ,

and the last two terms cancel out.



3. Basic moments of estimators In the following we assume that all design matrices Xi and Zi are of full rank. Lemma 4. Let A and V > 0 be any p × q and p × p matrices, respectively. Then (i) PVA

−1

 ′ = QVQA .  −1 ′

(ii) PVA = QVQA −1

(iii) V

.

−  − = A A VA A′ + V−1 QA QA V−1 QA QA V−1 



D. Klein, I. Žežula / Statistical Methodology 23 (2015) 59–72

65

Proof. Let us start with (i). Following Markiewicz [4], proof of Lemma 1, it is easy to see that PV −1/2 A V1/2 QA = 0. Then PV 1/2 QA = V1/2 QA (QA VQA )− QA V1/2

  = I − PV −1/2 A V1/2 QA (QA VQA )− QA V1/2   = I − PV −1/2 A PV 1/2 QA .       However, since r V1/2 QA = n − r V−1/2 A , it follows that PV 1/2 QA = I − PV −1/2 A = QV −1/2 A , which is equivalent with PV −1/2 A = QV 1/2 QA . Then  − −1 PVA = A A′ V−1 A A′ V−1  − = V1/2 V−1/2 A A′ V−1/2 V−1/2 A A′ V−1/2 V−1/2 = V1/2 PV −1/2 A V−1/2 = V1/2 QV 1/2 QA V−1/2  − = I − V1/2 V1/2 QA QA V1/2 V1/2 QA QA V1/2 V−1/2  ′ = I − VQA (QA VQA )− QA = QVQA .  ′ −

Similarly, (ii) can be proved. For (iii) observe that A A VA

−

A A′ VA





A′ =

−1

QVQA

′



−1 V−1 = V−1 − PQV A

′

A′ = PVA V−1 . Then using (ii) we have

V−1

 − = V−1 − V−1 QA QA V−1 QA QA V−1 .  Lemma 5. Let us consider Model III and define W1 = 6−1/2 S1 6−1/2 , Wi = Wi−1 + Ni−1 for i = 2, . . . , k, where S1 = E ′ QX E and Ni = 6−1/2 E ′ PXi E 6−1/2 . Then,

 Wi ∼ Wp

n−

k 

 r (Xj ), Ip

∀ i = 1, . . . , k.

j =i

Proof. Denoting H = E 6−1/2 it is easy to see that H ∼ Nn×p 0, In , Ip , i.e. vec H ∼ N (0, Ip ⊗ In ).





Then it is obvious that W1 = H′ QX H ∼ Wp n − QX +

i−1

j =1

PXj = In −

Wi = H′ QX H +

k

j =i

i−1 





k

j =1

r (Xj ), Ip . Observe that denoting Ti =

PXj we can write

H′ PXj H = H′ Ti H

(13)

j =1

for i = 2, . . . , k. Since matrix Ti is symmetric and idempotent with r (Ti ) = n − follows. 

k

j =i

 

r Xj , the result

Lemma 6. Let us consider Model III with W1 , . . . , Wk defined as in Lemma 5. Let Di = 61/2 QZi , i = 1, . . . , k. Then for any i, j, such that 1 ≤ j < i < k, it holds

 

Wj+1

W

Tr E PDii · · · PDj+1 Dj D′j Wj Dj where dj =

n−

k

n−



−



Wj+1

D′j PDj+1

( )−1 and ci = ( )−1

l=j+1 r (Xl )−r Dj

k

l=j r (Xl )−r Dj

′

 ′  W · · · PDii = dj dj+1 . . . di−1 ci ,

r (Di ) n−

k

l=i r (Xl )−r (Di )−1

.

Proof. It is clear that Di is a p × p matrix of rank r (Di ) = p − r (Zi ), so that it is not of full rank.  − However, Di D′i Wi Di D′i Wi is a projection matrix and such a matrix depends only on the space it

66

D. Klein, I. Žežula / Statistical Methodology 23 (2015) 59–72

projects onto, therefore matrix Di can be replaced by any 1i ∈ Rp×r (Di ) such that R (Di ) = R (1i ) and Di D′i Wi Di



−

D′i = 1i 1′i Wi 1i



−1

1′i .

This implies

 

Wj+1

W

Tr E PDii · · · PDj+1 Dj D′j Wj Dj



−



Wj+1

D′j PDj+1

′

 ′  W · · · PDii

   ′   −1 ′  Wj+1 ′ W W W = Tr E P∆ii · · · P∆jj++11 1j 1′j Wj 1j 1j P∆j+1 · · · P∆ii .

(14)

Since the column spaces of Zi ’s are nested, we have R (1k ) ⊆ · · · ⊆ R (11 ). Therefore for any 1 ≤ j < k there exists matrix Uj such that 1j+1 = 1j Uj . Let us denote Kj = 1′j Wj+1 1j



−1/2

 −1/2 (1′j Wj 1j ) 1′j Wj+1 1j , 

j = 1, . . . , k − 1.



Lemma 5 implies that 1′j Wj 1j ∼ Wr (1j ) n − l=j r (Xl ), 1′j 1j , and 1′j H′ PXj H1j ∼ Wr (1j ) (r (Xj ), 1′j 1j ). These two statistics are independent due to the orthogonality of column spaces of Xi ’s. Since Wj+1 = Wj + H′ PXj H, it follows that Kj has a multivariate beta distribution of type I (see e.g. Kollo

k

1 and von Rosen [3], Theorem 2.4.8 and Definition 2.4.2, pp. 248–249). Then E K− j



dj =

( ) ( )

k

l=j+1 r (Xl )−r 1j −1  n− kl=j r (Xl )−r 1j −1

n−



= dj I, where

(see e.g. von Rosen [8], Lemma 2.3(vi)). Taking only the middle part of

the right hand side of (14) we may write Wj+1

P∆j+1 1j 1′j Wj 1j



−1



Wj+1

1′j P∆j+1

′

 −1 ′  ′ 1/2 −1 = 1j+1 1′j+1 Wj+1 1j+1 Uj 1j Wj+1 1j Kj  ′ 1/2  ′ −1 ′ × 1j Wj+1 1j Uj 1j+1 Wj+1 1j+1 1j+1 .

According to Kollo and von Rosen [3], Corollary 2.4.8.1, p. 250, Kj is independent of 1′j Wj+1 1j . Then Kj is also independent of 1′l Wl 1l for l ≥ j + 1, since there exist matrix Ul,j+1 such that

 1l Wl 1l = ′

U′l,j+1

1j Wj+1 1j + 1j H ′



l−1 

 PXs H1j

Ul,j+1 ,

s=j+1

and Wj+1 and H′

 l −1

s=j+1

PXs H are independent. Thus, denoting

Wj+2

W

C = P∆ii · · · P∆j+2 1j+1 1′j+1 Wj+1 1j+1



−1

U′j 1′j Wj+1 1j



1/2

,

C is independent of Kj and therefore

 

 ′   ′ −1 ′  Wj+1 ′ W Tr E P∆i · · · P∆j+1 1j 1j Wj 1j 1j P∆j+1 · · · P∆ii      1 ′    1   ′  1 ′ = Tr E CK− = Tr E K− = Tr E K− E CC j C j CC j   ′    ′  = dj Tr E C C = dj Tr E CC    ′   −1 ′  Wj+1 ′ W W W = dj Tr E P∆ii · · · P∆jj++22 1j+1 1′j+1 Wj+1 1j+1 1j+1 P∆j+2 · · · P∆ii . Wi

Wj+1

D. Klein, I. Žežula / Statistical Methodology 23 (2015) 59–72

67

We can repeatedly apply the same technique, and at the end we obtain

 

Wj+1

W

Tr E P∆ii · · · P∆j+1 1j 1′j Wj 1j



−1



Wj+1

1′j P∆j+1

′

 ′  W · · · P∆ii

   −1 ′  1i = dj dj+1 . . . di−1 Tr E 1i 1′i Wi 1i r (1i )

= dj dj+1 . . . di−1

k

n−



.

r (Xl ) − r (1i ) − 1

l =i

The last expectation is due to Theorem 2.4.14(iii) from Kollo and von Rosen [3], p. 257. Since r (1i ) = r (Di ) for all i, the result follows.  Theorem 7.  Bi is an unbiased estimator of Bi for all i = 1, . . . , k, i.e. E[ Bi ] = Bi . The variance of the estimator is i−1



   −1  Var  Bi = (1 + ci ) Z′i 6−1 Zi + (dj − 1)dj+1 . . . di−1 ci j =1

× Z′i Zi 

where ci =

−1

Z′i Zj Z′j 6−1 Zj



p−r (Zi ) n−

k

l=i r (Xl )−p+r (Zi )−1

and dj =

 −1 n−

Z′j Zi Z′i Zi



−1



 −1 ⊗ X′i Xi ,

( )−1 for 1 ≤ j < i. ( )

k

l=j+1 r (Xl )−p+r Zj

n−

k

l=j r (Xl )−p+r Zj −1

Proof. First, observe that S1 , . . . , Sk could be written by means of the random errors, i.e. S1 = E ′ QX E , S

−1

1 E ′ PXi−1 E Si = Si−1 + QZii− −1



S

−1

1 QZii− −1

′

,

i = 2, . . . , k.

Then it is easy to see that

 −1 ′ −1  ′ −1 −1  Xi E Si Zi Zi Si Zi . Bi − Bi = X′i Xi

(15)

It follows from Kollo and von Rosen [3], Theorem 2.2.4(iv), p. 196, that X′i E and Si are independent. Therefore





E  Bi − Bi = E



X′i Xi



−1

X′i E





  ′ −1  −1  1 ·E S− = 0. i Zi Zi Si Zi



=0

Now, let us derive the variance of  Bi . Using (15), Lemma 4 and properties of the vec operator we have

 

Var  Bi = E



1 Z′i S− i Zi

−1

1 −1 ′ −1 Z′i S− i 6Si Zi Zi Si Zi



−1 

 −1 ⊗ X′i Xi

 −1  −1 ′   −1 ′  −1  ′ −1 S S = Z′i Zi Zi E PZii 6 PZii Zi Z′i Zi ⊗ Xi Xi   ′  −1 ′  −1  ′ −1 S S = Z′i Zi Zi E QQiZ 6QQiZ Zi Z′i Zi ⊗ Xi Xi i

df

= Z′i Zi 

−1

Z′i Fi Zi Z′i Zi



i

−1

⊗ X′i Xi 

 −1

.

(16)

68

D. Klein, I. Žežula / Statistical Methodology 23 (2015) 59–72



k

First, let i = 1. Since S1 ∼ Wp n −



j=1

r (Xj ), 6 , via calculations similar to Kollo and von

Rosen [3] (4.2.13)–(4.2.23), pp. 412–413, we obtain

 

B1 = E Var 



1 Z′1 S− 1 Z1

n−

−1

k 

1 −1 ′ −1 Z′1 S− 1 6S1 Z1 Z1 S1 Z1



n−

k 

 −1 ⊗ X′1 X1

r (Xj ) − 1

j=1

=

 −1 



Z′1 6−1 Z1

 −1

−1  . ⊗ X′1 X1

r (Xj ) − p + r (Z1 ) − 1

j =1

Now let i > 1. Let us define V1 = 6−1/2 S1 6−1/2 ,



V

Vi = Vi−1 + PDii−−11

and

′

V

Ni−1 PDii−−11 ,

for i = 2, . . . , k,

where Ni = 6−1/2 E ′ PXi E 6−1/2 ∀i as in Lemma 5, and Di = 61/2 QZi ∀i as in Lemma 6. Then, we can re-write the expression for Fi as



  ′    ′ Si Si Si Fi = 6 − E 6 − E 6PQZ + E PQZ 6PQZ i i i i         ′ ′ V V V V = 6 − 61/2 E PDii + E PDii − E PDii PDii 61/2 . S PQiZ

Since R(Dk ) ⊆ · · · ⊆ R(D1 ), for any non-singular matrix A we obtain D′i Vi Di = D′i Wi Di ,

(17)

Di Vi Di−1 = Di Wi Di−1 , ′



PADi Dj = Dj ,

∀j > i.

Using these relations, we can write V

PDii = Di D′i Vi Di



−

D′i Vi = Di D′i Wi Di



−



D′i Vi−1 + Ni−1 Di−1 D′i−1 Wi−1 Di−1



−

D′i−1 Vi−1



  −   − = Di D′i Wi Di D′i I + Ni−1 Di−1 D′i−1 Wi−1 Di−1 D′i−1 Vi−1  ′  −   − V = Di D′i Wi Di D′i I + Ni−1 Di−1 D′i−1 Wi−1 Di−1 D′i−1 PDii−−11 Vi−1   −   − V = Di D′i Wi Di D′i I + Ni−1 Di−1 D′i−1 Wi−1 Di−1 D′i−1 Vi−1 PDii−−11  −  − = Di D′i Wi Di D′i Vi Di−1 D′i−1 Vi−1 Di−1 D′i−1 Vi−1  −  − = Di D′i Wi Di D′i Wi Di−1 D′i−1 Vi−1 Di−1 D′i−1 Vi−1 W

V

= PDii PDii−−11 .

(18)

Then, using (13) we get for any 2 ≤ j ≤ i V



W



W

Wj+1

Vj



i−1 PDii PDj−1 − PDj = PDii PDi− · · · PDj+1 PDj PDj−1 − PDj 1



j    W W  W W W  = PDii PDi−i−11 · · · PDj+j+11 PDjj PDj−1 − PDj = PDl l PDj−1 − PDj l =i j

=

 l =i

Dl (D′l H′ Tl HDl )− D′l H′ Tl H PDj−1 − PDj .





D. Klein, I. Žežula / Statistical Methodology 23 (2015) 59–72

69





Since H PDj−1 − PDj is independent of HDl for any l ≥ j, we have

E



V PDii



PDj−1 − PDj



=E

 j 

 Dl (Dl H Tl HDl ) Dl H Tl E H PDj−1 − PDj ′



− ′



 

l =i



= 0.





(19)



=0

Similarly, since V1 = W1 and HQD1 is independent of HDi for all i ≥ 1, we obtain E



V PDii QD1



 =E

1 

 Wj PDj QD1

j =i

=E

 1 

 Dj (Dj H Tj HDj ) Dj H Tj HQD1 ′



− ′



j =i

=E

 1 

 Dj (Dj H Tj HDj ) Dj H Tj E HQD1 ′



− ′





j=i

= 0.



(20)

   =0

V

It is easy to see that PDii PDi = PDi , therefore





V

E PDii PDi = PDi .

(21)

i

Observe that I = PDi +



V

E PDii





PDj−1 − PDj + QD1 . Therefore, using (19)–(21) we obtain

j=2

i         V V V = E PDii PDi + E PDii PDj−1 − PDj + E PDii QD1



j =2

= PDi . 

V

Let us now look at PDii

 E PDi



V PDii

′

V PDii QD1

′ 

V

V

V

V

PDii . Since PDii PDi = PDi and PDi PDii = PDii , it holds

  V = E PDii QD1 = 0.

Using similar principle as in (19) and (20), we obtain

   E

′

V

′



PDii





PDii





PDj−1 − PDj

E

V



PDj−1 − PDj

E

PDj−1 − PDj

V PDii

′



V

= 0,

j = 2, . . . , i,

= 0,

j = 2, . . . , i,

PDii QD1



V

PDii PDi V PDii



PDk−1 − PDk

 

= 0,

k, j = 2, . . . , i, k ̸= j.

Therefore we can write E



V PDii

′

V PDii



 = E PDi



V PDii

′

V PDii PDi

 +

 i   E

PDj−1 − PDj



V PDii

′

V PDii



PDj−1 − PDj

 

j =2

 + E QD1



V PDii

′

V PDii QD1



.

(22)

70

D. Klein, I. Žežula / Statistical Methodology 23 (2015) 59–72





V

Trivially, E PDi PDii



V

QD1 PDii

′

′



V

PDii PDi

= PDi . Using (18), we express the last term as

V

PDii QD1 = QD1

i 

H′ Tj HDj (D′j H′ Tj HDj )− D′j

1 

Dj (D′j H′ Tj HDj )− D′j H′ Tj HQD1 .

j =i

j =1

Since HQD1 is independent of HDj , j = 1, . . . , k, and T1 = QX , using Lemma 6 and Theorem 2.2.9(i) of Kollo and von Rosen [3], we may write

QD 1 E



V PDii

′

V PDii





 Tr D1 (D′1 H′ T1 HD1 )− D′1 H′ T1 T1 H

QD1 = E

× D1 (D′1 H′ T1 HD1 )− D′1

i 

H′ Tj HDj (D′j H′ Tj HDj )− D′j

j =2 2 

×

 Dj (D′j H′ Tj HDj )− D′j H′ Tj H

QD1

j=i





= E Tr D1 (D′1 W1 D1 )− D′1 W1 D1 (D′1 W1 D1 )− D′1 i  

×

2 ′ 

Wj PDj



= E Tr

QD1

j=i

j=2



 Wj PDj

2 

Wj PDj D1

i  

Wj PDj

( D1 W 1 D1 ) D1 ′

− ′

j =i

′

 QD 1

j =2

= d1 . . . di−1 ci QD1 .

(23)

It can be obtained in a similar way for j = 2, . . . , i





PDj−1 − PDj E

 

= E Tr

W PDii



···

V

PDii

′

V



PDii

Wj+1 PDj+1 Dj



PDj−1 − PDj





Dj Wj Dj

−



Dj





Wj+1 PDj+1

′

···



W PDii

′ 

    × PDj−1 − PDj = dj dj+1 . . . di−1 ci PDj−1 − PDj .

(24)

Combining (22)–(24), we have

E



V PDii

′

V PDii

 = PDi +

i 

dj dj+1 . . . di−1 ci PDj−1 − PDj + d1 . . . di−1 ci QD1





j=2

= PDi +

i 

dj dj+1 . . . di−1 ci QDj − QDj−1 + d1 . . . di−1 ci QD1





j =2

= PDi + ci QDi +

i−1  

dj − 1 dj+1 . . . di−1 ci QDj .

j =1



(25)

D. Klein, I. Žežula / Statistical Methodology 23 (2015) 59–72

71

Since QDi = PΣ −1/2 Zi and r (Di ) = p − r (Zi ) for any i = 1, . . . , k, using (22) and (25) we get

 Fi = 6

1/2

QDi + ci QDi +



i−1  

61/2

dj − 1 dj+1 . . . di−1 ci QDj



j =1

 =6

1/2

(1 + ci ) PΣ −1/2 Zi

 i −1    + dj − 1 dj+1 . . . di−1 ci PΣ −1/2 Zj 61/2 j =1

i−1    −1 ′  −1 ′  Zj , Zi + dj − 1 dj+1 . . . di−1 ci Zj Z′j 6−1 Zj = (1 + ci ) Zi Z′i 6−1 Zi

(26)

j =1

where in dj and ci are r (Dl ) replaced by p − r (Zl ). Finally, the variance of  Bi , i = 2, . . . , k, is obtained from (16) and (26).  Observe that ci > 0 and di > 1 ∀i. It may be noted that the variances of maximum likelihood estimators are given only for the three-component model in Model I and Model II in the literature, since the calculations are very tedious in general for k components. However, the moments can be derived for general k components model  in Model III. The following theorem presents E  6 in Model III. Theorem 8. Let us consider estimator  6 given by Theorem 2. Then

   k i −1     r (Xi ) Σ −1 Σ −1  E 6 = 1+ 6, (dj − 1)dj+1 . . . di−1 ci PZj (ci − 1)PZi + n

i=1

(27)

j =1

where ci and dj for all i and j < i are given by Theorem 7. Proof. Observe that we can write n 6 = S1 +

k 

S



−1

S

−1

QZii Y′ PXi Y QZii

′

i=1

= S1 +

k  

′

S

PQiZ

i=1

i

S

E ′ PXi E PQiZ , i

where S1 = E ′ QX E . Since PXi E and Si are independent, utilizing the first moment of a Wishart matrix we obtain E n 6 = E [S1 ] +





 k  E

i =1

 =

n−

′ 

S

PQiZ

i

k



k



r (Xi ) 6 +



i=1

 =

n−

k  i=1



r (Xi ) E



S

E E ′ PXi E PQiZ

i



i=1

 r (Xi ) 6 +

k 

r (Xi ) 61/2 E

′

S

PQiZ

i





S

6PQiZ V

PDii

′

i



6PDii 61/2 . V

i=1

The result now follows from relation (25).



4. Conclusions To conclude, the authors would like to stress that orthogonal decompositions are very useful in all areas of statistics and mathematics. If modeling of a real world phenomenon allows creating an orthogonal structure, the resulting model can be expected to have much nicer properties than all

72

D. Klein, I. Žežula / Statistical Methodology 23 (2015) 59–72

others. In our case it is ECGM Model III, where we assume orthogonal structure in the ANOVA part (which is always the case in a simple one-way model). Under such assumptions, we have derived explicit form of the MLEs of both the first- and the second-order parameters, and also their basic moments, in general k profile setting. This is very useful especially for the comparison of small sample properties of the MLEs with other competitive estimators. As a drawback of ML estimation in this  , which cannot be easily corrected. setting can be viewed possible big bias of Σ Acknowledgments The research was supported by grants VEGA MŠ SR 1/0832/12 and VEGA MŠ SR 1/0344/14. The authors thank anonymous referees for their comments which helped to improve the quality of the paper and inspired further research. References [1] [2] [3] [4] [5] [6] [7] [8]

K. Filipiak, D. von Rosen, On MLEs in an extended multivariate linear growth curve model, Metrika 75 (2012) 1069–1092. J. Hu, Properties of the explicit estimators in the extended growth curve model, Statistics 44 (2009) 477–492. T. Kollo, D. von Rosen, Advanced Multivariate Statistics with Matrices, Springer, Dordrecht, 2005. A. Markiewicz, On dependence structure preserving optimality, Statist. Probab. Lett. 53 (2001) 415–419. S. Puntanen, P.H.G. Styan, J. Isotalo, Matrix Tricks for Linear Statistical Models, Springer-Verlag Berlin Heidelberg, 2011. A.P. Verbyla, W.N. Venables, An extension of the growth curve model, Biometrika 75 (1988) 129–138. D. von Rosen, Maximum likelihood estimators in multivariate linear normal model, J. Multivariate Anal. 31 (1989) 187–200. D. von Rosen, Moments for a multivariate linear model with an application to the growth curve model, J. Multivariate Anal. 35 (1990) 243–259. [9] I. Žežula, Remarks on unbiased estimation of the sum-of-profiles model parameters, Tatra Mt. Math. Publ. 39 (2008) 45–52.