Local influence assessment in the growth curve model with unstructured covariance

Local influence assessment in the growth curve model with unstructured covariance

ELSEVIER Journal of Statistical Planning and Inference 62 (1997) 263-278 of istical planning Jst a t O Urnal and inference Local influence assessme...

751KB Sizes 0 Downloads 20 Views

ELSEVIER

Journal of Statistical Planning and Inference 62 (1997) 263-278

of istical planning Jst a t O Urnal and inference

Local influence assessment in the growth curve model with unstructured covariance I Jian-Xin Pan a, Kai-Tai Fang

b

Dietrich v o n R o s e n

c,.

a Hon9 Kon9 Baptist Collefe and Yunnan University. China b Hon9 Kong Baptist College and the Institute of Applied Mathematics, Academia Sinica. China c Department o[" Mathematics, Uppsala University. Box 480. 75106 Uppsala, Sweden

Abstract

In this paper, a local influence approach is employed to assess adequacy of the growth curve model with an unstructured covariance, based on likelihood displacement. The Hessian matrix of the model is investigated in detail under an abstract perturbation scheme. For illustration, covariance-weighted perturbation is discussed and used to analyze two real-life biological data sets, which show that the criteria presented in this article are useful in practice. © 1997 Elsevier Science B.V. A M S classifications: Primary 62H 12; Secondary 62A 10 Keywords: Curvature; Growth curve model; Hessian matrix; Local influence; Likelihood displacement; Perturbation; Statistical diagnostic

1. I n t r o d u c t i o n

]'he 9rowth curve model (GCM) which is useful when investigating short time series in economics, biology, medical research and epidemiology (see, e.g., Grizzle and Allen, 1969; Lee and Geisser, 1975), including growth curves, was first proposed by Potthoff and Roy (1964) and then subsequently studied by many authors, including Rao (1965, 1966, 1967), Khatri (1966), Gleser and Olkin (1970), Geisser (1970) and yon Rosen (1989, 1990, 1991). The GCM is defined as Ypxn = X p x m B m x r Z r x n ~- Epxn,

(1.1)

where X and Z are known design matrices of ranks m < p and r < n, respectively, and B is an u n k n o w n regression coefficient matrix. The columns of the error matrix E are independent p-variate normal with mean 0 and u n k n o w n covariance matrix I; > 0, i.e, Y ~ Np, n ( X B Z , X, ln). Usually, p is the number of time points observed on each of * Corresponding author. E-mail: [email protected] J Supported partially by the Hong Kong UPGC Grant and the Swedish Natural Research Council. 0378-3758/97/$17.00 @ 1997 Elsevier Science B.V. All rights reserved PII S0378-3758(96)00194-2

264

J.-)L Pan et al./ Journal of Statistical Planning and Inference 62 (1997) 263-278

the n cases, (m - 1) is the degree of a polynomial which describes the mean structure of the time series and r is the number of treatment groups. The GCM has been applied to many real-life examples, for instance, by Potthoff and Roy (1964) and Keramidas and Lee (1990). For the parameters, B and 12, in the GCM, Khatri (1966), Gleser and Olkin (1970), Srivastava and Khatri (1979), and von Rosen (1989), have shown various ways of finding the maximum likelihood estimators. The following expressions were obtained by the authors: (XtS-IX)-Ixts-Iyzt(zzt)-I

and

,r

1-(S + QsYPz, yt~t n

~

(1.2)

where S = Y(I,, - P z t ) Y t and the projections Qs = SXo(XtoSXo)-lXto and Pa = ~(AtA)-I.,4 t where Xo is any matrix which spans the orthogonal complement to the column vector space of X, i.e. XtoX = 0 and rank(Xo) = p - rank(X). The matrix S is positive definite with probability one as long as n > p+r (Okamoto, 1973). Various aspects of the estimators in (1.2) have been considered. For reviews of the model see Woolson and Leeper (1980), Seber (1984) or von Rosen (1991). In particular some asymptotic properties as well as formulaes for higher moments of B and X have been obtained (von Rosen, 1990). These can be used when giving asymptotic expansions of the distribution and density functions. Hence, statistical inference for the GCM, which is based on the maximum likelihood estimators, can be carried through. Furthermore, the prediction problem, which is of special interest for growth curve data, has among others been studied by Rao (1984, 1987) and Lee (1988, 1991). Statistical regression diagnostics have systematically been investigated over the last two decades. However, for the GCM, there are few works on regression diagnostics. In fact, most investigations on this subject have appeared very recently. For the G e M with the special spherical covariance structure, i.e. _r = o-2G, where G > 0 is known, Liski (1991) presented an approach for detecting outliers and influential observations. When X is arbitrary Pan and Fang (1994) derived a convenient formula for the empirical influence function for B. Based on the empirical influence function a likelihood ratio criteria for detecting multiple outliers and the generalized Cook's distance for identifying influential observations, have been established (see also Kish and Chinchilli, 1990; Pan, 1994). Another approach was taken by von Rosen (1991) who studied influential observations with the help of perturbations. One may view the approach as an extended version of approaches presented by Belsley et al. (1980) and Pregibon (1981). Most of the procedures mentioned above are based on the so called case-deletion methodology, which also is known as the 91obal influence approach. In the approach the main idea is to delete observations and then to choose an appropriate metric for measuring changes of the model, an estimator or any other statistics. For details see for example, Cook and Weisberg (1982) or Chatterjee and Hadi (1988). This approach is not just of theoretical interests and it is often applied in practise (see, e.g., Cook and Weisberg, 1982; Beckman and Cook, 1983). However, several problems arise when applying it (see Cook, 1986). For example: (i) How should one in advance

j._)d Pan et al./Journal oJ Statistical Planning and &lerence 62 (1997) 263 27~¢

265

decide the size of the subset of observations which is to be deleted? (ii) How should one choose the subset of observations even if its size is known? The first question comprises the so-called masking and swamping phenomena, in which the diagnostic statistics based on case-deletion sometimes detect too many outliers (or influential observations) and sometimes detect too few. These problems have not been satisfactory solved (see, e.g., Rousseeuw and van Zomeren, 1990). Moreover, especially for dependent data, the second question usually meets some difficulties (see, e.g., Barrett and Lewis, 1984). In order to deal with these problems, Cook (1986) developed a general method t~r assessing the influence of local departure from model assumptions, which now is known as the local influence approach. The method assumes only a well-behaved likelihood. For the ordinary regression model with a normal error, Cook (1986) demonstrated the local influence approach and introduced some diagnostic statistics measuring effects of various perturbation schemes. Since then, this method has been adopted and plays now an important role in statistical diagnostics. However, for some more complicated models such as the GCM, many difficulties arise when applying the local influence approach in order to diagnose the models in an adequate manner, as pointed out by Beckman et al. (1987). They employed the local influence to diagnose the mixed-model analysis of variance. The purpose of this paper is to apply the local influence approach to the GCM. In the beginning of Section 2, we will give a very brief sketch of the local influence approach., and then emphasize multivariate techniques which are useful for a matrix version. The Hes;sian matrix of a statistical model (see, e.g., Cook, 1986), serves as a basis of the local influence approach. It will be shown that the Hessian is invariant under an 5, one-to-one measurable transformation of parameters. This fact can significantly simplify investigations. In Section 3 estimated information matrices for the GCM are established when the regression coefficient B, the covariance I; or the parameter pair (B,X) are of :interest. Thereafter, in Section 4 d B ( W ) / d W and dX( W)/'d W are obtained, where W is a perturbation matrix a n d / ~ ( W ) and ,~(W) are maximum likelihood estimators in the perturbed model. Hence based on the results of Sections 3 and 4 the Hessian matrices are established in Section 5 for the regression coefficient B, the covariance Z, and the parameter pair (B,Z). Furthermore, in Section 5, a perturbation scheme, a variance-weighted version which in practice has appeared popular, is investigated in some detail. The Dental Data (Potthoff and Roy, 1964) and the Mouse Data (Rao, 1984) are analyzed by using the methods presented in this article. The analyses show that the approach may be useful in practice, especially since computation is not overwhehning.

2. Local influence and the related multivariate version

In this section, the local influence approach is briefly described and an extension is investigated. Furthermore, some multivariate techniques related to the GCM are presented.

266

J.-X.. Pan et al./Journal of Statistical Plannin 9 and Inference 62 (1997) 263 278

2.1. Local influence Let L(O) represent the log-likelihood function of a certain postulated model, where 0 is a p-variate parameter vector. Denote the MLE of 0 by /J. When the model is perturbed by some factors, say to, where to E ~ is a q-variate vector and t] is the perturbation space, the log-likelihood function and the MLE of 0 are denoted L(0ito ) and/J(to), respectively. Suppose that there exists a null perturbation, i.e. a point too E such that L(Oitoo)=L(O ) for all 0 and tJ(to0)---0. One of the most important issues is to choose some appropriate measures to assess the difference between 0 and 0(to). The likelihood displacement LD(to) =_ 2 ( L ( / J ) - L(0(to))), suggested by Cook (1986), is one of such measures. Since the MLE 0 maximizes L(O), we always have LD(to)>~ O. It is supposed that large values of LD(to) indicate that/Jo, and 0 differ. Furthermore, it follows that the first derivative of LD(to) with respect to to, evaluated at too, vanishes and that the likelihood displacement LD(to) attains its local minimum at too. In order to observe the changes of LD(to) in the neighbourhood of too, Cook (1986) suggested the use of the geometric curvature, say Ca, of LD(to) along a direction d E ~q. Obviously, the larger the value of Ca is, the stronger the perturbation in the direction d influences the likelihood. Without loss of generality, the direction d can be subjected to an unit sphere S q in a q-dimensional space, i.e., ]]d]l = dv/d~ -- 1. Especially, the direction, say dmax, maximizes the curvature Ca(d E sq), shows how to perturb the postulated model and also shows how to obtain the largest local changes on the likelihood displacement. The direction dmax thus serves as a basis of diagnosing the local changes of the postulated model. How should one find the direction dm~x on the unit sphere sq? In fact, as noted by Cook (1986), it is nothing but the unit eigenvector corresponding to the largest absolute eigenvalue of the Hessian matrix F0 = Go[.G~, where £ = dZL(O)/dOdOt]o=O is the p × p observed information matrix with (i,j)th element d2L(O)/dOidOjIo-O (1 <~i,j <<.p), At

and Go - dO (oJ)/d~o[~,=~oo is a q × p matrix with (i,j)th element d~(~o)/d~oi]o~=~,0 (1 <~i<~q, 1 <~j<~p). Therefore, the Hessian matrix F0 plays a pivotal role in the local influence approach. Sometimes, we are only interested in a r-dimensional subset 01 of the p-dimensional 0 = (0t~,0~)t. In this case, the likelihood displacement is given by LD(to) = 2 ( L ( 0 ) - L(0(to))), where /~(to) = (01(to), ^t ~t ^ t /J2(0,)is the parameter vector 02(0,(to))), such that L(01,02(01)) -- suP02 L(Ol,O2) and 0,(to) is the MLE of 01 in the perturbed model. In a similar manner, the Hessian matrix, denoted by F0,, can be decomposed ^t into Fo, = Go~£G~,, where Go, - (d0, (to)/dto]o,=,~o)" (I, d0t2(01)/dO, ). Lemma 2.1. The Hessian matrix Fo (or Fo~) is &variant under any one-to-one mea-

surable transformation of the parameter vector 0 (or Ol ). ProoL Here we only show the invariance of F0. For/~0, the invariance can be proven in the same manner. Let q = g(O) be a one-to-one measurable transformation from 0

J.-X. Pan et al./Journal of Statistical Planning and InJerence 62 (1997) 263 278

267

to !/. By noticing that the log-likelihood function L(O) = L(g I(~/)) - £(r/), the MLE of q is 0 = g(O). Denote rj(~o) as the MLE of r/ when the model is pemarbed by a factor m. By using the chain rule of vector derivative (see, e.g., Magnus and Neudecker (1988) or Fang and Zhang (1990)), we know d2/.(tl) d

(d£(t/) .)t

dt/dq t - d t I \

dOt

d ( dOt

dL(0).it

dl/ J = d-~'d-O \~-~ ' ~

J

= (dOt~ . (d2L(O)~ . (dOt~ t \ d~l J

k,. dO dOt J

dOt(c0)

dOt(o)

dOt(o)

do

do

dO(o)"

\ d~l J

and

Hence, L,1

(-d0t'~t0(d0t'~t = \dt/J \dt/J

and

G , I = G o (d~t'~

\ d 0 - J 0=0"

Therefore, we obtain that Fq = Gq[.,G~ = Go£oG~ = ~'0, which implies that the Hessian matrix F0 is invariant under any one-to-one measurable transformation and the proof is complete. [] Obviously, Lemma 2.1 also implies that the direction dmax is invariant under ans, one-to-one measurable transformation of the parameters. 2.2. Matrix derivatives

In order to derive the Hessian matrix for the GCM we need some basic facts for matrix derivatives. Matrix derivatives are in this paper defined by either dB dA

--

dvec(B) dvect(A )

or

dB dA

dvec(B) dsvect(A )

for symmetric A

dB dA

dsvec(B) dsvect(A)

for symmetric B and A

or

where for p x p symmetric A svec(A) =_ (all . . . . . apl,a22 .... ,ap2 . . . . . app) t E ~p(p+l):2 which in the literature is also denoted by vech(e). The choice of one of the three versions of derivatives on various places in this paper will become clear from the

3i-x. Pan et al./Journal of Statistical Planning and Inference 62 (1997) 263 278

268

context. Second order derivatives are defined recursively, i.e.

dA 2

dA

~

"

Let

H = Ip2 + Kp, p - (Kp, p)d

(2.1)

where Kp, q is the commutation matrix, i.e. Kp, q vec(A) = vec(A t) where A is of size p x q and (Kp, p)d stands for the matrix Kp,p where the off-diagonal elements have been put to 0. Furthermore, let Dp denote the duplication matrix, i,e. vec(A ) =

Opsvec(A),

(2.2)

where A is a p × p symmetric matrix. Useful results for Kp, q and Dp can be found in Magnus and Neudecker (1988) and Fang and Xu (1990). Among others, Dp+vec(A) = svec(A) where D + denotes the Moore-Penrose inverse of Dp, Kp, pDp = Dp and Dp~H = Dtp. The next lemma is given without a proof since related results can be found in Magnus and Neudecker (1988), Fang and Zhang (1990) and Kollo (1991).

Lemma 2.2. (i) (ii)

d S C - (Ct®Ip)d~A + ( L ® B ) ~A dBdA ® C _ (lq @ Ks,p ® L )

B:pxqC'qxrA'sxt

~-~ ® vec(C) + vec(B) ®

B:p×qC:r×s (iii)

dtr(BMt) - v e c t ( M ) ~ dZ

(iv) (v)

dtr(BMBtNt) - (vect(NBM t) + vect(NtBM))d~zB, dd ~A dB-1 dA (B-1 ® B-I)d-~dB

(vi)

ddet(B)°

(vii)

dvec(A ) - H dvect(A)

(viii)

dvec(A) _ H(D+)t dsvect(A ) = Dp

dA

- adet(B)avect((Bt)-l)

~-

if A is symmetric if A is symmetric.

[]

Simple consequences of Lemma 2.1 which will be used later are:

Corollary 2.1. Let A be a symmetric p × p matrix dA (i) d A - ' - D+(A ® A)Dp

J.-X. Pan et al./ Journal of Statistical Planning and In]erence 62 (1997) 263 278

269

d logdet(A) _ vect(A_ L)Dp dA

(ii)

d21ogdet(A) d21ogdet(A) : -ol,(a dA 2 -- dsvec(A)dsvect(A)

(iii)

where the matrix Dp is defined by (2.2). Corollary 2.2. Let A be a p x q matrix dAA

~

dA

-- (l 4- Kp, p)(A @ I).

3. Local influence in the growth curve model We will now consider the information matrix for the GCM so that the Hessian matrix Go[,G~ can be obtained.

3.1. B and S, are oJ" interest The log-likelihood function of the GCM equals

k =_ L(B,Y. 1) = c + lnlogdet(Z -L) - l t r { Z - l ( Y - X B Z ) ( Y - XBZ)t}, (3.1) where the constant c = - ½ pnlog(27z). In Lemma 2.1 it was shown that the parametrization was immaterial. Therefore we consider the parameter pair (B,X L) instead of (B,S). Let 0t = vec(B) and 02 = s v e c ( Z - ] ) , i.e. 0 ~ (0tl,0~) t E R mr+P(p+l)/2 is the parameter vector of the model and then ]2 : d2L/dOdOtlo O. In this case L takes the following form: Theorem 3.1. The observed information matrix jor the GCM, based on the parame-

ters B and Z - l , can be expressed as - n ( Z Z t) @ ( X t S - 1 X ) , [,e.x =

Dtp(QsYZ t '~-)X),

(ZytQts @ Xt)Dp "~ --5"n O;(~,t

@~~,)Op J ,

(3.2)

where S, ~,, Qs and Dp, have previously been defined Proof. Partition [ . B , z into

, ~ (Lll t21

L12)=~ ( d2L/dOldOt''

LB Z

L22

\d2L/d02 d0tl,

d2L/dOldOt2) . d2L/d02 dot2 o:0

(3.3,

For L~I it is enough to consider - 5 I tr(X,-1XBZZtBtXt). Now, using Lemma 2.2 (iv) gives d ~ ( - ½ tr(Z

1

X B Z Z t B t X t ) = - v e c ( X t 2" 1X B Z Z t )t.

Z-X.. Pan et al./Journal of Statistical Plannin9 and Inference 62 (1997) 263 278

270

Furthermore, LII :

-~--~(Xt~-IXBZZt)

t = -(/Z

t) @ ( X t ~ , - 1 X )

and after inserting Z we obtain - ( Z Z t) ® ( X t S - J X ) , since XtF-,-1 = X t S -1. For L21 we consider d2 1

dB dsvec(Z-

[ - ½tr{Z-l( Y - X B Z ) ( Y - X B Z ) t }].

(3.4)

)

With the help of Lemma 2.2 (iii) and Corollary 2.2 we find that (3.4) equals Dtp{( Y - X B Z ) Z t @ X }

and inserting B establishes the expression of the theorem. Finally we note that Corollary 2.l (iii) implies L22. [] 3.2. B is o f interest

When we are only interested in diagnosing influence on the regression coefficient B, the estimate, say ~(n), of 2~, given by 2(B) = -l(V - X B Z l ( V - X S Z ) ' , n

implies that sup L ( B , E ) = L(B,~,(B)) = c - ~logdet{(Y - X B Z ) ( Y - X B Z ) t} E

where c is a constant (see e.g., Srivastava and Khatri (1979) or von Rosen (1990)). Theorem 3.2. I f only the regression coefficient B is o f interest in the GCM, the estimated information matrix is 9iven by £n = n Z ( V - X B Z ) t V - I ( Y - X B Z ) Z t ® ( X t S - I X ) -- n ( Z Z t) ® ( X t S - 1 X ) . V = ( Y -- X B Z ) ( Y - X B Z ) t and V stands for V when B has been inserted.

Proof. Note that dV

dB d2L(B' ~(B))dB 2

-

-

-

(I +Kp, p){(Y-XBZ)ZtQX},

)~

- ~~ v de c n t (v -1 dV

:n{Z(Y-XBZ)t®Xt}

--

n ~} v e c t { X_t V _ l ( Y__ X B Z ) Z t

dV-1 dB

nZZt®XtV-IX

J.-X. Pan et al./Journal of Statistical Planning and In/erence 62 (1997) 263 -278 = nKk, q { X t V - I ( Y

271

- XBZ)Z t @ Z(Y - XBZ)tV-1X}

+ n Z ( Y - X B Z ) t v - l ( Y - X B Z ) Z t @ ( X t V-1X)

-nZZ t @x tV-IX. Inserting B and since X t ~ - l ( y _ verified. [Z

t = 0 and X t ~ - I = X t S - t the theorem is

X~Z)Z

3.3. IF, is o f interest

Suppose that the covariance matrix I2 in the GCM is of interest. From Srivastava and Khatri (1979) or Potthoff and Roy (1964) it follows that sup L(B, Z ) = L(B(z), 1[) = c - I n logdet(I;) - ½tr{12-~ ( Y - X B Z ) ( Y - X B Z ) t ]. B

where c is a constant and k = (Xt~.,-Ix)xt~, -I YZt(ZZt)

-1.

We: will determine the estimated information matrix with respect to I; (instead of Z-~). Theorem 3.3. Let R = X t y z t ( Z Z t ) - I Z Y t X o .

I f only X is o f interest in the G C M ,

the estimated information m a t r i x is 9iven by

+

+

Proof. Note that tr{X-~(V - XBZ)(Y

- XBZ) t } = tr(I;-ZS) + tr{(X~XX,,)-'R}.

Then, with the help of Lemma 2.2 and some calculations d2 dsvec(.r) 2

t r ( X - 1 S ) = 2 D t p ( x - 1 S X - t ~ X -1 )Dz,

(3.5)

arid d2

dsvec(I2)itr{( X t XXo ) - I R } t t Xo) - 1 R ( X ot L X o ) = 2O),{(X;r,

-1

X ot ® X t Z X o ) - I X t ~ oDJ p"

(3.6)

Finally we note that dZl°gdet(Z) - - D ~ ( Z - 1 6) Z - 1)Dp dsvec(L') 2

(3 7 )

and inserting estimators in ( 3 . 5 ) - ( 3 . 7 ) together with a summation of these expressions proves the theorem.

272

,L-~ Pan et al./Journal of Statistical Plannin9 and Inference 62 (1997) 263-278

Remark. In Theorem 3.3 Xtof-,Xo = ~XtoSXo + ,,!R which gives some alternative ways to express £x.

4. Covariance perturbation In order to form the Hessian we need dB(W)/dW where B ( W ) is a function of the perturbation matrix W. In this paper we will assume that instead of the model Y~Np,,,(XBZ, Y,,I~) we have a perturbed model Y~Np, n(XBZ,2~, W-~). This means that the columns of the model, i.e. the columns of Y - XBZ, are not identically and independently distributed. Under the perturbed model the maximum likelihood estimator of B equals k ( W ) = {XtS( W ) - ~ X } - ~ X t S ( W)-1 V W Z t ( Z W Z t )-

: ( x t x ) - ~ x t ( t - P)9., where S( m ) : g ( z t )o { ( z t )to W - 1( / t ) o } - 1( Z t )to y t p

=

s(W)Xo{XtS(W)Xo}-,x t

Q = y W Z t ( Z W Z t ) -1. Furthermore, n,~(w)

= {v

-

xk(w)z){v

- xk(w)z}

t.

In the rest of this section d W will be interpreted as dsvec(W).

Theorem 4.1. Let E = Y ( I -

WZt(ZWZt)-1Z). For the covariance perturbation

dB (W) = [(ZWZt)_l z { / _ VtXo(XtoS( W)Xo)- IXtE} dW { x t s ( W ) - I x } - 1 x t s ( w ) - l E]Dn. Proof. By noting the following facts the theorem is immediately established:

dS(W) dW

-

-

(E Q E)Dn

dQ _ {(ZWZt)_~Z ® E}D. dW dP _ [(Xo{XtoS(W)Xo}_lXto E ~ X { X t S ( W ) _ I x } _ I x t s ( w ) _ 1 E ] D n .

dW

[]

For the dispersion matrix we note the following two theorems which are straightforward to show. Explicit expressions can be obtained with the help of Theorem 4.1.

J.-X. Pun et al. / Journal of Statistical Planning and Inf,'fence 62 (1997) 263 278

3"73 ~,

Theorem 4.2. For the covariance perturbation d~(W) _ 1(i + dW

Kp,p){(Y

-

XBZ)Z

t

~: X} d~(W~)

n

dW

Theorem 4.3. For the covariance perturbation d2~( W ) - l dW

(Z" @

d~(W) Z) d--W- "

5. Two illustrative examples In this section we are going to illustrate some of the above-mentioned ideas. Maybe the most natural perturbation scheme is when W in Section 4 is diagonal. Let the vector ~u consist of the diagonal elements of the perturbation matrix W given in Section 4, i.e. ~ = (Wll, W2R. . . . . W~,,). Moreover, the null perturbation W0 is given by Wo = I,,, We will concentrate on the Hessian matrix for B but the Hessian matrices tbr (B, Z) and ,Y could also have been specified in more detail than in the next theorem. Theorem 5,1. Let W be the perturbation matrix in Section 4.

O) dW do

4" + , 1

- - L_~ D n (ui

Ui)ul

where ui is the ith column o f In. (ii) The Hessian matrix Fa, z equals

am N)t(d(B, 22-1) t

(d(B, z~-I )) W=I,B=B,Z=L

where d ( B , ~ , - 1 ) / d W is obtained fi'om Theorems 4.1 and 4.2, and £n,z ~ was ,qiven in Theorem 3.1. (iii) The Hessian matrix FB equaL~ {!-

EtXo(XtgXo)

,EtS

IX~,g}Zt(Zlt)-ll{l-ytxo(xtgxo)

IX~,E}

IX(XtS-IX)-IxtS-IE,

w.6ere E is &fined in Theorem 4.1, and * denotes the Hadamard product g f tnatrices. (iv) /vz equals

(dW3 (d 3 d.,] dWj

(dW5 \ de~ J [W=l,Z:~

where d~,/dW is obtained from Theorem 4.2 and ]~ , was given in Theorem 3.3.

J.-~ Pan et al./ Journal of Statistical Plannin9 and Inference 62 (1997) 263~78

274

ProoL We just indicate how to prove (i) and (iii). (i) follows since svec(W) = D+n ~ i S l S j L I mij(ilj @ / / i ) , o) = ~ = l Wkklgk and dWiij/dWkk = 1 if i = j = k and 0 otherwise. The expression in (iii) is obtained by combining (i) and Theorems 3.2 and 4.1 and 4.3. In particular one should note that DnD+(ui ® ui) = ui ® ui. [] 5.1. Dental data

This data set was first considered by Potthoff and Roy (1964) and later analysed by Lee and Geisser (1975), Rao (1987) and Lee (1988, 1991). Estimation of parameters, testing hypothesis and prediction were considered. Dental measurements were made on 11 girls and 16 boys of age 8, 10, 12 and 14 years. Each measurement is the distance, in millimeters, from the center of the pituitary to the pterygomaxillary fissure. Hence, the design matrices X and Z are given by

X=

10

12

14

and

Z=

0

116

'

where lrn is a m × 1 vector with all m components equal to 1. Based on the case-deletion approach, Pan and Fang (1994) showed that in this data set, the 24th observation which is from the boy group is a discordant outlier. Also the 24th, 20th and 15th observations were regarded as influential observations (in decreasing order) (Pan, 1994). Similar results were obtained by von Rosen (1993). In this paper, we use a local approach to analyze the data set. A diagonal variance perturbation scheme is assumed. When both the parameters B and X are of interest, or only one of them, the Hessian matrices /~(B,~),FB and /~z are calculated by using Theorem 5.1. For each case, Table 1 provides the largest absolute eigenvalue 121max of the Hessian matrix as well as the unit eigenvector dmax corresponding to 12[max. An index plot of [dmaxl is given in Fig. 1. From the first column of Table 1 (see also the dashdot line in Fig. 1), which corresponds to the case when only the regression coefficient B is of interest, it follows that individual 24 is the most influential observation d(24) = 0.7265 is the largest absolute component of dmax. The vector dmax because ~max also shows that the 15th individual is the second largest influential observation for B. On the other hand, the situations for /~ and F(s,~) are completely different from that of/~s. When only the covariance matrix X or the parameter pair (B,X) is of interest, the second or third column of Table l implies that the 20th observation has the strongest influence on X or on (B,X) (also see the dotted or solid lines in Fig. 1). Comparing with the case-deletion approach (Pan and Fang, 1994; Pan, 1994), the local influence approach not only shows that the 20th, 24th and 15th are influential observations, but also emphasizes that the 20th observation has the largest influence on the statistical inference based on the MLE of ( B , Z ) or only X. For the MLE of the regression coefficient B, however, the largest influence is ensued by the 24th observation.

J.-X. Pan et al./Journal of Statistical Planning and In[erence 62 (1997) 263 278

275

Table 1

The largest eigenvalue and its eigenvector of t/n, t/;z and F(B,z) tbr dental data Eigenvalues:

I/~lmax

l/B

l/z

~#(., z')

2.2992

7.8524

7.9400

Eigenvectors F(B, E) 1 2 3 4 5 6 7 8 9 10 11 12 13 14

0.0180 0.0358 -0.0224 --0.0046 0.0067 --0.0004 0.0058 -0.0168 0.0091 -0.0109 0.0002 -0.0436 -0.0491 -0.0405

0.0012 0.0081 0.0054 0.0160 0.0121 0.0049 0.0104 0.0032 0.0180 0.0178 0.0146 0.0118 0.0255 0.0171

0.0009 0.0067 0.0078 0.0152 0.0115 0.0047 0.0101 0.00411 0.0174 0.0162 0.0129 0.0033 0.0322 0.0216

15 16 17 18 19 20 21 22 23 24 25 26 27

1

0.0407 0.0570 0.0037 0.0037 0.0175 0.9944 0.0077 0.0078 0.0395 0.0271 0.0048 0.0312 0.0044

/),0093 0.0632 0.0065 0.0112 0.0236 0.9919 0.0180 0.0070 0.0453 0.0687 --0.0079 0.0322 0,0120

i

08

8

0,5246 -0.0409 0.0990 0.0570 0.0489 --0.1027 0.1443 0.1553 -0.1148 -0.7265 0.2917 -0.1309 /).0215

fl

0.6

~-" 0.4 0.2 0

10

20

30

INDEX Fig. 1. Solid line: (B,Z), dashdotted line: B, dottedline: 2;.

5.2. Mouse data

The data set was analyzed by Rao (1984, 1987) and later by Lee (1988a, 1991). It consists of weights of 13 male mice measured at intervals of 3 days from birth to weaning. For this data set, following Rao 11984), a second-degree polynomial in time for the growth function was assumed and hence the design matrix X takes the following form

X =

1 1

2 4

3 9

4 16

5 25

6 36

49/

276

J.-X. Pan et al./ Journal of Statistical Plannin9 and Inference 62 (1997) 263-278

Table 2 The largest eigenvalue and its eigenvector of/~s, Fz and ~B,z) for mouse data Eigenvalues:

FB

Fz

]21max

0.2450

-0.1819 -0.1296 --0.0407 --0.1444 -0.3272 -0.1183 -0.0604

-0.0480 -0.1008 0.0385 -0.0368 -0.1186 --0.0803 0.0321

F(B,~)

4.3124 Eigenvectors

4.7545

F(B,~) 1 2 3 4 5 6 7

-0.0253 0.1422 0.3507 0.0088 0.0007 --0.1719 0.2760

8 9 10 11 12 13

0.2282 0.0379 0.1446 -0.1093 --0.0708 --0.8117

-0.1001 -0.1352 --0,0851 --0.5132 -0.2593 -0.6608

--0.0487 --0.0134 --0.0099 --0.5247 --0.0951 --0.8223

0.8 i

~ t- 0 6 O

121 0.2

.". /"

...1

-1 1""

oi 0

5

10

15

INDEX

Fig. 2. Solid line: (B,~), dashdotted line: B, and dottedline: _r.

and Z t = 113, respectively. The case-deletion approach shows that the 13th individual is an influential observation (Pan and Fang, 1994). Now, the local influence approach is used to analyze this data set. For the parameters B, X and (B, X) o f interest, respectively, the largest absolute eigenvalue ]}01max and its unit eigenvector dmax o f the Hessian matrix are calculated and presented in Table 2. The index plot o f Idmaxl is displayed in Fig. 2. I f only the regression coefficient B is o f interest, the first column o f Table 2 shows that the 13th individual is the most influential observation since Id~3x)I = 0.8117 is the largest absolute component o f dmax. The dashdot line in Fig. 2 also shows this point well. I f we are only interested in the covariance ~, the influences o f the 13th and 1 lth observations on • are significantly larger than that o f others (see the second column o f Table 2 and the dotted line in Fig. 2). When both the parameters B and 22 are o f interest, the situation is the same as that o f Z'. In other words, the 13th and 1 lth individuals are the two most influential observations for the statistical inference based on Z).

J.-X. Pan et al./Journal of Statistical Planninq and ln[erence 62 (1997) 263 278

277

6. Discussion We have studied the curvature of the likelihood displacement m three different situations; L ( B , Z ) , sup~L(B, 12) and s u p e L ( B , Z ). As noted from the analyses of the data sets the results depend on which likelihood we study. Indeed the analysis provides different information. If we, for example, are interested in the maximum likelihood estimator of B we note that supzL(B, 12) is closer to L ( B , 12 ) than L ( B , 12). Hence the results based on supz. L ( B , Z ), vaguely speaking, are more local than those based or, L ( B , 12). Therefore, if the dispersion of B is small the main results should be based on supz-L(B,Z). However, we stress that any analysis of influential observations can just guide us to identify some set of observations which then should be further in-spected. From that point of view we recommend that in practice many analyses should be performed. An advantage with the approach of the paper is that, in comparison with the casedeletion procedures or the perturbation procedure applied by von Rosen (1995), calculations are less extensive. Here we just calculate the Hessian and these calculations are made ones whereas for the case deletion approach calculations have to be repeated for each observation. Finally, we note that a further step in our approach would be to consider the likelihood displacements for single elements in B. This would have made sense tbr the dental data where two different treatment groups exist, For example, it is reasonable to believe that individual 15 who is a boy does not influence the parameters which correspond to the girls in the same manner as those which correspond to the boys.

Acknowledgements The careful work of one referee is greatly appreciated.

References Barnett, V. and 3". Lewis (1984). Outliers #1 Statistical Data. Wiley, New York. Beckman, R.J. and R.D. Cook (1983). Outlier...s. Teehnornetrics 25, 119 149. Beckman, R.J., C.J. Nachtsheim and R.D. Cook (1987). Diagnostics for mixed-model analysis of variance. Technometrics 29, 413 426. Belsley, D.A., E. Kuh and R.E. Welsch (1980). Regression Diaqnosties: Identifying Influential 1)ata and Sources of Collinearity. Wiley, New York. Chatterjee, S. and A.S. Hadi (1988). Sensitivity Analysis in Linear Reyression. Wiley, New York. Cook, R.D. and S. Weisberg (1982). Residual and Influence in Reyression. Chapman & Hall, New York. Cook, R.D. (1986). Assessmentof local influence(with discussion).J. R. Statist. Soc., Set. B 48, 133 169. Fang, K.T. and J.L. Xu (1990). The direct operations of symmetric and lower-triangularmatrices with their applications. In: K.T. Fang and T.W. Anderson, Eds., Statistical h![erence in Ellipticallv Contoured and Related Distributions. Allerton Press, New York, 441 455. Fang, K.T. and Y.T. Zhang (1990). Generalized Multivariate Analysis. Springer, Berlin, and Science Press. Beijing. Geisser, S (1970). Bayesian analysis of growth curves. Sankhya Ser. A 32, 53 64.

278

J~-X. Pan et al./ Journal of Statistical Planning and Inference 62 (1997) 263-278

Gleser, L.J. and I. Olkin (1970). Linear models in multivariate analysis. In: R.C. Bose, Ed., Essays in Probability and Statistics, University of North Carolina Press, Chapel Hill, NC, 267-292. Grizzle, J.E. and D.M. Allen (1969). Analysis of growth and response curves. Biometrics 25, 357-381. Keramidas, E.M. and J.C. Lee (1990). Forecasting technological substitutions with short time series. JASA 85, 625-632. Khatri, C.G. (1966). A note on a MANOVA model applied to problems in growth curve. Ann, Inst. Stat. Math. 18, 75-86. Kish, C.W. and V.M. Chinchilli (1990). Diagnostics for identifying influential cases in GMANOVA. Comm. Statist. Theory Methods 19, 2683-2704. Kollo, T. (1991). Matrix Derivative in Multivariate Statistics. Tartu University Press, Tartu (in Russian). Lee, J.C. (1988). Prediction and estimation of growth curve with special covariance structure. JASA 83, 432-440. Lee, J.C. (1991). Tests and model selection for the general growth curve model. Biometrics 47, 147 159. Lee, J.C. and S. Geisser (1975). Applications of growth curve prediction. Sankhya Set. A 37, 239-256. Liski, E.P. (1991 ). Detecting influential measurements in a growth curve model. Biometrics 47, 659-668. Magnus, J.R. and H. Neudecker (1988). Matrix Differential Calculus with Applications in Statistics and Econometrics. Wiley, Chichester. Okamato, M. (1973). Distinctness of the eigenvalues of a guadratic form in a multivariate sample. Ann. Statist. 1, 763-765. Pan, J.X. and K.T. Fang (1994). Multiple outlier detection in growth curve model with unstructured covariance matrix. Ann. Inst. Statist. Math. (to appear). Pan, J.X. (1994). Detecting influential observation in growth curve model with arbitrary covariance structure. Technical Report, No. 51, Department of Mathematics, Hong Kong Baptist College. Potthoff, R.F. and S.N. Roy (1964). A generalized multivariate analysis of a variance model useful especially for growth curve problems. Biometrika 51, 313-326. Pregibon, D. (1981). Logistic regression diagnostics. Ann. Statist. 9, 705-724. Rao, C.R. (1965). The theory of least squares when the parameters are stochastic and its application to the analysis of growth curves. Biometrika 52, 44~458. Rao, C.R. (1966). Covariance adjustment and related problems in multivariate analysis. In: P. Krishnaiah, Ed., Multivariate Analysis. Academic Press, New York, 87-103. Rao, C.R. (1967). Least square theory using an estimated dispersion matrix and its applications to measurement of signals. Proc. 5th Berkeley Syrup., Vol. 1, 355-372. Rao, C.R. (1984). Prediction of future observations in polynomial growth curve models. Proc. Indian Statist. Institute Golden Jubilee Internat. Conf. on Statistics: Applications and New Directions. Indian Statistical Institute, Calcutta, 512-520. Rao, C.R. (1987). Prediction of future observations in growth curve models. Statist. Sci. 2, 434-471. von Rosen, D. (1989). Maximum likelihood estimates in multivariate linear normal model. J. Multivariate Anal 31, 187-200. von Rosen, D. (1990). Moments for a multivariate linear normal model with application to the growth curve model. Z Multivariate Anal. 35, 243-259. von Rosen, D. (1991). The growth curve model: a review. Comm. Statist. Methods 20, 2791-2822. von Rosen, D. (1995). lnfluencial observations in multivariate linear models. Scand. Z Statist. 22, 207-222. Rousseeuw, P.J. and B.C. van Zomeren (1990). Unmasking multivariate outliers and leverage points (with discussion). JASA 85, 633-651. Seber, G.A.F. (1984). Multivariate Observations. Wiley, New York. Srivastava, M.S. and C.G. Khatri (1979). An Introduction to Multivariate Statistics. North-Holland, New York. Woolson, R.F. and J.D. Leeper (1980). Growth curve analysis of complete and incomplete longitudinal data. Comm. Statist. Theory Methods 9, 1491-1513.