- Email: [email protected]

Robust shrinkage estimation for elliptically symmetric distributions with unknown covariance matrix Dominique Fourdrinier,a William E. Strawderman,b,1 and Martin T. Wellsc,*,2 a

Universite´ de Rouen, UMR CNRS 6085, 76821 Mont Saint Aignan Cedex, France b Department of Statistics, Rutgers University, New Brunswick, NJ 08903, USA c Department of Social Statistics, Cornell University, 358 Ives Hall, Ithaca, NY 14851-0952, USA Received 9 March 2000

Abstract Let X ; V1 ; y; Vn1 be n random vectors in Rp with joint density of the form ! n1 X 0 1 0 1 f ðX yÞ S ðX yÞ þ Vj S Vj ; j¼1

where both y and S are unknown. We consider the problem of the estimation of y with the invariant loss ðd yÞ0 S1 ðd yÞ and propose estimators which dominate the usual estimator d0 ðX Þ ¼ X simultaneously for the entire class of such distributions. The proof involves the development of expressions which are analogous to unbiased estimators of risk and which in fact reduce to unbiased estimators of risk in the normal case. The method is applicable to the case where S is structured. As an example, we examine the case where S is diagonal. r 2003 Elsevier Science (USA). All rights reserved. AMS 1991 subject classifications: 62F10; 62H12; 62C20 Keywords: Elliptically symmetric distributions; James–Stein estimation; Location parameter; Minimax; Quadratic loss; Risk function; Robustness; Unknown covariance

*Corresponding author. Fax: 607-255-8484. E-mail address: [email protected] (M.T. Wells). 1 Research supported by NSF Grant DMS-97-04524. 2 The support of NSF Grant DMS 9971586 is gratefully acknowledged. 0047-259X/03/$ - see front matter r 2003 Elsevier Science (USA). All rights reserved. doi:10.1016/S0047-259X(02)00023-4

D. Fourdrinier et al. / Journal of Multivariate Analysis 85 (2003) 24–39

25

1. Introduction Let X ; V1 ; y; Vn1 be n random vectors in Rp with joint density of the form ! n1 X 0 1 0 1 Vj S Vj ; f ðX yÞ S ðX yÞ þ ð1Þ j¼1

where the p 1 location vector y and the scale matrix S are unknown. This density arises as a joint density in the canonical form of the general linear model, where X is a projection on the space spanned by y and the Vi ’s are projections onto the orthogonal complement of the space spanned by y: For more on elliptical symmetry and the various choices of f ðÞ in (1) see [4,6]. The class in (1) contains models such as the multivariate normal, t; and Kotz-type distributions. We consider the problem of estimating y with the invariant loss Lðy; dÞ ¼ ðd yÞ0 S1 ðd yÞ:

ð2Þ

The usual estimator under Lð; Þ is d0 ðX Þ ¼ X : It is minimax and admissible when pp2: However, when pX3; d0 ðX Þ remains minimax but is no longer admissible. Explicit improvements are known in the normal case, f ðÞ is proportional to expð12Þ in (1), (see [1–3,10,11,13–15] for scale mixtures of normal distributions). We concentrate on the case pX3 and construct a class of estimators, depending on the sufﬁcient statistics ðX ; SÞ; of the form dðX ; SÞ ¼ X þ gðX ; SÞ; ð3Þ Pn1 where S ¼ i¼1 Vi Vi0 which dominate d0 ðX Þ simultaneously for the entire class of distributions deﬁned in (1). Note that sufﬁciency of ðX ; SÞ follows from the equality ! n1 X 0 1 0 1 f ðX yÞ S ðX yÞ þ Vj S Vj ¼ f ðtrðS1 ððX yÞðX yÞ0 þ SÞÞÞ: j¼1

Note also that, although the loss in (2) is invariant, the estimate in (3) may not be invariant (except for d0 ðX Þ). This class of estimators enriches the class of estimators studied previously in [1–3,10,11,13,15] for the normal distribution. An example of such an estimator is the James–Stein estimator a 2ðp 2Þ : dðX ; SÞ ¼ 1 0 1 X for 0pap XS X npþ2 Our method of proof relies on the unbiased estimator of risk difference, rðX ; SÞ; obtained in the normal case. We express the risk difference in the general case as Eyn ½rðX ; SÞ where Eyn denotes expectation with respect to a distribution related to f : For example, we show that, for gðX ; SÞ of the form hðX Þ=X 0 S 1 X ; the estimator dðX ; SÞ ¼ X þ gðX ; SÞ dominates X provided 2 rX gðX ; SÞ þ g0 ðX ; SÞS1 gðX ; SÞp0: npþ2

D. Fourdrinier et al. / Journal of Multivariate Analysis 85 (2003) 24–39

26

We develop the details in Section 2. It is worth noting that the method works when the scale matrix S has a deﬁned structure. We illustrate this fact by considering a diagonal S in Section 3. We make some concluding comments in Section 4. Finally, an appendix contains an integration by slice result and a technical lemma. The integration by slice and application of Stokes’ theorem give new perspective and geometric insight into the Stein’s classical integration by parts result. This approach is quite general and may be used to handle integration by parts type identities on more general manifolds than Euclidean spheres. For instance, Fourdrinier and Lemaire [8] consider distributions which have a symmetry with P respect to the c1 norm (that is, with density of the form f ðjjX yjj1 Þ where jjyjj1 ¼ nj¼1 jyj j is the c1 norm of y ¼ ðy1 ; y; yp ÞARp ). They obtain improved estimators by applying an integration by slice technique with ‘‘appropriate slices’’. Again what is remarkable about these results is that the domination over d0 ðX Þ holds not only for the normal case but simultaneously for the entire class of distributions. The robustness of the domination with respect to the distributional assumption is analogous to that found in [5], for the case of spherically symmetric P 0 distributions. In each case, a ‘‘residual term’’ (for us, S ¼ n1 j¼1 Vj Vj ) plays a crucial role in the domination result.

2. Minimax estimators with completely unknown R In this section we develop an expression for the risk difference Dy between dðX ; SÞ given by (3) and d0 ðX Þ ¼ X : We then apply the result to obtain useful classes of estimators which dominate d0 ðX Þ for the entire class of distributions (1). A standard calculation gives Dy ¼ Ey ½ðdðX ; SÞ yÞ0 S1 ðdðX ; SÞ yÞ Ey ½ðd0 ðX Þ yÞ0 S1 ðd0 ðX Þ yÞ

¼ Ey ½2gðX ; SÞ0 S1 ðX yÞ þ Ey ½gðX ; SÞ0 S1 gðX ; SÞ ;

ð4Þ

where Ey denotes the expectation with respect to (1). We ﬁrst give a lemma which expresses the two terms in the last expression of (4) as expectations Eyn with respect to the distribution ! n1 X 0 C 1 F ðX yÞ S1 ðX yÞ þ Vj0 S1 Vj ; j¼1

where F and C are deﬁned as Z 1 N f ðsÞ ds; F ðtÞ ¼ 2 t C¼

Z Rp ?Rp

0

1

F ðx yÞ S ðx yÞ þ

n1 X j¼1

! v0j S1 vj

dx dv1 ydvn1 :

D. Fourdrinier et al. / Journal of Multivariate Analysis 85 (2003) 24–39

27

Lemma 1. (i) Suppose gðx; Þ is weakly differentiable. Then Ey ½g0 ðX ; SÞS1 ðX yÞ ¼ CEyn ½rX gðX ; SÞ ;

ð5Þ

where rX gðX ; SÞ is the divergence of gðX ; SÞ with respect to X : (ii) For any p p matrix function TðX ; SÞ; we have Ey ½trðTðX ; SÞÞS1 ¼ 2CEyn ½Dn1=2 TðX ; SÞ þ Cðn p 2ÞEyn ½trðS1 TÞ ;

ð6Þ

where Dn1=2 TðX ; SÞ ¼

p X @Tii ðX ; SÞ 1 X @Tij ðX ; SÞ þ : @Sii 2 iaj @Sij i¼1

ð7Þ

The proof of Lemma 1 is given in the appendix. Note that, when X ; V1 ; y; Vn1 are independent normal vectors with covariance S; then f ¼ F and therefore Ey ½ ¼ Eyn ½ : Hence Lemma 1(i) essentially reduces to Stein’s Lemma 1981 (cf. [15]) and Lemma 1(ii) corresponds to a result of Haff [12]. We apply Lemma 1 to get an expression of the risk difference between dðX ; SÞ and d0 ðX Þ: Theorem 1. Assume that Ey ½X 0 X oN and Ey ½g0 ðX ; SÞgðX ; SÞ oN: Then the risk difference Dy between dðX ; SÞ and d0 ðX Þ equals CE n ½2rX gðX ; SÞ þ ðn p 2Þg0 ðX ; SÞS1 gðX ; SÞ þ 2Dn1=2 ðgðX ; SÞg0 ðX ; SÞÞ : ð8Þ Proof. Applying Lemma 1(i) to the ﬁrst term in (4) and Lemma 1(ii) to the second term in (4) with TðX ; SÞ ¼ gðX ; SÞgðX ; SÞ0 ; we have the desired result. & Using Theorem 1, an obvious sufﬁcient condition for dðX ; SÞ to dominate d0 ðX Þ and hence to be minimax is 2rX gðX ; SÞ þ ðn p 2Þg0 ðX ; SÞS 1 gðX ; SÞ þ 2Dn1=2 ðgðX ; SÞg0 ðX ; SÞÞp0: ð9Þ Note that, as Eyn ½ ¼ Ey ½ in the normal case, the left handside of (9) is an unbiased estimator of the risk difference between dðX ; SÞ and d0 ðX Þ: Perhaps, most importantly, observe that the theorem leads to an extremely strong robustness property for estimators satisfying (9). Namely, any such estimator dominates d0 ðX Þ for the entire class of distributions (1). This property is analogous to the result in [5] for the case S ¼ I: The following corollaries and examples give useful illustrations of estimators for which condition (9) is satisﬁed.

D. Fourdrinier et al. / Journal of Multivariate Analysis 85 (2003) 24–39

28

0 1

S XÞ Corollary 1. Let dðX ; SÞ ¼ X rðX X where rðÞ is a nondecreasing function X 0 S 1 X

bounded between 0 and

2ðp2Þ npþ2:

0

Assume that Ey ½X 0 X oN and Ey ½ðX 0XS1XX Þ2 oN: Then

dðX ; SÞ dominates d0 ðX Þ and is minimax. 0 1

S XÞ X ; according to Theorem 1, we have to ﬁrst Proof. Setting gðX ; SÞ ¼ rðX X 0 S 1 X calculate

rðX 0 S 1 X Þ X rX gðX ; SÞ ¼ rX X 0 S 1 X

rðX 0 S 1 X Þ rðX 0 S1 X Þ 0 ¼ p þ X r : X X 0 S 1 X X 0 S1 X By routine vector calculus it follows rX

rðX 0 S 1 X Þ r0 ðX 0 S1 X ÞX 0 S1 X rðX 0 S 1 X Þ 1 S X: ¼2 0 1 XS X ðX 0 S 1 X Þ2

Hence

rðX 0 S1 X Þ 0 0 1 þ 2r ðX S X Þ : rX gðX ; SÞ ¼ ðp 2Þ X 0 S 1 X

ð10Þ

Similarly, for the second term of (9) we make the identiﬁcation

g0 ðX ; SÞS 1 gðX ; SÞ ¼

r2 ðX 0 S 1 X Þ : X 0 S 1 X

Finally, for the third term of (9) we have Dn1=2 ðgðX ; SÞg0 ðX ; SÞÞ " # " # p X @ r2 ðX 0 S 1 X Þ 2 1 X @ r2 ðX 0 S1 X Þ ¼ Xi þ Xi Xj @Sii ðX 0 S 1 X Þ2 2 iaj @Sij ðX 0 S 1 X Þ2 i¼1 ¼

2ðX 0 S 1 X Þ2 rðX 0 S 1 X Þr0 ðX 0 S1 X Þ 2ðX 0 S 1 X Þr2 ðX 0 S 1 X Þ ðX 0 S1 X Þ4

p X @ 1X @ ðX 0 S 1 X ÞXi2 þ ðX 0 S 1 X ÞXi Xj : @S 2 @S ii ij iaj i¼1

ð11Þ

D. Fourdrinier et al. / Journal of Multivariate Analysis 85 (2003) 24–39

Using the fact that

@ @Sij

29

ðX 0 S1 X Þ ¼ ð2 dij ÞðX 0 S 1 Þi ðX 0 S 1 Þj it follows

Dn1=2 ðgðX ; SÞg0 ðX ; SÞÞ ¼ 2 "

ðX 0 S 1 X Þ2 rðX 0 S 1 X Þr0 ðX 0 S1 X Þ ðX 0 S 1 X Þr2 ðX 0 S 1 X Þ ðX 0 S 1 X Þ4

1X ðX 0 S 1 Þi Xi2 þ 2ðX 0 S 1 Þi ðX 0 S 1 Þj Xi Xj 2 iaj i¼1

r2 ðX 0 S 1 X Þ ¼ 2 rðX 0 S 1 X Þr0 ðX 0 S 1 X Þ : X 0 S1 X p X

#

ð12Þ

Hence, by (10)–(12), we have that the risk difference is given by

rðX 0 S 1 X Þ r2 ðX 0 S 1 X Þ 0 0 1 þ 2r ðX S X Þ þ ðn p 2Þ Dy ¼ CE n 2 ðp 2Þ 0 1 XS X X 0 S 1 X r2 ðX 0 S1 X Þ 0 1 0 0 1 4 rðX S X Þr ðX S X Þ X 0 S 1 X

rðX 0 S1 X Þ f2ðp 2Þ þ ðn p þ 2ÞrðX 0 S 1 X Þg ¼ CE n X 0 S1 X 4r0 ðX 0 S 1 X Þf1 þ rðX 0 S1 X Þg p 0:

&

Þ Corollary 2. Let dðX ; SÞ ¼ X þ XhðX 0 S 1 X where h is a vector valued function. Assume 0

h ðX ÞhðX Þ

oN: Then dðX ; SÞ dominates d0 ðX Þ and is that Ey ½X 0 X oN and Ey ½ðX 0 S 1 X Þ2

minimax provided 2 h0 ðX ÞS1 X h0 ðX ÞS1 hðX Þ rX hðX Þ 2 p0: þ npþ2 X 0 S 1 X X 0 S1 X

ð13Þ

Þ Remark. It is worth noting that, setting gðX ; SÞ ¼ XhðX 0 S 1 X ; condition (13) is equivalent to

2 rX gðX ; SÞ þ g0 ðX ; SÞS1 gðX ; SÞp0: npþ2

ð14Þ

D. Fourdrinier et al. / Journal of Multivariate Analysis 85 (2003) 24–39

30

Proof of Corollary 2. Using the form of gð:; :Þ in the remark, we show that (9) holds. The only troublesome term is Dn1=2 ðg0 ðX ; SÞgðX ; SÞÞ which is expressed as p X @ h2i ðX Þ 1 X @ hi ðX Þhj ðX Þ þ @Sii ðX 0 S 1 X Þ2 2 iaj @Sij ðX 0 S1 X Þ2 i¼1

¼2

p X

h2i ðX Þ

i¼1

¼2

ðX 0 Si1 Þ2 ðX 0 S 1 X Þ

ðh0 ðX ÞS 1 X Þ2 ðX 0 S 1 X Þ3

þ2 3

X

hi ðX Þhj ðX Þ

iaj

ðX 0 S 1 Þi ðX 0 S 1 Þj ðX 0 S 1 X Þ3

:

Now by the Cauchy–Schwarz inequality we have ðh0 ðX ÞS 1 X Þ2 ph0 ðX ÞS 1 hðX ÞX 0 S1 X : Hence Dn1=2 ðgðX ; SÞg0 ðX ; SÞÞp2

h0 ðX ÞS 1 hðX Þ ðX 0 S1 X Þ2

¼ 2g0 ðX ; SÞS 1 gðX ; SÞ

and the left-hand side of (9) is bounded above by 2rX gðX ; SÞ þ ðn p þ 2Þg0 ðX ; SÞS 1 gðX ; SÞ: Therefore (9) holds as soon as (14) is satisﬁed.

&

Example 1. The classical James–Stein estimator dðX ; SÞ ¼ ð1 X 0 Sa1 X ÞX is seen to dominate d0 ðX Þ ¼ X for the entire class of distributions of the form (1) provided 0pap2ðp2Þ npþ2 by either Corollary 1 or 2. While Corollary 1 allows the inclusion of a monotone function rðÞ in the James–Stein estimator, Corollary 2 gives minimax estimators which are not necessarily scalar multiples of X : Example 2. Let A be a p p symmetric positive deﬁnite matrix. Consider an 0 XÞ estimator of the form dðX Þ ¼ X XrðX 0 S 1 X AX : Then Corollary 2 corresponds to hðX Þ ¼ rðX 0 X ÞAX and condition (13) for minimaxity is 2 2rðX 0 X ÞX 0 AS 1 X 0 0 0 0 r ðX X Þ2X AX rðX X Þ trðAÞ þ npþ2 X 0 S 1 X 0 1 X AS AX p0: þ r2 ðX 0 X Þ X 0 S1 X Suppose that the function rðÞ is monotone non decreasing. Use the facts that X 0 AS 1 X plmax X 0 S 1 X and X 0 AS 1 AX pl2max X 0 S 1 X

D. Fourdrinier et al. / Journal of Multivariate Analysis 85 (2003) 24–39

31

where lmax is the maximum eigenvalue of A: Then we have that dðX ; SÞ is minimax provided " # trðAÞ 2lmax 1 0 0prðX X Þ : npþ2 l2max In particular if A ¼ I the condition is 2ðp 2Þ 0prðX 0 X Þp : npþ2 3. Minimax estimation in a case when R is structured The method of proof adapts to situations where the scale matrix S is structured and to some other situations related to model (1). As an illustration we give a result for the case where S is assumed to be diagonal and equal to diagðs21 ; y; s2p Þ: Then 2 P P iÞ 2 model (1) reduces to f ð pi¼1 ððXi y þ s12 n1 j¼1 Vij ÞÞ with sufﬁcient statistics s2i i P 2 ðX1 ; y; Xp ; S1 ; y; Sp Þ ¼ ðX ; SÞ where Si ¼ n1 j¼1 Vij and the loss in (2) reduces to Pp ðdi ðX ;SÞyi Þ2 : We consider estimators of the form i¼1 s2 i

di ðX ; SÞ ¼ Xi þ Si hi ðX ; SÞ ¼ Xi þ gi ðX ; SÞ;

i ¼ 1; y; p:

ð15Þ

We ﬁrst give an expression for the risk difference between dðX ; SÞ and d0 ðX Þ: Theorem 2. Assume that Ey ½X 0 X oN and Ey ½gðX ; SÞ0 gðX ; SÞ oN: Then the risk difference Dy between dðX ; SÞ and d0 ðX Þ equals " # p X @ @ n 2 2 ðn þ 1ÞSi hi ðX ; SÞ þ 2Si hi ðX ; SÞ þ 4Si hi ðX ; SÞ hi ðX ; SÞ ; CEy @Xi @Si i¼1 ð16Þ where C and Eyn are defined in Section 2. Proof. We have " # p p X X Si2 h2i ðX ; SÞ Xi yi Dy ¼ Ey þ2 Si hi ðX ; SÞ s2i s2i i¼1 i¼1 " # p p n1 X X X Vij X y i i ¼ Ey ðVij Si h2i ðX ; SÞÞ þ 2 Si hi ðX ; SÞ s2i s2i i¼1 j¼1 i¼1 " !# p n1 X X @ @ n 2 ðVij Si hi ðX ; SÞÞ þ 2 Si hi ðX ; SÞ ; ¼ CEy @Vij @Xi j¼1 i¼1 where we have used integration by parts as in Lemma 1(i).

D. Fourdrinier et al. / Journal of Multivariate Analysis 85 (2003) 24–39

32

Now n1 n1 n1 X X X @ ðVij Si h2i ðX ; SÞÞ ¼ Si h2i ðX ; SÞ þ 2 Vij2 h2i ðX ; SÞ @V ij j¼1 j¼1 j¼1

þ4

n1 X

Vij2 Si hi ðX ; SÞ

j¼1

@ hi ðX ; SÞ @Si

¼ ðn þ 1ÞSi h2i ðX ; SÞ þ 4Si2 hi ðX ; SÞ Hence Dy has the form given by the theorem.

@ hi ðX ; SÞ: @Si

&

As in Section 2, if an estimator of the form (15) is such that the term in brackets in (16) is nonpositive, this estimator dominates d0 ðX Þ simultaneously for all distributions of the form (1) with a diagonal S: An attractive class of estimators is given by P X2 rð pi¼1 Sii Þ di ðX ; SÞ ¼ Xi P X 2 Xi ; i ¼ 1; y; p: ð17Þ p i i¼1 Si

The next corollary gives sufﬁcient conditions under which an estimator of the form (17) dominates d0 ðX Þ: P X2 P Corollary 3. Assume that Ey ½X 0 X oN and Ey ½ð pi¼1 Sii Þ2 pi¼1 Xi2 oN: Then an estimator of the form (17) dominates d0 ðX Þ provided that rðÞ is nondecreasing and 0prp2ðp2Þ nþ1 : Proof. Let W be the p p diagonal matrix diagðS1 ; y; Sp Þ: Note that 0

Pp

0 W 1 X ÞXi rðX : X 0 W 1 X Si

1

X W X : When (17) is expressed in the form (15), hi ðX ; SÞ ¼ three terms in brackets in (16) as A; B and C; respectively. We have A¼

p X

ðn þ 1ÞSi h2i ðX ; SÞ ¼ ðn þ 1Þ

i¼1

r2 ðX 0 W 1 X Þ : X 0 W 1 X

Then we have B¼

p X i¼1

2Si

@ hi ðX ; SÞ @Xi

p X rðX 0 W 1 X Þ rðX 0 W 1 X Þ Xi2 r0 ðX 0 W 1 X Þ Xi2 ¼ 2 2 þ 2 X 0 W 1 X ðX 0 W 1 X Þ Si ðX 0 W 1 X Þ2 Si i¼1

p 2ðp 2Þ

rðX 0 W 1 X Þ X 0 W 1 X

!

Xi2 i¼1 Si

¼

Write the

D. Fourdrinier et al. / Journal of Multivariate Analysis 85 (2003) 24–39

33

since rðÞ is nondecreasing. Finally, we have p X

@ hi ðX ; SÞ @Si i¼1 p X rðX 0 W 1 X Þ Xi @ rðX 0 W 1 X Þ Xi Si2 ¼4 X 0 W 1 X Si @Si X 0 W 1 X Si i¼1 0 1 Xi2 0 1 rðX W X Þ p 0 1 X rðX W X Þ @ B Si C B C ¼4 Si @ A 0 1 0 1 X @S W X X W X i i¼1

C¼

4Si2 hi ðX ; SÞ

p0 since it is straightforward to show that 0 1 X2 0 1 @ @rðX W X Þ Sii A p0: @Si X 0 W 1 X

Consequently, A þ B þ C is bounded above by

ðn þ 1Þ

r2 ðX 0 W 1 X Þ rðX 0 W 1 X Þ 2ðp 2Þ X 0 W 1 X X 0 W 1 X

which is nonpositive as soon as 0prp2ðp2Þ nþ1 :

&

Remark. A model closely related to the above corresponds to unequal sample sizes for the residual Vij : Suppose for the ith component we observe Xi ; Vi1 ; y; Viðni 1Þ ði ¼ 1; y; pÞ and assume that the density has the form 2 P P i 1 Vij2 iÞ f ð pi¼1 ½ðXi y þ nj¼1

Þ: An analysis as above shows that dðX ; SÞ ¼ X s2 s2 i

rðX 0 W * 1 X Þ X X 0 W * 1 X

i

dominates d0 ðX Þ provided the function r is nondecreasing and bounded S

p 1 2 between 0 and 2ðp 2Þ: Here we set W n ¼ diagðn1Sþ1 ; n2Sþ1 ; y; np þ1 Þ where Si ¼ Pni 1 2 j¼1 Vij :

Note that, when ni ¼ n; i ¼ 1; y; p; this result is equivalent to Corollary 3. See [1] 0

QðdyÞ for a treatment of the normal model under a loss of the form ðdyÞ where Q is a trðQSÞ ﬁxed known p p positive deﬁnite matrix.

34

D. Fourdrinier et al. / Journal of Multivariate Analysis 85 (2003) 24–39

4. Concluding remarks We developed a method of integration by parts which is applicable to a large class of models. This technique provides, for a class of elliptically symmetric distributions and under the invariant loss, expressions for the risks of estimators of the location vector which are analogous to unbiased estimators of the risks. These expressions, in fact, are expectations of the unbiased estimators of risk for the normal model, but with respect to a distribution related to the underlying distribution. In the normal case this related distribution coincides with the original distribution, but in general they are different. It seems that the existence of an unbiased estimator of the risk difference in the normal case together with a ‘‘residual’’ for estimating the covariance structure are the essentials for our method to work. We employ this technique to produce classes of estimators which have a very strong robustness property. In particular, we give estimators which dominate the usual estimator d0 ðX Þ ¼ X simultaneously for the entire class of distributions. This robustness property is analogous to that found by Cellier and Fourdrinier [5] in the spherically symmetric case (i.e. the case S ¼ I). In fact, an analysis along the lines of that in Section 3 gives essentially the result of [5] (except here we require the existence of the density). This paper indicates that such robustness is to be expected over a natural class of models containing the normal model, when a residual vector is present for estimating the scale matrix. We indicate the generality of the method and the expected robustness property by developing the case where a speciﬁc structure of the scale matrix is assumed; we treat the case where this matrix is diagonal. Due to the special structure of S we can give a more speciﬁc form of the shrinkage function. In the case of a diagonal S; the results of Sections 2 and 3 are both applicable to produce superior estimators. Those of Section 4 are based on the sufﬁcient statistics for the problem while those of Section 2 depend on the cross-product terms of the ‘‘sample covariance matrix’’ as well and hence may be further improved by ‘‘Rao–Blackwellizing’’. It seems preferable in this case to use directly estimators of the form in Section 3. In contrast, note that even if S is known, it is not clear that it is better to use the known value of S in place an estimated value in constructing improved estimators in either Section 2 or Section 3. See Fourdrinier and Strawderman [9] for examples in the case of S ¼ I: In the case of a normal distribution, X is a complete sufﬁcient statistic and ‘‘Rao–Blackwellization’’ is possible if S is used in the estimator. In the nonnormal case however, typically X will not even be sufﬁcient. Hence Rao– Blackwellization (conditional on X ) will not result in a better estimator. Furthermore Rao–Blackwellization of the estimator in the normal case will result in a procedure that may not dominate d0 for all distributions f and hence robustness of domination will likely be lost. Acknowledgments The authors thank the referees for their valuable comments which improved and sharpened the paper.

D. Fourdrinier et al. / Journal of Multivariate Analysis 85 (2003) 24–39

35

Appendix A We ﬁrst state an integration by slice result which is often used in the particular case where the ‘‘slices’’ are Euclidean spheres of radius r: This result can be derived from the co-area theorem stated by Federer [7] (i.e. Theorem 3.2.12). Lemma A.1. For any real number r; let ½j ¼ r the submanifold in Rp associated with a given continuously differentiable function j defined on Rp : Then, for any Lebesgue integrable function f ; we have Z Z Z f ðxÞ dsr ðxÞ dr; f ðxÞ dx ¼ p R frARj½j¼r a|g ½j¼r jjrjðxÞjj where sr is the area measure defined on ½j ¼ r : Remark. When j : x-jjxjj; the submanifold ½j ¼ r is the usual sphere and Lemma A.1 coincides with the classical result. The following corollary is useful when the function f is deﬁned through the inner product of rj and a function g; that is, when f is of the form f ðxÞ ¼ rjðxÞ gðxÞ: Corollary A.1. Assume that g is a function defined on Rp such that the function rj g is integrable. Then Z Z Z rjðxÞ gðxÞ dx ¼ r gðxÞ dx dr; Rp

frARj½j¼r a|g

Br

where Br is the set with boundary ½j ¼ r corresponding, for any xA½j ¼ r ; to the outward normal vector rjðxÞ: Proof. The result immediately follows from an application of the Stokes theorem rjðxÞ since the outward unit normal vector at x is jjrjðxÞjj : & We now give the proof of Lemma 1 which was repeatedly used in Sections 2 and 3. Proof of Lemma 1. (i) By deﬁnition we have Ey ½gðX ; SÞ0 S1 ðX yÞ

¼

Z Rp ?Rp

Z

0

1

0

1

gðx; sÞ S ðx yÞf ðx yÞ S ðx yÞ þ Rp

Now applying Corollary A.1 with jðxÞ ¼ integral gives S1 ðx yÞ rjðxÞ ¼ qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ðx yÞ0 S1 ðx yÞ

n1 X

! v0j S1 vj

dx dv1 ydvn1 :

j¼1

qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ðx yÞ0 S1 ðx yÞ to the inner most

D. Fourdrinier et al. / Journal of Multivariate Analysis 85 (2003) 24–39

36

and Z

0

0

1

n1 X

1

gðx; sÞ S ðx yÞf ðx yÞ S ðx yÞ þ

Rp

¼

Z

N 2

f R þ

0

¼

Z

f R2 þ

n1 X

0

¼ ¼

2

Rf R þ

n1 X

!Z ½j¼R

N

Rf R2 þ

½j¼R

v0j S1 vj

Z Rp

n1 X

v0j S1 vj

qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ rjðxÞ dsR ðxÞ dR ðx yÞ0 S1 ðx yÞ jjrjðxÞjj

!Z gðx; sÞ !Z

j¼1

Z

gðx; sÞ0 S1 ðx yÞ dsR ðxÞ dR jjrjðxÞjj

gðx; sÞ0

j¼1

0

¼

v0j S1 vj

dx

j¼1

!Z

j¼1

N

0

Z

v0j S1 vj

j¼1 N

Z

n1 X

! v0j S1 vj

½j¼R

rjðxÞ dsR ðxÞ dR jjrjðxÞjj

rx gðx; sÞ dx dR ½jpR

N 2

rx gðx; sÞ pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ Rf R þ 0 1

n1 X

ðxyÞ S ðxyÞ

! v0j S1 vj

dR dx

j¼1

! Z n1 X 1 N 0 1 rx gðx; sÞ f rþ vj S vj dr dx ¼ 2 ðxyÞ0 S1 ðxyÞ Rp j¼1 ! Z n1 X 0 1 0 1 rx gðx; sÞF ðx yÞ S ðx yÞ þ vj S vj dx: ¼ Z

Rp

ðA:1Þ

j¼1

Finally integrating (A.1) with respect to the vj gives an expression for the expectation Ey ½gðX ; SÞ0 S1 ðX yÞ and yields the desired result. & (ii) First note that trðTðX ; SÞS1 Þ ¼ trðTðX ; SÞS1 SS 1 Þ 1

¼ tr TðX ; SÞS

n1 X

! Vi Vi0 S 1

i¼1

¼

n1 X

trðVi0 S 1 TðX ; SÞS1 Vi Þ

i¼1

¼

n1 X

Vi0 S 1 TðX ; SÞS1 Vi :

i¼1

Then, by the argument in Lemma 1(i), Ey ½trðTðX ; SÞS1 Þ ¼ C

n1 X i¼1

Eyn ½rVi TðX ; SÞS 1 Vi

D. Fourdrinier et al. / Journal of Multivariate Analysis 85 (2003) 24–39

¼C

n1 X

" n

Ey

i¼1

p p p X @ X X Tjc ðX ; SÞScm Vim @V ij j¼1 m¼1 c¼1

37

!#

¼ CEyn ðA1 þ A2 þ A3 Þ; where A1 ¼

A2 ¼

p p p n1 X X X X @ Vim Tjc ðX ; SÞS cm ; @Vij i¼1 j¼1 m¼1 c¼1 p p p n1 X X X X i¼1

A3 ¼

Vim

j¼1 m¼1 c¼1

p p p n1 X X X X i¼1

j¼1 m¼1 c¼1

@ Tjc ðX ; SÞ S cm ; @Vij

@ cm Vim Tjc ðX ; SÞ S : @Vij

First, it is easy to see that A1 ¼

p p p n1 X X X X i¼1

djm Tjc ðX ; SÞS cm

j¼1 m¼1 c¼1

¼ ðn 1Þ

p p X X j¼1

Tjc ðX ; SÞS cj

c¼1

¼ ðn 1Þ trðTðX ; SÞS1 Þ: Now since S is symmetric we have " # p p p n1 X X X X @Tjc ðX ; SÞ @Sqr X cm Vim S : A2 ¼ @Sqr @Vij qpr i¼1 j¼1 m¼1 c¼1 By deﬁnition of S; the last derivative is @Sqr @ ¼ ðViq Vir Þ ¼ Viq djr þ Vir djq : @Vij @Vij Multiplying the last expression by Vim and summing on i we obtain " # p p p X X X @Tjc ðX ; SÞ X cm S ðSmq djr þ Smr djq Þ : A2 ¼ @Sqr qpr j¼1 m¼1 c¼1

ðA:2Þ

D. Fourdrinier et al. / Journal of Multivariate Analysis 85 (2003) 24–39

38

Pp

Summing on m and using the fact that

m¼1

Sam Smb ¼ dab it follows that

p p X X @Tjc ðX ; SÞ X ðdcq djr þ dcr djq Þ @Sqr j¼1 c¼1 qpr p p X X X @Tjc ðX ; SÞ @Tjc ðX ; SÞ ¼ dcq djr þ dcr djq @Scj @Sjc j¼1 c¼1 qpr

A2 ¼

¼

p p X X @Tjc ðX ; SÞ X ðdcq djr þ dcr djq Þ: @Scj j¼1 c¼1 qpr

Note that X

(

ðdcq djr þ dcr djq Þ ¼

qpr

Hence

2

if j ¼ c;

1

if jac:

"

p X @Tjj ðX ; SÞ 1 X @Tjc ðX ; SÞ A2 ¼ 2 þ 2 jac @Sjj @Sjc j¼1

#

¼ 2Dn1=2 T: We now treat the term A3 : Using the same argument which led to (A.2), we can write A3 as p p p X X X X @S cm A3 ¼ Tjc ðX ; SÞ ðSmq djr þ Smr djq Þ: @Sqr qpr j¼1 m¼1 c¼1 Using the fact that ( Scq S rm S mq S rc @S cm ¼ @Sqr Scq S qm

if qar; if q ¼ r;

we have that A3 is expressed as ( p p p X X X X Tjc ðX ; SÞ 2Smq djq Scq S qm j¼1 m¼1 c¼1

þ

X

)

cq rm

ðSmq djr þ Smr djq ÞðS S

qor

¼

q¼r

p p p X X X

mq rc

þS S Þ

( Tjc ðX ; SÞ 2Smj S cj S jm

j¼1 m¼1 c¼1

þ

X qor

) cq rm

ðSmq djr þ Smr djq ÞðS S

mq rc

þS S Þ :

D. Fourdrinier et al. / Journal of Multivariate Analysis 85 (2003) 24–39

39

P Summing on m; using the fact that pm¼1 Sma Smb ¼ dab and the symmetry of S and S 1 ; we obtain ( ) p p X X X cj rc cq A3 ¼ Tjc ðX ; SÞ 2S þ ðdjr S þ djq S Þ j¼1

¼

c¼1

p p X X j¼1

c¼1

( Tjc ðX ; SÞS cj 2 þ

qor

X

) ðdjr þ djq Þ :

qor

P Using the fact that, for any ﬁxed j; qor ðdjr þ djq Þ ¼ p 1 we have A3 ¼ ðp þ 1Þ trðTS 1 Þ: Finally, summing A1 þ A2 þ A3 ; the lemma follows. &

References [1] J.O. Berger, M.E. Bock, Combining independent normal mean estimation problems with unknown variances, Ann. Statist. 4 (3) (1976) 642–648. [2] J.O. Berger, M.E. Bock, L.D, Brown, G. Casella, L. Gleser, Minimax estimation of a normal mean vector for arbitrary quadratic loss and unknown covariance matrix, Ann. Statist. 5 (1977) 763–771. [3] J.O. Berger, L.R. Haff, A class of minimax estimators of a normal mean vector for arbitrary quadratic loss and unknown covariance matrix, Statist. Decisions 1 (1983) 105–130. [4] M. Bilodeau, D. Brenner, Theory of Multivariate Statistics, Springer, Berlin, 1999. [5] D. Cellier, D. Fourdrinier, Shrinkage estimators under spherical symmetry for the general linear model, J. Multivariate Anal. 52 (2) (1995) 338–351. [6] K.T. Fang, S. Kotz, K.W. Ng, Symmetric Multivariate and Related Distributions, Chapman & Hall, London, 1990. [7] H. Federer, Geometric Measure Theory, Springer, Berlin, 1969. [8] D. Fourdrinier, A.-S. Lemaire, Estimation of the mean of a c1 -exponential multivariate distribution, Statist. Decisions 18 (2000) 259–273. [9] D. Fourdrinier, W.E. Strawderman, A paradox concerning shrinkage estimators: should a known scale parameter be replaced by an estimated value in the shrinkage factor?, J. Multivariate Anal. 59 (2) (1996) 109–140. [10] L.J. Gleser, Minimax estimation of a normal mean vector when the covariance matrix is unknown, Ann. Statist. 7 (1979) 838–846. [11] L.J. Gleser, Minimax estimators of a normal mean vector for arbitrary quadratic loss and unknown covariance matrix, Ann. Statist. 14 (1986) 1625–1633. [12] L.R. Haff, An identity for the Wishart distribution with applications, J. Multivariate Anal. 9 (1979) 531–544. [13] W. James, C. Stein, Estimation with quadratic loss, in: Proceedings of the Fourth Berkeley Symposium on Mathematics Statistics and Probability, Vol. 1, University of California Press, Berkeley, 1961, pp. 361–380. [14] M.S. Srivastava, Stein estimation under elliptical distributions, J. Multivariate Anal. 28 (1989) 247–259. [15] C. Stein, Estimation of the mean of a multivariate normal distribution, Ann. Statist. 9 (6) (1981) 1135–1151.