- Email: [email protected]

Improved Multivariate Prediction in a General Linear Model with an Unknown Error Covariance Matrix Anoop Chaturvedi University of Allahabad, Allahabad, India

Alan T. K. Wan City University of Hong Kong, Kowloon, Hong Kong

and Shri P. Singh University of Allahabad, Allahabad, India Received August 23, 1999; published online February 5, 2002

This paper deals with the problem of Stein-rule prediction in a general linear model. Our study extends the work of Gotway and Cressie (1993) by assuming that the covariance matrix of the model’s disturbances is unknown. Also, predictions are based on a composite target function that incorporates allowance for the simultaneous predictions of the actual and average values of the target variable. We employ large sample asymptotic theory and derive and compare expressions for the bias vectors, mean squared error matrices, and risks based on a quadratic loss structure of the Stein-rule and the feasible best linear unbiased predictors. The results are applied to a model with first order autoregressive disturbances. Moreover, a Monte-Carlo experiment is conducted to explore the performance of the predictors in finite samples. © 2002 Elsevier Science (USA) AMS 1991 subject classification: 62J05. Key words and phrases: large sample asymptotic; prediction; quadratic loss; risk; Stein-rule.

1. INTRODUCTION Gotway and Cressie (1993) considered a class of linear and nonlinear predictors in the context of a general linear model. Their work was motivated by earlier works of Copas (1983) and Copas and Jones (1987). The former is related to the prediction of a single random variable in regression using a Stein-rule predictor, whereas the latter work considers Stein-rule prediction in an autoregressive model. Gotway and Cressie 166 0047-259X/02 $35.00 © 2002 Elsevier Science (USA) All rights reserved.

⁄

PREDICTION IN A GENERAL LINEAR MODEL

167

(1993) discussed a class of nonlinear predictors that is found to have uniformly smaller risk than the best linear unbiased predictor (BLUP) and constructed a range of predictors including the Stein-rule predictor as special cases of this class of nonlinear predictors. A problem with Gotway and Cressie’s (1993) work is that they assumed, except for a scalar multiple, that the parameters in the covariance structure of the linear model’s disturbances are known. On a practical level it is not clear as to how often the assumption just described can be satisfied. Indeed, despite the amount of research that has been carried out in Stein-rule estimation over the past 30 years, much of the analysis has focused upon regression models with spherical disturbances or those that assume a known covariance structure of the disturbances. See, for example, Judge and Bock (1978) for a comprehensive discussion of the relevant literature. By way of comparison, only scant attention has been given to Stein-rule estimation in a model where the disturbance covariance matrix is of an unknown form. Chaturvedi and Shukla (1990) considered a Stein-rule estimator based on the feasible generalized least squares estimator (FGLSE) and obtained Edgeworth-type asymptotic expansion for its distribution when the sample size is large (see also Wan and Chaturvedi (2000, 2001)). In this paper, we consider a Stein-rule predictor in a model where the disturbances’ covariance matrix is unknown. This work is motivated in part by recent studies of Kariya and Toyooka (1992) and Usami and Toyooka (1997), who derived the normal approximations for the feasible BLUP (FBLUP) and the FGLSE when the sample size is large, but is best thought of as an extension of Gotway and Cressie (1993). Although, unlike Gotway and Cressie (1993) who considered the class of optimal heterogeneous linear predictors given in Toutenburg (1982, p. 140), we consider predictors based on a composite target function that incorporates allowance for the simultaneous predictions of the actual and average values of the target variable (see Shalabh (1995), Toutenburg and Shalabh (1996, 2000), and Toutenburg et al. (2000)). We derive the large sample asymptotic distribution of a class of predictors based on the target function and compare the risk of the Stein-rule predictor with that of the FBLUP based on a quadratic loss structure. Furthermore, the findings are elaborated by considering a model with first order autoregressive disturbances. Finally, a Monte-Carlo experiment is conducted to explore the performance of the predictors in finite samples. 2. THE MODEL AND PREDICTORS Consider the general linear model, Y=Xb+u,

(2.1)

168

CHATURVEDI, WAN, AND SINGH

where Y is a n × 1 vector of observations on the dependent or target variable, X is an n × p nonstochastic matrix of observations on p explanatory variables, b is a p × 1 vector of unknown coefficients, and u is a n × 1 vector of disturbances. Let Yf be a T × 1 vector of unobserved values of the dependent variable for T forecast periods generated by the model Yf =Xf b+uf ,

(2.2)

where Xf is a T × p matrix of prespecified values on the explanatory variables for T forecast periods and uf is a T × 1 vector of disturbances. Further, we assume that the disturbance vector (uŒ, u −f )Œ follows a normal distribution with mean vector 0 and covariance matrix s 2S where

R VŒF YV S .

S=

(2.3)

So, s 2 F is a n × n covariance matrix of u, s 2Y is a T × T covariance matrix of uf , and s 2V is a n × T matrix of covariances between u and uf . Working on the assumption that S is known, Gotway and Cressie (1993) considered a Stein-rule predictor of Yf and examined its risk under a quadratic loss structure. Here, we assume, instead, that S=S(h) is a function of an unknown q × 1 vector h that belongs to an open subset of the q-dimensional Euclidean space, and h is estimated by an estimator hˆ. In predicting the dependent variable of a regression model, a traditional practice is to obtain the prediction for the actual values of the dependent variable or its average value but not both simultaneously. In some circumstances, it may be desirable to consider the simultaneous predictions of both the actual and the average values of a variable. Consider the situation of a real estate agent being engaged by vendors to provide market valuation of houses. In assessing the merit of the agent, considerations should be given to the average absolute errors that result from the agent’s valuations. On the other hand, a vendor will be concerned entirely with the accuracy of the appraisal of his or her own house. By virtue of the above considerations, both the average and the actual values of the absolute errors are of importance. Shalabh (1995) gave other examples of practical situations where one is required to predict both the average and the actual values of a variable. In these circumstances, one should use a prediction function that reflects more than one desideratum. Shalabh (1995) and Toutenburg and Shalabh (1996) considered the target function y=lYf +(1 − l) E(Yf ) =Xf b+luf ,

(2.4)

PREDICTION IN A GENERAL LINEAR MODEL

169

which allows the prediction of both Yf and E(Yf ), where l (0 [ l [ 1) is a nonstochastic scalar which assigns weights to the actual and expected values of Yf , respectively. Shalabh (1995) further established the bias and mean squared error matrix of the predictor of y based on (2.4) for the special case of S=I. For convenience purposes we write W=F −1. If h is known, then the BLUP for y is given by ˜f +(1 − l) Xf b˜, y˜=lY

(2.5)

˜f =Xf b˜+VŒW(Y − Xb˜) Y

(2.6)

where

is the BLUP of Yf (see Toutenburg (1982, p. 138)) and b˜=(XŒWX) −1 XŒWY

(2.7)

is the generalized least squares estimator of b. Now, substituting (2.6) in (2.5), we obtain y˜=Xf b˜+lVŒW(Y − Xb˜).

(2.8)

On the other hand, if h is unknown and estimated by an estimator hˆ, then the replacement of h by hˆ in (2.8) leads to the feasible BLUP, ˆ ŒW ˆ (Y − Xbˆ), yˆ=Xf bˆ+lV

(2.9)

ˆ and V ˆ are obtained by replacing h by hˆ in W and V, for y, where W respectively, and ˆ X) −1 XŒW ˆY bˆ=(XŒW

(2.10)

is the FGLSE of b. Note that the first term on the r.h.s. of (2.9) is an ˆ ŒW ˆ (Y − Xbˆ) is estimator of the nonstochastic part Xf b of y, whereas V an estimator of the disturbance term uf . Now, the Stein-rule estimator considered in Chaturvedi and Shukla (1990) is given by

1

bˆsr = 1 −

2

ˆ (y − Xbˆ) a (y − Xbˆ)Œ W , ˆ ˆ Xbˆ n − p+2 bŒXŒW

(2.11)

170

CHATURVEDI, WAN, AND SINGH

where a( \ 0) is a characterizing scalar. Chaturvedi and Shukla (1990) derived Edgeworth-type asymptotic expansion for the distribution of bˆsr and the condition of the dominance of bˆsr over bˆ under the criteria of risk under quadratic loss and concentration of distribution around the true values of the parameters. If bˆ is replaced by bˆsr in (2.9), then we obtain the Stein-rule predictor ˆ ŒW ˆ (Y − Xbˆsr ). yˆsr =Xf bˆsr +lV

(2.12)

Obviously, for a=0, the predictor yˆsr reduces to yˆ.

3. ASYMPTOTIC DISTRIBUTION AND DOMINANCE CONDITIONS In this section, we consider the asymptotic distribution of the predictor yˆsr when the sample size is large. We assume that, (i) for any n × n finite matrix C with elements of order O(1), the quantity XŒCX/n is of order O(1) as n Q .; (ii) for any arbitrary matrix with elements of order O(1), the quantity XŒCu/`n is of order Op (1); and (iii) the estimator hˆ of h is an even function of Mu, where M= In − X(XŒX) −1 XŒ and `n (hˆ − h) is of order Op (1) as n Q .. Now, for purposes of analysis, we write B=XŒWX/n,

ˆ =XŒW ˆ X/n, B

ˆ u/(s `n) a=XŒW

and c=(yˆsr − y) `n/s. Since M is an idempotent matrix with rank n − p, there exists a n × (n − p) matrix P such that PŒP=In − p and PPŒ=M. Consider the transformation, e1 =XŒWu/(s `n) and e2 =PŒu/s. Further, from the normality of u and observing that PŒX=0, it follows that e1 and e2 are independently distributed, e1 ’ N(0, B) and e2 ’ N(0, PŒW −1P). Also, note that assumption (iii) implies that hˆ is also an even function of e2 . In what follows, we derive the conditional asymptotic distribution of c given e2 .

171

PREDICTION IN A GENERAL LINEAR MODEL

Theorem 3.1. The conditional asymptotic distribution of c given e2 , up to order Op (n −1), is normal with mean vector m(e2 ) and variance covariance matrix X, where

m(e2 )= −

asu ˆb `n bŒB

ˆ ŒW ˆ X) b (Xf − lV

ˆ ŒW ˆ X) B ˆ −1XŒW ˆ /`n+l `n (V ˆ ŒW ˆ − VŒW)] +[(Xf − lV × W −1P(PŒW −1P) −1 e2 , 2 − 2

−1

u=s e (PŒW P)

−1

(3.1)

ˆ −W ˆ ŒX(XŒW ˆ X) −1 XŒW ˆ) PŒW (W −1

× W −1P(PŒW −1P) −1 e2 /n,

(3.2)

and X=l 2n(Y − VŒWV)+(Xf − lVŒWX) B −1(Xf − lVŒWX)Œ −

5

1

2

2as 2 2bbŒ (Xf − lVŒWX) B −1 − (Xf − lVŒWX)Œ nbŒBb bŒBb

6

(3.3)

Proof. We can write c as

c= = = =

`n s `n s `n s

(yˆsr − y) (yˆsr − Xf b − luf ) ˆ ŒW ˆ (Y − Xbˆsr ) − luf ] [Xf (bˆsr − b)+lV

ˆ (Y − Xbˆ) a (Y − Xbˆ)Œ W 5 ˆ ŒW ˆ X)(bˆ − b) − (X − lV ˆ Xbˆ (n − p+2) bˆŒXŒW s ˆ ŒW ˆ X) bˆ+lV ˆ ŒW ˆ u − lu 6 . × (X − lV (3.4) `n

f

f

f

Furthermore, we have ˆ − W=Op (n −1/2), W

(3.5)

ˆ − V=Op (n −1/2), V

(3.6)

172

CHATURVEDI, WAN, AND SINGH

and a − e1 =Op (n −1/2).

(3.7)

Now, following Kariya and Toyooka (1992), we observe that X(XŒWX) −1 XŒW+W −1P(PŒW −1P) −1 PŒ=In ,

(3.8)

so that u=[X(XŒWX) −1 XŒW+W −1P(PŒW −1P) −1 PŒ] u

5XB`ne +W

6

−1

=s

1

−1

P(PŒW −1P) −1 e2 .

(3.9)

Hence, 1 ˆ (Y − Xbˆ) (Y − Xbˆ)Œ W n − p+2

1

s2 p−2 = 1− n n

2

−1

ˆ −W ˆ X(XŒW ˆ X) −1 XŒW ˆ) e −2 (PŒW −1P) −1 × PŒW −1(W

× W −1P(PŒWP) −1 e2 +Op (n −1) =s 2u+Op (n −1).

(3.10)

Further, up to order Op (n −1/2),

51 b+`ns Bˆ a 2Œ Bˆ 1 b+`ns Bˆ a 26 1 b+`ns Bˆ a 2 1 1 2sbŒa 2 1 s ˆ = 1− b+ B a2 ˆ ˆ bŒBb `n bŒBb `n b s 1 2bbŒ 2 = + B − e. (3.11) ˆ bŒBb bŒBb `n

bˆ = ˆ bˆ bˆŒB

−1

−1

−1

−1

−1

−1

1

Hence, up to order Op (n −1), c can be written as ˆ ŒW ˆ X) B ˆ −1a − c=(Xf − lV

1

× B −1 −

2

asu ˆb `n bŒB

ˆ ŒW ˆ X) b − (Xf − lV

2bbŒ l `n ˆ ˆ e1 + (VŒWu − uf ). bŒBb s

as 2 (X − lVŒWX) nbŒBb f (3.12)

173

PREDICTION IN A GENERAL LINEAR MODEL

Now, let us consider the transformation I 0 u R uu S=R VŒW S R S. I se n

f

T

(3.13)

0

Further, we observe from the normality assumption of (uŒ, u −f )Œ that e0 is distributed independent of u (and hence independent of e1 and e2 ) and follows a normal distribution with mean vector 0 and covariance matrix Y − VŒWV. Making use of this transformation, c can be written, to the order of our approximation, as ˆ ŒW ˆ X) B ˆ −1a − c=(Xf − lV

1

× B −1 −

2 asu ˆ ŒW ˆ X) b − as (Xf − lVŒWX) (Xf − lV ˆb nbŒBb `n bB

2

2 `n l ˆ ˆ bbŒ e1 + (VŒW − VŒW) u − `n le0 . bŒBb s

(3.14)

ˆ −1a=(X −WˆX) −1 (X −Wˆu), making using of (3.8), and Now, recognizing that B n s `n after some manipulations, we can write c= − l `n e0 −

asu ˆb `n bŒB

5

ˆ ŒW ˆ X) b (Xf − lV

ˆ ŒW ˆ X) B ˆ −1 + (Xf − lV

ˆ XŒW ˆ ŒW ˆ − VŒW) +l `n (V `n

6

× W −1P(PŒW −1P) −1 e2

5

+ (Xf − lVŒWX) B −1 −

1

as 2 2bbŒ (Xf − lVŒWX) B −1 − nbŒBb bŒBb

26 e . 1

(3.15) ˆ, W ˆ , and V ˆ are Since, e0 , e1 , and e2 are independently distributed and u, B functions of e2 , we observe that, up to order O(n −1), the conditional distribution of c given e2 is normal with mean vector m(e2 ) and variance– covariance matrix X.

Corollary 3.1. Let f(m(e2 ), X) denote the p.d.f. of a normal distribution with mean vector m(e2 ) and variance–covariance matrix X. Then, up to order Op (n −1), the asymptotic unconditional distribution of c is given by f(c)=Ee2 [f(m(e2 ), X)].

174

CHATURVEDI, WAN, AND SINGH

Proof. Notice that, to the order of our approximation, the conditional variance–covariance matrix of c, i.e., X, does not depend on e2 . Therefore, the unconditional variance–covariance matrix of c is also X. Corollary 3.2. The bias vector of the predictor yˆsr , up to order O(n −1), is given by

5

6

as 2 u s E (X − lVŒWX) b + Ee2 ˆb f n e2 bŒB `n ˆ ˆ ŒW ˆ X) B ˆ −1 XŒW+l `n (V ˆ ŒW ˆ − VŒW) × (Xf − lV `n

E(yˆsr − y)= −

35

4

× W −1P(PŒW −1P) −1 e2 .

6 (3.16)

Proof. The proof follows directly from Theorem 2.1. Note that if we assume hˆ is an even function of e2 , then the second term in Eq. (3.16) vanishes and the expression for the bias vector of yˆsr reduces to E(yˆsr − y)= − =−

5

as 2 u Ee2 (X − lVŒWX) b ˆb f n bŒB

6

as 2 (X − lVŒWX) b+O(n −3/2). nbŒBb f

(3.17)

From (3.17), we observe that the bias of yˆsr increases in magnitude as (bŒBb/s 2) decreases. The bias, in magnitude, is also a decreasing function of n. Obviously, yˆ, the feasible BLUP, is unbiased to order O(n −1). Corollary 3.3. The MSE matrix of the predictor yˆsr , up to order O(n −2), is obtained as E[(yˆsr − y)(yˆsr − y)Œ]

3

s2 2 2as 2 = l n(Y − VŒWV)+(Xf − lVŒWX) B −1(Xf − lVŒWX)Œ − n nbŒBb

1

× (Xf − lVŒWX) B −1 −

2

2bbŒ a 2s 2 (Xf − lVŒWX)Œ+ bŒBb n(bŒBb) 2

4

× (Xf − lVŒWX) bbŒ(Xf − lVŒWX)Œ .

(3.18)

PREDICTION IN A GENERAL LINEAR MODEL

175

Proof. The proof follows2 directly from Theorem 2.1 and by observing that E[(yˆsr − y)(yˆsr − y)Œ]=s nX+[E(yˆsr − y)][E(yˆsr − y)]Œ. Substituting a=0 in (3.18) leads to the expression for the MSE matrix of the FBLUP. Further, to order O(n −2), the difference between the MSE matrices of the predictors yˆ and yˆsr is given by E[(yˆ − y)(yˆ − y)Œ] − E[(yˆsr − y)(yˆsr − y)Œ]

5

1

2 6

4+a 2as 4 bbŒ (Xf − lVŒWX)Œ. (3.19) = 2 (X − lVŒWX) B −1 − n bŒBb f 2bŒBb Now, for establishing the dominance of yˆsr over yˆ under the MSE matrix criterion, we observe that (3.18) is positive semidefinite if and only if ) bbŒ is positive semidefinite. However, by utilizing Rao and B −1 − (2b4+a − Bb Toutenburg (1995, Theorem A.57, p. 303), we find that B −1 − (2b4+a ) bbŒ is − Bb positive semidefinite if and only if (4+a)/2 [ 1, which is impossible as a \ 0. Hence the predictor yˆsr cannot uniformly dominate yˆ under the MSE matrix criterion. Similarly, from (3.15), yˆ dominates yˆsr if and only if (2b4+a ) bbŒ − B −1 is positive semidefinite. But from Rao and Toutenburg − Bb (1995, Theorem A.59, p. 304), the matrix (2b4+a ) bbŒ − B −1 also cannot be − Bb positive semidefinite for p > 1. This leads to the conclusion that neither of the predictors yˆ and yˆsr uniformly dominate the other under the MSE matrix criterion, at least to order O(n −2). Now, the question arises whether some weaker condition exists that ensures yˆsr to be better than yˆ. It turns out that a dominance condition can be derived if the predictors are compared in terms of risk under the quadratic loss function L(y˜, y)=(y˜ − y)Œ Q(y˜ − y),

(3.20)

where y˜ is an estimator of y and Q is a positive definite, symmetric matrix of order O(1). Corollary 3.4. The risk of the predictor yˆsr , up to order O(n −2), is given by s2 R(yˆsr )= [l 2n.tr(Q(Y − VŒWV))+tr(Q(Xf − lVŒWX) B −1(Xf − lVŒWX)Œ)] n −

5

as 4 2 tr(Q(Xf − VŒWX) B −1(Xf − lVŒWX)Œ) n 2bŒBb

− (4+a)

6

bŒ(Xf − lVŒWX)Œ Q(Xf − lVŒWX) b . bŒBb

(3.21)

176

CHATURVEDI, WAN, AND SINGH

Proof. The proof follows by taking the trace of Q times the MSE matrix expression given in (3.19). Corollary 3.5. Under the quadratic loss function of (3.20), a sufficient condition for the dominance of the predictor yˆs over yˆ is given by 0 [ a [ 2(w − 2),

w > 2,

(3.22)

where tr{B −1(Xf − lVŒWX)Œ Q(Xf − lVŒWX)} w= , ml {B −1(Xf − lVŒWX)Œ Q(Xf − lVŒWX)} and ml (.) is the maximum characteristic root of (.). Proof. Note that R(yˆ) is obtainable by substituting a=0 in (3.21). Hence,

5

as 4 2 tr{Q(Xf − lVŒWX) B −1(Xf − lVŒWX)Œ} − (4+a) R(yˆ) − R(yˆsr )= 2 n bŒBb × \

bŒ(Xf − lVŒWX)Œ Q(Xf − lVŒWX) b bŒBb

6

as 4 [2 tr{B −1(Xf − lVŒWX)Œ Q(Xf − lVŒWX)} − (4+a) n 2bŒBb × ml {B −1(Xf − lVŒWX)Œ Q(Xf − lVŒWX)}].

(3.23)

Given (3.23), it is straightforward to obtain Corollary 3.5.

4. AN ILLUSTRATIVE EXAMPLE AND MONTE-CARLO RESULTS The results presented in the preceding section are confined to the situation of large samples. Further, the analysis does not include a (numerical) measure of the risk magnitudes of estimators. The main purpose of this section is to extend the analysis of Section 3 to sample sizes that are more typical of those faced in practice. Unfortunately, given our present knowledge, finite sample properties cannot be determined analytically and thus we have to rely on the results of simulations. Another purpose of this section is to compare the simulation results with those obtained using large n approximations, so that the accuracy of the findings based on the latter approach can be evaluated.

177

PREDICTION IN A GENERAL LINEAR MODEL

In what follows, we focus on the case where the elements of u are assumed to be generated by a stationary first-order autoregressive process, ut =rut − 1 +Jt ,

|r| < 1, t=1, ..., n,

(4.1)

where the Jt ’s are i.i.d. from N(0, s 2J ) so that var(ut )=s 2J /(1 − r 2). Further,

1 V= 1 − r2

and

R

W=

so that

1

R

−r

rn

r n+1 . . r n+T − 1 rn

.

.

. .

.

.

.

. .

.

r

r2

. .

rT

. .

0

0

0

−r . .

0

0

0

0

− r 1+r 2

. . r n+T − 2

S

rn−1

.

.

.

. .

.

.

.

.

.

.

. .

.

.

.

0

0

0

. .

0

0

0

. .

0

0 0 . . . 0

r

0 0 . . . 0

r2

R

− r 1+r 2

VŒW= .

.

. . .

.

.

.

.

. . .

.

.

S

−r

−r

1

S

=(0 v),

(4.2)

0 0 . . . 0 rT where v=(r r 2...r T)Œ. So, the expression for the predictor yˆsr becomes yˆsr =Xf bˆsr +l(yn − x −n bˆsr ) vˆ,

(4.3)

178

CHATURVEDI, WAN, AND SINGH

where yn is the nth element of Y, x −n =(xn1 , xn2 , ..., xnp )Œ is the nth row of X, and vˆ=(rˆ rˆ 2...rˆ T)Œ is an estimator of v. In this case, the risk difference of the predictors yˆ and yˆsr , to order O(n −2), is given by

5 3

as 4 2 tr Q(Xf − lvx −n ) B −1(Xf − lvx −n )Œ R(yˆ) − R(yˆsr )= 2 n bŒBb −

(4+a) bŒ(Xf − lvx −n )Œ Q(Xf − lvx −n ) bŒBb

46 .

(4.4)

Next, we numerically evaluate (4.4) and compare the results with those obtained by Monte-Carlo simulations. We consider a transformed model where the regressors are orthogonal; i.e., HŒH=I, where H=(X | Xf ). Also, we consider n=16, 96; T=4, p=4, 10; l=0, 0.3, 0.5, 0.8, 1.0; r=−0.90, ..., 0.90 and s 2=1. In addition, the parameter vector b is ˆ is constructed using the chosen so that bŒb=5 or 15 and the matrix W Prais–Winsten (1954) transformation. Each part of the experiment is based on 5000 repetitions and the estimators’ risk performance is compared by setting Q=XŒWX. Further, we choose a=p − 2 as the characterizing scalar of bˆsr . Chaturvedi and Shukla (1990) showed that this choice of a minimizes the risk of bˆsr if Q is set to XŒWX. A selection of the representative results from the study is given in Table I, which illustrates, in addition to the Monte-Carlo risks of the estimators, a measure of relative efficiency, ¯ (yˆ), where R ¯ (yˆsr ) and R ¯ (yˆ) are the Monte-Carlo ¯ (yˆsr )/R defined as re=R risks of yˆsr and yˆ, respectively. So, yˆsr is deemed to be more efficient than yˆ if re < 1, and vice-versa. The numerical values of (4.4) and the corresponding values obtained by simulations are denoted by Da and Ds , respectively, in the table. Three general patterns in the simulation results may be noted to begin with. First, the results indicate that, without exception, yˆsr has smaller risk than yˆ in all regions of the parameter space. Second, as l increases, that is, more weight is assigned to the prediction of the actual unobserved values of the dependent variable, re and the risks of both yˆ and yˆsr increase, ceteris paribus. In other words, the predictor based on the Stein-rule estimator is relatively more efficient when l is small than when it is large. Third, other things being equal, increasing p always results in a smaller value of re, and hence an improvement in the efficiency of yˆsr relative to yˆ, although the risks of both yˆ and yˆsr also increase as p increases. So, one broad conclusion that can be drawn from the study is that in general, the risk reduction from using yˆsr over yˆ (as measured by re) is most pronounced when p is large and relatively more weight is given to the prediction of the mean value of the dependent variable. Broadly speaking, yˆsr is relatively more efficient for small values than for large values of bŒb, and the behaviour of

− 0.80 − 0.45 0.00 0.25 0.65 0.80 − 0.80 − 0.45 0.00 0.25 0.65 0.80 − 0.80 − 0.45 0.00 0.25 0.65 0.80 − 0.80 − 0.45 0.00 0.25 0.65 0.80

n=96, T=4, p=4, bŒb=15

n=96, T=4, p=10, bŒb=15

n=96, T=4, p=4, bŒb=5

n=96, T=4, p=10, bŒb=5

r

1.3834 0.7026 0.6262 0.5683 0.6193 1.0347

0.1442 0.1568 0.1797 0.1886 0.1812 0.2017

1.3834 0.7026 0.6262 0.5683 0.6193 1.0347

0.1442 0.1568 0.1797 0.1886 0.1812 0.2017

¯ (yˆ) R

0.5163 0.3002 0.2729 0.2785 0.3129 0.4561

0.1151 0.1282 0.1510 0.1552 0.1534 0.1637

0.8469 0.4597 0.4311 0.4219 0.4628 0.7179

0.1309 0.1450 0.1736 0.1806 0.1756 0.1901

¯ (yˆsr ) R

0.3732 0.4273 0.4357 0.4901 0.5053 0.4407

0.7959 0.8176 0.8402 0.8227 0.8465 0.8115

0.6122 0.6543 0.6884 0.7424 0.7473 0.6938

0.9077 0.9248 0.9656 0.9576 0.9693 0.9424

re

l=0

0.8671 0.4024 0.3534 0.2898 0.3064 0.5787

0.0291 0.0286 0.0287 0.0334 0.0278 0.0380

0.5365 0.2429 0.1952 0.1464 0.1565 0.3168

0.0133 0.0118 0.0062 0.0080 0.0056 0.0116

DS

2.4432 0.9889 0.9165 0.8455 1.0732 2.1355

0.0239 0.0352 0.0309 0.0244 0.0174 0.0143

0.8144 0.3296 0.3055 0.2818 0.3577 0.7118

0.0080 0.0117 0.0103 0.0081 0.0058 0.0048

Da

2.8906 1.9470 1.7111 1.7212 2.0802 2.5249

2.0020 1.3432 1.2100 1.2972 1.7841 2.3463

2.8906 1.9470 1.7111 1.7212 2.0802 2.5249

2.0020 1.3432 1.2100 1.2972 1.7841 5.0960

¯ (yˆ) R

2.1927 1.6443 1.2779 1.3312 1.7532 2.1665

1.9796 1.3169 1.1572 1.2372 1.7287 2.2943

2.4868 1.6753 1.4556 1.5001 1.9093 2.3401

1.9976 1.3359 1.1984 1.2825 1.7709 5.0659

¯ (yˆsr ) R

0.7586 0.7668 0.7468 0.7734 0.8428 0.8580

0.9888 0.9804 0.9564 0.9538 0.9689 0.9779

0.8603 0.8605 0.8507 0.8715 0.9178 0.9268

0.9978 0.9946 0.9904 0.9887 0.9926 0.9941

re

l=0.5

0.6978 0.4540 0.4332 0.3900 0.3270 0.3584

0.0224 0.0263 0.0527 0.0600 0.0555 0.0519

0.4037 0.2717 0.2555 0.2211 0.1709 0.1848

0.0043 0.0075 0.0116 0.0147 0.0132 0.0332

DS

Monte-Carlo Risks of Estimators

TABLE I

1.4015 0.9036 0.9165 0.8556 0.8362 1.1926

0.0017 0.0144 0.0309 0.0412 0.0463 0.0419

0.4672 0.3012 0.3055 0.2852 0.2787 0.3975

0.0006 0.0048 0.0103 0.0137 0.0154 0.0119

Da

5.5285 3.8502 3.3743 3.4971 4.5138 5.4579

4.9978 3.2292 2.8345 3.0313 4.3022 5.7512

5.5285 3.8502 3.3743 3.4971 4.5138 5.4579

4.9978 3.2292 2.8345 3.0313 4.3022 5.7512

¯ (yˆ) R

4.9374 3.3709 2.8939 3.0449 4.1548 5.1655

4.9577 3.1955 2.7666 2.9587 4.2474 5.7151

5.1984 3.5637 3.0830 3.2300 4.3231 5.3136

4.9881 3.2181 2.8178 3.0113 4.2886 5.7414

¯ (yˆsr ) R

0.8931 0.8755 0.8576 0.8707 0.9205 0.9464

0.9920 0.9896 0.9760 0.9761 0.9873 0.9937

0.9403 0.9256 0.9137 0.9236 0.9578 0.9736

0.9981 0.9966 0.9941 0.9934 0.9968 0.9983

re

l=0.8

0.5911 0.4793 0.4804 0.4522 0.3590 0.2924

0.0402 0.0337 0.0679 0.0726 0.0548 0.0361

0.3301 0.2865 0.2914 0.2671 0.1907 0.1443

0.0097 0.0111 0.0166 0.0200 0.0137 0.0098

DS

0.9868 0.8660 0.9165 0.8648 0.7380 0.8130

− 0.0082 0.0026 0.0309 0.0513 0.0632 0.0579

0.3289 0.2887 0.3055 0.2883 0.2460 0.2710

− 0.0027 0.0009 0.0103 0.0171 0.0211 0.0193

Da

PREDICTION IN A GENERAL LINEAR MODEL

179

− 0.80 − 0.45 0.00 0.25 0.65 0.80 − 0.80 − 0.45 0.00 0.25 0.65 0.80 − 0.80 − 0.45 0.00 0.25 0.65 0.80

n=16, T=4, p=10, bŒb=15

n=16, T=4, p=4, bŒb=5

n=16, T=4, p=10, bŒb=5 24.103 14.934 9.6683 7.5627 6.5118 7.1976

0.4384 0.6414 1.2326 1.4847 1.9094 2.1981

24.103 14.934 9.6683 7.5627 6.5118 7.1976

0.4384 0.6414 1.2326 1.4847 1.9094 2.1981

17.726 10.339 6.8602 5.2847 4.3301 4.5645

0.3556 0.5078 0.9765 1.1776 1.5315 1.8211

20.218 11.956 7.7819 6.0502 5.1063 5.5218

0.4058 0.5830 1.1043 1.3320 1.7236 2.0117

0.7354 0.6923 0.7096 0.6988 0.6650 0.6342

0.8112 0.7917 0.7922 0.7931 0.8021 0.8285

0.8388 0.8006 0.8049 0.8000 0.7842 0.7672

0.9257 0.9090 0.8959 0.8972 0.9027 0.9152

6.3766 4.5952 2.8081 2.2780 2.1816 2.6331

0.0828 0.1336 0.2561 0.3071 0.3780 0.3770

3.8845 2.9781 1.8864 1.5125 1.4055 1.6758

0.0326 0.0584 0.1283 0.1527 0.1859 0.1865

63.208 46.875 30.392 24.870 23.646 30.122

0.2023 0.3264 0.5397 0.7181 0.9961 0.9879

21.069 15.625 10.131 8.2899 7.8820 10.041

0.0674 0.1088 0.1799 0.2394 0.3320 0.3293

23.242 16.270 11.193 9.3697 9.1671 9.7902

2.5025 1.8031 2.0138 2.2331 2.6934 3.2185

23.242 16.270 11.193 9.3697 9.1671 9.7902

2.5025 1.8031 2.0138 2.2331 2.6934 3.2185

17.765 11.731 8.1926 6.8051 6.6404 7.0564

2.4453 1.6915 1.7919 1.9690 2.4117 2.9691

19.945 13.338 9.1917 7.6979 7.6613 8.1722

2.4811 1.7567 1.9058 2.1050 2.5572 3.1031

0.7644 0.7210 0.7319 0.7263 0.7244 0.7208

0.9771 0.9381 0.8898 0.8817 0.8954 0.9225

0.8582 0.8198 0.8212 0.8216 0.8357 0.8347

0.9915 0.9743 0.9464 0.9426 0.9494 0.9641

5.4765 4.5385 3.0004 2.5646 2.5267 2.7338

0.0322 0.1116 0.2219 0.2642 0.2817 0.2494

3.2965 2.9320 2.0013 1.6717 1.5058 1.6180

0.0214 0.0464 0.1079 0.1281 0.1362 0.1154

55.09 45.96 30.39 24.28 20.34 21.94

0.153 0.335 0.539 0.676 0.785 0.680

18.36 15.32 10.13 8.094 6.781 7.314

0.064 0.111 0.179 0.225 0.261 0.226

25.559 18.623 13.309 11.676 12.428 13.663

5.9458 3.7587 3.4452 3.6902 4.5861 5.8402

25.559 18.622 13.309 11.676 12.428 13.663

5.9458 3.7587 3.4452 3.6902 4.5861 5.8402

20.450 14.086 10.182 8.9229 9.6704 10.771

5.9019 3.6614 3.2439 3.4484 4.3394 5.6345

22.527 15.709 11.237 9.9013 10.861 12.033

5.9295 3.7194 3.3494 3.5751 4.4691 5.7508

0.8011 0.7564 0.7650 0.7642 0.7781 0.7883

0.9926 0.9741 0.9416 0.9345 0.9462 0.9648

0.8814 0.8435 0.8443 0.8480 0.8739 0.8807

0.9973 0.9895 0.9722 0.9688 0.9745 0.9847

5.1097 4.5361 3.1270 2.7525 2.7580 2.8920

0.0439 0.0973 0.2013 0.2418 0.2466 0.2057

3.0326 2.9137 2.0718 1.7742 1.5676 1.6297

0.0163 0.0393 0.0958 0.1151 0.1170 0.0894

51.182 45.386 30.392 23.901 18.348 17.878

0.2308 0.3522 0.5397 0.6553 0.7089 0.5966

17.061 15.129 10.131 7.9669 6.1159 5.9594

0.0769 0.1174 0.1799 0.2184 0.2363 0.1989

¯ (yˆ), Monte-Carlo risk of y; R ¯ (yˆsr ), Monte-Carlo risk of yˆsr ; re=R ¯ (yˆsr )/R ¯ (yˆ); Ds =R ¯ (yˆ) − R ¯ (yˆsr ); Da =R(yˆ) − R(yˆsr )=numerical value of Note. R Eq. (4.4).

− 0.80 − 0.45 0.00 0.25 0.65 0.80

n=16, T=4, p=4, bŒb=15

180 CHATURVEDI, WAN, AND SINGH

PREDICTION IN A GENERAL LINEAR MODEL

181

the predictors’ risks for varying choices of r depends largely on the values of the model’s other parameters. However, on a practical level, little prescriptions can be offered based on the latter two findings as both bŒb and r are unknown in practice. Now, comparing Ds and Da , we observe that the large n approximation approach is reasonably reliable when the sample size is relatively large and that it works better when p is small than when it is large and when bŒb is large than when it is small. On the other hand, if |r| is large, then Da differs from Ds frequently by over 50%. As expected, the reliability of the large sample approximation results declines as n decreases.

ACKNOWLEDGMENTS The authors are grateful to the editor in charge, the referees, and Professors V. K. Srivastava and A. Ullah for their helpful comments on an earlier version of the paper. The second author’s work was supported partially by a strategic grant from the City University of Hong Kong.

REFERENCES A. Chaturvedi and G. Shukla, Stein-rule estimation in linear models with non-scalar error covariance matrix, Sankhya¯, Ser. B (1990), 293–304. A. T. K. Wan and A. Chaturvedi, Double K-class estimators in regression models with nonspherical disturbances, J. Multivariate Anal. 79 (2001), in press. A. T. K. Wan and A. Chaturvedi, Operational variants of the minimum mean squared error estimator in linear regression models with non-spherical disturbances, Ann. Inst. Statist. Math. 52 (2000), 332–342 J. B. Copas, Regression, prediction and shrinkage (with discussion), J. Roy. Statist. Soc. Ser. B 45 (1983), 311–354. J. B. Copas and M. C. Jones, Regression shrinkage methods and autoregressive time series prediction, Australian J. Statist. 29 (1987), 264–277. C. A. Gotway and N. Cressie, Improved multivariate prediction under a general linear model, J. Multivariate Anal. 45 (1993), 56–72. G. G. Judge and M. E. Bock, ‘‘The Statistical Implications of Pre-test and Stein-Rule Estimators in Econometrics,’’ North Holland, Amsterdam, 1978. T. Kariya and Y. Toyooka, Bounds for normal approximations to the distributions of generalized least squares predictors and estimators, J. Statist. Planning Inference 30 (1992), 213–221. S. J. Prais and C. B. Winsten, ‘‘Trend Estimation and Serial Correlation,’’ Cowles Commission Discussion Paper 383, Chicago, 1954. C. R. Rao and H. Toutenburg, ‘‘Linear Models: Least Squares and Alternatives,’’ Springer-Verlag, New York. Shalabh, Performance of Stein-rule procedure for simultaneous prediction of actual and average values of study variable in linear regression models, Bull. Internat. Statist. Inst. 56 (1995), 1375–1390. H. Toutenburg, ‘‘Prior Information in Linear Models,’’ Wiley, New York, 1982.

182

CHATURVEDI, WAN, AND SINGH

H. Toutenburg, A. Fieger, and C. Heumann, Regression modeling with fixed effects: missing values and related problems, in ‘‘Statistics for 21st Century’’ (C. R. Rao and G. J. Székely, Eds.), pp. 423–439, Marcel Dekker, New York, 2000. H. Toutenburg and Shalabh, Predictive performance of the methods of restricted and mixed regression estimators, Biometrical J. 38 (1996), 951–959. H. Toutenburg and Shalabh, Improved prediction in linear regression model with stochastic linear constraints, Biometrical J. 42 (2000), 71–86. Y. Usami and Y. Toyooka, Errata of Kariya and Toyooka (1992), Bounds for normal approximations to the distributions of generalized least squares predictors and estimators, J. Statist. Planning Inference 58 (1997), 399–405.