Testing and estimating location vectors when the error covariance matrix is unknown

Testing and estimating location vectors when the error covariance matrix is unknown

Journal of Econometrics 54 (1992) 121-138. North-Holland Testing and estimating location vectors when the error covariance matrix is unknown* Will...

1MB Sizes 1 Downloads 23 Views

Journal

of Econometrics

54 (1992) 121-138.

North-Holland

Testing and estimating location vectors when the error covariance matrix is unknown* William

Griffiths

University of New England, Armidale, NSW 2351, Australia

George

Judge

University of California, Berkeley, CA 94720, USA Received

February

1990, final version

received

September

1991

An exact test proposed by Weerahandi (1987) for testing the equality of location vectors under heteroskedasticity is compared with a commonly-used computationally simple asymptotic test. The results from a variety of sampling experiments indicate that in most instances the nominal size of Weerahandi’s test (F,) overstates the probability of a Type I error and the nominal size of the asymptotic test (F,) understates the probability of a Type I error. Consequently, without size correction, the probability of a Type II error is less for FA than it is for F,. With size correction the powers of the two tests are virtually identical. Within an estimation context the risk properties of the pre-test estimators generated by FA and F, are compared, and an empirical Bayes estimator is developed within the framework of the more general seemingly unrelated regressions model. Under a squared error loss measure the empirical Bayes estimator is shown to behave in a minimax way.

1. Introduction

In a recent article, Weerahandi (1987) considers the problem of testing the equality between two location vectors when the corresponding scale parameters are possibly unequal and proposes a simple exact test that is similar in form to that of Chow (1960). Several earlier articles also consider this problem. Toyoda (1974) and Schmidt and Sickles (1977) demonstrate how poorly the conventional Chow test can perform under heteroskedasticity. *We are indebted helpful comments.

to D. Giles,

S. Weerahandi,

0304-4076/92/$05.00

0 1992-Elsevier

Science

F. Wolak,

Publishers

and two anonymous

B.V. All rights reserved

referees

for

122

W Grifiths and G. Judge, Testing and estimating location uectors

Jayatissa (1977) suggests an exact test that allows for the different variances, but it is one that has few degrees of freedom and it lacks some desirable invariance properties [Tsurumi (1984)]. Bounds tests have been suggested by Dalal and Mudholkar (19881, Kobayashi (1986), and Ohtani and Kobayashi (1986). Several recent papers compare and seek to improve the finite sampling performance of alternative asymptotic tests, and some compare asymptotic tests with the Jayatissa test. See, for example, Ali and Silver (1985), Honda (19821, Honda and Ohtani (19891, Ohtani and Toyoda (1985,1985), Tsurumi and Sheflin (19851, Rothenberg (19841, Conerly and Mansfield (19891, and Watt (1979). Related work that examines the consequences of testing for heteroskedasticity on the properties of location estimators and tests is that of Greenberg (1980), Yancey, Judge, and Miyazaki (1984), Ohtani (19871, and Toyoda and Ohtani (19861. In this paper we first focus on Weerahandi’s exact test (F,). This test uses the magnitude of a p-value (observed significance level) as the measure of evidence, or as the criterion for rejection or acceptance, for a null hypothesis that equates location vectors. It is an unconventional test in the sense that its size is not necessarily equal to the critical p-value upon which the decision to accept or reject is based. The test emphasizes the evidence (p-value) from a single sample not the repeated sample considerations of size and power. However, because size and power are criteria upon which the choice of a test is frequently based, it is important to investigate the performance of F, in this regard. In particular, since computation of F, requires numerical integration, it is important to compare the finite sample size and power of F, with a conventional, less computationally demanding, asymptotic test. In this paper a Monte Carlo sampling experiment is used to compare the F,-test with an approximate F-test (F,). For the experimental designs considered, our results indicate that F, is not better everywhere in the parameter space than the more conventional, computationally easier, F,-test. Sometimes hypothesis testing is done for its own sake, but, frequently, it is a preliminary step towards the choice of an appropriate estimator of some parameters of interest. There are two general approaches to estimation under the existence of an uncertain hypothesis. One approach is to use a preliminary test estimator that is a discontinuous function of two alternative estimators, one that has desirable properties when the null hypothesis is true and one that has desirable properties when the null hypothesis is false. The other approach is to use a continuous weighted average of the two alternative estimators, with weights given by considerations from Stein-rule estimation, empirical Bayes estimation, or posterior odds. Examples of these various procedures can be found in Judge and Bock (19781, Zellner and Vandaele (1975), and Judge, Hill, and Bock (1990). Within this context, a secondary objective of this paper is to develop an empirical Bayes estimator that is relevant when the uncertain hypothesis is one of equality of coefficient

W. Grifiths

and G. Judge, Testing and estimating location wctors

123

vectors. This estimator is developed within the more general seemingly unrelated regression framework [Zellner (196211 that, relative to the twoequation heteroskedastic error model, permits more than two equations, as well as contemporaneous correlation among the errors. Because the empirical Bayes estimator and its preliminary test estimator counterparts are biased estimators, a squared error loss measure, or risk, is used to compare the sampling theory performance of the various estimators. We find that under a squared error measure the empirical Bayes estimator not only is uniformly superior to the conventional least squares and seemingly unrelated regressions estimators, but it also outperforms preliminary test estimators based on FA and F, and thus offers an attractive alternative to traditional solutions to the pooling problem. We begin by examining the testing question in section 2. The problem of estimation under hypothesis uncertainty is taken up in section 3.

2. Comparison

of tests

2.1. Test statistics Assume that

y=

two CT X 1) vectors

we observe

[I [ y1 = Y2

Xl

x2

of sample

][;:I+[:;I=xS+e?

observations

yl, y, such

(2.la)

where X, and X, are (TX K) design matrices, /3i and & are K-dimensional unknown location vectors, and e, and e2 are unobservable CT X 1) normal random vectors with mean vectors 0 and covariance matrix

(2.lb)

Given that a: and (~22are not (necessarily) equal, we are interested in testing whether the location vectors are identical (pi = /3*). In section 3 the sampling properties of alternative estimators for B, given the uncertainty about the equality of /3, and &, are of interest. To describe the two statistics for testing the hypothesis Ha: /3, = /12, consider the usual least squares location and scale estimators bi = (X:Xi)-‘X,‘y, and gi2 = (yi -Xibi)‘(yi -X,b,)/(T - K) for i = 1,2. It is well

124 known

UCGrifiths and G. Judge, Testing and estimating location vectors

that, under F = (4 I

H,, -~,)‘[4X;XI)~‘+~:(X~X2)-1]-1~~l-~2)/~ ($?/a?

+ &;/a;)/2 (2.2)

- F[K,UT-K)].

If a: = ~2, then F, becomes the usual Chow statistic for testing the equality 8, = & under the assumption of equal variances. When a: # a;, these unknown parameters remain in F, and an alternative strategy is necessary. One way to proceed is to replace a: and ai by consistent estimators 3: and G.22,in which case F, becomes

(2.3) Under the usual assumptions about the limiting behavior of X, and X,, FA and F, converge to a multiple of the same asymptotic X2 distribution. Thus, a commonly used approximate large sample test for testing & = /3* under heteroskedasticity is that based on the statistic FA. Other asymptotic alternatives exist. The statistic FA is equal to the Wald statistic divided by its degrees of freedom K. The Wald test by itself could be used, as could the Lagrange multiplier or likelihood ratio tests. However, it has been argued that the F-test version is likely to be better in terms of the accuracy with which the actual size approximates the nominal size. See, for example, Woodland (1986) and the Wald test results in Watt (19791, Honda (1982), and Ohtani and Toyoda (1985,1985). Another alternative to FA when a: # ui is the test procedure F,, suggested by Weerahandi (1987). To outline this test we note that B=

+/CT:

+/cl;

+ &,2/a;

-beta[(T-K)/2,(T-K)/2],

(2.4)

and, furthermore, that B is independent of the two X*-statistics that comprise the numerator and denominator of F,. Substituting B into (2.2) yields

F,=2(b,-b,)’

6-j 6; $X;X,)-‘+I_B(X;X,)-’

-I(b, 1 -M/K. (2.5)

W. Griffiths and G. Judge, Testing and estimating location uectors

125

To examine the test statistic Fw let us momentarily return to the test based on FA. As is well known, when using FA we can proceed in one of two ways. Assuming a fixed significance level of 5 percent, we can find the observed value of FA and reject the null hypothesis (H,: /3r = &I if this observed value is greater than the 5 percent critical value. Alternatively, we can find (under H,) the probability of the test statistic FA exceeding its observed value, and reject H, if this probability (p-value) is less than 0.05. Both procedures are equivalent and, in a sufficiently large sample, the p-value has a uniform distribution in repeated sampling in the sense that P(p < 0.05) = 0.05. That is, a 5 percent significance level implies a correct null is rejected 5 percent of the time. With the test statistic F, it is not possible to calculate an observed value of the test statistic because B is unknown. Thus, the procedure of rejecting H, when an observed F, exceeds some critical value is not possible. However, it is possible to replace the statistics b,, b,, &:, and ~3: in F, by their observed sample values, to form a partially observed statistic F$. Then, recognizing that B has a beta distribution and that it is independent of b,, b,, &f, and 6.22, it is possible to compute the probability of obtaining an F, greater than the partially observed statistic F&. As Weerahandi demonstrates, this probability is given by P=

1- E&K,z(T-J#&~N)] 1

(2.6)

whereF[KZ(T-K)] (-1 is the distribution function value from the F-distribution with K and 2(T - K) degrees of freedom, and EB[ *I represents the expectation of this value with respect to the distribution of B. The integral given by this expectation can be evaluated numerically. The null H, may then be rejected if this value is less than some prescribed value, say 0.05. An intuitive explanation of the procedure is as follows: For each value of B between 0 and 1 an observed value of F, (say f,> is computed; then, for each f,, we find P[Fw >f,]. These probabilities are averaged with the beta distribution providing the weights for the averaging process. If the average probability is less than 0.05, the hypothesis is rejected. This test provides an intuitively appealing measure of the sample evidence for or against the null hypothesis, but it is not a conventional test with a fixed level of significance. The p-value in (2.6) does not have a uniform distribution in the sense that P[p < 0.05]= 0.05. Thus, in repeated sampling, with a rule that says reject if p < 0.05, H, will not necessarily be rejected 5 percent of the time and the size of the test will not necessarily be 0.05. This characteristic raises the question of whether, using conventional criteria, the Fw-test that involves the use of a numerical integration program is preferable to more commonly used, computationally easier, asymptotic tests such as the F,-test. The reason for looking for alternatives to FA is,

J.Econ

E

126

W. Grifiths

and G. Judge, Testing and estimating location vectors

presumably, to obtain a test that is more powerful in finite samples, and that has a finite sample size corresponding more closely to the presumed size. 2.2. Sampling experimental design Because of the intractability of analytical procedures, we use Monte Carlo sampling procedures to examine the questions of finite sample size and power. The experiment was conducted using three sample sizes, T = 8, 20, and 40, and a location vector of dimension K = 4. The value of 4 was chosen because it is sufficiently large to be realistic for both the hypothesis testing and estimation objectives. Two alternatives for X, and X, were chosen. For the first one we set Xix, = Xix, = n1, so that the problem was one of estimating K means with n replications on each mean process. The values for IE were 2, 5, and 10 for T = 8, 20, and 40, respectively. For the other chosen alternative all the columns of X, and X, except for the first were filled with uniform random numbers between 0 and 10. The first columns contained constant terms. Once X, and X, were set, they were held fixed in repeated samples. The sum of the variances was kept constant throughout, a: + a: = 10, but three different variance ratios, y = a$/~: = 1, 9, and 25, were considered. For the location vectors we set p, = (l,l, l,l,>’ and /S2 = cupi, where a is a scalar that controls the extent to which & differs from /3,. A large number of values of a were considered. In the reporting of the power results, the ‘difference’ between & and pi was measured by the noncentrality parameter

For each parameter setting 5000 samples were generated. With this number of samples and p = 0.05, the standard deviation of an estimate for p is 0.00308. 2.3. Size and power results The estimated sizes of both tests for each variance ratio, y = ai/uf, each sample size T, and both design matrices are given in table 1. A 5 percent significance level was used for the F’-test and a ‘critical p-value’ of 0.05 was used for the Fw-test. When y = 1 and Xix, and Xix, are proportional to the identity matrix, the F’-test is valid in small as well as large samples. Thus, we would expect all the &-test sizes in the upper part of the table, and for y = 1, to be within a reasonable sampling error of 0.05. With the exception of the entry for T = 8, which lies just outside a 95 percent confidence interval, such is indeed the case. In the remainder of the table the actual sizes for the &,-test exceed the nominal size of 0.05; the greater the degree of het-

W. Grifiths

and G. Judge, Testing and estimating location vectors

127

Table 1 Sizes of the tests.a y=a,Z/cT:=1 Sample

size

'?A

y = cT;/a, 2-9 Fw

FA

xix, =x;x, T=8 T= 20 T=40

0.043 0.049 0.045

0.013 0.038 0.040

T=8 T=20 T=40

0.068 0.050 0.052

0.032 0.039 0.047

General

‘The

standard

error

for all the estimated

Fw

= 25

FA

Fw

0.033 0.047 0.047

0.098 0.068 0.058

0.044 0.049 0.049

0.040 0.045 0.052

0.092 0.064 0.059

0.042 0.049 0.052

=nI,

0.079 0.064 0.054 Xix,

y = o,2/a:

and Xix, 0.079 0.061 0.058

sizes is approximate

0.003.

eroskedasticity and the smaller the sample size, the larger the actual size of the F,-test. For T I 20 and y 2 9, the asymptotic theory has clearly not yet taken hold. For the Fw-test we first note that its size is never greater than that of the F,-test. The size of the F,-test is approximately ‘correct’ for T = 20, 40 and y = 9, 25. However, for T = 8 or y = 1, the actual size of the F,-test is clearly less than the nominal size; the lower the degree of heteroskedasticity and the smaller the sample size, the smaller the actual size of the F,-test. If a choice between the two tests is to be made on the basis of size alone, then this choice depends on which is the bigger sin, overstatement or understatement of the probability of a Type I error. If understatement is more sinful, F, is better; if overstatement is more sinful, FA is better. When the location vectors are not equal, the relevant criterion for comparing the two tests is the probability of a Type II error, or the power of the test. We find that, when the tests are not size-corrected, the power of the F,-test is always greater than that of the F,-test. The difference in power may mean that the F,-test is superior to the F,-test, but it could be attributable to the fact that the size of the F,-test is always greater than that of the F,-test. To investigate this question further, a sampling experiment was used to find critical values that lead to exact sizes of 0.05 for each of the tests and for each experimental set up. These critical values appear in table 2. Although all applied problems will be different, the values in this table provide some kind of practitioner’s guide to the way in which nominal critical values need to be adjusted to achieve an exact size. When a power comparison between FA and F, is made using the size-corrected critical values that appear in table 2, we find that the performance of the two tests is virtually identical. Some selected results that typify this close correspondence between FA and F, appear in table 3.

128

W. Grifiths and G. Judge,

Testing and estimating location vectors Table 2

Simulated exact critical values for a 0.05 size.

FA Nominal

Fw

y=l

y=9

y = 25 xix,

T=8 T=20 T=40

3.838 2.668 2.499

3.691 2.657 2.440

4.902 2.882 2.560

T=8 T=20 T=40

3.838 2.668 2.499

4.340 2.666 2.532

5.070 2.860 2.641

Nominal

=x;x,

y=9

y = 25

0.1188 0.0647 0.0606

0.0740 0.0532 0.0536

0.0573 0.0520 0.0512

0.0757 0.0628 0.0529

0.0625 0.0547 0.0481

0.0615 0.0506 0.0483

= nI,

5.690 2.940 2.602

General Xix,

y=l

0.05 0.05 0.05 and Xi Xz

5.376 2.957 2.650

0.05 0.05 0.05

Table 3 Power functions for size-corrected tests FA and F,.‘! T=8,y=25

T=20,y=25

A

0.0 3.2 3.9 5.0 5.6 6.5 8.2 11.3 12.2 14.1 15.5 18.4 21.6 25.1 30.8 34.9 39.2 45.0 51.2 57.8

FA 0.0500 0.1306 0.1506 0.1832 0.1998 0.2250 0.2760 0.3650 0.3908 0.4412 0.4718 0.5454 0.6142 0.6798 0.7690 0.8136 0.8548 0.8984 0.9330 0.9576

T=40,y=25

General Xi X, and Xi X,

General Xi X, and X; X,

Fw

A

'?A

Fw

A

F~

Fw

0.0500 0.1288 0.1484 0.1806 0.1984 0.2220 0.2716 0.3586 0.3852 0.4344 0.4662 0.5370 0.6054 0.6720 0.7578 0.8070 0.8498 0.8918 0.9282 0.9536

0.0 1.4 2.6 3.3 4.0 4.9 5.8 6.8 7.3 7.9 9.1 10.3 11.6 13.0 16.1 17.7 19.5 23.2 27.2 36.2

0.0500 0.1104 0.1744 0.2086 0.2522 0.2964 0.3484 0.4086 0.4368 0.4676 0.5286 0.5964 0.6604 0.7212 0.8188 0.8562 0.8912 0.9404 0.9676 0.9952

0.0500 0.1110 0.1726 0.2084 0.2514 0.2966 0.3490 0.4076 0.4360 0.4660 0.5290 0.5964 0.6590 0.7198 0.8178 0.8546 0.8902 0.9390 0.9666 0.9950

0.0 0.9 2.0 2.9 3.5 4.3 5.1 6.0 7.0 7.5 8.0 9.1 10.2 11.5 12.8 14.2 17.2 20.4 24.0 31.9

0.0500 0.0884 0.1444 0.2006 0.2444 0.2874 0.3418 0.4026 0.4624 0.4962 0.5294 0.5954 0.6572 0.7186 0.7694 0.8148 0.8914 0.9418 0.9744 0.9966

0.0500 0.0882 0.1448 0.2020 0.2446 0.2874 0.3422 0.4034 0.4634 0.4966 0.5302 0.5954 0.6576 0.7186 0.7698 0.8148 0.8914 0.9416 0.9740 0.9966

X;X,=X;X,=nI,

“The standard errors for the estimated powers range from 0.001, when the estimated power is 0.995, to 0.007, when the estimated power is 0.5.

129

W. Griffiths and G. Judge, Testing and estimating location uectors

3. Estimation

under

hypothesis

uncertainty

3.1. Statistical model and estimators To examine the question of estimation under uncertainty about the equality of location vectors we use a more general framework where there are more than two equations and where, in addition to heteroskedasticity, there can be contemporaneous correlation between the errors in different equations. This generalization results in the so-called seemingly unrelated regressions statistical model [Zellner (196211 that n la:y be written as

Xl

81 82

x*

+

e1 e2 f

1:

(3.1)

eM

or, more compactly,

the M possibly

error-related

equations

may be written

as

y=Xfl+e,

(3.2)

e-N(0,Xc3ZI,).

(3.3)

where

The statistical model in (2.1) and (2.2) is a special case of this model where M = 2 and L? is diagonal. The hypothesis of interest is that which Zellner (1962) considered under the heading ‘testing for aggregation bias’, namely, H,:

/.j,=&=

. . . =flM.

(3.4)

One statistic that can be used to test this hypothesis counterpart of the approximate F-test in (2.31, namely FA=$R’(RC!R’)-lR&(M-

l)K,

is an extended

(3.5)

where R is an [(M - 1)K x MK] matrix that is constru$ed so that the null hypothesis (3.4) can bc written as R/3 = 0; e = [X’(T’ @1,)X]-‘; b = CX’(_T-’ @ Z,)y; and 2 has elements given by Gij = (yi - Xibi)‘(yj - Xjbj)/ (T - K). Under the null hypothesis, FA in (3.5) will have an approximate F-distribution with [CM - l)K, M(T - K)] degrees of freedom. If the null hypothesis is known to be false, then the feasible generalized least squares b is an appropriate estimator for /3. On the other hand, if the null hypothesis is known to be true, fi can be estimated via the restricted

130

W. Gri$iths and G. Judge, Testing and estimating

estimator

(lM @ Z,)g, where

$f=

[z~(r~‘~z,)Z]~‘Z’(f.-‘~Z,)y,

Z’=(Xi,Xi,..., ments

location Cectors

Xb), equal to one.

and 1, is an M-dimensional

(3.6) vector

with all its ele-

3.1.1. A pre-test estimator When there is uncertainty about the null hypothesis, it is common practice to use a weighted average of 8 and Cl,,,,@ ZT)g with weights provided by the sample evidence. Thus, estimators that are relevant under uncertainty about H, can be written in the general form

P” = ‘41, @I,)&? + (I-

w)P,

(3.7)

where the weight w depends on the choice of estimator. With a preliminary test estimator the weights are one and zero and depend on the outcome of the test statistic used in the hypothesis test (3.4). When using FA and the corresponding critical value c, the value of w that appears in the preliminary estimator (3.7) is given by

(3.8) For the special case where Z is diagonal, fi is replaced by the estimator b obtained from separate application of least squares to each equation. If in addition M = 2, and the preliminary test estimator in (3.7) is based on F, and a ‘critical p-value’ of 0.05, then w is defined as

WE

0

if

~10.05,

1

if

p>O.O5.

l

(3.9)

3.1.2. An empirical Bayes estimator An alternative to (3.8) Bayes estimator. In this sample information. As we specify the conjugate

and (3.9) is to use weights suggested by an empirical case the weights are a continuous function of the the first step towards developing such an estimator, prior distribution

P-N[(WWib2C]:

(3.10)

U’SGrifiths

and G. Judge, Testing and estimating location vectors

131

where C = [X’(z-’ 0 ZT)XIP1 is the covariance matrix of the generalized least squares estimator fi = CX’(_?-’ @ Z,)y, and 6 (a K-dimensional vector) and 72 are prior parameters. Since the null hypothesis is one of equality of location vectors, it is appropriate to use a prior specification where all location vectors have the same prior mean Z?. The prior covariance matrix 7’C can be viewed as an extension of that used for the g-prior suggested by Zellner (1986) and used in a similar context by Judge, Hill, and Bock (1990) and Zellner and Hong (1989). Given (3.10) and the likelihood function implied by (3.2) and (3.3), the posterior mean for ~3, conditional on 2, is given by -1

++,1

jg=

[

1

[

;c-l(1,@ZK)B+ c-l/ii

=($ +

I

1)j1[&3)1,)8+8]

=

i

&

i

(I[email protected])B+

r= S. ( 1 -tT2 i

(3.11)

In this estimation rule the parameters s, TV, and 2 are unknown. To make use of empirical Bayes ideas to replace the unknowns p and 72 with estimates, conditional on 2, we write j&/j+“,

E[ VU’] = C,

(3.12)

B = ([email protected])B+%

E[ UU’] = TIC.

(3.13)

and

Consequently,

we have the statistical

j=([email protected])B+u+u, Application

of generalized

model

E[(u+v)(u+v)‘] least squares

=(l+~~)c.

(3.14)

to (3.14) yields the estimator

of B,

L1

(3.15)

=g,

which is the restricted

estimator

given in (3.6) with 2 replaced

by 2.

132

W. Grifiths

and G. Judge, Testing and estimating

location L’ectors

The next step is to obtain unbiased estimates of the weights [l/(1 + TV)] and [~~/(l + r2)] that appear in (3.11). Working in this direction, from (3.14) we know the quadratic form

(3.16)

is distributed as a x2 random we rewrite (3.16) as

variable

with (MK -K)

degrees

of freedom.

If

where [fL(lMe3JK)jq’C-p-([email protected])jq

q=

=&qRCR’)-‘Rg,

then (3.17)

Therefore,

an unbiased 1

estimator

of [l/(1

+

~~11is

K(M-l)-2 (3.18)

1 fT2

4

If we now retuy

to the expression

for the posterior

replace j? with j3 =C given in (3.15) and [l/(1 in (3.18), we have, as an estimator for f3, K(M-

& i

1) -2

mean

in (3.11)

+ r*)l with its estimate

and given

K(M-l)-2

(lMMzK)i+

l-

~'R~(RCR~)-'R~ 1

S. ji?R'(RC~~j-'~fi I

i

(3.19) Finally, if Z is replaced by 2, and hence C becomes C, then, using (3.51, we have j,=

b becomes

K(M-l)-2 i K(M-

l)F,

([email protected])g+ 1

1i

6, i becomes

K(M-l)-2 K(M_l)F

jj, and

_ B. A

I

(3.20)

W. Grifiths

and G. Judge, Testing and estimating location uectors

133

This empirical Bayes estimator belongs to the class of estimators specified in (3.7) with weight w, = [ K(M - 1) - 2]/[ K(M - 1)&l. Unlike the discontinuous pre-test estimators where weights are given in (3.8) and (3.9), the empirical Bayes estimator is a continuous weighted average of (1, @ Z,>g? and /% Another interpretation can also be placed on (3.20). For this interpretation first note that it is possible for ws to exceed unity. Under this circumstance it seems reasonable to use instead the weight w+= min{l, w,}. Making this substitution and rearranging (3.20) yields (3.21) Written in this way b,’ can be viewed as a positive rule Stein-like estimator that is a natural extension of the Stein estimators outlined in Judge and Bock (1978). Other Stein-like estimators that have been suggested for this statistical model can be found in Zellner and Hong (1989), Garcia-Ferrer et al. (1987), Srivastava and Giles (19871, and references therein. 3.2. Risk comparisons If the estimator if is to be a practical alternative to the traditionally used preliminary test estimators, then it is essential that it possess desirable risk characteristics relative to these estimators. To get some information on sampling performance, the same sampling experiments discussed in section 2.2 were used to compare the risks of the various estimators. Under the setup of these experiments, where M = 2 and .$ is diagonal, the seemingly unrelated regression estimator b is replaced by the least squares estimator b = (b’,, b;)’ and the restricted two-stage Aitken (RTSA) estimator is

xx,+xix, 62’ ;

g= [

VI

1[

-’ xix, _-1b, Ul

xix, + -b, 62’

(3.22) 1

The finite sample properties of this estimator have been investigated by Taylor (1977,1978) and Kariya (1981). The pre-test estimators that were considered are those defined by the weights in (3.8) and (3.9). In both cases a nominal 5 percent significance level was used. The mean squared prediction error of each estimator was used as the measure of sampling performance. In fig. 1, for Xix, = Xix, = nl,, T = 20, and y = 9, we have graphed the risks of the various estimators relative to the risk from separate least squares estimation of each equation. When an estimator’s risk falls below one, it is risk-superior to b, and the converse is true when it is greater than one.

134

W. Grifiths and G. Judge, Testing and estimating location vectors

2.5

~ __

-

RTSA FApre-test F”pre-test

ii .z 15 CG -_ 1.0

0.0

_-

__----

~

0

5

1

/

10

15

-_

_-----

1

20

25

h.

Fig. 1. Empirical

risk functions.

Looking first at the results for the pre-test estimators, we find that in the A parameter space (2.7) the F, pre-test estimator is slightly risk-superior to the FA pre-test estimator when A < 4 and risk-inferior when A > 4. These results are a reflection of the fact that the F,-test rejects /3, = /.3* more frequently. More frequent rejection is desirable when A is large, because then b has smaller risk than g (labelled RTSA); less frequent rejection is desirable when A is small, since f is then risk-superior to b. A choice between the two pre-test estimators is not possible without the assignment of a prior distribution to A, or without changing the loss function. However, the computationally more expensive ‘exact’ F,-test has no obvious risk advantages over the conventional F,-test. The empirical risk of the empirical Bayes or Stein-like estimator is considerably less than that of the pre-test estimators over a large range of the A parameter space and is only slightly higher than the pre-test risk over a small range of the parameter space where A is close to zero. The restriction /3r - & = 0 provides a natural shrinkage direction for the Stein-like estimators. Although theoretical analysis of these estimators is difficult due to the complicated dependencies, it is interesting to note that in all cases consid-

W Grifiths

and G. Judge, Testing and estimating location vectors

135

ered for X,, X,, y, and T the Stein-like estimator B,’ dominated b. Consequently, strong evidence is provided that B,’ behaves in a minimax way. The graphs for other settings of X,, X,, y, and T yield qualitatively similar risk characteristics. For example, for T = 40 there is little difference between the empirical risks of the two pre-test estimators, reflecting the similar performance of the two tests. When T = 8 and the &-test rejects far more frequently, the two pre-test risks differ considerably. For example, at h = 28.8, y = 9, and X;X, = Xix, = nI,, the relative empirical risks for the F, pre-test and the FA pre-test are, respectively, 2.482 and 1.456. Examples of other results for the Stein-like estimator are as follows. For Xix, = Xi X, = nZ, and T = 40, at the origin when A = 0, the relative risks of /?,’ for y = 1, 9, and 25 are 0.690, 0.476, and 0.405, respectively. For T = 8, h = 0, and y = 1, 9, and 25, the relative risks for /3: are 0.731, 0.556, and 0.496, respectively. In each case for T and y, as A increased the relative risks of the Stein-like estimators approached one. A sampling experiment using a nondiagonal _Z and following the format of section 2 was also conducted. In this case the estimators involved were the general variants defined in section 3.1. In the experiment we report

T= 20,

M=

2,

2 =

The empirical

risk outcomes

from this experiment

are presented

Relative

risks of the seemingly unrelated regression estimator pre-test and Stein-like estimators.

in table

4.

Table 4 empirical

and the corresponding

1

A

RTSA

FA pre-test

Stein Cz)

SUR Cj,

0.00 1.14 2.24 3.30 4.56 10.26 15.61 18.25 21.09 28.51 41.06 55.89 72.99

0.198 0.467 0.729 0.983 1.286 2.657 3.943 4.577 5.261 7.048 10.070 13.641 17.764

0.302 0.616 0.882 1.121 1.349 1.790 1.658 1.526 1.396 1.134 1.029 1.000 1.000

0.498 0.641 0.742 0.815 0.877 0.987 1.000 1.002 1.002 1.002 1.001 1.000 1.000

1 1 1 1

1 1 1 1 1 1 1 1

136

W. Griffiths and G. Judge, Testing and estimating location cectors

The conclusions that can be drawn from the empirical risk results for the nondiagonal 2 presented in table 4 are similar to those for the diagonal case. Again the Stein-like estimator is risk-superior to the corresponding FA pre-test estimator over a large range of the A parameter space, and it appears to be uniformly risk-superior to the traditional seemingly unrelated regression estimator. Since the empirical risk functions cross, without the assignment of a prior distribution it is impossible to use risk properties to make a clear choice between the preliminary test and Stein-like empirical Bayes estimators. If a cery high prior probability is placed on the null hypothesis of equality of coefficient vectors, then the primary test estimator is better. However, if there is even moderate uncertainty about H,, the empirical Bayes estimator would appear to be a better choice. Thus, under hypothesis uncertainty when estimation is the problem, the empirical Bayes estimator has appealing sampling performance and is a real practical alternative to the traditional pre-test and seemingly unrelated regression estimators.

4. Final comments The problem of finding an exact finite sample test for the equality of two location vectors in the presence of different scale parameters is an old one. Weerahandi (1987) has suggested a novel and appealing procedure for filling this void. However, if criteria such as known finite sample size, power, and risk of pre-test estimation are regarded as important, our sampling results indicate that the Weerahandi test F, is not everywhere better than the more conventional, computationally easier, F,-test. It is important to report this finding because, without it, practitioners are likely to unnecessarily adopt the more complicated approach that would appear to have little normative content. If avoidance of a Type I error is regarded as paramount, then the F,-test does have the advantage of a probability of a Type I error that is smaller than the nominal probability. In any event it is clear that proposers of new tests need to provide a statistical evaluation of the relative performance of their tests and consequent pre-test estimators, so that practitioners are in a position to make better decisions about the choice of their testing and estimation techniques. Monte Carlo experiments, by their nature, produce results with limited generality. It is possible that other designs, such as one with unbalanced sample data, may produce different results. However, a balanced data setup appears to be the more common one in econometrics and until evidence to the contrary is uncovered, our results for balanced data should be adequate to guide practitioners regarding test statistics and estimators. The question of whether or not to pool two or more samples of data is frequently encountered in applied work. Continuous Stein-like estimators that make use of the F,-test statistic appear to offer under a squared error

W. Griffifhs and G. Judge, Testing and estimating location vectors

137

loss measure a risk-superior alternative to traditional linear model estimators and to conventional pre-test estimators that are normally used when the pooling question arises. Another interesting question relates to the properties of preliminary test estimators generated by a multiple decision problem where a test for heteroskedasticity precedes the test for equality of the location vectors. In this case it remains to be seen whether superior continuous counterparts to the traditional preliminary test estimators can be developed.

References Ah,

M.M. and J.L. Silver, 1985, Test for equality between sets of coefficients in two linear regressions under heteroscedasticity, Journal of the American Statistical Association 80, 730-73s. Chow, G.C., 1960, Tests of equality between sets of coefficients in two linear regressions, Econometrica 28, 591-605. Conerly, M.D. and E.R. Mansfield, 1989, An approximate test for comparing independent regression models with unequal error variances, Journal of Econometrics 40, 239-260. Dalal, S.R. and G.S. Mudholkar, 1988, A conservative test and confidence region for comparing heteroscedastic regressions, Biometrika 75, 149-152. Garcia-Ferrer, A., R.A. Highfield, F. Palm, and A. Zellner, 1987, Macro-economic forecasting using pooled economic data, Journal of Business and Economic Statistics 5, 53-67. Greenberg, E., 1980, Finite sample moments of a preliminary test estimator in the case of possible heteroskedasticity, Econometrica 48, 1805-1813. Honda, Y., 1982, On tests of-equality between sets of coefficients in two linear regressions when disturbance variances are unequal, Manchester School of Economic and Social Studies 50, 116-125. Honda, Y. and K. Ohtani, 1989, Modified Wald tests of equality between sets of coefficients in two linear regressions under heteroscedasticity, Journal of the Royal Statistical Society B 51, 71-79. Jayatissa, W.A., 1977, Tests of equality between sets of coefficients in two linear regressions when disturbance variances are unequal, Econometrica 45, 1291-1292. Judge, G. and M.E. Bock, 1978, The statistical implications of pre-test and Stein-rule estimators in econometrics (North-Holland, Amsterdam). Judge, G., R.C. Hill, and M.E. Bock, 1990, An adaptive empirical Bayes estimator of the multivariate normal mean under quadratic loss, Journal of Econometrics 44, 189-214. Kariya, T., 1981, Bounds for the covariance matrices of Zellner’s estimator in the SUR model and the 2SAE in a heteroskedastic model, Journal of the American Statistical Association 76, 975-979. Kobayashi, M., 1986, A bounds test of equality between sets of coefficients in two linear regressions when disturbance variances are unequal, Journal of the American Statistical Association 81, 510-513. Ohtani, K., 1987, On pooling disturbance variances when the goal is testing restrictions on regression coefficients, Journal of Econometrics 35, 219-231. Ohtani, K. and M. Kobayashi, 1986, A bounds test for equality between sets of coefficients in two linear regression models under heteroscedasticity, Econometric Theory 2, 220-231. Ohtani, K. and T. Toyoda, 1985a, A Monte Carlo study of the Wald, LM and LR tests in a heteroscedastic linear model, Communications in Statistics: Simulation and Computation 14, 735-746. Ohtani, K. and T. Toyoda, 1985b, Small sample properties of tests of equality between sets of coefficients in two linear regressions, International Economic Review 26, 37-44. Rothenberg, T.S., 1984, Hypothesis testing in linear models when the error covariance matrix is nonscalar, Econometrica 52, 827-842.

138

U? Grifiths

and G. Judge, Testing and estimating location uectors

Schmidt, P. and R. Sickles, 1977, Some further evidence on the use of the Chow test under heteroskedasticity, Econometrica 45, 1293-1298. Srivastava, V.K. and D.E.A. Giles, 1987, Seemingly unrelated regression equation models: Estimation and inference (Marcel Dekker, New York, NY). Taylor, W.E., 1977, Small sample properties of a class of two-stage Aitken estimators, Econometrica 45, 497-508. Taylor, W.E., 1978, The heteroskedastic linear model: Exact finite sample results, Econometrica 46, 663-675. Toyoda, T., 1974, Use of the Chow test under heteroscedasticity, Econometrica 42, 601-608. Toyoda, T. and K. Ohtani, 1986, Testing equality between sets of coefficients after a preliminary test for equality of disturbance variances in two linear regressions, Journal of Econometrics 31, 67-80. Tsurumi, H., 1984, On Jayatissa’s test of constancy of regressions under heteroskedasticity, Economic Studies Quarterly 35, 57-62. Tsurumi, H. and N. Sheflin, 1985, Some tests for the constancy of regressions under heteroskedasticity, Journal of Econometrics 27, 221-234. Watt, P.A., 1979, Tests of equality between sets of coefficients in linear regressions when disturbance variances are unequal: Some small sample properties, Manchester School of Economic and Social Studies 47, 391-396. Weerahandi, S., 1987, Testing regression equality with unequal variances, Econometrica 55, 1211-1216. Woodland, A.D., 1986, An aspect of the Wald test for linear restrictions in the seemingly unrelated regressions model, Economics Letters 20, 165-169. Yancey, T.A., G. Judge, and S. Miyazaki, 1984, Some improved estimators in the case of possible heteroskedasticity, Journal of Econometrics 25, 133-150. Zellner, A., 1962, An efficient method of estimating seemingly unrelated regressions and tests of aggregation bias, Journal of the American Statistical Association 57, 348-368. Zellner, A., 1986, On assessing prior distributions and Bayesian regression analysis with g-prior distributions, in: P.K. Goel and A. Zellner, eds., Bayesian inference and decision techniques: Essays in honor of Bruno de Finetti (North-Holland, Amsterdam) 233-234. Zellner, A. and C. Hong, 1989, Forecasting international growth rates using Bayesian shrinkage and other procedures, Journal of Econometrics 40, 183-202. Zellner, A. and W. Vandaele, 1975, Bayes-Stein estimators for k-means, regression and simultaneous equation models, in: S.E. Fienberg and A. Zellner, eds., Studies in Bayesian econometrics and statistics in honor of Leonard J. Savage (North-Holland, Amsterdam) 627-653.