Partially generalized least squares and two-stage least squares estimators

Partially generalized least squares and two-stage least squares estimators

Journal of Econometrics 23 (1983) 275-283. North-Holland PARTIALLY GENERALIZED LEAST SQUARES AND TWO-STAGE LEAST SQUARES ESTIMATORS* Takeshi AMEMIYA...

440KB Sizes 4 Downloads 65 Views

Journal of Econometrics 23 (1983) 275-283. North-Holland

PARTIALLY

GENERALIZED LEAST SQUARES AND TWO-STAGE LEAST SQUARES ESTIMATORS* Takeshi AMEMIYA Stanford University, Stanford, CA 94305, USA

Received April 1982, final version received November 1982 A class of partially generalized least squares estimators and a class of partially generalized twostage least squares estimators in regression models with heteroscedastic errors are proposed. By using these estimators a researcher can attain higher efficiency than that attained by the least squares or the two-stage least squares estimators without explicitly estimating each component of the heteroscedastic variances. However, the e%ziency is not as high as that of the generalized least squares or the generalized two-stage least squares estimator calculated using the knowledge of the true variances. Hence the use of the term partial.

1. Introduction

In this paper I propose a class of partially generalized least squares estimators and a class of partially generalized two-stage least squares estimators in regression models with heteroscedastic errors. By using these estimators a researcher can attain higher efficiency than that attained by the least squares or the two-stage least squares estimators without explicitly estimating each component of the heteroscedastic variances. However, the efficiency is not as high as that of the generalized least squares or the generalized two-stage least squares estimator calculated using the knowledge of the true variances. This is why I use the term partially above. This paper is motivated by Chamberlain (1982), who suggests a way to improve on the least squares or two-stage least squares estimator in heteroscedastic regression models without explicitly estimating each variance .I In this paper I carry Chamberlain’s idea further to define a class of estimators more efficient than his. Chamberlain assumes that the exogenous variables are i.i.d. random variables, but I work with the more standard assumption that the exogenous *This work was supported by NSF Grant SES 79-12965 to Stanford University. The paper has greatly improved through numerous discussions with Tom MaCurdy. ‘The idea should also be credited to White (1982), who proposed an instrumental variables estimator more efficient than the two-stage least squares estimator in a heteroscedastic stratified cross-section model. 0304-4076/83/%3.00 0 1983, Elsevier Science Publishers B.V. (North-Holland)

216

T. Amemiya, Partially generalized LS and 2SLS estimators

variables are known constants. So Chamberlain’s estimators are reinterpreted to conform to my setting. Chamberlain also considers a modification of the three-stage least squares estimator, but I do not because the results regarding the two-stage least squares can be easily generalized to this situation. 2. A heteroscedastic regression model In this section I will consider a heteroscedastic

regression model,

y=x/?+u,

(2.1)

where X is a T x K matrix of known constants with a full column rank and the elements {ut} of the T-vector u are independent but heteroscedastic with bounded second moments and finite fourth moments. We define C=Euu’. Note that Z is a diagonal matrix by our assumption. In order to prove certain asymptotic results later, I will assume that the elements of X are bounded and lim T-'X'X exists and is non-singular. The boundness of X is not necessary and can be easily replaced by a set of slightly more general assumptions, but I will not do so because it seems to be a rather uninteresting mathematical exercise. I assume that there are 4 linear constraints on fi written as

QB=O,

(2.2)

where Q is a 4 x K matrix of known constants with a full row rank. The possibility of no constraint is subsumed under the assumption. First, I will assume that C is known, and later I will consider the more interesting case where C is unknown. I will consider three estimators: The constrained least squares (CLS), the constrained generalized least squares (CGLS), and Chamberlain’s estimator. The first two are well known. They are defined as follows:

j?+={Z-(X'X)-lQ'[Q(X'X)-'Q']-'Q}fl,

(2.3)

where ?=(X’x)-‘X’y, and

~~={~--(x~~-lx)-lQ~~~(~~~-l~)-l~~~-l~)~c, where ~G=(x’z-lx)-lxT~y.

(2.4)

T. Amemiya, Partially generalized LS and 2SLS estimators

277

To define Chamberlain’s estimator p, premultiply (2.1) by X’ to obtain X'y=X'Xj?+X'u.

(2.5)

Then, p is CGLS applied to (2.5). Thus,

F=cI-~Q’(Q~Q’) -1~113,

(24

where ‘4 =(X’X) ~‘XZX(XX)

-l.

All the three estimators are unbiased and their variance-covariance matrices can be easily derived from their definitions. A direct comparison of the variance-covariance matrices will show (2.7) where the inequalities are in the sense of matrices. Strict inequalities generally hold except in a special case where there is no constraint, in which case

I will give an alternative, more intuitive explanation for (2.7). That fl is better than p’ (assuming there is a constraint) can be shown by noting that /I’ is CLS applied to (2.5), whereas /? is CGLS applied to (2.5) as I stated earlier. To show that fi,’ is superior to p, define a T x (T-K) matrix of constants W such that [X, W] is non-singular and W’X =O. Premultiply (2.1) by W’ and obtain w’y=

(2.8)

w’u.

Then, /3,+ can be interpreted as CGLS applied to (2.5) and (2.8) jointly and hence superior to 8. Now, I will consider the case where Z is unknown. Let y, and xi be the tth rows of y and X, respectively, and let D be the T-dimensional diagonal matrix whose tth diagonal element is equal to (y,-x;fi)‘. Then, under our assumptions we have plim (X’DX/T)

=

lim (X’CX/T),

(2.9)

which I will prove below.’ Therefore, it is clear from the definition of p that *Eq. (2.9) was First demonstrated and used in estimating the variance-covariance matrix of the least squares estimator in a heteroscedastic regression model by Eicker (1963). Eicker’s idea was further developed by White (1980).

278

T. Amemiya, Partially generalized LS and 2SLS estimators

if we replace X’CX with X’DX in the definition (2.6) we obtain the asymptotically equivalent estimator. Note that even if we cannot estimate Z we can consistently estimate lim T-‘X’CX, which is all that is needed for the present purpose. Now, a proof of (2.9). Consider the i, j th element of the right-hand side of (2.9). We have T-‘x~ZX~ = T-’ $I EXi,Xj,U:.

(2.10)

Consider the same for the left-hand side. We have

-2T-’

~

XitXjtX;

(8-a)~t.

(2.11)

t=1

The first term of the right-hand side of (2.11) converges in probability to the limit of the right-hand side of (2.10) by a law of large numbers under the assumptions I stated earlier. Also under our assumptions, the second and the third terms of the right-hand side of (2.11) converge to zero in probability. Thus, (2.9) is proved. Now I want to ask: (1) Can I define an estimator which is more efficient than Chamberlain’s and yet does not require the estimation of Z? (2) Can such an estimator be asymptotically as efficient as CGLS? The answer to the first question is yes and the answer to the second is generally no, as I will show below. When we define CGLS as in (2.4), it seems that we cannot calculate it unless we can estimate C since there does not seem to be a consistent estimator of lim T-‘X’Z;-‘X. Suppose we rewrite (2.4) using the interpretation of fi,’ given after (2.8). Then we have ,!?G+={I-(X’X)-1A(X’X)-1Q’[Q(X’X)-1x4(X’X)-1Q’]-1Q}j?,,

(2.12)

A=X’CX-X’X’W(W’ZW)-‘W’CX,

(2.13)

~c=~-(x’x)-lXZw(w’cw)-‘w’y.

(2.14)

where

and

T. Amemiya, Partially generalized

LS and 2SLS estimators

An equivalence of (2.12) to (2.4) can be also directly demonstrated the identity

C~W(W’CW)-lW’C~=I-C-+X(X’C-lX)X’z-+.

279

by using

(2.15)

We have made progress since the right-hand side of (2.12) depends only on C and not on C-‘. However, a difficulty of calculating j?b still exists: each element of the matrices T-‘X’DW and T-‘W’DW converge in probability to the limit of the corresponding element of T- ‘X’C W and T- ’ W’ZW, respectively, for the same reason as (2.9) holds, but replacing X’CW and W’CW with X’DW and W’DW does not produce an asymptotically equivalent estimator because the sizes of the matrices increase with the sample size T. The above consideration suggests that we should use only a subset W, of W in defining an estimator of the form (2.12). I assume that Wl is a T x N matrix of full column rank, where N is a finite fixed number, such that its elements are bounded and lim T- ’ W; W, exists and is non-singular. I define the class of constrained partially generalized least squares estimators (CPGLS) by PP’={I-(X’X)-1Al(X’X)-1Q’[Q(X’X)-1Al(X’X)-1Q’]-1Q}~~,

(2.16) where Al=X’~x-X’CWl(WICWl)-lWICX,

(2.17)

j?p=pI-(x’x)-lx’Zwl(WIZwl)-lw;y.

(2.18)

and

One can replace C with D in the above without changing the asymptotic distribution of the estimator. I can show that BP+ is more efficient than Chamberlain’s i? in exactly the same way as I earlier showed the superiority of /3,’ over fl. If there is no constraint, BP+is reduced to flp. Note that BP,,with D in place of C, is asymptotically more efficient than the least squares, even though Chamberlain’s estimator cannot do any better than the least squares in the case of no constraint. More precisely, we have

vpv~p=(x’x)-‘x’cwl(w;cwl)-lwlzx(x’x)-’.

(2.19)

The above equality suggests that W, should be chosen so as to maximize the correlations between the columns of WI and CX. Unfortunately, I cannot at the moment offer any concrete formula which is generally useful for finding

280

T. Amemiya, Partially generalized LS and 2SLS estimators

the optimal IV,. I will only give a simple example below, where the optimal W, can be easily found. Consider a special case of the model (2.1) where B is a scalar and X is a Tvector of ones. Assume that the first T/2 elements of C are ones and the remaining T/2 elements have the same value a, which is an unknown parameter. Actually, the number of elements in each of the two groups may differ from T/2 by any finite number without affecting our asymptotic results. Then, the optimal W, is the vector whose first T/2 elements are ones and remaining T/2 elements are minus ones. Then, using (2.19), we can easily show V&=V&(l/T)(2a/(l

(2.20)

+a)),

whereas

~B^=U/T)U +4/2). 3. A heteroscedastic

simultaneous

(2.21) equations model

In this section I will consider a limited information simultaneous equations model with heteroscedastic errors defined by y=Yy+X,fl+u=Zcr+u,

(3.1)

y=xII+v,

(3.2)

and

where X1 is a subset of X, X following the same condition as in section 2, and u also has the same properties as in section 2, with Euu’ = Z diagonal as before. As for I’, I assume that its t th row u; may be correlated among one another and also with u, but serially independent with bounded variances. Here I do not assume any constraints among the parameters y and /I, though such constraints can be easily handled. As in section 2, I first assume that Z is known and compare the following three estimators: the two-stage least squares (2SLS), the generalized two-stage least squares (G2SLS), and Chamberlain’s estimator. The 2SLS estimator of CY is defined by 4 = (z,PZ) - ‘Z’Py,

where P = X(X’X) given by I/& = (Z’Z)

- IX’,

(3.3) and its asymptotic

- l Z’xqZ’Z)

- 1,

variance-covariance

matrix is

(3.4)

T. Amemiya, Partially generalized LS and 2SLS estimators

where Z=(Xn,

281

X,). I define the G2SLS estimator as

8,=(z’PC_‘PZ)_lz’PC_‘Py,

(3.5)

and its asymptotic variance-covariance

matrix is given by

VB,=(Z’c~‘Z)-‘.

(3.6)

Theil (1961, p. 345) defined G2SLS as (Z’P,YIZ)-‘Z’PC-‘y, which has the same asymptotic distribution as (3.5). I defined it as (3.5) in order to rewrite it in a certain way, which I will show later. Chamberlain’s estimator can be derived by premultiplying (3.1) by X’ to obtain x’y = x’za

+ X’u,

(3.7)

and then, applying GLS to (3.7), as L?= [ZX(X’CX)

-

1x’z] - ‘Z’X(x’CX) - ‘x’y.

Its asymptotic variance-covariance I/i?= [z’x(xlcx)-‘X’Z]

It is straightforward

(3.8)

matrix is given by -1.

(3.9)

to show

I/oi2 I/&2 V’!i,.

(3.10)

If C is unknown, one can replace X’CX by X’DX in (3.8) without changing the asymptotic distribution because of (2.9) where D is now defined as the diagonal matrix whose t th diagonal element is equal to (y,--~idi)~. Using the identity (2.15) we can rewrite (3.5) as ti,=(Z’XA_‘X’Z)_‘Z’XA_‘X’y,

(3.11)

where A is as defined in (2.13). Either in the form (3.5) or (3.11), however, one cannot replace C with D without changing the asymptotic distribution for the same reason I explained in section 2. As in section 2, I define the class of partially generalized two-stage least squares estimators (PG2SLS) by oi,=(Z’XA;‘X’Z)~‘Z’XA;‘X’y, where A, is as defined in (2.17). Its asymptotic varianceecovariance given by V&,=(Z’XA;lX’Z)-l.

(3.12) matrix is

(3.13)

282

T. Amemiya, Partially generalized LS and ZSLS estimators

It is easy to show

The asymptotic distribution definition.

of 6, is unchanged if C is replaced with D in its

All the estimators considered in this section can be straightforwardly generalized to the full information simultaneous equations model to yield the G3SLS, PG3SLS, and the corresponding Chamberlain’s estimator. I will briefly indicate how this can be done. Write n structural equations as

y=za+u,

(3.15)

where v=(y;,

y; ,...,

y:)‘,

a=(&,&

,...,

a;)‘,

u=(u;,u;

,...,

I&)‘,

and

z=

2,

0

0

Z2

.

i:

0

0

Also define [email protected],

w,=

1.

Z”

where @ is the Kronecker product, and

w;

0

0

W

0

...

0

V

So far everything is essentially the same as the model (3.1) using these newly defined matrices which appear in bold italics. The only significant new feature of (3.15) as compared to (3.1) is that here C=Euu’ is not diagonal,

T. Amemiya, Partially generalized LS and 2SLS estimators

283

but is of the form

where each ~ij is a diagonal matrix. However, this does not create any significantly new problem because, for example, plim T-‘X’D,jX where the tth diagonal element of Dij is (Yit-Z:,Cli) = plim T - ‘X’ZijX, (Yjt --ZStaj).

References Chamberlain, Gary, 1982, Multivariate regression models for panel data, Journal of Econometrics 18,546. Eicker, F., 1963, Asymptotic normality and consistency of the least squares estimators for families of linear regressions, Annals of Mathematical Statistics 34, 447456. Theil, Henri, 1961, Economic forecasts and policy, 2nd rev. ed. (North-Holland, Amsterdam). White, Halbert, 1980, A heteroscedastic-consistent covariance matrix estimator and a direct test for heteroscedasticity, Econometrica 48,817-838. White, Halbert, 1982, Instrumental variables regression with independent observations, Econometrica 54483499.