Estimating the asymptotic covariance matrix for quantile regression models a Monte Carlo study

Estimating the asymptotic covariance matrix for quantile regression models a Monte Carlo study

JOURNAL OF Econometrics Journal EIXWIER of Econometrics 68 (1995) 303-338 Estimating the asymptotic covariance matrix for quantile regression mod...

2MB Sizes 0 Downloads 29 Views

JOURNAL OF

Econometrics Journal

EIXWIER

of Econometrics

68 (1995) 303-338

Estimating the asymptotic covariance matrix for quantile regression models A Monte Carlo study Moshe Buchinsky Department

of Economics, Yale University. New Haven, CT 06520-8264, USA (Received

July 1991; final version

received April 1994)

Abstract This Monte Carlo study examines several estimation procedures of the asymptotic covariance matrix in the quantile and censored quantile regression models: design matrix bootstrap, error bootstrapping, order statistic, sigma bootstrap, homoskedastic kernel. and heteroskedastic kernel. The Monte Carlo samples are drawn from two alternative data sets: (a) the unaltered Current Population Survey (CPS) for 1987 and (b) this CPS data with independence between error term and regressors imposed. This special setup allows one to evaluate the estimators under various realistic scenarios. The results favor the design bootstrap for the general case, but also support the order statistic when the error term is independent of the regressors. Key words: Quantile and censored quantile JEL classification: C13; C14; C15; C24

regression;

Asymptotic

covariance

matrix

1. Introduction

Quantile regression models have gained considerable interest recently, especially in theoretical discussions. The availability of efficient linear programming algorithms together with the rapid development of computers have opened the

This paper is, in part, from chapter 2 of my dissertation ‘The Theory and Practice of Quantile Regression’, Harvard University, 1991. I wish to thank Don Andrews, Joshua Angrist, Chris Cavanagh, Paul Gunther, Bo Honor& three anonymous referees, and especially Jim Powell for many discussions and comments. Comments made by the participants in various universities’ Gary Chamberlain error is, of course,

0304-4076/95/$09.50 SSDl 030440769401652

mine.

1995 Elsevier Science S.A.

304

M. BuchinskylJournal

of Econometrics 68 (1995) 303-338

gates for more common use of such models in empirical applications. Of special interest in estimation of the asymptotic covariance matrix for quantile regression estimators. This paper reports on a Monte Carlo study, using real data, of various order statistic, bootstrap, and kernel estimators for both the quantile and the censored quantile regression models. Much recent literature has been devoted to the theoretical aspects of quantile regression estimators, and in general, robust estimators.’ Nevertheless, little attention has been given to the practical problems of determining the estimators’ precision. In limited experiments relating to median regression Dielman and Pfaffenberger (1986,1988) compared the nominal size and power of a Wald test to the empirical Monte Carlo size and power, while Stangerhaus (1987) investigated aspects related to computation of confidence intervals. A common assumption in these studies in that the error (uO) density function at zero is independent of the covariate vector x (i.e.,fU,(0)x) =f.,(O) for all x). Under this assumption the regressors and the error term for the Monte Carlo samples were drawn independently from hypothetical distributions. In addition, the dependent variable y was constructed from x, [email protected], and an assumed coefficient vector Be. No investigation have been made into practical problems associated with estimation in the more general case, whenf,,(Olx) #f&(O). In this study the Monte Carlo samples are not drawn from a hypothetical distribution but rather from the actual Current Population Survey (CPS) for 1987. Moreover, samples are drawn from the joint distribution of x and y, so that the distribution of the error term is that which is actually contained in the data. One can thereby investigate several issues significant for empirical application. Various techniques that are consistent in the general case can be examined for an actual important data set. In addition, since the CPS data on wages are available only in a discrete form, one can examine the importance in practice of the continuity assumption for the error term density. Finally, we can also examine practical issues relating, for example, to the bootstrap sample size, the kernel estimators bandwidths, etc. The paper is organized as follows: Section 2 briefly introduces the quantile and censored quantile regression models. The estimators examined in this study are presented in Section 3. The Monte Carlo simulation setup and explanations about some subjective choices for the order statistic and the bootstrap estimators are discussed in Section 4. Section 5 studies the nature of the heteroskedasticity in the CPS data set. The results of the Monte Carlo simulations are presented and discussed in Section 6, while Section 7 discusses the standard errors for the various estimation schemes. Section 8 presents conclusions and remarks.

1 See, for example, studies by Bickel (1973), Carroll (1989), and Newey and Powell (1990).

and Ruppert

(1984), Portnoy

and Koenker

M.

BuchinskylJournal

oj’ Econometrics

68 (1995)

303-338

305

2. Quantile and censored quantile regressions

2.1. Quantile reyression The conditional written as

quantile of Koenker

and Bassett’s (1978) model can be

Qe(Yilxi) = X~PO, i = 1, . . . ,n,

(1)

where fi8 and xi are K x 1 vectors and Xii = 1. The error us = y - x’/& is assumed to have a continuously differentiable c.d.f., F,,( - 1x), and density functionf,,( . Ix). An estimator for be is obtained from

minf ,C i4dyi 8

.db),

(2)

I-1

where [email protected](I)= (0 - I(,? < O)),? is the check function and I(A) is the usual indicator function. The problem in (2) is a linear programming problem (see, e.g.. Koenker and Bassett, 1978). Powell (1984,1986) showed that (2) fits into the generalized method of moments (GMM) framework for the censored quantile regression model, under Huber’s (1967) conditions. In particular one can show that

where /10 = U(1 - H)(E[,f,,(0~x)xx’])-‘E[.~x’](E[f~,(0~.~).~x’])-‘. If the density of ue, at 0, is independent simplifies to A

=

0

et1 - 0)(E[xx’])-’

f’o

= o;(E[xx’])-

of x, i.e., jJ0l.x)

(3 =&(O), then &

‘,

U# where fI(l - 0) cH2=m.

(5)

2.2. Censored quantile regression The censored quantile regression model, which allows one to deal with censored data, was proposed by Powell (1984,1986) and can be written similarly

M. BuchinskylJournal

306

of Econometrics 68 (1995j 303-338

to (1) as [email protected](ylx) = min(y’, x’Pe), where y” is the censoring value. A consistent estimator rffefor /$ is obtained as a solution to mjn $ ,i

PO(Yi

-

min(y”,

XXV).

(6)

1-l

If follows that

where /I; = 0(1 - G)(E[fU,(O(x)1(x’/3, < yO)xx’])-’ E[r(x’/?, < y’)xx’] x (ECfue(Olx)l(x’Be <

Y~).=‘I)-

‘.

(7)

= af(E[Z(x’/& < y’)xx’])-‘.

(8)

If~U,(Olx)=,fu,(O), then n: simplifies to /ip = &l - 0) (E[~(x’/$ < yO)xx’])-’ 0 f’ 4

Since min (y’, x$) is not linear in /I, the problem in (6) is not an LP problem. Nonetheless, an LP algorithm - which I call the iterative linear programming algorithm (ILPA) - can be used in the following manner: In thejth iteration we solve BP’ using only the observations for which x$r- ‘) < Y’.~ The algorithm is terminated when the sets of observations in two consecutive iterations are the same. Convergence to a solution is not guaranteed, but if it is achieved then a local minimum is obtained.3

3. Estimators for the asymptotic covariance matrix Several estimators for the asymptotic covariance matrix for [email protected] examined: (a) order statistic, (b) design matrix bootstrap, (c) error bootstrapping, (d) sigma bootstrapping, (e) general kernel, and (f) homoskedastic kernel, with several variations for each of these estimators.

’ Dropping the observations for which [email protected] 2 y0 is of no consequence y, - y’, which does not depend on B8.

since yi - min(ya, xi/&) =

3 For a proof see Buchinsky (1991). Typically, for the CPS data set, convergence has occurred in five to twenty iterations regardless of the starting value pi. The starting value in all the estimations here is the median regression estimate, with the constant properly adjusted. Similar algorithm was suggested by Osborne and Watson (1971). Other algorithms for nonlinear I, regression were suggested by Womersley (1986) and Koenker and Park (1993).

M. Buchinsky/Journal

of Econometrics 68 (1995) 303-338

307

3.1. Order statistic estimator (OS) This estimator is valid only under the independence assumption (i.e., for the covariance matrices in (4) and (8)). An estimator for ai in (5) can be obtained by matching an exact confidence interval for the 0th quantile, pe, from a binomial distribution, with a confidence interval using a normal approximation. From the binomial distribution B(n, Q) and its limit for large n we have Pr(U,,, d pe d U,,,) = Pr(s < XB < t)

where X, - B(n,@, U(,,,) denotes the sample mth-order statistic, s = [no - 11, t = [nH + 11, and [Ii] is the integer part of A.Denoting by Z, the qth quantile of a standard normal variable, the normal approximation implies that for a symmetric 1 - a confidence interval Pr& P,

- (acne lj2Z 1 -s/2 @(Z1

-a,2)

-

@(

G

-z1

ps

d

Ofj +

o(Jn-“2z,

_a,2)

-a/2).

(10)

Equating the two confidence intervals from (9) and (10) gives an estimate for 02:4 II s; = n( O#) - 0,,,)“/4z:

-n

and

I= Z1 _a,2Jm,

(11)

where UCrnjis the mth-order statistic of r&i, . ,u*o,.5 Asymptotically the choice of Zr -a,2 does not matter, but it does affect S,’ in small samples. In this study I use three values of Z1_o,2: 1.65, 1.96, and 2.57, corresponding to 1 - a confidence intervals of 0.90, 0.95, and 0.99, respectively.

4An estimate & is provided then by A, = c?:((l/n)~“, ,x,x:)) ‘. For the censored quantile regression model an estimate S& for 0: is based only on the observations for which xi/$ i y” and 2,’ = B&((l/n)C:=, I(x{JO < y”)xixl)-‘. ‘This estimator (see Huber, 1981) employs differences between two order statistics to estimate the derivative dFU;‘(t)/dt = l/f.,(FU;‘(t)). Estimating this reciprocal density at a point using order statistic was suggested by Siddiqui (1960) and investigated by Bloch and Gastwirth (1968) Bofinger (1975), Sheather and Maritz (1983) and Hall and Sheather (1988). This literature supports an for the estimator used here is r~r’~. optimal bandwidth rate of n“‘, while the rate of the bandwidth Koenker and Bassett (1982b) provide an alternative method for taking a discrete derivative of the empirical quantile function.

308

M. Buchinsky/Journal

of Econometrics

68 (1995) 303-338

3.2. Design matrix bootstrap (DMB) estimator

This type of estimator, which was suggested initially by Efron (1979,1982) and attracted many researchers, is valid for the general case (i.e., for covariance matrices given in (3) and (7)).6 In this method an estimate for /ie is computed by

(12) where fly’, . . ,Bf’ are the B bootstrap estimates for be, for the B samples (each of size m) drawn from FnXY,the empirical joint distribution of x and y, and /Ii is some pivotal vector.’ Two alternative pivotal values are considered here: (a) /?;1= PO,,the estimate for the original Monte Carlo sample (denoted by DMBE), and (b) Bi = (l/S)?;= rflg’, the average of the bootstrap estimates (denoted by DMBA). In addttlon, the direct percentile method (referred to as DMBP) is considered. In this method the upper and lower bounds of a 1 - 5 confidence interval are taken to be - element by element - the [X/2] and [B(l - t/2)] order statistics, respectively, of the bootstrap estimates. 3.3. Error bootstrapping

(EB)

The consistency of this estimation procedure relies on the independence assumption. Instead of resampling fromf,,, it is based on separate resampling of m observations e: and XT (j = 1, . ,m) from the empirical distribution functions of ue, F,,;, (based on GO,, . . . , Con,u*eiE yi - xi/&), and of x, F,,,, respectively. The dependent variable is computed by y; = xf’/$ + e;. Three alternative computation methods, identical to those for the DMB procedures, are employed here as well. These are denoted by EBE (for /I: = Be), EBA (for the average of the bootstrap estimates), and EBP (for the percentile method). In the censored quantile regression model F,,,( .) cannot be estimated in the usual way since some of the residuals u*8iare censored. Instead, I use Kaplan and

‘jThere is little theoretical justification for the use of the bootstrap method in econometric models. The bootstrap method for quantile regression was first implemented by Buchinsky (1994) and was adopted also by Chamberlain (1994). Several authors considered bootstrapping the sample quantile (e.g., Efron, 1979, 1982; Bickel and Freedman, 1981; Singh, 1981; Lo, 1989). Hahn (1992) proved the consistency of the bootstrap estimator using the percentile method described below. Buchinsky (1992) justified the use of the bootstrap method for the special case of discrete x’s, Justification of bootstrap estimator for the mean regression is provided in Freedman (1981), while Yang (1985) considered a general class of differentiable functionals. ‘The resampling schemes for the quantile and censored quantile regression models are essentially the same. The only difference is the manner in which the bootstrap estimates are obtained.

M.

BuchinskylJournal

cf Econometrics

68 (1995)

303-338

309

Meier’s (1985) consistent estimator F,;,(t) = 1 -

n

(1 - Ij),

Uj
where 5 = nj/rj. nj is the number of uncensored observations - possibly only one - with I&, = Uj, and rj is the total number of observations for which tiei d Uj. 3.4, Sigma bootstrap estimator (SBE) This estimator also relies on the independence assumption. Here, ai in (4) and (8) is estimated directly using a bootstrap method. The estimate is given by (13) where & is the 0th quantile of r&r, . . . , lion and Gr), . . . ,Gp’ are B bootstrap estimates from B samples (each of size m) drawn from F,tB.’ 3.5. General kernel (GK) estimator Powell (1986) considered a one-side uniform kernel estimator for the general case covariance matrix. The estimates for the terms of the covariance matrix in (7) are provided by d^, = t ,$ r-l

Z(Xifie

< yO)XiXi

(14)

for EII(x’bB -CyO)xx’], and Afx =

(C,n)-’

i

I(X:bo

<

yO)Z(O< fe, d

C,)XiXi

(15)

i=l

for E[l(x’be < y”)fU,(O1x)xx’], where c, = op(l) is the kernel ‘bandwidth’.’ The present study also considers an alternative normal kernel estimator given by n

d,,

= dLL 2

Z(X$~

<

y’)exp{ - U^~,/(~C~)}~(L&,

3 0)X,X;,

(16)

i=l

*Note that by the linear programming representation of quantile regression, Q0 = 0 always. An estimator for A, is provided by /i, = B~~((l/n)C”_,xixl)~‘. For the censord quantile regression model F,, is constructed using the Kaplan-Meier procedure, and 2: = c?&,((l/n) x ^ I;=, f(x:j?o < yO)xix;)- ‘. 9This estimator can be easily modified to accommodate the quantile regression model, merely by dropping the indicator function I(xifiO < y’) from the formulas in (14) and (15).

M. BuchinskylJournal

310

of Econometrics 68 (1995) 303-338

where d, = nc,$/2. For both the uniform and normal kernels, two-side estimators are also considered wherein the indicator function in (15) is replaced with I( - 42 < &, d c,/2), the indicator function in (16) is dropped, and d, in (16) is now d, = nc,,/‘%. 3.6. Homoskedastic

kernel (HK) estimator

Under the independence assumption one can estimatef,,(O) and substitute into 0; in (5) to get an estimate for As, A, = (tI(1 [email protected]/j;(O)) (6 ’ Cf= 1 Xix;)- ‘. Powell (1986) suggested as a consistent one-side uniform kernel estimator for _L,(O),

i,(O)= c, l

ljl m$, < (i$lwB8
yO)Z(O < lie, <

CJ,

for the censored quantile regression model. lo Both one- and two-side uniform and normal kernel estimators are investigated in this study.

4. The Monte Carlo simulation

setup

4. I. ‘Population ’ sample

The Monte Carlo samples are drawn from the actual 1987 CPS sample (containing 75,578 observations) of all outgoing rotation groups of adult white males. This special situation enables one to compare small sample estimates with their ‘population’ counterparts. For this study I computed the (K x 1) coefficient vectors /[email protected] a wage equation at the 0.10, 0.25, 0.50, 0.75, and 0.90 quantiles. The log of usual weekly earnings is the dependent variable, and the independent variables consist of: (i) a constant, (ii) education, (iii) potential experience, and (iv) potential experience squared.’ ’ The five ‘population’ coefficient vectors were estimated twice. First a quantile regression was estimated, using the Barrodale and Roberts (1973) algorithm, assuming no censoring problem.” Secondly, in order to study the effect of censoring, an extreme censoring value of $750 was artificially imposed on all

“For

the quantile

regression

model the estimator

is&”

= (c.n)) ‘~~=, I(0 < lie, < c.).

r’ Education = higher grade attended - 1 - I(last grade was not completed), where indicator function. Potential experience = min(age - education - 6, age - 18). 12The BarrodaleeRoberts algorithm quantile (e.g., Koenker and D’Orey,

I

is for the median 1987).

regression,

I( .) is the

but is easily modified

for any

M. BuchinskylJournal

of Econometrics 68 (1995) 303-338

311

weekly earnings of $750 or more.i3 Censoring then becomes a severe problem, even for low quantiles. The censored quantile regression is estimated using the ILPA. 4.2. Monte

Carlo

simulations

scheme

In this definition of the ‘population’ and the ‘population parameters’ the distribution of the error term is not assumed, but rather is given its actual value. Consequently, the error term ue is not independent, in general, of the regressors x (see the next section). In order to examine the performance of the alternative methods under different assumed relationships between the error term and the regressors, two sets of simulations are carried out. In the first set, the Monte Carlo samples, each of size n, are drawn from the (x, y) pairs of the ‘CPS population’; in this case u0 depends on x. In the second set, the Monte Carlo samples are drawn from the set of x’s and t+,‘s independently, while the y’s are computed from x, uO,and the population parameters. While ug is independe,nt of x, its marginal distribution remains the same as for the first case, i.e., the empirical distribution of the ‘population errors’ r&i, . . . , tie,. Under both schemes a coefficient vector is estimated for the Monte Carlo sample, along with covariance matrices using each of the alternative procedures. I then construct 0.95 confidence intervals for each population parameter using each of the above estimated standard errors and check whether or not it contains the corresponding population parameter. This procedure is repeated N,,,, times. The numbers reported in each of the tables below give the empirical levels of the Monte Carlo simulations, i.e., the fraction of times in which the computed confidence intervals contained a particular population coefficient.14 Since the Monte Carlo repetitions are independent, the variance of an empirical level filk is

var(hk)= j+

plk t1 - plk),

(17)

mc

for every estimator I (I = 1, . . . , L) and every population coefficient fiOk(k = 1> ... 3K). Standard errors for typical empirical levels and Monte Carlo samples sizes are reported in Table 12.

I3 The actual censoring value in the CPS data set is $999, with 5,474 observations (7.2% of the total number of observations) having this value as their usual weekly earnings. 12,196 observations (16.1%) take the value $750 after the artificial censoring at that value is imposed. t4The confidence intervals are computed separately regions for the population parameter vector.

for each coefficient;

these are not confidence

M. BuchinskylJournal

312

of Econometrics 68 (1995) 303-338

4.3. Choosing o!for the order statistic estimator

To illustrate the impact of the choice of c( on the estimate ag, a sample of 500 observations was randomly drawn from the CPS sample (with a censoring value artificially set at $750). ce is then estimated for values of Z1 _a,2 ranging from 1.5 to 2.7 (corresponding to 1 - c( ranging from 0.86 to 0.99), at the 0.10,0.50, and 0.90 quantiles. The estimates of de for 8 = 0.10, 0.50, 0.90 graphed in Figs. la-lc show the sensitivity of ciOto the confidence interval level.15 At the 0.10 quantile the 8,‘s range from 1.02 to slightly over 1.22. This variation is even larger at the 0.90 quantile, where the &‘s differ by a factor of about 1.5. Even for confidence intervals of 0.95 to 0.99 (Z, _n,2 = 1.96 to 2.7) the variability in rie is quite significant, especially at the extreme quantiles. While censoring creates a problem at the 0.90 quantile, and to lesser extent at the 0.50 quantile, it causes no problem at the 0.10 quantile. Therefore, the sensitivity of the [email protected] at the 0.90 quantile cannot be solely attributed to censoring, although it is certainly magnified by it. 4.4. Choosing the number of bootstrap repetitions

When using the bootstrap technique one needs to choose: (a) the number of repetitions B and (b) the bootstrap sample size m. In the current study only the effect of the latter choice is examined, while the number of repetitions is based on a preliminary examination of the behavior of the bootstrap estimates. In the following example I use a sample of size 100 drawn from the 1987 annual CPS (with top coding value set at $7501, and I estimate the coefficient vector and a set of 150 bootstrap estimates (each of size 100). Note that with B bootstrap estimates at hand, one can also estimate the covariance matrix for /?e, using the fourth-order moments. Denote each element of the covariance matrix estimate in (12) (divided by n) by Vj, that is,

Let Ujdenote the vector of the stacked columns of the lower triangular matrix of Vj, i.e., Uj= vecl(Vj), and let u = vecl(%(/&)). Then an estimate of the covariance matrix of the covariance matrix estimate is given by

Gi

(vG(be)) =

$

,e - (Uj

U)(Uj

U)‘.

(19)

J-1

I5 Note from (11) that for s = [no - I] and t = [no + j] the numerator thresholds as G(increases, while the denominator increases continuously the ‘ratchet effect’ observed in Figs. la-lc.

of $8’ changes only at certain as a increases. This leads to

M. Buchinsky/.lournal

of Econometrics

68 (1995) 3035338

a. JO Quantllo ,24Slomo

I 22

-

1.20 -

I ooruILtLLucLuL:“‘~I~‘~~!“~‘:“‘~~’~~~:~~~ I5

lb

17

I8

IV

Siondora

2 norm01

2.1

22

2.3

dlslrtb~llon

24

25

26

25

26

values

b.30 Quantile 0.62

Sigma

0.60 0.58 -

Standard

normal

dlstrlbullon

values

o.Jellll~:1111:~~~~~:~~~~:~~~~~ 1.5

I.6

I7

I8

l.V

Slondord

Fig. 1. vg estimates

for censored

2 normal

21

2.2

23

dlnrloutlon

quantile

regression,

24

values

with sample size 500.

313

314

M. BuchinskyJJournalof Econometrics68 (1995) 303-338

Since the standard error is a simple function of the variance, one can use the delta method to show that the asymptotic standard error, se,( .), of the estimated standard error,3 ( - ), for an element in &, is given by

(20) where va( .) denotes the asymptotic variance and var( .) denotes the variance. Figs. 2a-2d depict the standard errors, along with their standard errors, as functions of the number of bootstrap repetitions for two coefficients at the 0.50 and 0.90 quantiles. Two important points are clear: (a) the bootstrap estimates stabilize quickly and have relatively small standard errors, and (b) the bootstrap estimates are smoother and stabilize faster when censoring is not a problem (Figs. 2a and 2c) than when it presents a severe problem (Figs. 2b and 2d). Based on preliminary examination of the bootstrap estimator for various sample sizes, the following rule was adopted in all the Monte Carlo experiments: 100 repetitions are carried out whenever the bootstrap sample size is less than 500 observations, and 50 repetitions for 500 or more observations.

5. Characterization 5.1.

of the heteroskedasticity

in the CPS data

General test for heteroskedasticity

When the error term ue is independent of x, the slope coefficients at different quantiles should be the same. Equality of the slope coefficients at the five estimated quantiles can be tested using the minimum distance (MD) framework. Under the null hypothesis of equality among the slope coefficients, the MD statistic has an asymptotic X2-distribution, i.e., X2 = n(fly+

Gfl”)‘Apl(fl-

x2((J -l)(k

Gf?‘)

-1))

(21)

wherep =(/I&, . . . , /I;,)’ is a stacked vector of J unrestricted parameter vectors, fi is its estimate, B” = (G’,?, ‘G)) r G’ ‘fl is the efficient MD estimator for the restricted parameter vector /3” = (/Ior, . . . , &, p2, . . , /Ik)‘, G is a suitable restriction matrix, and A, is a consistent estimator for AP, the covariance matrix of the unrestricted estimate fl. For two alternative order statistic estimates for A,, corresponding to CI= 0.05 and c( = 0.01, the x2 statistics were 2937.6 and 3198.5, respectively, well above any reasonable critical value. This clearly rejects the null hypothesis, indicating

;ia

0.022

22

22

0.027-

-

35

St. Error

46

30

62

54

-

4b

Error

70

- SI. Error

reoettttons *

78

l

04

St. Errol

.Sb 102

70

quantile

- St. Error

01 re~lllOnl

b2

l

94

IO2

bootstrap

St. Error

Sb

repression.

-

78

Coefficient, .90 Ghantile

NO

-

No. 01

54

Fig. 2. Censored

-St.

38

c. Experience

30

a. Education Coefficient, .90 Qnantile

estimates

110

110

-

0.02

-

38

2.1 Error

46

30

-St.

38

62

70

- SI. Error

-

with sample

- St. Error

52 70 NO. of rq,etfto”s

sic

+

*

75

78

Coefficient,

-

NO. 01 re*tttlonr

S4

54

coellicients.

Error

46

d. Experience

30

for difkrcnt

22

+

0.04

0.141

22

b. Education Coefficient.

94

102

94

100.

+ St. Error

a5

102

SO Quantilo

f SI. Error

.%,

50 Quantile

110

110

316

M.

BuchinskylJournal of Econometrics68 (1995) 303-338

that the distribution regressors.

of the error term ue depends significantly on the set of

5.2. Koenker-Bassett

model of heteroskedasticity

In view of this result one would like to obtain information about the degree of heteroskedasticity in the CPS data for a particular model. For this purpose I employ the multiplicative heteroskedasticity model of Koenker and Bassett (1982a) wherein y = x’/I + u = x’fl + (T(X)&,and E is i.i.d. error independent of x.16 For a(x) = ( 1 + x’y) we get y = x’s + (1 + x’y)(uO + Q&)), where ue = E - [email protected](E). The conditional quantile can then be written as Qe(~lx)

= x’P + Qe(ulx)

= x’(D + YQ~(E)) + Qe(E) = x’de,

(2-3

where 60 = fi + (y + ei)Qe(s) and e; = (l,O, . . . ,O). In this setup /?I and y1 cannot be separately identified, as is also the case with the Qej(s)‘s. I therefore set y1 = 0, Q0.50(~)= 0, and estimate the remaining coefficients using the MD framework. Let 6(p) = (Sk,, . . . ,S&)‘, ,u = (PI, ... ,Pk,~z, ... ,YkYQe,(E), ... 9Qe,(&))‘3 where Q,,50 = 0. An efficient estimator for p is obtained from

min(Bmatrix is given by

As can be clearly seen from Table 1, the estimates large, and undoubtedly significant. This strongly suggests significant linear quantile regression specified

The results

This section evaluates

I6 I wish

M. Buchinsky/Journal Table 1 Koenker-Bassett

heteroskedasticity

Coefficient

of Econometrics 68 (1995) 303-338

model

Point estimate

Standard

and standard

error

0.912 0.06 1 0.045 0.001 0.072 0.058 0.001 1.338 0.630 0.544 0.925

431.275 9.061 5.646 -0.097 - 1.025 -2.261 0.05 1 -99.61 I -46.943 39.834 12.660 Note: The coefficients

317

errors

are multiplied

f-statistic 473.16 149.62 126.89 - 100.09 - 14.22 - 38.87 37.34 - 74.46 - 74.57 73.19 78.60

by 100.

the first data set, given the heteroskedasticity in the CPS data, only the design matrix and general kernel estimators are consistent. For the second data set all estimators are consistent. The performance of each method is evaluated in terms of the departure of the empirical levels from the 0.95 nominal level. The results for the 0.25 and 0.75 quantiles are intermediate and are therefore omitted. The results at the 0.25 quantile fall between the 0.10 and 0.50 quantiles, but closer to the 0.50 quantile. Similarly, the results at the 0.75 quantile fall between the 0.90 and 0.50 quantile, and again closer to the 0.50 quantile. The values in each table are grouped by quantiles. The four lines in each group report the empirical level for the constant, the coefficient on education, and the coefficients on experience and experience squared, respectively. 6.1. Order statistic (OS) estimator The results for the order statistic estimator are reported in panels A and B of Table 2 for the quantile and censored quantile models using the original CPS data. Similarly, panels C and D report the results for the independent CPS data.’ 7 While the order statistic estimator is not consistent for the former data, its computational simplicity makes it quite attractive. Estimates of 6; based on a larger confidence interval yield larger (and closer to the 0.95 nominal level) empirical levels uniformly across all quantiles and parameter estimates. Nevertheless, the empirical levels for a particular E vary significantly both across quantiles and across the parameters tending to be smaller at the higher quantiles for all sample sizes. “The results for a = 0.05 fall between therefore omitted.

the two reported

results for a = 0.10 and c( = 0.01 and are

M. BuchinskylJournal

318

Table 2 Empirical

levels for order statistic,

Monte Carlo 100 0.10”

Panel A: QR

3000 repetitions

sample size 500

O.Olb

of Econometrics 68 (1995) 303-338

0.10”

too

1000 O.Olb

Original

0.10”

O.Olb

sample

0.10”

500 O.Olb

Panel C: QR

0.10”

1000 O.Olb

Independent

0.10”

O.Olb

sample

0.10 quantile 0.852 0.836 0.855 0.835

0.977 0.977 0.972 0.956

0.907 0.912 0.893 0.853

0.929 0.927 0.918 0.879

0.909 0.912 0.883 0.850

0.926 0.931 0.903 0.877

0.868 0.875 0.871 0.88 1

0.977 0.977 0.975 0.974

0.93 1 0.933 0.932 0.932

0.946 0.946 0.947 0.948

0.917 0.92 1 0.925 0.925

0.936 0.944 0.944 0.945

0.910 0.908 0.891 0.862

0.917 0.924 0.896 0.861

0.921 0.915 0.895 0.867

0.921 0.913 0.894 0.867

0.901 0.903 0.918 0.916

0.913 0.913 0.928 0.925

0.934 0.934 0.928 0.925

0.943 0.938 0.935 0.934

0.935 0.935 0.939 0.937

0.932 0.937 0.936 0.938

0.869 0.911 0.841 0.851

0.887 0.925 0.860 0.871

0.835 0.880 0.839 0.850

0.856 0.900 0.859 0.867

0.807 0.804 0.808 0.810

0.926 0.923 0.93 1 0.928

0.918 0.925 0.922 0.922

0.939 0.937 0.932 0.937

0.919 0.921 0.913 0.921

0.934 0.939 0.929 0.937

0.50 quantile 0.892 0.908 0.870 0.843

0.900 0.910 0.893 0.855

0.90 quantile 0.757 0.810 0.727 0.731

0.888 0.922 0.867 0.871

Panel B: CQR

Original

sample

Panel D: CQR ~ Independent

sample

0.10 quantile 0.837 0.864 0.793 0.771

0.973 0.968 0.959 0.936

0.901 0.904 0.911 0.883

0.920 0.902 0.942 0.926

0.911 0.912 0.897 0.874

0.939 0.929 0.917 0.903

0.867 0.860 0.863 0.859

0.971 0.967 0.969 0.964

0.934 0.937 0.933 0.934

0.952 0.953 0.947 0.947

0.923 0.915 0.922 0.920

0.942 0.935 0.938 0.938

0.849 0.846 0.843 0.818

0.873 0.858 0.856 0.833

0.869 0.862 0.858 0.822

0.880 0.870 0.861 0.825

0.891 0.882 0.873 0.867

0.913 0.904 0.905 0.904

0.895 0.889 0.899 0.896

0.906 0.902 0.913 0.910

0.910 0.905 0.892 0.895

0.917 0.913 0.908 0.907

0.774 0.813 0.756 0.776

0.827 0.860 0.826 0.844

0.795 0.852 0.783 0.808

0.842 0.862 0.842 0.849

0.718 0.721 0.744 0.769

0.750 0.745 0.776 0.804

0.792 0.790 0.802 0.807

0.849 0.856 0.861 0.866

0.834 0.839 0.830 0.845

0.868 0.876 0.868 0.870

0.50 quantile 0.849 0.839 0.841 0.8 13

0.868 0.869 0.882 0.846

0.90 quantile 0.727 0.773 0.679 0.737

0.798 0.826 0.758 0.812

Note: The basic population is the 1987 annual CPS for white males (75,578 observations). dependent variable is log usual earnings. The covariates are constant, education, experience, experience squared. ‘LX= 0.10, corresponds ‘x = 0.01, corresponds

to Z1 _ll,z = 1.645. to Z, _z,z = 2.576.

The and

M.

Buchinsky/.lournal

of Econometrics

68 (1995)

303-338

319

Comparison of panels A and B shows that the empirical levels for the censored quantile regression are slightly lower than for the quantile regression, mostly at the 0.90 quantile which is affected severely by censoring. In contrast, at the 0.10 quantile, which is unaffected by censoring, the empirical levels are comparable. In general, the empirical levels for the larger sample sizes are closer to the 0.95 nominal level across all quantiles, most noticeably at the 0.90 quantile. The fact that for a given LX the performance of the OS estimator is enhanced only slightly as the sample size increases, suggests that it converges rapidly to its population value. It is not the appropriate covariance matrix, however, for the original CPS data. This is verified by the results reported in panels C and D using the independent CPS data. The performance of the OS improves significantly at all quantiles, for all levels of a and for all sample sizes. Also the OS yields, in general, better results for the education and the constant coefficients than for the two experience coefficients. In summary, the OS performs reasonably well when the independence assumption is satisfied. In fact, as will be clear from the results below, it is the most reliable among the estimators that are valid only under the independence assumption, This is significant since the computation time required for this estimator is a few seconds, while the time for the bootstrap methods is rather lengthy. 6.2. Sigma bootstrapping

(SB) estimator

The performance of the SB estimator is examined for both the original and the independent CPS data, even though it is valid only for the latter. The results are reported in Table 3. The first three columns in each panel pertain to a sample size of 100 with bootstrap sample sizes of 50,100, and 500. The last three columns are for a sample size of 500, with bootstrap sample sizes of 100, 500, and 1000. The table indicates that the SB estimator does not perform well for relatively small samples in either data set. Moreover, it is very sensitive to the size of the bootstrap sample even for the independent sample. Smaller bootstrap sample sizes yield empirical levels closer to the 0.95 nominal level. The SB performance improves significantly when the sample size increases, especially for the independent sample, providing reasonable empirical levels for small bootstrap sample sizes for data sample sizes of 500 (see panels C and D). The consistently low empirical levels imply that the SB estimator yields too small standard errors. Note, however, that the empirical levels are much closer to the 0.95 level for the independent data. The performance of the SB improves for large samples because F,,( .) is estimated more accurately. The reason for the decline in the empirical levels for large bootstrap sample sizes is more difficult to explain. There are two effects working in opposite directions: (a) as m becomes larger the bootstrap estimates

M. BuchinskylJournal

320

Table 3 Empirical

levels for sigma bootstrap,

Monte Carlo

1000 repetitions

sample size

100

500

Bootstrap 50

of Econometrics 68 (1995) 303-338

100

500

sample size

100

Panel A: QR

500

100

Original

500

1000

sample

50

100

500

100

Panel C: QR - Independent

500

1000

sample

0.10 quantile 0.882 0.887 0.850 0.822

0.842 0.821 0.807 0.771

0.560 0.572 0.553 0.532

0.911 0.938 0.884 0.852

0.868 0.885 0.857 0.836

0.853 0.854 0.822 0.779

0.851 0.859 0.865 0.866

0.793. 0.786 0.802 0.800

0.546 0.554 0.551 0.562

0.941 0.935 0.931 0.924

0.906 0.898 0.888 0.868

0.867 0.869 0.849 0.842

0.748 0.746 0.688 0.664

0.924 0.926 0.900 0.862

0.920 0.902 0.874 0.833

0.888 0.863 0.854 0.829

0.915 0.920 0.906 0.896

0.883 0.882 0.884 0.866

0.764 0.743 0.746 0.737

0.938 0.937 0.936 0.938

0.912 0.904 0.917 0.915

0.896 0.885 0.903 0.900

0.512 0.569 0.503 0.490

0.852 0.893 0.835 0.830

0.798 0.856 0.801 0.810

0.787 0.832 0.768 0.762

0.825 0.824 0.862 0.864

0.764 0.766 0.803 0.805

0.561 0.569 0.594 0.586

0.910 0.920 0.933 0.928

0.881 0.878 0.902 0.902

0.855 0.855 0.872 0.872

0.50 quantile 0.901 0.891 0.857 0.837

0.865 0.873 0.827 0.810

0.90 quantile 0.792 0.808 0.768 0.757

0.721 0.765 0.724 0.727

Panel B: CQR

Original

sample

Panel D: CQR - Independent

sample

0.10 quantile 0.910 0.929 0.894 0.842

0.854 0.869 0.792 0.771

0.553 0.571 0.569 0.523

0.919 0.918 0.903 0.892

0.881 0.893 0.852 0.832

0.821 0.827 0.839 0.816

0.901 0.895 0.892 0.885

0.857 0.842 0.833 0.832

0.596 0.588 0.58 1 0.589

0.925 0.933 0.939 0.936

0.897 0.898 0.894 0.890

0.871 0.877 0.872 0.870

0.672 0.624 0.657 0.655

0.871 0.856 0.894 0.868

0.824 0.810 0.822 0.787

0.806 0.796 0.827 0.795

0.902 0.911 0.910 0.917

0.879 0.884 0.901 0.898

0.768 0.776 0.778 0.790

0.951 0.947 0.949 0.946

0.937 0.933 0.940 0.930

0.935 0.930 0.937 0.926

0.441 0.462 0.408 0.465

0.718 0.766 0.596 0.650

0.481 0.539 0.401 0.450

0.583 0.643 0.474 0.541

0.835 0.833 0.837 0.855

0.795 0.803 0.795 0.812

0.638 0.644 0.643 0.658

0.920 0.924 0.927 0.935

0.878 0.880 0.884 0.888

0.859 0.852 0.857 0.859

0.50 quantile 0.912 0.905 0.861 0.833

0.875 0.859 0.835 0.802

0.90 quantile 0.839 0.871 0.799 0.814

0.742 0.782 0.709 0.737

Note: See note to Table 2. Also, each bootstrap

estimate

is computed

based on 100 repetitions.

M. BuchinskyjJournal

of Econometrics 68 (1995) 303-338

321

are more likely to be closer to each other, consequently leading to a smaller covariance estimate, and (b) larger m tends to increase the covariance matrix estimate since it multiplies the sum in (13). Apparently, the first effect dominates. In summary, the SB estimator performs reasonably well when the independent assumption, under which it is consistent, is satisfied. For data which do not satisfy the independence restriction, it seems to perform poorly. However, the SB estimator is easy to compute - on the order of minutes, significantly less than for the other bootstrap estimators. 6.3. Design matrix bootstrap (DMB) estimator The results for the DMB estimator are summarized in Tables 4 and 5 for sample sizes of 100 and 500 observations, respectively. Panel A of each table reports the results for the quantile regression model, while panel B is for the censored quantile regression model. The first six columns in each panel report the results for the two sample sizes using the original CPS data for the DMBE, DMBA, and DMBP. Similarly, the last six columns report the results for the independent CPS data. The DMB estimators yields empirical levels very close to the 0.95 nominal level, usually within approximately one standard error. The DMBE is the most precise, while the DMBP yields the lowest empirical levels. This suggests that 100 bootstrap repetitions may be insufficient for directly constructing confidence intervals. All three versions of the DMB estimator perform equally well for the original and the independent data, and for both models. This is encouraging as one would like to guard against possible heteroskedasticity, but not at the expense of poor performance when independence is actually satisfied. Another apparent result is the robustness of the DMB estimates to changes in the relative size of bootstrap to data sample size. The empirical levels for DMBE are virtually the same for the two bootstrap sample sizes. The other two estimators yield empirical levels slightly better for the smaller bootstrap sample size. A data sample size of 500 yields even better results (see Table 5). The performance of all three DMB procedures, in both panels of Table 5, is extremely good. At all quantiles and for all coefficients the empirical levels are near the 0.95 nominal level, with the largest deviation being 0.030. All three procedures are insensitive to the bootstrap sample size. This is of signal importance, especially for the censored quantile regression, since using a small bootstrap sample size reduces significantly the computational costs without affecting the covariance matrix estimate. (It takes 10 to 20 seconds to run a quantile regression with 4 variables atid 500 observations.) In simulations with sample sizes larger than 500 observations the DMB methods perform to complete satisfaction. The empirical levels for all the DMB methods are very close to the

M. BuchinskylJournal

322

Table 4 Empirical Original

levels for design matrix

bootstrap,

sample size 100, 1000 repetitions

Independent

CPS sample

sample size 500

100 A

Panel A

68 (1995) 303-338

Monte Carlo

CPS sample

Bootstrap

E

qf Econometrics

P Quantile

100

E

500

A

P

E

A

P

E

A

P

regression

0.10 quantile 0.952 0.960 0.957 0.958

0.952 0.943 0.944 0.949

0.919 0.925 0.928 0.934

0.973 0.959 0.963 0.950

0.966 0.938 0.966 0.937

0.955 0.928 0.948 0.937

0.971 0.977 0.961 0.950

0.951 0.966 0.939 0.929

0.946 0.944 0.930 0.916

0.966 0.964 0.951 0.951

0.947 0.945 0.935 0.930

0.936 0.928 0.921 0.920

0.941 0.953 0.959 0.960

0.965 0.962 0.971 0.944

0.951 0.950 0.952 0.935

0.944 0.946 0.946 0.931

0.957 0.963 0.966 0.961

0.943 0.938 (I.951 0.941 0.954 0.951 0.950 0.944

0.963 0.964 0.952 0.950

0.950 0.954 0.943 0.941

0.944 0.934 0.941 0.933

0.941 0.965 0.905 0.929

0.953 0.955 0.947 0.943

0.929 0.940 0.943 0.942

0.909 0.922 0.907 0.922

0.963 0.955 0.948 0.943

0.949 0.939 0.930 0.927

0.941 0.925 0.929 0.920

0.944 0.949 0.940 0.939

0.929 0.921 0.924 0.929

0.906 0.905 0.902 0.904

0.50 quantile 0.972 0.962 0.967 0.969

0.956 0.943 0.964 0.951

0.90 quantile 0.964 0.970 0.950 0.949

0.946 0.956 0.917 0.940

Panel B - Censored

quantile

regression

0.10 quantile 0.961 0.961 0.931 0.942

0.932 0.952 0.909 0.922

0.934 0.937 0.908 0.912

0.931 0.914 0.891 0.909

0.912 0.891 0.892 0.891

0.875 0.850 0.865 0.860

0.958 0.957 0.961 0.968

0.935 0.940 0.941 0.950

0.936 0.942 0.935 0.938

0.967 0.960 0.975 0.966

0.956 0.947 0.962 0.953

0.942 0.932 0.946 0.945

0.908 0.932 0.972 0.947

0.941 0.932 0.956 0.945

0.906 0.924 0.947 0.925

0.893 0.903 0.927 0.902

0.972 0.966 0.970 0.968

0.959 0.957 0.958 0.958

0.953 0.947 0.957 0.953

0.960 0.951 0.959 0.968

0.951 0.940 0.948 0.959

0.943 0.926 0.941 0.949

0.924 0.934 0.891 0.914

0.959 0.968 0.967 0.972

0.924 0.942 0.925 0.940

0.891 0.915 0.895 0.916

0.982 0.980 0.986 0.989

0.951 0.953 0.956 0.967

0.931 0.923 0.935 0.952

0.968 0.964 0.970 0.979

0.950 0.941 0.948 0.958

0.921 0.905 0.926 0.946

0.50 quantile 0.952 0.945 0.971 0.964

0.931 0.946 0.971 0.956

0.90 quantile 0.968 0.969 0.972 0.962

0.954 0.965 0.914 0.923

Note: See note to Table 2. The estimators

are: E = DMBE,

A = DMBA,

P = DMBP.

M. Buchinsky/Journal

Table 5 Empirical Original

levels for design matrix

bootstrap,

Carlo

sample size 500, 1000 repetitions CPS sample

sample size

100

500 A

Panel A

Monte

323

68 (I 995) 303-338

Independent

CPS sample

Bootstrap

E

qf’ Econometrics

P Quantile

100

E

500

A

P

E

A

P

E

A

P

regression

0.10 quantile 0.942 0.940 0.955 0.953

0.926 0.930 0.940 0.937

0.922 0.937 0.946 0.930

0.956 0.944 0.946 0.950

0.940 0.946 0.930 0.933

0.936 0.925 0.930 0.933

0.954 0.964 0.948 0.941

0.946 0.954 0.938 0.932

0.947 0.949 0.934 0.927

0.941 0.950 0.942 0.940

0.928 0.937 0.931 0.925

0.929 0.933 0.930 0.919

0.938 0.951 0.929 0.946

0.956 0.947 0.953 0.951

0.939 0.927 0.958 0.944

0.921 0.906 0.944 0.944

0.963 0.955 0.945 0.953

0.956 0.947 0.940 0.948

0.947 0.950 0.940 0.945

0.934 0.946 0.946 0.960

0.930 0.938 0.937 0.950

0.919 0.934 0.939 0.948

0.930 0.944 0.931 0.934

0.950 0.963 0.946 0.943

0.944 0.940 0.935 0.937

0.929 0.938 0.933 0.937

0.945 0.942 0.952 0.948

0.934 0.933 0.942 0.939

0.930 0.926 0.942 0.943

0.946 0.932 0.949 0.946

0.927 0.917 0.937 0.934

0.931 0.915 0.934 0.924

0.50 quantile 0.966 0.969 0.940 0.947

0.967 0.969 0.945 0.945

0.90 quantile 0.948 0.953 0.941 0.948

0.938 0.939 0.937 0.945

Panel B - Censored

quantile

regression

0.10 quantile 0.962 0.942 0.959 0.950

0.941 0.942 0.955 0.941

0.937 0.941 0.947 0.942

0.91 I 0.946 0.937 0.925

0.903 0.930 0.931 0.922

0.904 0.939 0.916 0.926

0.952 0.946 0.957 0.956

0.944 0.941 0.944 0.947

0.943 0.941 0.940 0.942

0.948 0.945 0.938 0.936

0.936 0.934 0.922 0.916

0.926 0.925 0.921 0.916

0.927 0.925 0.944 0.961

0.950 0.941 0.944 0.946

0.937 0.916 0.934 0.935

0.931 0.924 0.951 0.944

0.955 0.945 0.939 0.949

0.943 0.937 0.931 0.943

0.942 0.932 0.936 0.939

0.952 0.946 0.941 0.929

0.944 0.934 0.933 0.916

0.939 0.930 0.930 0.916

0.924 0.919 0.848 0.890

0.967 0.971 0.962 0.981

0.957 0.962 0.962 0.981

0.948 0.943 0.943 0.957

0.956 0.971 0.957 0.964

0.936 0.956 0.939 0.954

0.935 0.934 0.911 0.929

0.963 0.971 0.945 0.942

0.940 0.940 0.926 0.912

0.928 0.930 0.905 0.895

0.50 quantile 0.950 0.950 0.962 0.959

0.933 0.932 0.965 0.950

0.90 quantile 0.967 0.971 0.967 0.971

0.938 0.948 0.895 0.910

Note:

See note to Table 4.

324

M. BuchinskylJournal

of Econometrics 68 (1995) 303-338

0.95 nominal level for all coefficients at all quantiles, even for bootstrap sample sizes one-tenth the original sample size. The major disadvantage of the DMB method is its computational cost, which is quite high even for relatively small bootstrap sample sizes. When the independence of x and u0 is obvious, the OS estimator might be preferable. 6.4. Error bootstrapping

(EB) estimator

The results for the EB estimator are reported in Table 6 for sample sizes of 100 and 500 observations using only the independent CPS data. Panels A and B report the results for the quantile and censored quantile regression models, respectively. For the 100 sample size two bootstrap sample sizes of 50 and 100 observations were considered. For the 500 sample size the bootstrap samples employed are of 100 and 500 observations. The results for the original CPS data are omitted since the EB is as costly to compute as the DMB, while it is not consistent. The EB estimator performs quite well at the middle quantile when the bootstrap sample size is small relative to the data sample size. When the bootstrap sample size is large relative to the data sample size, the empirical levels fall well below the 0.95 nominal level, especially at quantiles affected by censoring. The empirical levels at the 0.90 quantile where there is censoring are considerably lower than those at the 0.50 quantile. Note also that the empirical levels are not uniform across the three EB estimators. A feature common to the DMB and EB estimators is that for both the quantile and the censored quantile regression models the EBE estimator seems to perform slightly better than the EBA estimator. Both estimators perform significantly better than the EBP estimator. Apparently, more bootstrap repetitions are needed for the percentile method in order to be able to precisely compute confidence intervals for the parameters. The three EB estimators are less sensitive than the SB estimator to bootstrap sample size, but for similar reasons their performance deteriorates significantly for large bootstrap sample sizes. For large sample sizes this becomes less of a problem for the EB estimator than for the SB estimator. The EB estimator results for a sample of 1000 observations are similar to the 500 sample size and are therefore omitted. Nevertheless, it is worth noting that even for this relatively large sample size, the EB methods tend to perform better for relatively small bootstrap sample sizes. 6.5. Homoskedastic

kernel (HK) estimator

Both a uniform and a normal kernel function are considered for the homoskedastic kernel estimator. For each of these functions one- and two-side kernel estimators forf,,(O) are examined. The results are reported in Table 7 for the

M. BuchinskylJournal

Table 6 Empirical

levels for error bootstrap,

Monte Carlo Bootstrap

sample

sample

independent

Panel A

1000 repetitions

Monte Carlo

size 100

100 A

CPS data,

325

sample size 500

size

50 E

of Econometrics 68 (1995) 303-338

P Quantile

100

E

500

A

P

E

A

P

E

A

P

regression

0.10 quantile 0.924 0.926 0.927 0.931

0.921 0.925 0.924 0.927

0.907 0.915 0.910 0.913

0.890 0.878 0.896 0.898

0.884 0.873 0.890 0.892

0.872 0.872 0.872 0.882

0.944 0.943 0.940 0.945

0.939 0.941 0.937 0.942

0.935 0.933 0.924 0.936

0.891 0.912 0.896 0.908

0.882 0.906 0.890 0.902

0.872 0.900 0.898 0.898

0.938 0.940 0.928 0.932

0.926 0.914 0.920 0.923

0.923 0.912 0.917 0.922

0.913 0.894 0.916 0.914

0.949 0.954 0.938 0.939

0.946 0.950 0.933 0.937

0.943 0.938 0.920 0.933

0.935 0.931 0.918 0.923

0.927 0.920 0.916 0.915

0.935 0.924 0.930 0.915

0.920 0.926 0.932 0.922

0.888 0.864 0.883 0.897

0.881 0.862 0.880 0.891

0.870 0.864 0.878 0.886

0.967 0.977 0.974 0.983

0.959 0.973 0.973 0.983

0.955 0.959 0.959 0.970

0.927 0.938 0.932 0.946

0.91 I 0.904 0.931 0.918 0.928 0.924 0.938 0.930

0.50 quantile 0.943 0.945 0.937 0.943

0.942 0.944 0.935 0.939

0.90 quantile 0.935 0.930 0.933 0.932

0.931 0.926 0.931 0.929

Panel B

Censored

quantile

regression

0.10 quantile 0.921 0.902 0.912 0.928

0.914 0.894 0.909 0.924

0.893 0.872 0.901 0.917

0.865 0.878 0.874 0.882

0.859 0.874 0.870 0.881

0.850 0.860 0.857 0.872

0.944 0.943 0.940 0.945

0.939 0.941 0.937 0.942

0.935 0.933 0.924 0.936

0.891 0.912 0.896 0.908

0.882 0.906 0.890 0.902

0.872 0.900 0.898 0.898

0.927 0.913 0.918 0.926

0.892 0.893 0.880 0.888

0.888 0.886 0.874 0.883

0.881 0.880 0.874 0.877

0.949 0.954 0.938 0.939

0.946 0.950 0.933 0.937

0.943 0.938 0.920 0.933

0.935 0.931 0.918 0.923

0.927 0.920 0.916 0.915

0.935 0.924 0.930 0.915

0.828 0.831 0.850 0.923

0.731 0.745 0.784 0.836

0.728 0.739 0.777 0.833

0.650 0.667 0.708 0.779

0.967 0.977 0.974 0.983

0.959 0.973 0.973 0.983

0.955 0.959 0.959 0.970

0.927 0.938 0.932 0.946

0.911 0.931 0.928 0.938

0.904 0.918 0.924 0.930

0.50 quantile 0.938 0.940 0.935 0.939

0.934 0.936 0.930 0.934

0.90 quantile 0.907 0.926 0.928 0.964

0.900 0.919 0.925 0.963

Note: See note to Table 4. The marginal cumulative distribution function of the error ug is estimated using the Kaplan-Meier procedure. The estimators are: E = EBE, A = EBA, P = EBP.

M. BuchinskyJJournal

326

Table 7 Empirical levels for homoskedastic repetitions Uniform

kernel

One-side

(c,)

0.20

kernel, original

Two-side a”

1.5

Panel A - Quantile

0.20

of Econometrics

(c.)

68 (1995) 303-338

CPS data, Monte Carlo

Normal

kernel

One-side

(c.)

sample size 500, 5000

Two-side

(c.)

1.5

aa

0.20

1.5

a’

0.20

1.5

aa

regression

0.10 quantile 0.808 0.806 0.762 0.723

0.629 0.616 0.585 0.538

0.806 0.807 0.782 0.761

0.900 0.899 0.870 0.829

0.840 0.834 0.800 0.757

0.846 0.844 0.821 0.804

0.754 0.755 0.715 0.676

0.782 6.785 0.742 0.705

0.807 0.811 0.783 0.767

0.899 0.902 0.862 0.843

0.974 0.977 0.960 0.945

0.857 0.861 0.834 0.831

0.851 0.843 0.802 0.800

0.940 0.931 0.897 0.873

0.998 0.996 0.989 0.983

0.891 0.884 0.844 0.845

0.949 0.945 0.919 0.890

1.00 1.00 1.00 1.00

0.853 0.849 0.809 0.812

0.966 0.960 0.937 0.916

1.00 1.00 1.00 1.00

0.883 0.891 0.851 0.850

0.801 0.798 0.756 0.749

0.859 0.896 0.854 0.859

0.914 0.944 0.911 0.917

0.839 0.841 0.795 0.791

0.988 0.996 0.992 0.990

1.00 1.00 1.00 1.00

0.804 0.802 0.772 0.751

0.874 0.912 0.874 0.878

0.998 1.00 0.999 0.999

0.850 0.849 0.816 0.814

0.50 quantile 0.937 0.927 0.892 0.866

1.00 1.00 I.00 I.00

0.90 quantile 0.960 0.976 0.961 0.959

1.00 I.00 1.00 1.00

Panel B

Censored

quantile

regression

0. IO quantile 0.783 0.785 0.783 0.755

0.598 0.599 0.601 0.569

0.808 0.808 0.781 0.763

0.887 0.889 0.879 0.857

0.794 0.801 0.788 0.762

0.845 0.846 0.820 0.802

0.749 0.752 0.713 0.673

0.774 0.779 0.736 0.697

0.809 0.812 0.782 0.769

0.898 0.899 0.860 0.839

0.972 0.974 0.958 0.941

0.860 0.863 0.836 0.834

0.854 0.845 0.803 0.799

0.911 0.905 0.899 0.874

0.995 0.993 0.996 0.991

0.892 0.886 0.843 0.848

0.894 0.885 0.883 0.852

1.00 1.00 1.00 1.00

0.856 0.851 0.810 0.811

0.929 0.926 0.918 0.896

1.00 I.00 1.00 1.00

0.885 0.894 0.850 0.852

0.797 0.793 0.751 0.746

0.960 0.979 0.885 0.933

0.980 0.992 0.937 0.978

0.835 0.834 0.794 0.787

0.872 0.946 0.846 0.910

1.00 1.00 1.00 1.00

0.801 0.803 0.769 0.750

0.598 0.737 0.605 0.736

0.984 0.997 0.969 0.986

0.847 0.846 0.815 0.811

0.50 quantile 0.890 0.884 0.883 0.854

I.00 1.00 1.00 1.00

0.90 quantile 0.979 0.990 0.946 0.978

1.00 1.00 1.00 1.00

Note: See note to Table 2. The formula ’ Bandwidth

is selected automatically,

for the asymptotic

covariance

using the least-squares

matrix is given in Section 3.

cross-validation

method.

M.

BuchinskylJournal

Table 8 Empirical levels for homoskedastic 5000 repetitions Uniform

kernel

One-side

(c,)

0.20

1.5

kernel, independent

Two-side a”

Panel A - Quantile

0.20

of Econometrics

(c.)

68 (1995)

CPS data,

Normal

kernel

One-side

(c.)

303-338

Monte

327

Carlo

sample

Two-side

size 500,

(c.)

1.5

aa

0.20

1.5

aa

0.20

1.5

aa

regression

0.10 quantile 0.807 0.814 0.815 0.808

0.633 0.640 0.638 0.645

0.856 0.856 0.826 0.803

0.904 0.902 0.902 0.900

0.842 0.842 0.847 0.842

0.894 0.886 0.867 0.848

0.775 0.774 0.780 0.782

0.802 0.805 0.810 0.809

0.858 0.860 0.830 0.811

0.908 0.909 0.913 0.912

0.978 0.978 0.981 0.980

0.909 0.905 0.880 0.876

0.900 0.895 0.852 0.844

0.948 0.949 0.939 0.935

0.999 0.998 0.997 0.998

0.947 0.939 0.893 0.894

0.954 0.957 0.953 0.952

1.00 1.00 1.00 1.00

0.902 0.898 0.859 0.858

0.969 0.970 0.968 0.968

1.00 1.00 1.00 1.00

0.939 0.946 0.901 0.899

0.849 0.852 0.807 0.799

0.925 0.932 0.927 0.928

0.967 0.969 0.961 0.959

0.885 0.888 0.843 0.835

0.998 0.999 0.999 0.999

1.00 1.00 1.00 1.00

0.852 0.853 0.823 0.801

0.941 0.941 0.948 0.947

1.00 1.00 1.00 1.00

0.896 0.901 0.866 0.860

0.50 quantile 0.946 0.939 0.937 0.933

1.00 1.00 1.00 1.00

0.90 quantile 0.991 0.993 0.991 0.991

1.00 1.00 1.00 1.00

Panel B

Censored

quantile

regression

0. IO quantile 0.818 0.828 0.822 0.822

0.616 0.616 0.612 0.621

0.856 0.856 0.826 0.803

0.912 0.917 0.908 0.915

0.816 0.831 0.818 0.823

0.890 0.886 0.864 0.843

0.755 0.748 0.769 0.768

0.783 0.779 0.797 0.795

0.859 0.860 0.830 0.810

0.902 0.897 0.910 0.907

0.974 0.973 0.978 0.974

0.909 0.908 0.883 0.877

0.900 0.895 0.852 0.844

0.917 0.914 0.914 0.911

0.996 0.998 0.997 0.998

0.946 0.938 0.893 0.895

0.918 0.923 0.917 0.918

1.00 1.00 1.00 1.00

0.902 0.898 0.859 0.858

0.952 0.956 0.952 0.951

I.00 1.00 1.00 1.00

0.939 0.950 0.898 0.899

0.848 0.846 0.805 0.797

0.851 0.854 0.868 0.879

0.963 0.967 0.960 0.968

0.884 0.884 0.843 0.834

0.990 0.991 0.995 0.997

1.00 1.00 1.00 1.00

0.852 0.853 0.823 0.801

0.924 0.922 0.939 0.940

1.00 1.00 1.00 1.00

0.896 0.899 0.866 0.855

0.50 quantile 0.896 0.891 0.894 0.889

1.00 1.00 1.00 1.00

0.90 quantile 0.960 0.964 0.968 0.974

1.00 1.00 1.00 1.00

Note: See note to Table 7. a Bandwidth

is selected automatically,

using the least-squares

cross-validation

method

328

M. Buchinsky/Journal

of Econometrics 68 (1995) 303-338

original CPS data, and in Table 8 for the independent CPS data. The results for the one-side uniform kernel are reported in the first three columns of each table for bandwidth values of 0.20, 1.5, and also an automatically chosen bandwidth.” The next three columns report similar results for the two-side uniform kernel. The results for the normal kernel are organized similarly in the last six columns. Below a bandwidth of 0.20 the empirical levels are very low, and above 1.5 bandwidth they are always around 1.0. Table 7 shows considerable fluctuation in the empirical levels. These very between a very low level of 0.60 to the highest attainable level of 1.0, for two quantiles using the same bandwidth. Less variation can be detected for the automatic bandwidth, but its empirical levels are well below the 0.95 mark. The two-side kernels yield, in general, empirical levels closer to the 0.95 level than the one-side kernel, with the normal kernel performing better than the uniform kernel. In summary, this estimator performs quite poorly for the original CPS data. Table 8 presents the results for the HK estimator using the independent CPS data with sample size of 500 observations, for the same kernels and bandwidths as in Table 7. For samples smaller than 500 the HK estimator is very sensitive and inaccurate. The resulting empirical levels for the independent data are much better than for the original data, especially for the two-side kernels. The empirical levels are much closer to 0.95, for both the quantile and censored quantile regression models. Some deficiencies do remain though, the most important being that the empirical levels are not uniform across quantiles for any given value of the bandwidth. Data-based choices of c, yield better empirical levels, but they are still consistently lower than the 0.95 level. Furthermore, they are somewhat variable both across quantiles and across coefficients. Note that the performance of the HK estimator for the quantile and censored quantile models are comparable, even at the 0.90 quantile which is affected enormously by censoring. As for the quantile regression model, the two-side kernels perform better than the one-side kernels, with the two-side normal kernel performing the best. At a relatively large bandwidth the estimated standard errors are consistently too large. 6.6. General kernel (GK) estimator

The results for the GK estimator using one- and two-side uniform and normal kernels for the original CPS data are reported in Tables 9 and 10. The results for the independent CPS data are reported in Table 11. The order of the columns is the same as in Table 7.

‘s The automatic man, 1986).

bandwidth

is chosen using the least-squares

cross-validation

method

(e.g., Silver-

M. BuchinskylJournal

of Econometrics 68 (1995) 303-338

329

The GK estimator generally performs better than the HK estimator even for the independent data. This is a major advantage for the GK since it is computationally inexpensive, while at the same time guarding against heteroskedasticity. One may note, however, that the DMB estimator dominates the GK estimator. The results in Table 9 for the quantile regression model are not completely satisfactory. For the one-side kernel estimator, all bandwidths, including the automatic bandwidth, yield empirical levels that are too high (relative to the 0.95 nominal level) at the higher quantiles and too low at the lower quantiles. Much better results are obtained for the two-side kernel estimator. The results for the larger sample are better (see panel B) for all estimators, especially for the two-side kernel with the automatic bandwidth. The results also indicate that distinct bandwidths are required at different quantiles. The results in Table 10 for the censored quantile regression model are consistent with, but less precise than, those in Table 9. The results are similar for the 0.10 and 0.50 quantiles. At the 0.90 quantile, however, the empirical levels drop when using fixed bandwidths, but not for the automatic bandwidth. Also, some variation in the empirical level is noticeable across the parameters at any given quantile. Table 11 reports the results for the censored quantile regression models using the independent data. (The results for the quantile regression are very similar and are therefore omitted.) Clearly, the empirical levels are very sensitive to the bandwidth choices. Small bandwidths yield too low empirical levels, while large bandwidths yield too high values. The automatic bandwidth yields quite reasonable empirical levels especially for the sample of 500 observations. The overall performance of the general kernel estimator, and especially the two-side normal kernel, is reasonably good. These estimates are not as precise as the DMB estimates, but they are much less expensive to compute (taking only a few minutes).

7. On the relative magnitude of the standard errors The results presented do not explicitly indicate the magnitude of differences in the standard errors obtained for the various estimators, but they do provide enough information to extract this information. Suppose the average standard errors for a certain coefficient obtained by two competing methods are se’ and se*, and let [/$ - Z, _or,2sej,fiO+ Z, _,,zsej] be the confidence interval associated with the observed empirical probability level pj (j = 1,2). If the true standard error se0 were known, a pj confidence interval would be [fl, Z (1+p,),2seo,~0 + Zc,+,j,i2se0]. Since, however, the length of the two confidence intervals must be the same, it follows that sei/seo = Z~l+pJ,,2/Z1_a,Z.

M.

330

Table 9 Empirical

levels for general

Uniform

kernel

One-side

(c,)

0.20

Buchinsky/Journal

1.5

kernel, original

Two-side aa

of’ Econometrics

0.20

CPS data, quantile

(c,)

I.5

68 (1995)

u”

303-338

regression,

Normal

kernel

One-side

(c.)

5000 repetitions

Two-side

(c.)

0.20

1.5

,a

0.20

1.5

aa

Panel A - Monte Carlo sample size 100 0.10 quantile 0.805 0.792 0.772 0.737

0.645 0.640 0.643 0.633

0.827 0.821 0.792 0.776

0.834 0.825 0.796 0.760

0.884 0.884 0.854 0.837

0.838 0.838 0.810 0.791

0.787 0.771 0.751 0.719

0.808 0.811 0.785 0.759

0.859 0.845 0.826 0.812

0.884 0.883 0.832 0.793

0.979 0.977 0.974 0.965

0.898 0.890 0.877 0.898

0.951 0.939 0.942 0.920

0.917 0.909 0.913 0.885

0.997 0.995 0.996 0.993

0.927 0.938 0.933 0.934

0.957 0.948 0.952 0.934

1.00 1.00 1.00 1.00

0.934 0.949 0.931 0.954

0.967 0.967 0.948 0.934

1.00 1.00 1.00 1.00

0.971 0.977 0.960 0.933

0.960 0.942 0.934 0.933

0.909 0.883 0.899 0.879

0.951 0.962 0.957 0.957

0.948 0.929 0.925 0.929

0.996 0.988 0.992 0.988

1.00 1.00 1.00 1.00

0.924 0.936 0.929 0.926

0.930 0.922 0.918 0.912

1.00 1.00 1.00 1.00

0.960 0.938 0.944 0.928

0.50 quantile 0.932 0.922 0.921 0.898

1.00 1.00 1.00 1.00

0.90 quantile 0.983 0.970 0.974 0.966

1.00 1.00 1.00 1.00

Panel B

Monte Carlo

sample size 500

0.10 quantile 0.851 0.839 0.817 0.796

0.643 0.625 0.629 0.610

0.903 0.904 0.885 0.843

0.920 0.918 0.882 0.844

0.855 0.860 0.838 0.844

0.945 0.936 0.905 0.874

0.782 0.783 0.788 0.773

0.784 0.788 0.774 0.755

0.906 0.899 0.880 0.849

0.904 0.917 0.879 0.878

0.976 0.976 0.967 0.960

0.963 0.969 0.924 0.905

0.961 0.953 0.968 0.955

0.938 0.930 0.936 0.924

0.998 0.997 0.994 0.991

0.965 0.965 0.974 0.962

0.952 0.941 0.964 0.956

1.00 1.00 1.00 1.00

0.969 0.969 0.977 0.951

0.967 0.965 0.968 0.964

1.00 1.00 1.00 1.00

0.963 0.962 0.969 0.950

0.923 0.937 0.928 0.931

0.947 0.923 0.943 0.930

0.943 0.947 0.956 0.953

0.941 0.963 0.950 0.964

1.00 0.996 0.998 0.993

1.00 1.00 1.00 1.00

0.942 0.930 0.932 0.921

0.944

0.999

0.956

0.923

1.00 0.999 1.00

0.952 0.966 0.959

0.50 quantile 0.940 0.924 0.941 0.930

1.00 1.00 1.00 1.00

0.90 quantile 0.994 0.981 0.989 0.981 Note:

1.00 1.00 1.00 1.00

See note to Table

a Automatic

bandwidth

0.952 0.942

7. chosen

using the least-squares

cross-validation

method

M. BuchinskyiJournal

of’ Econometrics

Table IO Empirical levels for general kernel, original Uniform

kernel

One-side

(c.)

0.20

1.5

Panel A

Two-side an

Monte Carlo

0.20

CPS data, censored

(c,)

1.5

us

331

68 (1995) 303-338

quantile

Normal

kernel

One-side

(c,)

regression,

5000 repetitions

Two-side

(c.)

0.20

1.5

a”

0.20

1.5

(la

sample size 100

0.10 quantile 0.787 0.770 0.771 0.747

0.825 0.823 0.793 0.780

0.829 0.817 0.800 0.763

0.858 0.862 0.849 0.838

0.841 0.839 0.808 0.794

0.757 0.758 0.731 0.714

0.797 0.810 0.768 0.750

0.857 0.842 0.823 0.813

0.866 0.866 0.825 0.778

0.979 0.979 0.973 0.961

0.899 0.886 0.876 0.897

0.954 0.940 0.943 0.923

0.866 0.853 0.891 0.866

0.997 0.995 0.997 0.991

0.928 0.937 0.934 0.936

0.901 0.882 0.925 0.908

1.00 1.00 1.00 1.00

0.937 0.951 0.933 0.955

0.932 0.919 0.940 0.925

1.00 1.00 1.00 1.00

0.975 0.979 0.961 0.935

0.950 0.931 0.925 0.921

0.752 0.742 0.791 0.810

0.920 0.931 0.935 0.949

0.938 0.924 0.919 0.923

0.931 0.911 0.966 0.960

1.00 1.00 1.00 0.999

0.914 0.928 0.922 0.917

0.760 0.779 0.841 0.849

0.998 0.997 1.00 0.999

0.952 0.928 0.935 0.922

Monte Carlo

sample

0.631 0.619 0.645 0.648

0.50 quantile 0.857 0.838 0.897 0.880

1.00 1.00 1.00 1.00

0.90 quantile 0.855 0.846 0.900 0.912

1.00 1.00 1.00 1.00

Panel B

size 500

0. IO quantile 0.847 0.849 0.830 0.813

0.604 0.605 0.635 0.624

0.901 0.906 0.886 0.847

0.921 0.920 0.882 0.854

0.798 0.821 0.838 0.850

0.948 0.937 0.903 0.877

0.772 0.777 0.764 0.768

0.783 0.783 0.755 0.744

0.904 0.896 0.877 0.850

0.899 0.907 0.868 0.867

0.969 0.972 0.963 0.947

0.964 0.965 0.923 0.904

0.964 0.954 0.969 0.958

0.881 0.874 0.929 0.915

0.994 .0.966 0.992 0.964 0.997 0.975 0.994 0.964

0.900 0.871 0.948 0.933

1.00 1.00 1.00 1.00

0.972 0.971 0.979 0.952

0.928 0.921 0.958 0.950

1.00 1.00 1.00 1.00

0.967 0.964 0.970 0.952

0.913 0.926 0.919 0.919

0.793 0.809 0.758 0.781

0.887 0.926 0.789 0.858

0.897 0.914 0.939 0.953

1.00 1.00 1.00 1.00

0.932 0.922 0.925 0.912

0.616 0.7 13 0.697 0.796

0.982 0.996 0.972 0.985

0.948 0.942 0.957 0.953

0.50 quantile 0.867 0.845 0.927 0.913

1.00 1.00 1.00 1.00

0.90 quantile 0.902 0.896 0.881 0.895 Note:

1.00 1.00 1.00 1.00

See note to Table

a Automatic

bandwidth

0.931 0.958 0.944 0.958

7. chosen

using the least-squares

cross-validation

method.

M. BuchinskylJournal

332

Table 11 Empirical repetitions

levels for general

Uniform

kernel

One-side

(c,)

0.20

1.5

kernel,

Two-side aa

Panel A - Monte Carlo

0.20

of Econometrics

independent

(c,)

I.5

aa

CPS

68 (1995) 303-338

data,

censored

Normal

kernel

One-side

(c.)

quantile

regression,

Two-side

5000

(c.)

0.20

1.5

aa

0.20

1.5

aa

sample size 100

0.10 quantile 0.812 0.808 0.839 0.828

0.640 0.648 0.655 0.672

0.829 0.828 0.794 0.782

0.827 0.816 0.850 0.829

0.860 0.861 0.890 0.888

0.844 0.840 0.813 0.796

0.748 0.733 0.781 0.773

0.787 0.783 0.802 0.799

0.862 0.847 0.824 0.813

0.858 0.844 0.893 0.879

0.976 0.979 0.978 0.985

0.901 0.888 0.877 0.897

0.958 0.942 0.947 0.926

0.870 0.859 0.903 0.900

0.998 0.998 0.997 0.998

0.931 0.942 0.934 0.939

0.908 0.897 0.939 0.930

1.00 1.00 1.00 1.00

0.938 0.954 0.934 0.958

0.940 0.939 0.956 0.954

1.00 1.00 1.00 1.00

0.977 0.981 0.962 0.936

0.953 0.935 0.925 0.922

0.790 0.770 0.868 0.854

0.952 0.944 0.982 0.982

0.942 0.926 0.919 0.925

0.946 0.932 0.985 0.981

1.00 1.00 I.00 1.00

0.915 0.932 0.924 0.919

0.855 0.838 0.926 0.923

0.999 0.999 1.00 1.00

0.953 0.932 0.938 0.925

0.50 quantile 0.886 0.876 0.918 0.914

1.00 1.00 1.00 1.00

0.90 quantile 0.927 0.907 0.973 0.973

1.00 0.999 1.00 1.00

Panel B - Monte Carlo

sample size 500

0.10 quantile 0.866 0.861 0.862 0.853

0.625 0.617 0.619 0.628

0.902 0.909 0.889 0.850

0.918 0.918 0.923 0.908

0.802 0.803 0.847 0.856

0.951 0.941 0.903 0.879

0.789 0.769 0.802 0.796

0.789 0.785 0.795 0.793

0.906 0.900 0.880 0.854

0.904 0.904 0.911 0.910

0.976 0.975 0.972 0.972

0.968 0.966 0.927 0.905

0.966 0.955 0.970 0.961

0.895 0.877 0.922 0.920

0.997 0.998 0.997 0.999

0.971 0.966 0.977 0.966

0.901 0.896 0.936 0.934

1.00 1.00 1.00 1.00

0.976 0.975 0.982 0.954

0.939 0.938 0.954 0.960

1.00 1.00 1.00 1.00

0.969 0.964 0.972 0.953

0.914 0.930 0.920 0.919

0.883 0.873 0.890 0.888

0.968 0.970 0.966 0.976

0.935 0.959 0.945 0.960

0.989 0.983 0.993 0.995

1.00 1.00 1.00 1.00

0.937 0.925 0.927 0.915

0.934 0.927 0.935 0.937

1.00 1.00 1.00 1.00

0.952 0.944 0.957 0.954

0.50 quantile 0.877 0.866 0.918 0.920

1.00 1.00 1.00 1.00

0.90 q&tile 0.973 0.964 0.975 0.976

1.00 1.00 1.00 1.00

Note: See note to Table 7. a Automatic

bandwidth

chosen

using the least-squares

cross-validation

method.

M.

Table 12 Representative

BuchinskylJournal

standard

errors

Empirical

level

of Econometrics

for observed

empirical

68 (1995)

333

303-338

levels and Monte

Carlo

sample

repetitions

Repetitions

0.600

0.650

0.700

0.750

0.800

0.850

0.875

0.900

0.925

0.950

0.975

1000 2000 3000 4000 5000

0.015 0.011 0.009 0.008 0.007

0.015 0.011 0.009 0.008 0.007

0.014 0.010 0.008 0.007 0.006

0.014 0.010 0.008 0.007 0.006

0.013 0.009 0.007 0.006 0.006

0.011 0.008 0.007 0.006 0.005

0.010 0.007 0.006 0.005 0.005

0.009 0.007 0.005 0.005 0.004

0.008 0.006 0.005 0.004 0.004

0.007 0.005 0.004 0.003 0.003

0.005 0.003 0.003 0.002 0.002

Nore: The standard errors are computed number of Monte Carlo repetitions.

Table 13 Implied standard

error ratios Z ,I

Z ,I +P’)iz

P2

0.842 1.037 1.282 1.645

0.80 0.85 0.90 0.95

Note:

for observed

+p’)iz

=

p’ =

according

empirical

to: G(p)

= ,,/E,

where Nut is the

levels

1.037 0.85

1.282 0.90

1.645 0.95

1.960 0.975

23.2% 0.0%

44.6% 17.4% 0.0%

95.4% 58.6% 28.3% 0.0%

132.8% 89.0% 52.9% 19.1%

The numbers in the table are the ratio between the columns and rows.

se1 to se’ for the empirical

levels reported

in

Hence the ratio of the average standard errors for any two methods is se’ 2= se

Z(r +pijj2 -% +pz)i2

(231

As Table 13 illustrates the low empirical levels for the SB and EB methods imply that their standard errors are four to five times smaller than those for the DMB and GK methods. This disparity suggests that one should be cautious in using any of the above estimators in empirical applications. In particular, comparing alternative standard error estimates seem to be desirable. In order to evaluate whether the differences reflect actual disparities among the ‘population’ matrices or are the consequence of distinct small sample properties, the population covariance matrices at the 0.10, 0.50, and 0.90 quantiles have been computed. For the independent CPS data estimates using all estimators are computed, while for the original CPS data only the estimates for the DMB and GK estimators are computed. The results for the two data sets

334

Table 14 Population

M. BuchinskyfJournal

standard

Panel A - Uniform

errors

of Econometrics 68 (1995) 303-338

for the independent

kernel

CPS data Panel B

Homoskedastic

General

1

2

1

2.302 0.154 0.113 0.003

normal

kernel

Homoskedastic

General

2

1

2

1

2

1.958 0.130 0.107 0.003

2.447 0.165 0.119 0.003

1.818 0.122 0.089 0.002

2.229 0.149 0.110 0.002

1.827 0.123 0.101 0.003

2.331 0.157 0.119 0.003

1.139 0.076 0.056 0.001

1.072 0.072 0.063 0.002

1.239 0.084 0.064 0.002

1.113 0.075 0.055 0.001

1.160 0.078 0.057 0.001

1.098 0.072 0.064 0.002

1.159 0.078 0.065 0.002

1.248 0.084 0.06 1 0.001

2.366 0.129 0.101 0.002

1.760 0.095 0.083 0.002

1.716 0.115 0.084 0.002

1.234 0.083 0.061 0.001

2.813 0.152 0.115 0.002

1.603 0.090 0.08 1 0.002

0.10 quantile 1.982 0.133 0.097 0.002 0.50 quantile 1.105 0.074 0.054 0.001 0.90 quantile 1.508 0.101 0.074 0.002

Panel C - Other estimators

SB (4 0.10

0s (do

DMB

EB

2.0

0.10

0.01

E

A

E

A

2.383 0.160 0.117 0.003

2.392 0.160 0.118 0.003

2.366 0.159 0.116 0.003

2.883 0.185 0.146 0.004

2.397 0.169 0.119 0.003

2.337 0.165 0.117 0.003

2.308 0.165 0.116 0.003

1.143 0.077 0.056 0.001

1.054 0.071 0.052 0.001

1.060 0.07 1 0.052 0.001

1.205 0.095 0.067 0.002

1.105 0.080 0.067 0.002

1.034 0.075 0.055 0.001

1.033 0.075 0.055 0.001

1.065 0.071 0.052 0.001

1.052 0.07 1 0.052 0.001

1.151 0.077 0.057 0.001

1.568 0.099 0.065 0.002

1.460 0.087 0.065 0.001

1.182 0.086 0.063 0.001

1.180 0.086 0.063 0.001

0.10 quantile 2.23 I 0.150 0.110 0.002 0.50 quantile 1.064 0.07 1 0.052 0.001 0.90 quantile 1.314 0.088 0.065 0.001

Note: The population is described in Section 3. The numbers 1 and 2 for the kernel estimators refer to one- and two-side kernels with data-based choice for the bandwidth. m denotes the ratio between the bootstrap and sample sizes. The DMB, EB, and SB use bootstrap sample sizes of 10,000 observations and 100 repetitions. For OS estimator, CI= 0.10, 0.01 correspond to Z, _o,z = 1.645, 2.576.

M. BuchinskylJournal

Table 15 Population General

standard uniform

One-side

errors

335

of Econometrics 68 (1995) 303-338

for the original

CPS data

kernel

General

normal

kernel

DMB

Two-side

One-side

Two-side

E

A

2.360 0.148 0.131 0.003

2.249 0.157 0.125 0.003

2.43 1 0.162 0.124 0.003

2.535 0.171 0.136 0.003

2.151 0.146 0.131 0.003

1.112 0.078 0.062 0.002

1.215 0.081 0.067 0.002

1.201 0.082 0.062 0.002

1.202 0.090 0.065 0.002

1.202 0.090 0.065 0.002

1.761 0.092 0.088 0.002

2.012 0.107 0.092 0.002

1.852 0.100 0.085 0.002

2.352 0.132 0.093 0.002

1.638 0.094 0.078 0.002

0.10 quantile 2.503 0.166 0.150 0.004 0.50 quantile 1.203 0.079 0.069 0.002 0.90 quantile 2.029 0.103 0.098 0.002

Note: See note to Table

14.

are reported in Tables 14 and 15, respectively.” The bootstrap DMB, EB, and SB estimates utilize 100 bootstrap repetitions. For the first two estimates the bootstrap sample size is 10,000, while for the SB several bootstrap sample sizes are considered. Table 14 shows that the standard errors for the various methods are quite similar. In particular, the two-side normal kernel estimates (see panel B) are close to the DMB estimates (see panel C). The OS, and EB estimators yield estimates which are similar to the DMB estimate at the middle quantile, but less at the extreme quantiles. Moreover, the SB estimator is not sensitive to the bootstrap sample size, and the OS is little affected by the choice of CL Table 15 for the original CPS shows that the standard errors obtained by all three methods are similar, although the DMB estimates tend to be larger at the extreme quantiles. As expected, the estimates are in general larger than for the independent CPS data.

r9A similar exercise for the censored quantile regression reported here for the quantile regression model.

model

yielded

results

similar

to those

336

M. BuchinskylJournal

of Econometrics 68 (1995) 303-338

Comparison of the standard errors in Tables 14 and 15 indicates that differences in the performance of the various estimators can be mostly attributed to distinct small sample properties.

8.

M. BuchinskylJournal

of Econometrics 68 (1995) 303-338

337

activity working or a job; they are self-employed; and their usual earnings (usual earnings divided usual weekly must be than $1 less than Four variables used in simulations: usual earnings, education as: last attended minus minus another if last has not completed), and experience (defined min(age age - 6)). programs are in MATLAB. random number is that by MATLAB MATLAB User’s pp. 3-158). computations all tables run consecutively. seed number the first in the table was to zero.

References Barrodale, 1. and F. Roberts, 1973, An improved algorithm for discrete I, linear approximation, SIAM Journal of Numerical Analysis IO, 8399848. Bickel, P., 1973, On some analogues to linear combinations of order statistics in the linear model, Annals of Statistics 4, 597-616. Bickel, P. and D. Freedman, 1981. Some asymptotic theory of the bootstrap, Annals of Statistics 9, 119661217. Bloch. D.A. and J.L. Gastwirth, 1968, On a simple estimate of the reciprocal of the density function. Annals of Mathematical Statistics 39, 108331085. Bofinger, E., 1975, Estimation of density function using order statistics, Australian Journal of Statistics 17. 1-7. Buchinsky, M., 1991, Methodological issues in quantile regression, Ch. 1 in: The theory and practice of quantile regression, Ph.D. dissertation (Harvard University, Cambridge, MA). Buchinsky, M., 1992, Bootstrapping quantile regression model, Unpublished manuscript (Yale University, New Haven, CT). Buchinsky, M., 1994, Changes in the U.S. wage structure 196331987: Application of quantile regression, Econometrica 62, 405458. Carroll, R. and D. Ruppert, 1984, Power transformations when fitting theoretical models to data. Journal of the American Statistical Association 79, 321-328. Chamberlain, G., 1994, Quantile regression, censoring and the structure of wage, in: C. Sims and J.J. Laffont, eds., Proceedings of the sixth world congress of the Econometric Society. Barcelona, Spain (Cambridge University Press, New York, NY). Dielman, T. and R. Pfaffenberger, 1986, Bootstrapping in least absolute value regression: An application to hypothesis testing, American Statistical Association, Proceedings to the Business and Statistics Section, 6288630. Dielman, T. and R. Pfaffenberger, 1988, Bootstrapping in least absolute value regression: An application to hypothesis testing, Communications in Statistics B, Simulation and Computation 17,843-856. Efron, B., 1979, Bootstrap methods: Another look at the jackknife, Annals of Statistics 7, l-26. Efron, B., 1982, The jackknife, the bootstrap and other resampling plans (Society for Industrial and Applied Mathematics, Philadelphia, PA). Freedman, D., 1981, Bootstrapping regression models, Annals of Statistics 9, 121881228. Hahn, J., 1992, Bootstrapping quantile regression models, Unpublished manuscript (Harvard University, Cambridge. MA).

338

M. Buchinsky/Journal

of Econometrics 68 (1995) 303-338

Hall, P. and S.J. Sheather, 1988, On the distribution of a studentized quantile, Journal of the Royal Statistical Society B 50, 381-391. Huber, P., 1967, The behavior of maximum likelihood estimates under nonstandard conditions, Proceedings of the Fifth Berkeley Symposium 4,221-223. Huber, P., 1981, Robust statistics (Wiley, New York, NY). Kaplan, E. and P. Meier, 1958, Nonparametric estimation from incomplete observations, Journal of the American Statistical Association 53, 457-481. Koenker, R. and G. Bassett, 1978, Regression quantiles, Econometrica 46, 33-50. Koenker, R. and G. Bassett, 1982a, Robust tests for heteroskedasticity based on regression quantiles, Econometrica 50, 43-61. Koenker, R. and G. Bassett, 1982b, An empirical quantile function for linear models with iid errors, Journal of the American Statistical Association 77, 4077415. Koenker, R. and V. D’orey, 1987, Computing regression quantiles, Journal of the Royal Statistical Society, Applied Statistics 36, 383-393. Koenker, R. and B.J. Park, 1993, An interior point algorithm for nonlinear quantile regression, Unpublished manuscript (University of Illinois, Urbana-Champaign, IL). Lo, S.-H., 1989, On some representation of the bootstrap, Probability Theory and Related Fields 82, 411-418. Newey, W. and J. Powell, 1990, Efficient estimation of linear and type I censored regression models under conditional quantile restrictions, Econometric Theory 6, 295-317. Osborne, M.R. and G.A. Watson, 1971, On an algorithm for discrete nonlinear I, approximation, Computer Journal 14, 1844188. Portnoy, S. and R. Koenker, 1989, Adaptive L-estimation for linear models, Annals of Statistics 17, 362-381. Powell, J., 1984, Least absolute deviation estimation for the censored regression model, Journal of Econometrics 25, 3033325. Powell, J., 1986, Censored regression quantiles, Journal of Econometrics 32, 1433155. Sheather, S.J. and J.S. Maritz, 1983, An estimate of the asymptotic standard error of the sample median, Australian Journal of Statistics 25, 109-122. Siddiqui, M.M., 1960, Distribution of quantiles in samples from a bivariate population, Journal of Research of the National Bureau of Standards 64 B, 1455150. Silverman B., 1986, Density estimation for statistics and data analysis (Chapman and Hall, New York, NY). Singh, K., 1981, On the asymptotic accuracy of Efron’s bootstrap, Annals of Statistics 9, 118771195. Stangenhaus, G., 1987, Bootstrap and inference procedure for I, regression, in: Y. Dodge, ed., Statistical data analysis based on /,-norm and related methods (North-Holland, New York, NY). Yang, S.-S., 1985, On bootstrapping a class of differentiable statistical functionals with applications to I- and M-estimates, Statistica Neerlandica 39, 375-385. Womersley, R.S., 1986, Censored discrete linear I, approximation, SIAM Journal of Scientific and Statistical Computing 7, 1055122.