- Email: [email protected]

www.elsevier.com/locate/jspi

Estimation and inference in partially linear models with smoothing splines Michael G. Schimek 1 Institute for Medical Informatics, Statistics and Documentation, Karl-Franzens-University Graz, Engelgasse 13, A-8010 Graz, Austria

Abstract A new estimation concept for partially linear models based on smoothing splines has been recently introduced in Eubank et al. (Comput. Statist. Data Anal. 29 (1998) 27). It is based on Speckman’s (J. Roy. Statist. Soc. Ser. B 50 (1988) 413) approach. Here, we describe cheap direct algorithms for this approach as well as for the well-known approach of Green et al. (J. Roy. Statist. Soc. Ser. B 47 (1985) 294). The smoothing parameter can be selected via an unbiased risk criterion. This requires the estimation of the model variance and the evaluation of the hat matrix. Standard errors of the parametric coecients can be obtained from the estimated covariance matrix. Inference is discussed for both the parametric and the nonparametric part of the model. The time demand of all algorithms is only linear. They can be executed from the statistical and graphical environment S-Plus. Under correlation between the parametric and the nonparametric part of the model, which is quite common in practice, the approach due to Green et al. (1985) is known to be asymptotically biased. Apart from the bias, correlation can cause estimation problems. Hence we study and compare the small sample performance of both approaches. For this purpose a simulation study is performed. We nd that both approaches work well for reasonable signal-to-noise ratios and sample sizes, even under correlation. But for c 2000 Elsevier Science B.V. inferential purposes we suggest to use the Speckman estimates. All rights reserved. AMS classiÿcations: primary 62G07; secondary 62E25 Keywords: Algorithms; Asymptotic bias; Correlation; Partially linear model; Semiparametric regression; Simulations; Smoothing spline; S-Plus; Unbiased risk

1. Introduction We consider a semiparametric regression model with a predictor function consisting of a parametric linear component and a nonparametric non-linear component involving E-mail address: [email protected] (M.G. Schimek). go to Randy Eubank for great hospitality and stimulating conversations during the author’s visit to the Department of Statistics, Texas A&M University at College Station, in February 1996. This research was supported in part under a partnership contract between Texas A&M University at College Station and Karl-Franzens-University Graz. 1 Thanks

c 2000 Elsevier Science B.V. All rights reserved. 0378-3758/00/$ - see front matter PII: S 0 3 7 8 - 3 7 5 8 ( 0 0 ) 0 0 1 9 7 - X

526

M.G. Schimek / Journal of Statistical Planning and Inference 91 (2000) 525–540

an additional predictor variable. Many approaches based on dierent smoothing techniques have been proposed: Green et al. (1985), Wahba (1990), and Green and Silverman (1994) applied smoothing splines, Speckman (1988) and Robinson (1988) adopted smoothing kernels and Chen (1988) piecewise polynomials, to name the most important. Here we follow the tradition of smoothing splines, adapting a proposal of Speckman (1988). Further, we reconsider the roughness penalty approach of Green and Silverman (1994) based on Green et al. (1985). Suppose that responses y1 ; : : : ; yn have been obtained at non-stochastic values t1 ; : : : ; tn of a predictor variable t. The response and predictor values are connected by yi = uiT + f(ti ) + i

(1.1)

for i = 1; : : : ; n, where u1 ; : : : ; un are known k-dimensional vectors, is a k-dimensional unknown parameter (coecient) vector, f ∈ C 2 [0; 1] is an unknown smooth function, and the 1 ; : : : ; n are independent, zero mean random variables with a common variance 2 . Further it is assumed that 06t1 ¡ · · · ¡ tn 61 (i.e. no ties among the values of the predictor t). Eq. (1.1) in vector–matrix form is y = U + f + ;

(1.2)

where y = (y1 ; : : : ; yn )T , U T = [u1 ; : : : ; un ], f = (f(t1 ); : : : ; f(tn ))T and = (1 ; : : : ; n )T . The goal is to eciently estimate the parameter vector , the function f , and the mean vector = U + f . 2. Estimation based on smoothing splines We take advantage of the fact that the cubic smoothing spline is a linear estimator. The tted values for data z = (z1 ; : : : ; zn )T are of the form gˆ = (gˆ (t1 ); : : : ; gˆ (tn ))T = S z;

(2.3)

where gˆ is a natural cubic spline with knots at t1 ; : : : ; tn for a xed smoothing parameter ¿ 0, and S a known positive-de nite (symmetric) smoother matrix that depends on . For smoothing spline-based estimation in the partially linear model a solution can be obtained by minimizing the sum of squares equation Z b n P [g00 (t)]2 dt (2.4) SS(b; g) = (yi − uiT b − g(ti ))2 + i=1

a

k

2

over b ∈ R and g ∈ C [0; 1]. The resulting estimator is called a partial spline (see Wahba, 1990, Chapter 6). For a prespeci ed value of the corresponding estimators for f ; and based on Eq. (1.2) can be obtained by (subscript p denotes partial) T

T

p = (U˜ U )−1 U˜ y; fp = S (y − U p )

M.G. Schimek / Journal of Statistical Planning and Inference 91 (2000) 525–540

527

and p = fp + U p = Hp y for T

Hp = S + U˜ (U˜ U )−1 U˜

T

and U˜ = (I − S )U T with S introduced in (2.3). If U˜ is not of full rank we interpret (U˜ U )−1 as a generalized inverse. The roughness penalty approach of Green and Silverman (1994, Chapter 4) is exactly Eq. (2.4). One of their two estimation concepts is based on the iterative solution of the normal equations. They suggest to use the (Gauss–Seidel) back tting algorithm. Although back tting has an implementation in S-Plus, it is not straightforward to calculate a solution for partially linear models. Even more important, iterative numerical methods can be avoided totally. Below, we present an alternative to the second concept of Green and Silverman (1994) — a direct method — with superior numerical features. Rice (1986) demonstrated that the partial spline estimator is generally biased for the optimal choice when the components of U depend on t. This asymptotic bias can be larger than the standard error. Applying results due to Speckman (1988) this bias can be substantially reduced. The respective estimators (subscript s denotes Speckman) are T T

s = (U˜ U˜ )−1 U˜ (I − S )y;

fs = S (y − U s ) and s = fs + U s = Hs y for T T Hs = S + U˜ (U˜ U˜ )−1 U˜ (I − S )

and U˜ = (I − S )U T with S from (3). If U˜ is rank de cient we interpret (U˜ U˜ )−1 as a generalized inverse. As can be seen, the estimates are obtained by regression on partial residuals. While in the Speckman approach an estimator of is constructed after removing the in uence of t (i.e. the nonparametric predictor) from both the ui and y, the partial spline approach removes t-information only from the ui . In Speckman (1988) a Nadaraya–Watson kernel estimator was studied. The two estimation concepts are dierent, independent of the linear smoother used (see Schimek, 1997, p. 182f). In the following, when we mention Speckman’s approach we refer to our smoothing spline version of it.

528

M.G. Schimek / Journal of Statistical Planning and Inference 91 (2000) 525–540

3. Smoothing parameter choice For the evaluation of a partial linear model based on smoothing splines we have to select a smoothing parameter. A data-driven choice for the smoothing parameter via generalized cross-validation is feasible because the rst-order properties of the cross-validation function are basically not changed by the addition of a parametric term to the model (Speckman, 1988, p. 423f). Here, we prefer an unbiased risk estimator (UBR). The smoothing parameter can be obtained by minimizing the criterion R() = n−1 (||y − ||2 + 22 tr(H )) with || · || the Euclidean norm. When we adopt Speckman’s approach the calculation of tr(Hs ) instead of tr(H ) is required. In analogy, for the partial spline case we have to compute tr(Hp ). tr(Hs ) and tr(Hp ) can be calculated in O(n) steps, hence the smoothing parameter can be obtained in linear time. 4. The algorithms and S-Plus realization The goal is the ecient calculation of all estimators as well as the trace of S , hence of H . The idea is to apply De Boor’s (1978) algorithm for the cubic spline t, Hutchinson’s and de Hoog’s (1985) algorithm for the computation of the trace of S (all in O(n) steps), and to make extensive use of Cholesky decomposition. Now, let us rst consider Speckman’s estimation concept for smoothing splines. The quantity s can be calculated by rst transforming y and U to y˜ = (I − S )y and U˜ = (I − S )U . Then an ordinary least-squares regression of y˜ on U˜ is carried out T with the coecient vector s . Cholesky decomposition is used to factorize U˜ U˜ as T T T for T, an upper triangular matrix (T can be used for other computations too). An alternative would be singular value decomposition. Having s it is easy to calculate fs and s . Let us nally consider the trace of Hs , T T tr(Hs ) = tr(S ) + tr((U˜ U˜ )−1 U˜ (I − S )U˜ ):

The trace of S is evaluated as mentioned above. For the remaining trace U˜ is transformed to I − S U˜ and then k (because of k unknown coecients) successive regressions of the columns of (I − S )U˜ on U˜ are carried out. The k × k matrix solution is obtained from the normal equation system T U˜ U˜ B = U˜ (I − S )U˜ ;

where B = {bij } yields the desired trace b11 + · · · + bkk . All these quantities necessary for Speckman’s approach can be computed in linear time. Similar considerations apply to the partial spline concept of Green and Silverman (1994). Our direct (noniterative) solution is asymptotically equivalent. An additional

M.G. Schimek / Journal of Statistical Planning and Inference 91 (2000) 525–540

529

T

complication is U˜ U = U T (I − S)U in the matrix system. Again this expression can be factorized by means of Cholesky decomposition. Dierent from Speckman’s approach p and tr(Hp ) cannot be evaluated using the ordinary least-squares technique, but computation in linear time is still feasible. All algorithms for both approaches are implemented in FORTRAN subroutines, which can be executed in the statistical and graphical environment S-Plus for Windows. This requires compilation with the Watcom FORTRAN 77 compiler, version 10.6, for compatibility reasons. A direct implementation in the S language is possible in principle. But S does not support the manipulation of large (patterned) matrix systems and would tremendously slow down the algorithms or even demand the use of less eective alternatives. 2 Hence, we have written a S-Plus function plm which handles input and output to a dynamically loaded FORTRAN subroutine, itself making calls to a large number of external subroutines. For the analysis of reasonably large data sets (i.e. hundreds of observations), apart from adequate processor power, sucient memory is needed (at least 128 MB RAM). 5. Variance and covariance estimation Estimates of the model variance and covariances are necessary for a number of purposes: • Calculation of the unbiased risk criterion (UBR). • Inference about the model parameters. • Inference about the nonparametric function. Here, we suggest to apply an adaptation of a dierence-based variance estimator due to Gasser et al. (1986). This estimator has been successfully applied to kernel and smoothing spline regression, in the latter setting even for time-dependent data (see Schimek and Schmaranz, 1994). The Gasser et al. estimation technique is based on pseudo-residuals. The estimator is de ned by ˜2 =

yT AT Ay ; tr(AT A)

where the (n − 2) × n matrix A is consisting of the following elements: The entries all zero but the ith with ai ci , the (i + 1)th with −ci , and the (i + 2)th with bi ci , where ti+1 − ti ti − ti−1 ; bi = ai = ti+1 − ti−1 ti+1 − ti−1 and ci = (a2i + b2i + 1)−1=2 2 Even in the S-Plus system similar functionality is frequently performed externally (C or FORTRAN source codes).

530

M.G. Schimek / Journal of Statistical Planning and Inference 91 (2000) 525–540

for i = 2; : : : ; n − 1. This technique substantially removes the in uence of the underlying √ regression function on the estimator. Further it yields a n-consistent estimator for the model variance 2 , which holds as long as = 0. In partially linear models we have not only a nonparametric part (i.e. function f ) but also a parametric part (i.e. U ) with 6= 0. Hence we have to modify Gasser et al.’s variance estimation technique. To account for the parametric U term we introduce the following modi cation: ˆ2 =

yT AT (I − P)Ay tr(AT (I − P)A)

(5.5)

with P = AU (U T AT AU )−1 U T AT : √ This estimator has little bias and is n consistent as in the standard situation. The proof is given in Eubank et al. (1998, p. 33). The obtained result does not depend on the approach taken (partial or Speckman). This is speci cally important for their comparison (see simulations in Section 7). The calculation of 2 can be implemented as follows: y and U are transformed to Ay and AU . The numerator of (5.5) is the residual sum of squares obtained from regressing Ay on AU . For the denominator of (5.5) we have tr(AT (I − P)A) = n − 2 − tr(PAAT ) and by repeated regression ts of AU to the columns of AAT the trace of PAAT can be evaluated. It is a solution of a k × (n − 2) matrix (U T AT AU )B = U T AT AAT : The sum of the diagonal elements of AUB forms the trace. It is important to note that all this can be done in O(n) operations. Let us now focus on the variance–covariance matrices for the estimators p and s . They are T T Vp = 2 (U˜ U )−1 U˜ U˜ (U T U˜ )−1

and T T T Vs = 2 (U˜ U˜ )−1 U˜ (I − S )2 U˜ (U˜ U˜ )−1 :

For Speckman’s approach the variance–covariance matrix can be computed by solving the k × n linear system T T (U˜ U˜ )B = U˜ : T T B˜ = (I − S )B T can be evaluated to obtain Vs = 2 B˜ B˜ . For the partial spline approach the computations are analogous. In both instances we end up with O(n) procedures.

M.G. Schimek / Journal of Statistical Planning and Inference 91 (2000) 525–540

531

6. Inference For the assessment of a model it is necessary to perform tests on both the parametric and the nonparametric part. Till now inference is restricted to such models, for which the following assumptions hold: • • • • •

Covariates and response are measured on continuous scales. Consistent estimate of the (residual) variance 2 . No correlation between the covariates. Nonparametric regression estimator f linear in observed response y. The distribution of the response y is i.i.d.

The rst point is required throughout, apart from analysis of covariance problems, where the ui are incidence vectors. As far as the second is concerned we are well o because our estimator for 2 is based on pseudo-residuals, thus avoiding the usual problem of residual sum of squares in ation due to the unknown nonparametric bias. The requirement of no correlation between covariates, speci cally between U and t is essential for testing too, not only for estimation as pointed out earlier. For this and other reasons a simulation experiment is reported in Section 7. The second last point is always ful lled because smoothing splines are linear smoothers. The consequence of the last two assumptions is very fruitful: Speckman (1988, p. 422) could prove that the coecient vector is asymptotically NID without distributional assumptions on the explanatory variables. 6.1. Inference for the parametric component Given the asymptotic normality we can apply the variance–covariance matrices for

p (partial spline approach) and s (Speckman’s approach) to calculate test statistics, respectively, con dence intervals for the individual coecients of the parametric part. ˆ Hence Vp , respectively, Vs yields the standard errors SE of the estimated coecients . it is easy to perform inference based on ˆ

); ˆ tdf ≈ =SE( an approximate t-distributed test statistic with df = n − tr(S ) − k: The above assumptions also justify an approximate F test (here for Speckman’s approach). Speckman (1988, p. 426) proposed to use the mean-squared error MSE =

RSS tr(I − Hs )T (I − Hs )

as an estimator of 2 . The residual sum of squares is de ned by RSS = n−1 ||(I − Hs )y||2 : This estimator of 2 has a positive but asymptotically negligible bias, resulting in a conservative F statistic. For the approximate F test we further need the sum of squares

532

M.G. Schimek / Journal of Statistical Planning and Inference 91 (2000) 525–540

of the parametric component SSpar = ˆTs (Q T Q)−1 ˆs T T with Q T =(U˜ U˜ )−1 U˜ (I −S ). Further we de ne MSSpar =n−1 SSpar . The test statistic is now of the form MSSpar Fdf 1 ;df 2 ≈ MSE

with degrees of freedom df 1 = k and df 2 = tr(I − Hs )T (I − Hs ). Instead of Speckman’s variance estimator we could also use our modi ed dierence-based variance estimator. The approximate F test allows to judge the predictive value of the parametric part. 6.2. Inference for the nonparametric component The purpose is the formal assessment of the shape of the curve f . We want to test the hypothesis H0 : E(yi ) = (linear function) versus the alternative H1 : E(yi ) = f(ti ) (smooth function). This is of central interest because such a test allows us to decide whether a semiparametric model makes sense compared to a pure parametric model. The models under investigation form a nested structure in terms of residual sum of squares. There are several tests which can account for this decision problem. Hastie and Tibshirani (1990, p. 66) proposed an approximate F test for a straight line t fˆ0 versus a nonparametric t fˆ1 which also applies to the semiparametric setting. Necessary assumptions are • Unbiased estimator fˆ0 from ordinary least squares (OLS). • Unbiased estimator fˆ1 from Speckman’s approach (not obtained from partial splines). • Optimal smoothing parameter selected. Smoothing splines have the advantage of being unbiased for straight lines independent of the smoothing parameter choice. Together with Speckman’s concept of estimation we obtain unbiased results. This leads to improved inference. Let us consider the test statistic Pn Pn ( i=1 ˆ2i − i=1 ˆ2i )= (df 1 − df 0 ) ; (6.6) Fdf 1 −df 0 ; n−df 1 ≈ Pn 2 i=1 ˆi = (n − df 1 ) ˆ i ) − uiT ˆOLS . The degrees of freedom where ˆi = (yi − uiT ˆOLS ) and ˆi = uiT ˆs + f(t are df 0 , the number of parameters in the ordinary least-squares model (i.e. k) and df 1 = tr(2Hs − Hs HsT ) from Speckman’s approach. We want to point out that Eq. (6.6) does not require the estimation of 2 . An unaccounted non-linear function f would add bias (i.e. bias(t)) to our t. The idea of the test is to make a decision based on additional bias. Because the bias– variance relation is characteristic for a speci c model one could also construct a test via a smoothing parameter selection method, in our case the UBR criterion. This direction has not been studied yet.

M.G. Schimek / Journal of Statistical Planning and Inference 91 (2000) 525–540

533

It should be noted that the bias(t) of any partially linear model (for an optimal choice of ) is sensitive to correlation between U and t. Hastie and Tibshirani (1990) reported adequate performance of their approximate F test, but some researchers are in doubt of small sample results (e.g. Bowman and Azzalini (1997, p. 80) propose another approximation called pseudo-likelihood ratio test, for NID responses). An ‘exact’ permutation test due to Raz (1990), although proposed in a dierent setting, seems to be more appropriate for a decision based on limited sample information. He reported that the test’s nominal level can be maintained for nonnormal errors and samples as small as ten observations. The principal idea is to assume a null hypothesis that the pairing of (t; y) is entirely random. Then the distribution of F is calculated by the simulation of all possible pairings. No speci c distributional assumptions for the data are required. Regretably, there are neither systematic simulation results nor comparative studies. Recently, Hong and White (1995) have put forward an alternative testing procedure. The approximately standard normal distributed test statistic is nmˆ n = ˆ2 − kn √ ; 2kn Pn where mˆ n = n−1 i=1 ˆi ˆi with ˆi and ˆi as de ned for Eq. (6.6), kn a dimension reduction constant and ˆ2 the residual variance under the H0 model. The small sample performance of the M test has not been studied yet but bootstrap approximations are feasible. Ideas which can be applied to this test are found in Hong and Cheng (1993) and in Hardle et al. (2000). A general discussion of lack-of- t tests similar to the above mentioned can be found in a recent book by Hart (1997, Chapter 6). Other relevant papers in the context of splines are Cox et al. (1988) as well as Eubank and Spiegelman (1990). Let us nally make a comment about con dence intervals. Our goal is a rm conclusion about the shape of the curve f, apart from being linear or not. The problem is that some curvature must always be expected in nonparametric estimates due to sampling variation. Con dence intervals would be valuable to have some guideline. However, it is not possible to construct such an interval without knowing the bias(t) (even when taking a bootstrap approach). From Eubank and Speckman (1993) we know how complicated it is to derive an approximation for this bias. Hence in practice bias adjustment is avoided and so-called variability bands are in wide use. They consist of pointwise ˆ ˆ ± 2SE(f(t)), ˆ ˆ intervals for E[f(t)] rather than f(t) of the form f(t) with SE(f(t)) M=

the estimated standard error of the curve. Of course, careful interpretation is obligatory due to the pointwise nature of the band.

7. The simulation experiment A Monte Carlo study based on the above algorithms was carried out. We compared the partial spline approach with Speckman’s approach. Our interest was to study the

534

M.G. Schimek / Journal of Statistical Planning and Inference 91 (2000) 525–540

small sample behaviour of the two estimation concepts and their performance under correlation between U and t (compare with collinearity in parametric regression), both with regard to numerical stability and asymptotic bias. Numerical instability concerning the calculation of U˜ can result from the smoothing spline lter operation, especially for very small smoothing parameter values of or (linear) correlation between the columns of U and t (Eubank, 1988, Section 5.3). From Heckman (1986) we know that the parametric component in the partially linear model can be estimated with the parametric rate of convergence. Rice (1986) made clear that this is solely true when the variables in the parametric and the variables in the nonparametric components are uncorrelated. In general, the variance of ˆ decreases at a parametric rate (i.e. ∼ cn−1 ), whereas the bias decreases at a nonparametric rate (i.e. ∼ cn− and ¡ 1) except for special situations such as in Heckman (1986). 7.1. Outline For the simulations we considered two examples of the semiparametric regression model which was de ned in Eq. (1.1), the one is yi = 1 u1i + f(ti ) + i

(7.7)

and the other yi = 1 u1i + 2 u2i + f(ti ) + i

(7.8)

for i = 1; : : : ; n. The parametric component in Eq. (7.7) is one-dimensional whereas in Eq. (7.8) two-dimensional. We assumed sample sizes n = 50, 100 and 200, predictor values ti =(i−0:5)=n on [0; 1], and errors i ∼ NID(0; 1). For the nonparametric component we selected one smooth function f(ti ) = mn f0 (ti ) where f0 = 4:26(exp(−3:25ti ) − 4(exp(−6:5ti )+3(exp(−9:75t1 ) with max|f(t)|=9 (i.e. mn =9) for the whole simulation experiment. At rst, we studied a setting (G0) as in Heckman (1986) where the parametric predictor is de ned as a random variable, i.e. u1i ∼ NID(0; 1) in Eq. (7.7). For the rest of the study we assumed correlation between U and t for dierent parametric patterns. Let the uki = gk(ti ) + = mp g0 (ti ) + i , where g0 is a function that connects the parametric predictor values ui to the nonparametric predictor values ti , and errors i ∼ NID(0; 1). To impose correlation between the parametric and the nonparametric component we studied the following functions: (G1) (G2) (G3) (G4)

g0 (ti ) = (ti − 0:5)+ ; g0 (ti ) = −ti ; g0 (ti ) = ti − 3ti2 + 3ti3 ; and g0 (ti ) = exp(−(ti − 0:5)2 ).

These four functions for the parametric part together with the function hold constant for the non-parametric part are plotted in the appendix for illustration. (G1) violates the standard assumptions for optimal estimation with the cubic spline smoother used here because of a discontinuity, hence g0 6∈ C 2 [0; 1]. (G2) describes the

M.G. Schimek / Journal of Statistical Planning and Inference 91 (2000) 525–540

535

Fig. 1. Partial spline and Speckman’s approach for n = (50; 100; 200) with (G0).

common choice of a linear relationship. (G3) is a polynomial and (G4) an exponential function. Together with (G2) they are all compatible with cubic spline smoothing because g0 ∈ C 2 [0; 1]. In each case we imposed max|gk (t)| = 9 (i.e. mp = 9) for the parametric part. As a result, the signal-to-noise ratios of the nonparametric and the parametric component are the same. Further, we assumed values for the regression coecient = {0:5; 1; 1:5; 2; 2:5; 3} throughout the Monte Carlo experiment. For each setting (see later) and the three sample sizes we generated 300 replications. Each simulation run was performed for the partial spline approach and the Speckman approach using UBR as outlined earlier. The seed for the random number generator (simulation of the errors) was changed whenever called in a simulation run (important for the evaluation of potential correlation eects). All the calculations were carried out in S-Plus for Windows on a pentium platform under Microsoft Windows 95. 7.2. Results The results are reported in summary statistics of the estimated regression coecient

ˆ and the ratio of the squared biases R=

bias2 ( ˆp ) bias2 ( ˆs )

;

where subscript p denotes the partial spline approach and subscript s Speckman’s approach, calculated over 300 replications. For the case of stochastic independence between U and t (see also Schimek, 1998) the estimation results for the assumed values 0.5 and 3.0 obtained from the two approaches under three sample sizes are summarized in Figs. 1 and 2. In these two and the following gures (i.e Figs. 3– 6) we see six boxplots each, to be read from left, the rst pair representing the estimation results for n = 50 using the partial spline, respectively, the Speckman approach, the second pair does the same for n=100 and the third pair for n = 200. The ordinate represents the scale of the regression coecient .

536

M.G. Schimek / Journal of Statistical Planning and Inference 91 (2000) 525–540

Fig. 2. Partial spline and Speckman’s approach for n = (50; 100; 200) with (G0).

Fig. 3. Partial spline and Speckman’s approach for n = (50; 100; 200) with (G3).

Fig. 4. Partial spline and Speckman’s approach for n = (50; 100; 200) with (G3).

These as well as the other obtained results (not displayed here) for the case of stochastic independence are satisfying with respect to the true regression coecients. For example, for = 0:5 (3:0) the mean of ˆ is 0.5013 (3.0014) with the partial spline and 0.5010 (3.0010) with Speckman’s approach for n=100. The latter provides slightly

M.G. Schimek / Journal of Statistical Planning and Inference 91 (2000) 525–540

537

Fig. 5. Partial spline and Speckman’s approach for n = (50; 100; 200) with (G3) and (G4).

Fig. 6. Partial spline and Speckman’s approach for n = (50; 100; 200) with (G3) and (G4).

better results for all parameters when the sample size is small (n = 50), which is also supported by the obtained squared bias ratios R. According to Heckman (1986) we cannot expect much of a dierence, at least from an asymptotic point of view. Another asymptotic eect can be clearly identi ed in Figs. 1 and 2 via the reduction of the range of estimates with growing sample size, independently of the estimation procedure. There is little variation in the obtained squared bias ratios R for the three sample sizes over the range of values. The R’s are quite stable around 0.9800, hence near one, which indicates that the bias is about the same for the partial spline approach and Speckman’s approach. Let us now switch to the examples with predictor variable correlation based on Eq. (7.7). The estimation of for the function (G1) with the discontinuity is quite satisfying for the signal-to-noise ratio of 9 as assumed here. The parameter values

have almost no eect, but the sample size has some eect on the quality of the parametric estimates: Speckman’s approach yields better results for n = 50, although all the R’s are near 0.9900, not really favouring one of the estimation concepts. In view of all sample sizes both the partial spline and the Speckman approach work well.

538

M.G. Schimek / Journal of Statistical Planning and Inference 91 (2000) 525–540

Let us now discuss (G2), the linear relationship. To our surprise we obtained similar results as for (G1) (under the same signal-to-noise ratio of the parametric component). Thus, the lack of compatibility of the smoothing spline with the example (G1) compared to (G2), where the function is the expectation of the cubic spline adopted here, does not really matter. Again, Speckman’s concept seems to have some advantage for n = 50. Otherwise there is little dierence between the two approaches. Based on Eq. (7.7) we nally examine (G3), a polynomial. We obtained for instance for = 0:5 (3:0) a mean of ˆ of 0.4948 (2.9856) with the partial spline and of 0.4951 (2.9861) with Speckman’s approach for n = 100. A summary of these results can be seen in the boxplots for the three sample sizes (see Figs. 3 and 4). The estimates are satisfying for all values. The median of R for n = 50 is 0.9308, for n = 100 is 0.9658, and for n = 200 is 0.9785. There is not much of a dierence between the two approaches. Speckman’s concept no longer shows a better behaviour for the small sample size as was the case with (G0), (G1), and (G2). For a signal-to-noise ratio of 1.8 instead of 9 (i.e. more error contribution to the parametric part) we obtained a higher variability of estimates ˆ for n = 50 and 100, independent of the approach taken. This is also re ected in the R’s. We also considered cases of almost deterministic dependence on the predictor variable t, for instance with a signal-to-noise ratio of 90. This severe violation of the independence assumption between the parametric and the nonparametric component causes a substantial decline in the quality of estimates , ˆ especially for small sample sizes. Neither the partial spline nor the Speckman approach can adequately cope with this situation. Further, we studied a two-dimensional example (Eq. (7.8)) with (G3) and (G4) and two combinations of parameter values 1 and 2 . We took 1 = {0:5; 1; 1:5; 2; 2:5; 3} and

2 = 0:5 in the one instance and 2 = 3:0 in the other. To examplify the results we show boxplots for the combination of 1 = 0:5 and 2 = 3:0. In Fig. 5, the estimates ˆ1 and in Fig. 6, the estimates ˆ2 are displayed. The results are sample size-dependent, also indicated by the squared bias ratios R: For good results a large sample size (n = 200) is required. It also seems that the estimates of 2 = 3:0 are more stable than those of

1 . This might be due to the fact that (G4) is a simple symmetric function. In general, the estimates for the full range of parameter combinations are all satisfying. There is no favorit with respect to the two approaches nor a qualitative loss compared to the one-dimensional cases, when the respective R’s are compared for n = 200. Summing up the Monte Carlo results we can say that both approaches are numerically stable throughout all our simulations and that the asymptotic bias correction of Speckman’s approach has some eect when the sample size is small, if any. Overall, the partial spline approach works equally well for the settings considered. Reasonable signal-to-noise ratios guarantee adequate estimates of the regression coecients even under correlation between the parametric and the nonparametric components of the partially linear model. An explanation for these excellent results could be the use of cubic smoothing splines in both approaches. Splines have a better bias than kernel smoothers or local linear smoothers in the partially linear model (personal communication with Paul Speckman, 1997). But from an inferential point of view — correlation is most

M.G. Schimek / Journal of Statistical Planning and Inference 91 (2000) 525–540

539

likely in real data analysis — the asymptotically unbiased Speckman estimators is to be prefered. This is specially true for the approximate F tests without bootstraping. Speckman’s approach corrects for potentially important, but not necessarily known bias with almost no loss in variance (Speckman, 1988, p. 428).

Fig. 7.

540

M.G. Schimek / Journal of Statistical Planning and Inference 91 (2000) 525–540

Appendix Here we display the four functions G1–G4 for the parametric component, which were used in the simulation experiment to study correlation between the (design) variable t and the columns of U . Fig. 7 allows direct comparison of the shape of the functions. References Bowman, A.W., Azzalini, A., 1997. Applied Smoothing Techniques for Data Analysis. The Kernel Approach with S-Plus Illustrations. Clarendon Press, Oxford. Chen, H., 1988. Convergence rates for parametric components in a partly linear model. Ann. Statist. 16, 136–146. Cox, D., Koh, E., Wahba, G., Yandell, B., 1988. Testing the (parametric) null model hypothesis in (semiparametric) partial and generalized spline models. Ann. Statist. 16, 113–119. De Boor, C., 1978. A Practical Guide to Splines. Springer, New York. Eubank, R.L., 1988. Spline Smoothing and Nonparametric Regression. Marcel Dekker, New York. Eubank, R.L., Kambour, E.L., Kim, J.T., Klipple, K., Reese, C.S., Schimek, M., 1998. Estimation in partially linear models. Comput. Statist. Data Anal. 29, 27–34. Eubank, R.L., Speckman, P., 1993. Con dence bands in nonparametric regression. J. Amer. Statist. Assoc. 88, 1287–1301. Eubank, R.L., Spiegelman, C., 1990. Testing the goodness-of- t of a linear model via non-parametric regression techniques. J. Amer. Statist. Assoc. 85, 387–392. Gasser, T., Sroka, L., Jennen-Steinmetz, C., 1986. Residual variance and residual pattern in nonlinear regression. Biometrika 73, 625–633. Green, P., Jennison, C., Seheult, A., 1985. Analysis of eld experiments by least squares smoothing. J. Roy. Statist. Soc. Ser. B 47, 299–315. Green, P., Silverman, B.W., 1994. Nonparametric Regression and Generalized Linear Models. A Roughness Penalty Approach. Chapman & Hall, London. Hart, J.D., 1997. Nonparametric Smoothing and Lack-of-Fit Tests. Springer, New York. Hardle, W., Liang, H., Gao, J., 2000. Partially Linear Models. Springer, Heidelberg. Hastie, T., Tibshirani, R.J., 1990. Generalized Additive Models. Chapman & Hall, London. Heckman, N., 1986. Spline smoothing in a partly linear model. J. Roy. Statist. Soc. Ser. B 48, 244–248. Hong, S.Y., Cheng, P., 1993. Bootstrap approximation of estimation for parameter in a semiparametric regression model. Sci. China, Ser. A 14, 239–251. Hong, Y., White, H., 1995. Consistent speci cations testing via nonparametric series regression. Econometrica 63, 1133–1159. Hutchinson, M.F., de Hoog, F.R., 1985. Smoothing noisy data with spline functions. Numer. Math. 47, 99–106. Raz, J., 1990. Testing for no eect when estimating a smooth function by non-parametric regression: a randomization approach. J. Amer. Statist. Assoc. 85, 132–138. Rice, J., 1986. Convergence rates for partially splined models. Statist. Prob. Lett. 4, 203–208. Robinson, P.M., 1988. Root-n-consistent semiparametric regression. Econometrica 56, 931–954. Schimek, M.G., 1997. Non- and semiparametric alternatives to generalized linear models. Comput. Statist. 12, 173–191. Schimek, M.G., 1998. Partially linear models: a new algorithm and some simulation results. In: Payne, R., Green, P. (Eds.), Proceedings in Computational Statistics 1998. Physica-Verlag, Heidelberg, pp. 443–448. Schimek, M.G., Schmaranz, K.G., 1994. Dependent error regression smoothing: a new method and PC program. Comput. Statist. Data Anal. 17, 457–464. Speckman, P., 1988. Kernel smoothing in partial linear models. J. Roy. Statist. Soc. Ser. B 50, 413–436. Wahba, G., 1990. Spline Models for Observational Data. SIAM, Philadelphia, PA.