- Email: [email protected]

Statistics & Probability Letters 76 (2006) 1543–1549 www.elsevier.com/locate/stapro

Multivariate partially linear models Beatriz Pateiro-Lo´pez,1, Wenceslao Gonza´lez-Manteiga2 Departamento de Estadı´stica e Investigacio´n Operativa, Facultad de Matema´ticas, Universidad de Santiago de Compostela, 15706, Spain Received 6 May 2005; received in revised form 22 February 2006; accepted 1 March 2006 Available online 18 April 2006

Abstract Univariate partially linear regression models have been widely discussed in recent years. In this paper, we consider a multivariate partially linear regression model under independent errors, where the response variable is d-dimensional. We obtain the asymptotic bias and variance for both the parametric and the nonparametric components. Moreover, we investigate the asymptotic normality of the LS estimator of the parametric component. r 2006 Elsevier B.V. All rights reserved. Keywords: Multivariate regression; Partially linear models; Kernel smoothing

1. Introduction In this paper, we consider regression models where the regression function is the sum of a linear and a nonparametric component. Let n; p 2 N. Let i 2 f1; . . . ; ng. Let D R be a bounded domain. We assume that the design points fi ¼ ðxi1 ; . . . ; xip Þ0 2 Rp and ti 2 D are known and nonrandom. Then, consider the model given by yi ¼ f0i b þ f ðti Þ þ ei ,

(1)

where b ¼ ðb1 ; . . . ; bp Þ0 is an unknown vector of parameters, f is an unknown function, and the error terms ei are assumed to be independent with mean 0 and variance s2 . This model has already been discussed by Engle et al. (1986) to analyze the relationship between electricity sales and temperature. Their approach is based on smoothing splines. Following this work, many authors have focused on the estimation of the parametric and the nonparametric components under different conditions on the error term. Initially, the model with independent errors was tackled (Robinson, 1988; Speckman, 1988). Next, it was extended to include dependence on the error term. First, letting ei be a class of linear processes (Gao, 1995). Next, a semiparametric regression model under long-range dependent errors was considered (Gao and Anh, 1999). For the latter case, the estimation and testing of hypothesis was discussed (Aneiros Pe´rez et al., 2004). Some applications of partially linear models have been described in literature (see, for example, Ha¨rdle et al., 2000). Corresponding author. Tel.: +34 981563100x13390; fax: +34 34597054.

E-mail address: [email protected] (B. Pateiro-Lo´pez). Supported by Ministerio de Educacio´n y Ciencia (Grant no. AP20031217). 2 This research was supported by Grants BFM2002-03213 and PGIDTOOPXI20704PN. 1

0167-7152/$ - see front matter r 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.spl.2006.03.016

ARTICLE IN PRESS B. Pateiro-Lo´pez, W. Gonza´lez-Manteiga / Statistics & Probability Letters 76 (2006) 1543–1549

1544

All the models under consideration are univariate models. In some applications, it may be of interest to work with a multidimensional response variable. For example, in ﬁnance, it is now widely accepted that, working with series, such as asset returns, in a multidimensional framework leads to better results than working with separate univariate models. We may be interested in considering a multivariate partially linear model in this case. For the multivariate case involving d univariate regressions, let Y ¼ ðy1 ; . . . ; yd Þ be an (n d)-matrix of data, X ¼ ðx1 ; . . . ; xp Þ ¼ ðf1 ; . . . ; fn Þ0 an (n p)-matrix of known design points and B ¼ ðb1 ; . . . ; bd Þ a (p d)matrix of unknown coefﬁcients. For each r 2 f1; . . . ; dg, let f r be an unknown function, f r ¼ ðf r ðt1 Þ; . . . ; f r ðtn ÞÞ0 and F ¼ ðf 1 ; . . . ; f d Þ. Finally, let U ¼ ðu1 ; . . . ; ud Þ ¼ ðs1 ; . . . ; sn Þ0 be an (n d)-matrix of errors. Now, we deﬁne the multivariate partially linear model for the n observations. We do it by means of the following matricial equation: (2)

Y ¼ XB þ F þ U.

Without loss of generality, we assume that the domain of interest is D ¼ ½0; 1 and that, for each r 2 f1; . . . ; dg, f r has nX2 continuous derivatives on ½0; 1. The organization of this paper is as follows. In Section 2 we introduce some notation. Section 3 presents the estimates for the model in the univariate case and the proposed estimates for the multivariate case. Section 4 states the main results. Mathematical proofs are given in Appendix A.

2. Notation Now, we ﬁrst introduce some notation. We denote matrices and vectors by boldface letters A and a, respectively, and scalars by italics. Let vecðAÞ denote the vector obtained by stacking the columns of A. Let kAk denote the spectral norm3 of A. Let W ¼ ðwn;h ðti ; tj ÞÞi;j where wn;h ð; Þ is a weight function and the index h is the bandwidth parameter. Let ~ ¼ ðI WÞA. qX1, then, for each (n q)-matrix A, we write A

3. The estimates One of the methods proposed in literature consists of estimate b and f ðtÞ by means of ~ 0 XÞ ~ 1 X ~ 0 y~ , b^ ¼ ðX f^n;h ðtÞ ¼

n X

(3)

^ wn;h ðt; ti Þðyi f0i bÞ.

(4)

i¼1

Under independent errors, it was showed in Speckman (1988) that b^ is n1=2 -consistent and asymptotically normal. Regarding the nonparametric component, it was showed in Speckman (1988) that f^n;h ðtÞ is nn=ð2nþ1Þ consistent for f ðtÞ (assuming that f has nX2 continuous derivatives on ½0; 1). Consider now the multivariate partially linear regression model deﬁned by (2). Generalizing the estimators given by (3) and (4), we propose to estimate B and F ðtÞ ¼ ðf 1 ðtÞ; . . . ; f d ðtÞÞ0 in (2) by means of ^ ¼ ðb^ 1 ; . . . ; b^ d Þ ¼ ðX ~ 0 XÞ ~ 1 X ~ 0 Y, ~ B F^ n;h ðtÞ ¼

n X

wn;h ðt; ti Þðyi1

i¼1

f0i b^1 Þ; . . . ;

(5) n X

!0 wn;h ðt; ti Þðyid

f0i b^d Þ

i¼1

¼ ðf^1n;h ðtÞ; . . . ; f^dn;h ðtÞÞ0 .

3

The spectral norm is deﬁned as the square root of the maximum eigenvalue of A0 A, i.e., kAk ¼

ð6Þ

pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ maximum eigenvalue of ðA0 AÞ.

ARTICLE IN PRESS B. Pateiro-Lo´pez, W. Gonza´lez-Manteiga / Statistics & Probability Letters 76 (2006) 1543–1549

1545

4. Asymptotic behavior As in Gao (1995), among others, we assume that the xik ’s and ti are related via the following regression model. Let n, p 2 N. Let i 2 f1; . . . ; ng, k 2 f1; . . . ; pg. Then, xik ¼ gk ðti Þ þ Zik ,

(7)

where the gk are unknown smooth functions and Zik are random variables with zero mean. Let GðtÞ ¼ ðg1 ðtÞ; . . . ; gp ðtÞÞ0 and g ¼ ðZik Þi¼1;...;n;k¼1;...;p . Let i; j 2 f1; . . . ; ng, k; l 2 f1; . . . ; pg, and r; s 2 f1; . . . ; dg. Then, we have the following assumptions: (A1) The error vectors si are independent with mean vector 0 and matrix of variances and covariances R ¼ ðsrs Þ. (A2) n1 g0 g ! V, where V ¼ ðV kl Þ is positive deﬁnite. Pn Pn (A3) trðW0 WÞ ¼ i¼1 j¼1 wn;h ðti ; tj Þ2 ¼ Oðh1 Þ. (A4) Wgk 2 ¼ Oðh1 Þ ¼ kW0 gk k2 . n (A5) g~ k ðti Þ ¼ hn h1 ðti ÞgðnÞ k ðti Þ þ oðh Þ. (A6) kðI WÞf r k2 ¼ kf~r k2 ¼ Oðnh2n Þ. (A7) n1 g0 f~r ¼ Oðn1=2 hn Þ. (A8) There is a probability density pðtÞ in ½0; 1 such that for each continuous function cðÞ, Z 1 n X lim n1 cðti Þ ¼ cðtÞpðtÞ dt. n!1

0

i¼1

(A9) trðWÞP ¼ Oðh1 Þ. (A10) maxi nj¼1 jWij j ¼ Oð1Þ, max j

n X

jWij j ¼ Oð1Þ.

i¼1

Remark 1. Assumptions (A2)–(A10) are usual in univariate partially linear models under independent errors. Assumptions (A6) and (A7) generalize the conditions for the multivariate case. Theorem 2. Let h ! 0 and nh ! 1 when n ! 1. Then, under Assumptions (A1)–(A8), we have Z 1 2n 1 ^ GðnÞ ðtÞF ðnÞ ðtÞ0 h1 ðtÞ2 pðtÞ dt þ oðh2n Þ þ Oðhn ðnhÞ1=2 Þ. EðBÞ B ¼ h V 0 2

Furthermore, if nh ! 1 and nh4n ! 0, then, for each r; s 2 f1; . . . ; dg, Covðb^ r ; b^ s Þ ¼ srs n1 V1 þ oðn1 Þ. Now, let F^ 0 ðtÞ ¼ ðf^10P ðtÞ; . . . ; f^d0 ðtÞÞ0 be the corresponding estimator of F ðtÞ in case B is known, i.e., for each r r 2 f1; . . . ; dg, f^0 ðtÞ ¼ ni¼1 wn;h ðt; ti Þðyir f0i br Þ. Under ﬁxed design, Covðf^r0 ðtÞ; f^s0 ðtÞÞ ¼ srs

n X i¼1

wn;h ðt; ti Þ2 .

P Results about the limit behavior of ni¼1 wn;h ðt; ti Þ2 can be obtained under mild conditions for the weights fwn;h ðt; ti Þgni¼1 . For example, consider the Gasser–Mu¨ller weights, i.e., for each i 2 f1; . . . ; ng, let ti1 pS i1 pti and Z Si K h ðt uÞ du. wn;h ðt; ti Þ ¼ S i1

ARTICLE IN PRESS B. Pateiro-Lo´pez, W. Gonza´lez-Manteiga / Statistics & Probability Letters 76 (2006) 1543–1549

1546

We assume that (K1) K has support ½1; 1, with Kð1Þ ¼ Kð1Þ ¼ 0 and (K2) maxi jti ti1 j ¼ Oðn1 Þ. Then, 1 Covðf^r0 ðtÞ; f^s0 ðtÞÞ ¼ srs nh

Z

K 2 ðcÞ dc þ oððnhÞ1 Þ.

The previous example, together with some others included in Ha¨rdle (1990), justiﬁes the use, from now on, of weights fwn;h ðt; ti Þgni¼1 such that Biasðf^r0 ðtÞÞ ¼ Oðhn Þ,

(8)

Covðf^r0 ðtÞ; f^s0 ðtÞÞ ¼ OððnhÞ1 Þ. 2

(9) 4n

Theorem 3. Let h ! 0, nh ! 1, and nh ! 0 when n ! 1. Then, under Assumptions (A1)–(A8), we have BiasðF^ n;h ðtÞÞ ¼ BiasðF^ 0 ðtÞÞ½1 þ oð1Þ ¼ Oðhn Þ, and, for each r; s 2 f1; . . . ; dg, Covðf^rn;h ðtÞ; f^sn;h ðtÞÞ ¼ Covðf^r0 ðtÞ; f^s0 ðtÞÞ½1 þ oð1Þ ¼ OððnhÞ1 Þ. Theorem 4. Suppose that either (i) the components of X are uniformly bounded or (ii) there is d40 such that, for each i 2 f1; . . . ; ng and each k 2 f1; . . . ; pg, the model (7) holds with EjZik j2þd oCo1. Then, under Assumptions of Theorem 3 together with (A10), we have d

^ EðBÞÞ ^ ! N pd ð0; R V1 Þ, n1=2 ½vecðB ^ EðBÞÞ ^ denote the vector obtained by stacking the columns of ðB^ EðBÞÞ: ^ where vecðB In particular, for each r 2 f1; . . . ; dg, d

n1=2 ðb^ r Eðb^ r ÞÞ ! N p ð0; srr V1 Þ. Remark 5. All the results in this paper have been obtained assuming that the design points are ﬁxed and nonrandom. When both X i ¼ ðX i1 ; . . . ; X ip Þ and T i are i.i.d. random design points, Theorems 2–4 hold with probability one. Appendix A. Proofs of the main results First, we introduce the following lemma. Lemma 6. Let h ! 0 and nh ! 1 when n ! 1. Then, ~ ! V, (a) n1 X0 X 1 ~ 0 ~ (b) n X X ! V. Proof of Lemma 6. See, for example, Speckman (1988). Proof of Theorem 2. By (5), we have ^ ¼ ðX ~ 0 XÞ ~ 1 X ~ 0Y ~ B ~ 0 XÞ ~ 1 X ~ 0 ðI WÞðXB þ F þ UÞ ¼ ðX ~ 0 XÞ ~ 1 X ~ 0 F~ þ ðX ~ 0 XÞ ~ 1 X ~ 0 U. ~ ¼ B þ ðX

&

ARTICLE IN PRESS B. Pateiro-Lo´pez, W. Gonza´lez-Manteiga / Statistics & Probability Letters 76 (2006) 1543–1549

1547

Now, 0

0

^ ¼ EðBÞ ^ B ¼ ðn1 X ~ XÞ ~ 1 n1 X ~ F. ~ BiasðBÞ ~ 0 F. ~ Its krth element is ^ it sufﬁces to consider the matrix n1 X By Lemma 6, to compute BiasðBÞ, 1 0~ 1 1 x~ k f r ¼ ðð~gk þ ðI WÞgk Þ0 f~ r Þ ¼ ð~g0k f~ r þ g0k f~ r ðWgk Þ0 f~ r Þ. n n n By Assumptions (A5) and (A8), Z 1 1 ðnÞ 2n lim g~ 0k f~ r ¼ h2n h1 ðtÞ2 gðnÞ k ðtÞf r ðtÞpðtÞ dt þ oðh Þ. n!1 n 0 By Assumption (A7), 1 0~ g f r ¼ Oðn1=2 hn Þ. n k Then, by Assumptions (A4) and (A6), we have 1 1 1 ðWgk Þ0 f~ r p kWgk kkf~ r k ¼ Oðh1=2 ÞOðn1=2 hn Þ ¼ OððnhÞ1=2 hn Þ. n n n ^ follows from the last three expressions together with Lemma 6. Finally, the result for BiasðBÞ Now, we move to the asymptotic analysis of the covariance. For each r; s 2 f1; . . . ; dg, we have ~ 1 X ~ 0 ðI WÞðI WÞ0 Xð ~ X ~ 0 XÞ ~ 1 ~ 0 XÞ Covðb^ r ; b^ s Þ ¼ srs ðX ~ 0 XÞ ~ 0 XÞ ~ 0 WX ~ X ~ 0 W0 X ~ 1 þ srs ðX ~ 1 ½X ~ þX ~ 0 WW0 Xð ~ X ~ 0 XÞ ~ 1 . ¼ srs ðX ~ 0 WX, ~ X ~ 0 W0 X, ~ and X ~ 0 WW0 X. ~ By Lemma 6, to compute Covðb^ r ; b^ s Þ, it sufﬁces to consider the matrices X Following the same arguments as in Theorem 2 of Speckman (1988), we have ~ X ~ 0 W0 X ~ þX ~ 0 WW0 X ~ ¼ oðnÞ. ~ 0 WX X Hence, Covðb^ r ; b^ s Þ ¼ srs n1 V1 þ Oðn1 Þ. Proof of Theorem 3. We only prove the expression corresponding to the dispersion matrix of F^ n;h ðtÞ. We have f^rn;h ðtÞ ¼

n X

wn;h ðt; ti Þðyir f0i b^r Þ

i¼1

¼

n X

wn;h ðt; ti Þðyir f0i br Þ þ

i¼1

¼ f^r0 ðtÞ þ

n X

wn;h ðt; ti Þf0i ðbr b^r Þ

i¼1 n X

wn;h ðt; ti Þf0i ðbr b^r Þ

i¼1

¼ f^r0 ðtÞ þ wðtÞ0 Xðbr b^r Þ, where wðtÞ ¼ ðwn;h ðt; t1 Þ; . . . ; wn;h ðt; tn ÞÞ0 . Hence, Covðf^rn;h ðtÞ; f^sn;h ðtÞÞ ¼ Covðf^r0 ðtÞ; f^s0 ðtÞÞ wðtÞ0 X Covðb^r ; f^s0 ðtÞÞ wðtÞ0 X Covðb^s ; f^r0 ðtÞÞ þ wðtÞ0 X Covðb^r ; b^s ÞX0 wðtÞ. Next, we want to show that wðtÞ0 X ! ðg1 ðtÞ; . . . ; gp ðtÞÞ. n!1

(A.1)

ARTICLE IN PRESS B. Pateiro-Lo´pez, W. Gonza´lez-Manteiga / Statistics & Probability Letters 76 (2006) 1543–1549

1548

Now, 0

wðtÞ X ¼

n X

wn;h ðt; ti Þf0i

¼

i¼1

n X

wn;h ðt; ti Þxi1 ; . . . ;

i¼1

n X

! wn;h ðt; ti Þxip ,

i¼1

where, for each k 2 f1; . . . ; pg, n X

wn;h ðt; ti Þxik ¼

n X

i¼1

wn;h ðt; ti Þðgk ðti Þ þ Zik Þ ¼ gk ðtÞ g~ k ðtÞ þ wðtÞ0 gk .

i¼1

Under Assumptions (A4) and (A5), the last two terms converge to zero as n ! 1 . Hence, (A.1) holds.

&

By Theorem 2 and (A.1), we have jwðtÞ0 X Covðb^r ; b^s ÞX0 wðtÞjpkX0 wðtÞk2 kCovðb^r ; b^s Þk ¼ Oðn1 Þ.

(A.2)

Now, by (A.1) and (9), for each k 2 f1; . . . ; pg, jCovðb^ kr ; f^s0 ðtÞÞjpVarðb^ kr Þ1=2 Varðf^s0 ðtÞÞ1=2 ¼ Oðn1 Þ1=2 OððnhÞ1 Þ1=2 . Hence, jwðtÞ0 X Covðb^r ; f^s0 ðtÞÞjpkX0 wðtÞkkCovðb^r ; f^s0 ðtÞÞk ¼ Oðn1=2 ðnhÞ1=2 Þ.

(A.3)

By (A.2), (A.3), and (9) wðtÞ0 X Covðb^r ; f^s0 ðtÞÞ ¼ oð1Þ, Covðf^r ðtÞ; f^s ðtÞÞ 0

0

wðtÞ0 X Covðb^r ; b^s ÞX0 wðtÞ ¼ oð1Þ. Covðf^r ðtÞ; f^s ðtÞÞ 0

0

Hence, the result holds.

&

Proof of Theorem 4. We have ^ ¼ ½Id ðX ~ 0 XÞ ~ 0 ðI WÞvecðUÞ. ~ 1 ½Id X vecðB^ EðBÞÞ By Lemma 6, we only need to show that, given a 2 Rp d nf0g, d ~ 0 ðI WÞvecðUÞ ! n1=2 a0 ½Id X Nð0; a0 ðR VÞaÞ.

(A.4)

~ 0 ðI WÞ. Note that c0 ¼ ðc0 ; . . . ; c0 Þ where, for each r 2 f1; . . . ; dg, c0 ¼ ðcr1 ; . . . ; crn Þ. Then, Let c0 ¼ a0 ½Id X 1 r d ~ 0 ðI WÞ vecðUÞ ¼ c0 vecðUÞ ¼ a0 ½Id X

d X

c0r ur ¼

r¼1

n X

w0i si ,

i¼1

where, for each i 2 f1; . . . ; ng, w0i ¼ ðc1i ; . . . ; cdi Þ. By Assumption (A1), we have a sum of independent variables w0i si with Eðw0i si Þ ¼ 0 and Varðw0i si Þ ¼ w0i Rwi . Hence, ! n X 0 ~ 0 ðI WÞðI WÞ0 Xa. ~ Var w si ¼ a0 ½R X i

i¼1

Let n ! 1. By Assumptions (A2)–(A4), ~ 0 ðI WÞðI WÞ0 Xa ~ ! a0 ½R Va. n1 a0 ½R X

ARTICLE IN PRESS B. Pateiro-Lo´pez, W. Gonza´lez-Manteiga / Statistics & Probability Letters 76 (2006) 1543–1549

1549

Now, if we show that max1pipn Varðw0i si Þ ¼ oðnÞ, then, from the Lindeberg condition we get (A.4). Let er 2 Rd be the vector where all components equal 0 except for component number r, which is 1. Then, d X 0 2 2 max jwi Rwi jpkRk2 max cri pkRk2 max d max cri 1pipn

1pipn

1pipn

r¼1

¼ dkRk2 max

1prpd

2 max jcri j

1pipn

1prpd

¼ dkRk2 max kðe0r In Þck21 1prpd

¼ dkRk2 max kðe0r ðI WÞ0 ðI WÞXÞak21 . 1prpd

Denote kðe0r

ðe0r

p

Ip Þa 2 R by vr . We have

ðI WÞ0 ðI WÞXÞak1 ¼ kðI WÞ0 ðI WÞXðe0r Ip Þak1 pð1 þ kW0 k1 Þð1 þ kWk1 ÞkXvr k1 .

By Assumption (A10), kWk1 and kW0 k1 are uniformly bounded. Following the same arguments of Theorem 4 in Speckman (1988), we have that kXvr k1 is also bounded. Hence, kðe0r ðI WÞ0 ðI WÞXÞak1 ¼ Oð1Þ. Hence, (A.4) holds and, by Crame´r–Wold device, we have d

~ 0 ðI WÞ vecðUÞ ! N pd ð0; R VÞ. n1=2 ½Id X Last convergence result, together with Lemma 6, concludes the proof for the asymptotic normality ^ & of B. ^ Now, for the asymptotic distribution of b^ r , it sufﬁces to take into account that b^ r ¼ ðe0r Ip Þ vecðBÞ. References Aneiros Pe´rez, G., Gonza´lez Manteiga, W., Vieu, P., 2004. Estimation and testing in a partial linear regression model under long-memory dependence. Bernoulli 10 (1), 49–78. Engle, R.F., Granger, C.W.J., Rice, J., Weiss, A., 1986. Semiparametric estimates of the relation between weather and electricity sales. J. Amer. Statist. Assoc. 81 (394), 310–320. Gao, J.T., 1995. Asymptotic theory for partly linear models. Comm. Statist. Theory Methods 24 (8), 1985–2009. Gao, J.T., Anh, V.V., 1999. Semiparametric regression under long-range dependent errors. J. Statist. Plann. Inference 80 (1–2), 37–57. Ha¨rdle, W., 1990. Applied Nonparametric Regression. Cambridge University Press, Cambridge. Ha¨rdle, W., Liang, H., Gao, J., 2000. Partially Linear Models. Physica-Verlag, Wurzburg. Robinson, P.M., 1988. Root-N-consistent semiparametric regression. Econometrica 56 (4), 931–954. Speckman, P., 1988. Kernel smoothing in partial linear models. J. Roy. Statist. Soc. Ser. B 50 (3), 413–436.