Robust estimation for survival partially linear single-index models

Robust estimation for survival partially linear single-index models

Computational Statistics and Data Analysis 80 (2014) 140–152 Contents lists available at ScienceDirect Computational Statistics and Data Analysis jo...

493KB Sizes 0 Downloads 35 Views

Computational Statistics and Data Analysis 80 (2014) 140–152

Contents lists available at ScienceDirect

Computational Statistics and Data Analysis journal homepage: www.elsevier.com/locate/csda

Robust estimation for survival partially linear single-index models Xiaoguang Wang ∗ , Xinyong Shi School of Mathematical Sciences, Dalian University of Technology, Dalian, China

article

info

Article history: Received 29 May 2013 Received in revised form 11 June 2014 Accepted 27 June 2014 Available online 5 July 2014 Keywords: Local linear regression M-estimation Right censoring Semiparametric models Synthetic data

abstract The partially linear single-index model is an interesting semiparametric model extended by the partially linear model and the single-index model, which supply a good balance between flexibility and parsimony. A robust estimation is proposed to fit the partially linear single-index model in case outliers may occur in the right censored response. This method provides a flexible way for modeling survival data. It is a profile M-estimation version and the estimation procedure involves transforming the censored data into synthetic data at first, then it results in fitting the common partially linear single-index models by a robust loss function. Asymptotic properties for the estimators of the linear and single-index coefficients and the optimal rate of convergence for the estimator of the nonparametric function are established. The finite sample performance of the proposed method is assessed by Monte Carlo simulation studies, and demonstrated by the analyses of PBC data and NCCTG lung cancer data. © 2014 Elsevier B.V. All rights reserved.

1. Introduction In survival analysis, difficulties arise when the traditional regression models are used to analyze survival data in practice, since survival data are often not completely observed. Let Y be the completely observed variable of interest, such as the survival time or some transformation of the survival time. In many applications, especially in biomedical studies, Y cannot be completely observed due to possibly censoring, for instance, withdrawal of patients from the study, or death from a cause unrelated to the specific disease of being studied, etc. In this paper, we focus on the right censoring mechanism rather than the left censoring although the methodology can be directly extended to that. Let C denote the censoring variable. Due to the right censoring mechanism, we only observe (V , ∆), where V = min(Y , C ) and ∆ = I (Y ≤ C ) are the observed (possibly censored) response variable and the censoring indicator, respectively. When a set of predictors are considered, the accelerated failure time (AFT) model is an attractive alternative to the commonly-used Cox proportional hazards model and it is framed as a usual linear model with the response variable as the logarithm of the failure time (Kalbfleisch and Prentice, 2001). It postulates that the effect of explanatory variables is to multiply the predicted event time by some constant and that is more appealing in many ways. Semiparametric estimation in the AFT model with an unspecified error distribution has been studied extensively in the literature for common right censored data. In particular, some approaches in this aspect have received special attention including the Buckley–James iterative method, the rank-based method, and the inverse probability weighting (IPW) method (Miller, 1976; Buckley and James, 1979; Koul et al., 1981; Ritov, 1990; Tsiatis, 1990; Ying, 1993; Zhou, 2006). Another approach to fit the right censored data is to replace the possibly unavailable response Y with surrogate values by an appropriate mapping of the observed data. The key requirement is that the mapping is approximately, what Fan and Gijbels (1994) termed, a censoring



Correspondence to: 2 Linggong Road, Ganjingzi District, Dalian 116024, China. Tel.: +86 411 84708351 8123. E-mail addresses: [email protected] (X. Wang), [email protected] (X. Shi).

http://dx.doi.org/10.1016/j.csda.2014.06.020 0167-9473/© 2014 Elsevier B.V. All rights reserved.

X. Wang, X. Shi / Computational Statistics and Data Analysis 80 (2014) 140–152

141

unbiased transformation. Also see Koul et al. (1981), Leurgans (1987), and Zhou (1992). Also see Leiva et al. (2007) and Shang et al. (2013). Robust methods are also proposed for the AFT model. Powell (1984) proposed the least absolute deviation method for the fixed censoring times. For random censoring, Ying et al. (1995) modified quantile estimating equations by assuming independence between survival and censoring times. Portnoy (2003) proposed censored quantile regression by redistributing censored data. By using the martingale structure of right censoring time, Peng and Huang (2008) developed a quantile estimating equation. Due to the analytical difficulty and computational complexity of quantile regressions, we propose an M-estimation to fit the AFT model. For more flexible situation, a nonlinear structure of covariate is employed in the AFT model, that is a semiparametric regression model. It can relax some restrictive assumptions on parametric models and is flexible enough to capture the true underlying relationships between explanatory variables and response in dealing with complex real data. One of the most popular and important semiparametric models is the partial linear single-index model, which is an important generalization of the traditional linear model and the single-index model. It is reduced to a partially linear model (Engle et al., 1986; Härdle et al., 2000), or the single-index model (Härdle et al., 1993; Ichimura, 1993) as special cases. Because of recent advances in the semiparametric regression, Jin et al. (2003, 2006) and Zou et al. (2011) developed the estimation method in the partial linear AFT model. Lu et al. (2006) and Lu and Cheng (2007) proposed estimation for the partial linear single-index model. However, the existing estimation procedures may be sensitive to outliers and their efficiency may be significantly deduced for non-normal errors. This paper focuses on the robust estimation approach for the partially linear single-index AFT model. We consider the following randomly censored survival partially linear single-index model Y = g (XT β) + ZT α + ε,

(1)

where Y is the survival time or the logarithm survival time, θ = (β, α) is a vector of unknown parameters in R with ∥β∥ = 1 (where ∥ · ∥ denotes the Euclidean norm), X ∈ Rp , Z ∈ Rq , g (·) is an unknown univariate link function, and ε is a random error with mean zero. The restriction ∥β∥ = 1 on the single-index coefficients is required for the parameter identifiability. We assume without loss of generality that the first component of β is positive, i.e., the parametric space of β is 2 = {β = (β1 , β2 , . . . , βp )T , ∥β∥ = 1, β1 > 0} for ensuring identifiability. Assume that (X, Z) and ε are independent and C is independent of (Y , X, Z). Remind V = min(Y , C ) and ∆ = I (Y ≤ C ). The observations {(Vi , Xi , Zi , ∆i ), i = 1, . . . , n} is a random sample from the population {(V , X, Z, ∆)}. To resist the potential outliers, we focus on the M-estimation (Huber, 1964) to fit the partially linear single index survival model. The basic idea of our robust fit is to replace the least square loss function with a Huber type loss function. Hence, one of the building blocks of our proposal is the robust estimation. After transforming the censored data into synthetic data unbiasedly, we fulfill the M-estimation for the semiparametric model by an iterative method. The local linear regression technique is applied to approximate the nonparametric single index function given the parametric part. Then we plug in the function estimator and compute the parametric part. These two steps are both obtained by a Huber type loss function. Via this iterative procedure, the nonparametric component and the parametric component can be estimated. The rest part of the paper is as follows. Section 2 describes our robust estimation procedure. The asymptotic properties of these estimators are studied. Section 3 provides a computation algorithm and illustrates the estimation performance with Monte Carlo simulation results. In Section 4, we evaluate the proposed method by analyzing PBC data and NCCTG lung cancer data. Section 5 ends the paper with a brief discussion. All the conditions and technical proofs of the main results are deferred to the Appendix. p+q

2. Estimation procedure and main results In this section, we propose a robust estimation procedure and establishing its asymptotic properties. The main motivation is to generalize the robust estimating method for partially linear single-index models to survival time data. We adopt the Fan and Gijbels (1994) method to transform the right censored response to the unbiased synthetic data. In order to estimate the extra nonparametric component g, we use the local linear regression technique to construct a local objective function. Then we develop the estimation for parametric parts. The new profile estimation procedure naturally combines the survival analysis and the partially linear single-index model in one unified framework, which greatly facilitates the complicated data analysis and model inferences. 2.1. Transformation of the data

¯ = 1 − G and F¯ = 1 − F are the survival functions of the right We construct the synthetic data in this subsection. Let G ¯ is censoring variable C and the survival time variable Y , respectively, where G(t ) = P (C ≤ t ) and F (t ) = P (Y ≤ t ). When G unknown, we can estimate it by the Kaplan–Meier product-limit estimator. Denote τG = inf{t : G(t ) = 1} and τF = inf{t :  ˆ (V −)) and T2 = ∞ {I [V ≥ s]/(1 − Gˆ (s−)) − F (t ) = 1}. We suppose τF ≤ τG throughout this paper. Denote T1 = V /(1 − G −∞ ˆ (·−) is the left-continuous version of the Kaplan–Meier product-limit estimator defined as I [s < 0]}ds, where 1 − G ˆ (t ) = 1−G

n  i=1,V(i) ≤t



n−i n−i+1

I [∆(i) =0]

,

142

X. Wang, X. Shi / Computational Statistics and Data Analysis 80 (2014) 140–152

V(1) ≤ V(2) ≤ · · · ≤ V(n) are the order statistics of Vi ’s and ∆(i) is the ∆i associated with V(i) , i = 1, 2, . . . , n. We construct the synthetic data as follows, VGˆ = 1ϕ1 + (1 − ∆)ϕ2 ,

(2)

where ϕ1 = (1 + λ)T2 − λT1 , ϕ2 = (1 + λ)T2 , and λ is a tuning parameter. This transformation for random censored data is introduced by Fan and Gijbels (1994), and it will be reduced to the Koul et al. (1981) transformation KSV = 1T1 and the Leurgans (1987) transformation L = T2 , respectively, given the tuning parameter λ = −1 and λ = 0, respectively. An appropriate choice of λ reduces the variability of the transformed data. The choice of λ will be used in our implementations, which is recommend by Fan and Gijbels (1994),

∞

≥ s]/(1 − Gˆ (s−)) − I [s < 0]}ds − Vi ,  {i:∆i =1} V /(1 − G ˆ (Vi −)) − ∞ {I [Vi ≥ s]/(1 − Gˆ (s−)) − I [s < 0]}ds i −∞ −∞ {I [Vi

λˆ = min

which leads us to take the largest λ such that the synthetic response ViGˆ ≥ Vi for the uncensored response. Without loss of ∞ generality, we assume that Yi ≥ 0 and Ci ≥ 0, which leads the integration in T2 starts at 0, thus T2 = 0 I [Vi ≥ s]/(1 −

ˆ (s−))ds. We replace the observed data {(Vi , Xi , Zi , ∆i ), i = 1, . . . , n} by the transformed data {(V ˆ , Xi , Zi ), i = 1, . . . , n} G iG in our estimate method. Note that E {VG |X, Z} = E {Y |X, Z} when G is known, hence the above data transformations are unbiased. 2.2. Estimation method

By the data conversion, we obtain the synthetic data or pseudo-responses, then proceed to estimate the parameters and the nonparametric function. Firstly, we develop the estimation for the nonparametric part g. Given the parametric part θ , we estimate g and g ′ using the local linear fit. In a neighborhood of a fixed point t, denote a0 = g (t ) and a1 = g ′ (t ), respectively. We can write g (XT β) ≈ g (t ) + g ′ (t )(XT β − t ) = a0 + a1 (XT β − t ). Denote K (·) as a kernel function and Kh (·) = h−1 K (·/h), where h is the bandwidth depending on the sample size n. We use one-step local M-estimator to minimize the following objective function with respect to (a0 , a1 ), n    ρ ViGˆ − a0 − a1 (XTi β − t ) − ZTi α Kh (XTi β − t ),

(3)

i=1

where ρ(u) is a suitable chosen loss function, which is the key part of the well-known M-estimation (Huber, 1964). Usually we assume ρ(u) is convex and symmetric, and its natural alternatives are ρ(u) = u2 and ρ(u) = |u|. In the later practical applications, we use the Huber loss function (Huber, 1964)

ρ(u) =

 1   u2 ,

|u| ≤ k,

2

 k|u| − 1 k2 , 2

|u| > k,

and

ψ(u) = ρ ′ (u) =



u, k · sign(u),

|u| ≤ k, | u| > k ,

(4)

where k is a tuning parameter. Note that ψ(u) is not differential, however, some smoothing versions can be found, such as Hampel et al. (2011). Write the local estimators aˆ 0 ≡ aˆ 0 (t ; h, θ) and aˆ 1 ≡ aˆ 1 (t ; h, θ) of a0 and a1 , respectively, that is gˆ (t ) = aˆ 0 and gˆ ′ (t ) = aˆ 1 . Furthermore, we proceed to the estimation of the parametric part θ . Since the parametric space of β is 2 = {β = (β1 , β2 , . . . , βp )T , ∥β∥ = 1, β1 > 0}, without loss of generality, the first component β1 can be eliminated and the parametric p  2 1/2 , β2 , . . . , βp ), pr=2 βr2 < 1}. Denote β(1) = (β2 , . . . , βp )T . This space can be rearranged to a form {((1 − r =2 βr ) reparametrization method makes β(1) an inner point and it is the key to analyzing the asymptotic properties of the estimator for β and to facilitating an efficient computational algorithm. Hence the restricted estimation of the parameter θ is equivalent to estimating the (p + q − 1)-dimensional vector θ (1) = (β(1) , α) by minimizing the following objective function without restriction Qˆ n (θ) =

n    ρ ViGˆ − gˆ (XTi β) − ZTi α .

(5)

i=1

The function (5) is the direct analogue of the ideal objective function Qn (θ) for a known g, in which it is calculated by replacing gˆ (t ) with g (t ).

X. Wang, X. Shi / Computational Statistics and Data Analysis 80 (2014) 140–152

The following notations will be used. Let χ be the σ -field generated from {X1 , . . . , Xn , Z1 , . . . , Zn }, and J = Jacobian matrix of size p × (p − 1) with

 −β

J=

(1)T

143

∂β ∂β(1)

be the

  (1) 2 / 1 − ∥β ∥ . Ip−1

Denote fXT β (x β) as the density function of random variable XT β. We choose the kernel function K (·) as a symmetric density function and let T

µj =



uj K (u)du and

νj =



uj K 2 (u)du,

j = 0, 1, 2, . . . .

Note that µj = 0 if j is odd. In what follows, we write A⊗2 = AAT for a matrix A,  X = X − E (X|XT β),  Z = Z − T T T T T   E (Z|X β), X0 = X − E (X|X β0 ), Z0 = Z − E (Z|X β0 ), hX (·) = E (X|X β = ·), hZ (·) = E (Z|X β = ·) and Sn,k =

n

i =1

(XTi β − t )k Kh (XTi β − t ), k = 1, 2. Further, hˆ X (t ) and hˆ Z (t ) are the local linear estimators of hX (t ) and hZ (t ), i.e.,   n n n n   ˆ ˆhX (t ) = bi (t ), bi (t ) and hZ (t ) = bi (t )Zi bi (t )Xi i=1

i=1

where bi (t ) = {Sn,2 (t ) − (

XTi

i=1

i=1

β − t )Sn,1 (t )} (

Kh XTi

β − t ).

Next, the score function of (5) is shown. It is worth noting that

∂ gˆ (XT β) ∂β(1)

converges to g ′ (XT β)JT (X − E (X|XT β)) and

∂ gˆ (XT β) ∂α (1)

converges to E (Z|XT β) with probability tending to one, which indicates the derivatives of the estimators with respect to β and α converge instead of the corresponding derivatives of the functions. It is reasonable since the support of gˆ depends on the parameter β(1) but not on α (Cui et al., 2011). This feature guarantees the asymptotic properties of the parametric estimators. Proposition 1. Under conditions (A1)–(A5) given in the appendix, the score function of Qˆ n (θ) can be expressed as

 Qn (θ) =

ˆ′

 ∂ Qˆ n (θ) ∂ Qˆ n (θ) , , ∂α ∂β(1)

(6)

where

∂ Qˆ n (θ) ∂β

(1)

=

n 

JT gˆ ′ (XTi β){Xi − hˆ X (XTi β)}ψ ViGˆ − gˆ (XTi β) − ZTi α ,





i =1

n   ∂ Qˆ n (θ)  {Zi − hˆ Z (XTi β)}ψ ViGˆ − gˆ (XTi β) − ZTi α . = ∂α i =1

Moreover, the population version of Qˆ n′ (θ) is Qn′ (θ) =



∂ Qn (θ) ∂ Qn (θ) , ∂α ∂β(1)

 (7)

where

∂ Qn (θ) ∂β

(1)

=

n 

JT g ′ (XTi β) Xi ψ ViGˆ − g (XTi β) − ZTi α ,





i =1

n   ∂ Qn (θ)   = Zi ψ ViGˆ − g (XTi β) − ZTi α . ∂α i =1

These two terms are obtained by replacing gˆ , gˆ ′ , hˆ X , hˆ Z with g , g ′ , hX , hZ in (6). 2.3. Asymptotic properties The following theorems state the asymptotic properties of the estimators obtained based on the objective function given in (5). The necessary conditions are given in the Appendix. (1) ˆ . If conditions (A1)–(A5) are = (βˆ , α) (1) satisfied, the bandwidth h tends to 0 such that nh/(log h) → ∞ as n → ∞, then θˆ converges in probability to the true (1) parameter θ 0 .

Theorem 1. Assume the objective function (5) has a unique solution and denote it by θˆ

(1)

144

X. Wang, X. Shi / Computational Statistics and Data Analysis 80 (2014) 140–152

Note that the objective function (5) may not be convex, hence a global minimum cannot be guaranteed. However, we should assume it has a unique solution to ensure the consistency property. This is a common requirement in nonlinear regressions. By the fact that the objective function is a difference of two convex functions, the proof for the case with a convex objective function can be extended. Furthermore, we have the asymptotically normal property. Theorem 2. If conditions (A1)–(A6) are satisfied, the bandwidth h tends to 0 such that nh8 → 0 and nh3+3/(m−1) / log n → ∞ as n → ∞, then



n(θˆ

(1)

− θ (01) ) → Np+q−1 (0, S−1 RS−1 ), d

d

where S = E {B⊗2 ψ ′ (ϵG )}, R = E {(Bψ(ϵG ))⊗2 }, B and ϵG are defined in the conditions (A6) and (A7), and ‘‘ →’’ denotes the ˆ in all terms associated with G, we convergence in distribution. If G is unknown and replaced by the Kaplan–Meier estimator G have



n(θˆ

(1)

− θ (01) ) → Np+q−1 (0, S−1 (R + Ω (τF ))S−1 ), d

where Ω (τF ) is defined in the condition (A7). Remark 1. When we choose the loss function ρ(u) = u2 , then ψ(u) = 2u, and we obtain the following result:



n(θˆ

(1)

− θ (01) ) → Np+q−1 (0, S−1 RS−1 ), d

where S = E {B⊗2 }, R = E {(BϵG )⊗2 }. If G is unknown, we have



(1)

− θ (01) ) → Np+q−1 (0, S−1 (R − 41 (τF ))S−1 ), V where 41 (τF ) is defined as following. Let ξ (G, s, V ) = [ s {1 − G(t −)}−1 dt ]I [s < V ], η1 (s) = n(θˆ

d

then

41 (τF ) =

τF



E [B{(1+λ)ξ (G,s,V )−λ1T1 }I [s
{η1 (s)}⊗2 (1 − F (s−))dG(s).

0

The asymptotic normality of θˆ follows from Theorem 2 with a simple application of the multivariate delta method, since

ˆ is βˆ 1 = the first component of β

ˆ 1 − ∥β

(1)

∥2 .

Corollary 1. Under the conditions of Theorem 2, we have



d

n(θˆ − θ 0 ) → Np+q (0, 6),

where 6 = blockdiag (J, Iq )S−1 RS−1 blockdiag (J, Iq )T . If G is unknown, we have 6 Ω (τF ))S−1 blockdiag (J, Iq )T .

=

blockdiag (J, Iq )S−1 (R +

Remark 2. It is easy to verify that the proposed estimator of β0 is asymptotically more efficient than those of Lu and Cheng (2007) when ρ(u) = u2 , according to Corollary 1 and Remark 1. Similar conclusions can be found in Cui et al. (2011) and Wang et al. (2010). Using the results given by Fan and Gijbels (1994) for univariate censored nonparametric regression, we can obtain a consistent estimator gˆ of g. Theorem 3. Under Conditions (A1)–(A3), given χ , the mean of error term is 0, and assume that h → 0 and nh → ∞, the asymptotic conditional bias and variance of gˆ are given by E [{ˆg (X β) − g (X β)} |χ] = T

T

2



1 2

h µ2 g (X β) 2

′′

T

2 +

ν0 + op (h4 + 1/(nh)), nhfXT β (xT β)

and assume that h → 0 and nh → ∞, the asymptotic conditional bias and variance of gˆ ′ are given by 3

 E [{ˆg (x β) − g (x β)} |χ ] = ′

T



T

2

h2 µ4 6µ2

+

 g (x β) + 3g (x β) ′′′

T

ν2 nh3 µ22 fXT β (xT β)

′′

T

fX′ T β (xT β)

2

fXT β (xT β)

+ op (h4 + 1/(nh3 )).

Statistical inference on θˆ is of practical importance. However, it is seen from Corollary 1 that the asymptotic covariance matrix of θˆ takes a complex form and is difficult to estimate in finite samples. For practical implementation, we adopt a

X. Wang, X. Shi / Computational Statistics and Data Analysis 80 (2014) 140–152

145

simple nonparametric 0.632 bootstrap approach through resampling the real data with replacement. The performance is shown to be satisfactory in real data analysis carried out in Section 4. 3. Numerical studies In this section, a large-scale simulation study is carried out to estimate the regression coefficients in a survival partially linear single-index model. We are interested in the behavior of the estimators for the parametric part as well as that for the nonparametric single-index function. A computational algorithm is introduced at first. Two simulation studies are conducted to test both the capacity of dealing with right censored data and the robustness. Most simulation settings are adopted from other papers. R software is used for programming. The Epanechnikov kernel function K (t ) = 3/4(1 − t 2 )2 I(|t |<1) is adopted, and the Huber constant k = 4.685 is fixed to compute the estimates throughout this section and Section 4. 3.1. Computational algorithm To obtain the synthetic data given in (2), we replace the observed data {(Vi , Xi , Zi , ∆i ), i = 1, . . . , n} by the synthetic data {(ViGˆ , Xi , Zi ), i = 1, . . . , n}. Minimizing the objective function (3), we have the nonparametric part estimator function gˆ (t ) and gˆ ′ (t ), which depends on θ implicitly. To estimate the parametric part θ , a fixed point iterative algorithm is used to solve the estimating equation (7). ∂ Q (θ) For β, rewrite the estimating function as n(1) = JT Fˆ (β) with Fˆ (β) = (Fˆ1 (β), . . . , Fˆp (β))T where ∂β

Fˆj (β) =

n 

g ′ (XTi β) Xji ψ ViGˆ − g (XTi β) − ZTi α ,





j = 1, . . . , p,

i=1

and  Xji is the jth component of  Xi . Hence we can obtain the iterative equation

β=

M Fˆ1 (β)/∥Fˆ (β)∥ + M

β+

|Fˆ1 (β)|/∥Fˆ (β)∥2 ˆ F(β), Fˆ1 (β)/∥Fˆ (β)∥ + M

where M is a chosen number such that [Fˆ1 (β)/∥Fˆ (β)∥ + M ] ̸= 0. For α, solving the estimating equation, we can also obtain the iterative equation

α = ( ZT WZ)−1 ZT W Y, where W = diag(ωi ), i = 1, . . . , n, and

   T T   ψ ViGˆ − gˆ (Xi β) − Zi α , ωi = ViGˆ − gˆ (XTi β) − ZTi α   1,

ViGˆ ̸= gˆ (XTi β) + ZTi α, ViGˆ = gˆ (XTi β) + ZTi α.

To implement the estimation easily, an iterative algorithm is summarized as following. Step 1. Transform the censored survival time into the pseudo-response ViGˆ . Regard them as complete ones to choose initial

˜ and β˜ for α and β, respectively, by the MAVE estimator with the restriction ∥β∥ = 1. estimates α Step 2. Obtain gˆ (t ) = aˆ 0 by (3) with respect to a0 and a1 . ˜ and β˜ under ∥β∥ = 1 by minimizing the function (5) with the fixed point iteration Step 3. Update α α˜ = ( ZT WZ)−1 ZT W Y, β˜ =

˜ ˜ 2 |Fˆ1 (β)|/∥ Fˆ (β)∥ ˜ β˜ + Fˆ (β). ˜ Fˆ (β)∥ ˜ +M ˜ Fˆ (β)∥ ˜ +M Fˆ1 (β)/∥ Fˆ1 (β)/∥ M

Step 4. Repeat Steps 2 and 3 until convergence. Denote the final estimate of θ by θˆ and the final estimate of g (t ) by gˆ (t ) = aˆ 0 . Bandwidth selection is always crucial in the local smoothing techniques since it governs the curvature of the fitted function. Theoretically, when the sample size is large, the optimal bandwidth could be derived by minimizing the mean squared error (MSE). However, the optimal bandwidth cannot be calculated directly due to several unknown quantities and the implementation is also computationally expensive. By the conditions of the bandwidth in Theorem 2, we can take an optimal bandwidth h = 3(log(n)/n)1/5 . Since the asymptotic covariance matrix is complicated, we alternatively present a bootstrap standard error estimation for parametric estimators in real dataset examples. We estimate the variance by the nonparametric 0.632 bootstrap in which one samples size is 0.632n from the n observations without replacement. One bootstrap estimator is obtained under a replicate. After proper scale adjustment, the sample variance of the bootstrap estimators provides an estimate of the variance ˆ and βˆ . of α

146

X. Wang, X. Shi / Computational Statistics and Data Analysis 80 (2014) 140–152 Table 1 Scenario 1 simulation results with ε ∼ N (0, 0.5). True values of α and β are shown in parenthesis. Censoring 20%

n 50

200

400

40%

50

200

400

MEAN BIAS SE MEAN BIAS SE MEAN BIAS SE MEAN BIAS SE MEAN BIAS SE MEAN BIAS SE

α1 (=0.8)

α2 (= − 1.5)

β1 (=0.447)

β2 (=0.894)

β3 (=0)

0.717 −0.083 0.430 0.795 −0.005 0.137 0.803 0.003 0.076

−1.356

0.524 0.077 0.190 0.446 −0.001 0.052 0.450 0.003 0.025

0.852 −0.043 0.292 0.895 0.001 0.026 0.893 −0.001 0.013

0.018 0.018 0.202 0.005 0.005 0.068 0.001 0.001 0.049

0.781

−1.400

0.829

−0.019

0.100 0.587 −1.467 0.033 0.195 −1.496 0.004 0.132

0.055 0.055 0.263 0.004 0.003 0.081 0.005 0.005 0.056

0.582 0.795 −0.005 0.198 0.803 0.003 0.148

0.144 0.448 −1.502 −0.002 0.128 −1.504 −0.004 0.073

0.557 0.110 0.232 0.445 −0.002 0.083 0.456 0.009 0.047

−0.066 0.421 0.895 0.001 0.042 0.890 −0.005 0.024

3.2. Monte Carlo simulations The censored partially single-index model is stated as following: Y = g (XT β) + ZT α + ε with p = dim(Z) = 2 and q = dim(X) = 3. We use explanatory variables XT = (X1 , X2 , X3 ), where X1 , X2 , X3 are independent and uniformly distributed on [0, 1], and ZT = (Z1 , Z2 ), where Z1 , Z2 are independent and follow a Bernoulli distribution with 0.5 probability of being 1. The true nonparametric function g (u) = 8 exp(u) − (3u)3 /3. Scenario 1. First, to verify the transformation ability for the right censored data, √ √ we selected ε ∼ N (0, 0.5) in our simulations, the true parameters are set as α = (0.8, −1.5)T and β = (1/ 5, 2/ 5, 0)T = (0.447, 0.894, 0)T . The censoring distribution is selected to be N (µ, 22 ). Choose different µ to ensure the desired censoring proportions 20% and 40%. Set two sample sizes, n = 50, 200 and 400. Table 1 summarizes the results over 500 simulations, including the sample mean (MEAN), bias (BIAS) and Monte Carlo standard error (SE). The Monte Carlo standard error is an estimate of the inaccuracy of Monte Carlo samples. Overall, from the results in Table 1, our method behaves very well on the estimation of regression coefficients. For the same censoring proportions, the BIAS and the SE tend to be smaller for larger sample sizes. For example, when the censoring rate is 20% and the sample size n = 50, the BIAS and the SE of α1 are −0.083 and 0.430. As the sample size n increases to 200, the BIAS and the SE of α1 are −0.005 and 0.137. As n increases to 400, the BIAS and the SE of α1 decrease to 0.003 and 0.076. Similar results are observed for the 40% censoring rate case. For the same sample size, when the censoring proportion increases, the BIAS and the SE tend to be increased. For example, when the sample size n = 200 and the censoring rate is 20%, the BIAS and the SE of α2 are −0.002 and 0.128. Increase the censoring rate to 40%, the BIAS and the SE of α2 are 0.033 and 0.195. We can conclude that when the censoring proportion is high and the sample size is small, the BIAS and the SE tend to be much larger, which suggests that large samples and lower censoring proportion would be required to reduce the BIAS and the SE. Scenario 2. To examine the robustness of our proposed method, we vary the distribution of ε from the normal distribution N (0, 0.25), to the t-distribution t (2) and the Cauchy distribution Cauchy(0, 1) to compare with the least square estimator (LS) by Lu and Cheng (2007) and the MAVE estimator by Xia and Härdle (2006) with√ our robust with the Huber √ estimator √ objective function (Huber). The true parameters are set as α = (1, 2)T and β = (1/ 3, 1/ 3, 1/ 3)T = (0.577, 0.574, 0.577)T . According to different error distributions, the censoring distributions are selected to be their corresponding distribution type. This simulation is repeated 500 times. We fix the sample size n = 300, and censoring proportions are 20% and 40%, respectively. The results are presented in Table 2, with the corresponding standard errors in parenthesis. In Table 2 we show the results of three different methods when the censoring proportion is 20% and 40%. We can observe that the Huber method outperforms the other two methods in general, especially when the normal distribution assumption of the error term is mildly or severely violated. When ε ∼ N (0, 0.25), the Huber method performs competitively with the LS method and the MAVE method. The difference among these three approaches is negligible. However, when ε ∼ t (2), especially when ε ∼ Cauchy(0, 1), both the LS method and the MAVE method present much worse results than the Huber method. It is also worth mentioning that when the error distribution is the Cauchy distribution, both the LS method and the MAVE method supply unreasonable results, and the SEs are very large. This clearly demonstrates the robustness of our

X. Wang, X. Shi / Computational Statistics and Data Analysis 80 (2014) 140–152

147

Table 2 Scenario 2 simulation results. True values of α and β are shown in parenthesis and the sample size n = 300. Censoring

ε

Method

α2 (=2)

β1 (=0.557)

β2 (=0.557)

β3 (=0.557)

20%

N (0, 0.25)

Huber LS MAVE

0.999(0.117) 0.990(0.105) 0.963(0.191)

1.998(0.090) 2.007(0.086) 1.974(0.183)

0.585(0.048) 0.576(0.016) 0.576(0.070)

0.569(0.048) 0.576(0.015) 0.581(0.069)

0.578(0.049) 0.579(0.016) 0.561(0.086)

t (2)

Huber LS MAVE

0.988(0.360) 1.163(3.367) 2.294(8.489)

1.881(0.465) 1.911(3.025) 3.047(8.337)

0.681(0.191) 0.493(0.246) 0.509(0.261)

0.490(0.320) 0.343(0.397) 0.531(0.229)

0.543(0.280) 0.274(0.593) 0.531(0.243)

Cauchy

Huber LS MAVE

1.042(0.687) 0.611(4.801) −39.34(398.9)

2.033(0.747) 1.274(5.004) 42.16(387.5)

0.715(0.229) 0.382(0.280) 0.550(0.251)

0.511(0.315) 0.245(0.411) 0.427(0.270)

0.478(0.397) 0.139(0.732) 0.561(0.258)

N (0, 0.25)

Huber LS MAVE

1.023(0.192) 0.981(0.248) 0.978(0.249)

2.055(0.173) 2.009(0.189) 2.024(0.234)

0.588(0.072) 0.561(0.083) 0.554(0.123)

0.580(0.051) 0.542(0.201) 0.544(0.136)

0.563(0.071) 0.537(0.239) 0.593(0.113)

t (2)

Huber LS MAVE

1.042(0.429) 1.021(4.306) 1.806(11.47)

2.080(0.536) 1.433(4.610) 2.829(6.635)

0.682(0.225) 0.416(0.265) 0.496(0.256)

0.521(0.295) 0.323(0.407) 0.520(0.273)

0.513(0.282) 0.281(0.646) 0.524(0.269)

Cauchy

Huber LS MAVE

0.749(1.239) 0.028(5.308) −51.26(454.8)

2.154(1.276) 0.917(5.078) 50.62(442.5)

0.880(0.255) 0.423(0.305) 0.545(0.309)

0.277(0.417) 0.210(0.413) 0.458(0.267)

0.385(0.368) 0.097(0.717) 0.494(0.293)

40%

α1 (=1)

Huber method. As the censoring proportion increases, the estimation performance deteriorates. And the other two methods have similar trends. 4. Analysis on real data In this section, the randomly censored survival partially linear single-index model is applied to two real datasets to evaluate the estimation performance. All two real data can be found in R package survival, and their censoring rates run from a low to medium level. 4.1. Primary biliary cirrhosis (PBC) data The PBC data were collected in the Mayo Clinic trial in primary biliary cirrhosis of liver conducted between 1974 and 1984 and listed in Appendix D of Fleming and Harrington (2005). In this study, 312 out of 424 patients who agreed to participate in the randomized trial are eligible for the analysis. For each patient, clinical, biochemical, serological and histological features are gathered. In all, 125 patients died before the end of follow-up. We consider the dependence of the survival time on seventeen explanatory variables: continuous variables are age (in years), bili (serum bilirubin in mg/dl), chol (serum cholesterol in mg/dl), albumin (serum albumin in g/dl), copper (urine copper in 1 g/day), alk (alkaline phosphatase in units/liter), sgot (liver enzyme in units/ml), trig (triglycerides in mg/dl), plat (platelets per cubic ml/1000), and prot (standard prothrombin time in seconds); categorical variables are trt (treatment, 1 for control and 2 for treatment), stage (histological stage of disease), sex (0 denotes male and 1 denotes female), ascites (presence of ascites, 0 denotes absence of ascites and 1 denotes presence of ascites), hepatom (presence of hepatomegaly or enlarged liver, 0 denotes absence of hepatomegaly and 1 denotes presence of hepatomegaly), spiders (blood vessel malformations in the skin, 0 denotes absence of spiders and 1 denotes presence of spiders), and edema (0 denotes no edema, 0.5 denotes untreated or successfully treated edema and 1 denotes unsuccessfully treated edema). We take the logarithm of the survival time (in days) and use the survival partially linear single-index model to study the relationship between the survival time and explanatory variables. We focus on the 276 observations without missing values and the censoring rate is 60%. All seventeen variables are considered as important explanatory variables and included in the model, and all continuous variables form the single-index. Categorical variables stage and edema are dealt with dummy variables. Table 3 shows the estimated coefficients by three methods, with corresponding standard errors in parenthesis. The nonparametric bootstrap approach is applied for the variance estimation and 100 bootstrap replicates are used. From the results, we can see that three methods have similar coefficients estimation results under the censored survival partially linear single-index model. In addition, significant variables are bili, albumin, edema(2) and stage(1) by the results according to the asymptotically normal property with respect to the significance level 0.05. This provides strong evidence that variables bili (serum bilirubin) and albumin (serum albumin) are highly significant risk factors for the times to death via a nonlinear relationship, and edema(2) (treated or untreated edema) and stage(1) (low rate of histologic progression of disease) affects the survival times linearly. Those variables with p-values larger than 0.9 seem not affect the survival times. ˆ against XT βˆ , where the Fig. 1 shows the robust estimate of the single-index function g with the scatter plot of VGˆ − ZT α crosses stand for right censored data and the circles stand for the complete data. The left tail looks like a quadratic curve, the middle part behaves linear and the right tail looks like a concave curve, hence it is reasonable to believe that the continuous risk factors form a nonlinear structure to affect the survival time. Since no outliers apparently exposed by Fig. 1 and the

148

X. Wang, X. Shi / Computational Statistics and Data Analysis 80 (2014) 140–152 Table 3 PBC data. Estimated coefficients and standard errors. Covariate

Huber

P-value

LS

MAVE

Age Bili Chol Albumin Copper Alk Sgot Trig Plat Prot Trt Sex Ascites Hepatom Spiders Edema(1) Edema(2) Stage(1) Stage(2) Stage(3)

0.033 (0.073) 0.275 (0.117) −0.024 (0.183) −0.415 (0.190) 0.130 (0.197) −0.343 (0.186) 0.029 (0.191) 0.024 (0.247) 0.117 (0.280) −0.775 (0.774) 0.002 (0.115) 0.027 (0.271) −0.645 (0.427) −0.031 (0.158) −0.146 (0.195) −0.296 (0.279) −0.901 (0.452) 0.856 (0.342) 0.284 (0.240) 0.202 (0.172)

0.651 0.019 0.896 0.029 0.509 0.065 0.879 0.923 0.676 0.317 0.986 0.921 0.131 0.844 0.454 0.289 0.046 0.012 0.237 0.240

0.039 (0.036) 0.307 (0.092) −0.003 (0.125) −0.339 (0.154) 0.034 (0.114) −0.359 (0.142) 0.072 (0.108) −0.024 (0.196) 0.142 (0.197) −0.796 (0.758) 0.020 (0.123) 0.066 (0.199) −0.618 (0.438) −0.090 (0.150) −0.189 (0.170) −0.263 (0.223) −0.868 (0.387) 0.712 (0.281) 0.296 (0.205) 0.190 (0.156)

0.028 (0.044) 0.291 (0.072) −0.035 (0.103) −0.424 (0.132) 0.158 (0.104) −0.324 (0.137) 0.032 (0.086) 0.031 (0.155) 0.102 (0.180) −0.769 (0.646) 0.024 (0.115) −0.003 (0.191) −0.548 (0.341) −0.003 (0.136) −0.150 (0.162) −0.256 (0.214) −0.853 (0.391) 0.542 (0.292) 0.223 (0.200) 0.134 (0.145)

ˆ versus XT βˆ for the PBC data. Fig. 1. The estimated curve of g with the scatter plot of VGˆ − ZT α

algorithm of the Huber estimator is more complicated than LS and MAVE, the standard errors of Huber estimator in Table 3 are relative larger. But they are of the same order overall.

4.2. NCCTG lung cancer data The data came from the survival analysis for patients with advanced lung cancer from the North Central Cancer Treatment Group (Loprinzi et al., 1994). Performance scores are used to rate how well the patient can perform usual daily activities. There are 167 available observations in this trial and the censoring rate is 28%. We study the dependence of the censoring survival time (in days) on seven explanatory variables: continuous variables are age (in years), ph.karno (Karnofsky performance score, bad = 0 to good = 100, rated by physician), pat.karno (Karnofsky performance score as rated by patient), meal.cal (calories consumed at meals), and wt.loss (weight loss in last six months); categorical variables are sex (male = 1, female = 2) and ph.ecog (ECOG performance score). The NCCTG data is analyzed by the same setting of Section 4.1. Table 4 collects the estimated coefficients with their bootstrap standard errors by three methods. Significant variables are sex and ph.ecog(2). Fig. 2 shows the estimated singleindex function g with data, and its pattern looks like nonlinear.

X. Wang, X. Shi / Computational Statistics and Data Analysis 80 (2014) 140–152

149

Table 4 NCCTG lung cancer data. Estimated coefficients and standard errors. Covariate

Huber

P-value

LS

MAVE

Age Ph.karno Pat.karno Meal.cal Wt.loss Sex Ph.ecog(1) Ph.ecog(2)

0.348(0.284) 0.896(0.657) −0.231(0.459) −0.128(0.171) −0.079(0.227) 0.374(0.152) −0.247(0.216) −0.887(0.296)

0.220 0.173 0.615 0.454 0.728 0.014 0.253 0.003

0.306(0.299) 0.922(0.656) −0.121(0.457) −0.203(0.158) −0.012(0.232) 0.384(0.164) −0.203(0.214) −0.902(0.319)

0.349(0.287) 0.894(0.691) −0.237(0.468) −0.101(0.149) −0.113(0.178) 0.392(0.140) −0.142(0.193) −0.926(0.298)

ˆ versus XT βˆ for the NCCTG data. Fig. 2. The estimated curve of g with the scatter plot of VGˆ − ZT α

5. Conclusion and discussion Analysis of censored failure time data with multiple explanatory variables poses an important practical problem. In this article, we investigate the partially linear single-index model version for the semiparametric AFT model with possible outliers. The proposed M-estimation method achieves robustness by choosing the Huber type loss function when the response variable is subject to random censorship. Although the nonparametric single-index function is involved, the profile estimation procedure facilitates the parametric estimators via the local linear regression technique, and their asymptotic properties are established. Simulation studies and two real data examples illustrate that the proposed method can effectively deal with right censored data. Several questions remain unanswered for our proposed method based on the transformed response. In particular, we assume that the censoring variable C and (X, Z, Y ) are independent, and it is desirable to assume that C is independent of Y given the explanatory variables (X, Z), as the Buckley–James and rank based estimators. In addition, many investigators may care about the case with medium to high dimensional explanatory variables. It is far more interesting to conduct the variable selection for multiple censored regression models with an appropriate penalized robust objective function or a dimension deduction method. Acknowledgments We are grateful to the editor, an associate editor and two referees for reading the paper very carefully and their thoughtful and constructive comments. The research work of Wang is supported by the National Natural Science Foundation of China grant 11101063, 61173103 and 11371077. Appendix We give all proofs of main theorems and Remark 1 in this part. Firstly, some mild conditions will be necessary to establish the asymptotic properties.

150

(A1) (A2) (A3) (A4) (A5) (A6)

X. Wang, X. Shi / Computational Statistics and Data Analysis 80 (2014) 140–152

The function ρ(·) has a bounded and continuous second derivative. The functions g (·), hX (·) = E (X|XT β = ·) and hZ (·) = E (Z|XT β = ·) have bounded and continuous second derivatives. The density function fXT β (xT β) is continuous differentiable and bounded away from 0 and ∞. E |Y |m ≤ ∞ for some m ≥ 3. The conditional variance of Y given (X, Z) is bounded  away from 0 and ∞. The kernel K (·) has a bounded support with a bounded derivative, and satisfies u2 K (u)du ̸= 0. Write

 A=

JT g ′ (XT β0 )X



Z

,

B = A − E (A|XT β0 ),

ϵG = VG − g (XT β0 ) − ZT α0 ,

and both E {B⊗2 ψ ′ (ϵG )} and E {(Bψ(ϵG ))⊗2 } are positively definite.

V

(A7) Let ξ (G, s, V ) = [

s

{1 − G(t −)}−1 dt ]I [s < V ], η(s) =

E [Bψ ′ (ϵG ){(1+λ)ξ (G,s,V )−λ1T1 }I [s
and denote

Ω (τF ) = Ω2 (τF ) + 2Ω1 (τF ) < ∞, τ τ dG(s) where Ω1 (τF ) = 0 F E {Bψ(ϵG )I [V ≥ s]}ηT (s) 1−G(s−) and Ω2 (τF ) = 0 F (η(s))⊗2 (1 − F (s−))dG(s). Theorem 1 can be proved by the same arguments in Ichimura (1993). Since the proof is straightforward, we do not present it here. Next, we present the proof of Theorem 2. Proof of Theorem 2. We demonstrate the asymptotic normality of θˆ by using a general result of Newey (1994). Denote U = XT β0 , mX (U ) = E (X|U ), mZ (U ) = E (Z|U ) and κ = g ′ (U )JT {X − mX (U )}. In addition, let

Ψ (mZ , g , κ, α, β, VG , X, Z) = ψ(VG − g − Z α) T



κ



Z − mZ (U )

.

(A.1)

For any given m∗Z , g ∗ and κ ∗ , define D(m∗Z − mZ , g ∗ − g , κ ∗ − κ, α, β, VG , X, Z) =

∂Ψ ∗ ∂Ψ ∗ ∂Ψ (m∗ − mZ ) + (g − g ) + (κ − κ), ∂ mZ Z ∂g ∂κ

where the partial derivatives are the Frechet partial derivatives. After the algebraic simplification, we have

  0 ∂Ψ T = ψ(VG − g − Z α) , ∂ mZ −1   ∂Ψ 1 = ψ(VG − g − ZT α) , ∂κ 0   ∂Ψ κ = ψ ′ (VG − g − ZT α) . ∂g Z − mZ (U ) Note that these partial derivatives are zero. Accordingly,

∥Ψ (m∗Z , g ∗ , κ ∗ , α, β, VG , X, Z) − Ψ (mZ , g , κ, α, β, VG , X, Z) − D(m∗Z − mZ , g ∗ − g , κ ∗ − κ, α, β, VG , X, Z)∥ = O(∥m∗Z − mZ ∥2 + ∥g ∗ − g ∥2 + ∥κ ∗ − κ∥2 ), where ∥ · ∥ denotes the Sobolev norm, that is, the supremum norm of the function itself, as well as its derivatives. This equation is Newey’s Assumption 5.1(i). It is also noteworthy that Assumption 5.2 holds by the expression of D(·, ·, ·, α, β, VG , X, Z). Moreover, the result ED(m∗Z − mZ , g ∗ − g , κ ∗ − κ, α, β, VG , X, Z) = 0

(A.2)

leads to Newey’s Assumption 5.3. In addition to Newey’s assumptions mentioned above, we need to verify one more assumption to employ his result. Applying similar techniques to those used in Fan and Gijbels (1996), we obtain the following equations, which hold uniformly, gˆ (u) − g (u) = op (n−1/4 ),

ˆ Z (u) − mZ (u) = op (n−1/4 ), m

gˆ ′ (u) − g ′ (u) = op (n−1/4 ),

ˆ Z (u) − mZ (u) = op (n−1/4 ). m

These results imply that κˆ − κ = op (n−1/4 ). Thus, Newey’s Assumption 5.1(ii) holds. After examining Newey’s Assumption 5.1–5.3, we apply his Lemma 5.1 and find that θˆ has the same limit distribution as the solution to the equation 0=

n  i =1

Ψ (mZ , g , κ, α, β, ViG , Xi , Zi ).

(A.3)

X. Wang, X. Shi / Computational Statistics and Data Analysis 80 (2014) 140–152

151

Furthermore, it is easy to show that the solution to (A.3) has the same limit distribution as described in the statement of Theorem 2. In detail,

 S =



2 JT E {ψ ′ (ϵG )g ′ (XT β0 ) X⊗ 0 }J

2

ZT0 } JT E {ψ ′ (ϵG )g ′ (XT β0 ) X0

XT0 }J E {ψ ′ (ϵG )g ′ (XT β0 ) Z0

2 E {ψ ′ (ϵG ) Z⊗ 0 }

= E {B⊗2 ψ ′ (ϵG )},  2 2 JT E {ψ 2 (ϵG )g ′ (XT β0 ) X⊗ 0 }J R = X T }J E {ψ 2 (ϵG )g ′ (XT β0 ) Z0

JT E {ψ 2 (ϵG )g ′ (XT β0 ) X0 ZT0 }



2 E (ψ 2 (ϵG ) Z⊗ 0 )

0

= E {(Bψ(ϵG ))⊗2 }. Hence, we complete the proof when the censoring distribution G is given.

ˆ in all terms associated with G, we obtain the following When G is unknown and replaced by the Kaplan–Meier estimator G representation, Sn1/2 (θˆ

(1)

n  (Ai − E {Ai |Ui }ψ(ϵiGˆ )) + op (1),

− θ (01) ) = n−1/2

(A.4)

i=1

where ϵiGˆ = ViGˆ − g (XTi β0 ) − ZTi α0 and Ai − E {Ai |Ui } is a vector with p + q elements. Let Bi = Ai − E {Ai |Ui } and suppose n its elements are Bi,l = Bi,l (Xi , Ui ), l = 1, 2, . . . , p + q. We consider n−1/2 i=1 Bi,l ψ(ϵiGˆ ), l = 1, 2, . . . , p + q. Noticing

 Vi

ˆ , s, Vi ) = [ Bi,l ψ(ϵiGˆ ) = Bi,l ψ(ϵiG ) + Bi,l ψ ′ (ϵiG )(ϵiGˆ − ϵiG ) + op (1) and ξ (G Lu and Cheng (2007), we get Bi,l ψ ′ (ϵiG )(ϵiGˆ − ϵiG ) = n−1

n   j=1

s

{1 − Gˆ (t −)}−1 dt ]I [s < Vi ], using the result of



ˆ , s, Vi ) − λ∆i T ˆ }I [s < Vi ] Bi,l ψ ′ (ϵiG ){(1 + λ)ξ (G 1iG 0

1 − G(sˆ−) 1 1 − G(s) Y¯n



dMj (s) ,

hence (1)

− θ (01) ) → Np+q−1 (0, S−1 (R + Ω (τF ))S−1 ), τ τ dG(s) where Ω (τF ) = Ω2 (τF ) − 2Ω1 (τF ), Ω1 (τF ) = − 0 F E {Bψ(ϵG )I [V ≥ s]}ηT (s) 1−G(s−) , Ω2 (τF ) = 0 F (η(s))⊗2 (1 − F (s−)) dG(s). Thus, we complete the proof.  n1/2 (θˆ

d

Proof of Remark 1. When we choose the loss function ρ(u) = u2 , then ψ(u) = 2u. Use the result of Theorem 2, it is obvious τ E [B{(1+λ)ξ (G,s,V )−λ1T1 }I [s
ˆ in all terms associated with G. Note that dG(s). When G is unknown, it is replaced by the Kaplan–Meier estimator G E [B(g (X T β) + Z T α)I [V ≥ s]] = 0,

thus, E{

T Mn1 Mn2

(v)} =

v



 E

n i=1

0 p

v





n 1

 Bi ϵiG

η1T (s)dMi (s)

E {B1 [V1G − (g (X1T β) + Z1T α)]}η1T (s)dM1 (s)

0

v

 = − 0



v

= −

E {B1 [V1G − (g (X1T β) + Z1T α)]I [V1 ≥ s]}η1T (s)(1 − 1ΛG (s))dΛG (s) E {B1 V1G I [V ≥ s]}η1T (s)

0

v

 = −

dG(s) 1 − G(s−)

(η1T (s))⊗2 (1 − F (s−))dG(s).

0

Together with T Cov(Mn2 (τF ), Mn2 (τF )) =

τF



(η1 (s))⊗2 (1 − F (s−))dG(s),

0

we obtain the result presented in Remark 1.



(A.5)

152

X. Wang, X. Shi / Computational Statistics and Data Analysis 80 (2014) 140–152

References Buckley, J., James, I.A.N., 1979. Linear regression with censored data. Biometrika 66 (3), 429–436. Cui, X., Härdle, W.K., Zhu, L., 2011. The EFM approach for single-index models. Ann. Statist. 39 (3), 1658–1688. Engle, R.F., Granger, C.W.J., Rice, J., Weiss, A., 1986. Semiparametric estimates of the relation between weather and electricity sales. J. Amer. Statist. Assoc. 81 (394), 310–320. Fan, J., Gijbels, I., 1994. Censored regression: local linear approximations and their applications. J. Amer. Statist. Assoc. 89 (426), 560–570. Fan, J., Gijbels, I., 1996. Local Polynomial Modelling and its Applications. Chapman & Hall, London. Fleming, T.R., Harrington, D.P., 2005. Counting Processes and Survival Analysis, second ed. Wiley. Hampel, F., Hennig, C., Ronchetti, E., 2011. A smoothing principle for the Huber and other location M-estimators. Comput. Statist. Data Anal. 55 (1), 324–337. Härdle, W., Hall, P., Ichimura, H., 1993. Optimal smoothing in single-index models. Ann. Statist. 21 (1), 157–178. Härdle, W., Liang, H., Gao, J., 2000. Partially Linear Models. Physica-Verlag, Heidelberg. Huber, P.J., 1964. Robust estimation of a location parameter. Ann. Math. Statist. 35 (1), 73–101. Ichimura, H., 1993. Semiparametric least squares (SLS) and weighted SLS estimation of single-index models. J. Econometrics 58 (1–2), 71–120. Jin, Z., Lin, D.Y., Wei, L.J., Ying, Z., 2003. Rank-based inference for the accelerated failure time model. Biometrika 90 (2), 341–353. Jin, Z., Lin, D.Y., Ying, Z., 2006. On least-squares regression with censored data. Biometrika 93 (1), 147–161. Kalbfleisch, J.D., Prentice, R.L., 2001. The Statistical Analysis of Failure Time Data, second ed. Wiley, Hoboken. Koul, H., Susarla, V., Van Ryzin, J., 1981. Regression analysis with randomly right-censored data. Ann. Statist. 9 (6), 1276–1288. Leiva, V., Barros, M., Paula, G.A., Galea, M., 2007. Influence diagnostics in log-Birnbaum–Saunders regression models with censored data. Comput. Statist. Data Anal. 51 (12), 5694–5707. Leurgans, S., 1987. Linear models, random censoring and synthetic data. Biometrika 74 (2), 301–309. Loprinzi, C.L., Laurie, J.A., Wieand, H.S., Krook, J.E., Novotny, P.J., Kugler, J.W., Bartel, J., Law, M., Bateman, M., Klatt, N.E., 1994. Prospective evaluation of prognostic variables from patient-completed questionnaires, North Central Cancer Treatment Group. J. Clin. Oncol. 12 (3), 601–607. Lu, X., Chen, G., Singh, R.S., Song, P.X.-K., 2006. A class of partially linear single-index survival models. Canad. J. Statist. 34 (1), 97–112. Lu, X., Cheng, T.-L., 2007. Randomly censored partially linear single-index models. J. Multivariate Anal. 98 (10), 1895–1922. Miller, R.G., 1976. Least squares regression with censored data. Biometrika 63 (3), 449–464. Newey, W.K., 1994. The asymptotic variance of semiparametric estimators. Econometrica 62 (6), 1349–1382. Peng, L., Huang, Y., 2008. Survival analysis with quantile regression models. J. Amer. Statist. Assoc. 103 (482), 637–649. Portnoy, S., 2003. Censored regression quantiles. J. Amer. Statist. Assoc. 98 (464), 1001–1012. Powell, J.L., 1984. Least absolute deviations estimation for the censored regression model. J. Econometrics 25 (3), 303–325. Ritov, Y., 1990. Estimation in a linear regression model with censored data. Ann. Statist. 18 (1), 303–328. Shang, S., Liu, M., Zeleniuch-Jacquotte, A., Clendenen, T.V., Krogh, V., Hallmans, G., Lu, W., 2013. Partially linear single index Cox regression model in nested case-control studies. Comput. Statist. Data Anal. 67, 199–212. Tsiatis, A.A., 1990. Estimating regression parameters using linear rank tests for censored data. Ann. Statist. 18 (1), 354–372. Wang, J.-L., Xue, L., Zhu, L., Chong, Y.S., 2010. Estimation for a partial-linear single-index model. Ann. Statist. 38 (1), 246–274. Xia, Y., Härdle, W., 2006. Semi-parametric estimation of partially linear single-index models. J. Multivariate Anal. 97 (5), 1162–1184. Ying, Z., 1993. A large sample study of rank estimation for censored regression data. Ann. Statist. 21 (1), 76–99. Ying, Z., Jung, S.H., Wei, L.J., 1995. Survival analysis with median regression models. J. Amer. Statist. Assoc. 90 (429), 178–184. Zhou, M., 1992. Asymptotic normality of the synthetic data regression estimator for censored survival data. Ann. Statist. 20 (2), 1002–1021. Zhou, L., 2006. A simple censored median regression estimator. Statist. Sinica 16 (3), 1043–1058. Zou, Y., Zhang, J., Qin, G., 2011. A semiparametric accelerated failure time partial linear model and its application to breast cancer. Comput. Statist. Data Anal. 55 (3), 1479–1487.