- Email: [email protected]

Contents lists available at ScienceDirect

Statistics and Probability Letters journal homepage: www.elsevier.com/locate/stapro

On doubly robust estimation for logistic partially linear models Zhiqiang Tan Department of Statistics, Rutgers University, 110 Frelinghuysen Road, Piscataway, NJ 08854, United States

article

info

a b s t r a c t Consider a logistic partially linear model, in which the logit of the mean of a binary response is related to a linear function of some covariates and a nonparametric function of other covariates. We derive simple, doubly robust estimators of coefficients in the linear component. Such estimators remain consistent if either a nuisance model is correctly specified for the nonparametric component, or another nuisance model is correctly specified for the means of the covariates of interest given other covariates and the response at a fixed value. © 2019 Elsevier B.V. All rights reserved.

Article history: Received 10 April 2019 Received in revised form 2 August 2019 Accepted 3 August 2019 Available online 14 August 2019 Keywords: Double robustness Local efficiency Logistic models Odds ratio Partially linear models Semiparametric models

1. Introduction Generalized partially linear models are a semiparametric extension of generalized linear models (McCullagh and Nelder, 1989), such that the conditional mean of a response variable Y is related to a linear function of some covariates Z and a smooth function of other covariates X . Let {(Yi , Zi , Xi ) : i = 1, . . . , n} be independent and identically distributed observations from the joint distribution of (Y , Z , X ). Consider the following model E(Y |Z , X ) = Ψ {β T Z + g(X )},

(1)

where Ψ (·) is an inverse link function, β is a vector of unknown parameters, g(·) is an unknown, smooth function. Estimation in such models has been studied in such models in at least two approaches. In one approach, theory and methods have been developed in the case where X is low-dimensional (for example, a scalar) and kernel or spline smoothing is used to estimate g(·) at suitable rates of convergence (e.g., Speckman, 1988; Severini and Staniswalis, 1994). In another approach with X relatively high-dimensional, doubly robust methods have been proposed to obtain estimators of β which remain consistent and asymptotically normal at rate n−1/2 if either a parametric model for g(·) or another parametric model about, for example, E(Z |X ) is correctly specified (Robins and Rotnitzky, 2001; Tchetgen Tchetgen et al., 2010). In this note, we are concerned with model (1) with a binary response Y (taking value 0 or 1) and a logistic link, hence a logistic partially linear model: P(Y = 1|Z , X ) = expit{β T Z + g(X )},

(2)

where expit(c) = {1 + exp(−c)} . We provide a new class of doubly robust estimators of β which remain consistent and asymptotically normal at rate n−1/2 if either a parametric model for g(·) or a parametric model for E(Z |Y = 0, X ) is correctly specified, under mild regularity conditions but without additional parametric or smoothness restriction. −1

E-mail address: [email protected] https://doi.org/10.1016/j.spl.2019.108577 0167-7152/© 2019 Elsevier B.V. All rights reserved.

2

Z. Tan / Statistics and Probability Letters 155 (2019) 108577

Previously, doubly robust estimators of β were derived in model (1) with respect to parametric models for g(·) and E(Z |X ), in the case of an identity link, Ψ (c) = c, or a log link, Ψ (c) = exp(c) (Robins and Rotnitzky, 2001). For the logistic link, however, no doubly robust estimator of β can be constructed in this manner with respect to parametric models about g(·) and E(Z |X ) (Tchetgen Tchetgen et al., 2010). In fact, doubly robust estimators of β in model (2) were obtained with respect to parametric models about g(·) and p(z |Y = 0, X ), the conditional density of Z given Y = 0 and X (Chen, 2007; Tchetgen Tchetgen et al., 2010). By comparison, our result in general allows doubly robust estimation for β in model (2) with respect to more flexible models about the conditional mean E(Z |Y = 0, X ), in leaving other aspects of the conditional density p(z |Y = 0, X ) unspecified. In the special case of binary Z , our class of doubly robust estimators of β is equivalent to that in Tchetgen Tchetgen et al. (2010), but involves use of the parametric model for P(Z = 1|Y = 0, X ) in a more direct manner. We also propose two specific doubly robust estimators of β in model (2) based on efficiency considerations. The first estimator requires numerical evaluation of expectations under a model for p(z |Y = 0, X ) beyond the conditional mean E(Z |Y = 0, X ) unless Z is binary, but can be shown to achieve the minimum asymptotic variance among our class of doubly robust estimators when both models for g(·) and p(z |Y = 0, X ) are correctly specified. Compared with the locally efficient, doubly robust estimators in Tchetgen Tchetgen et al. (2010), this estimator remains consistent if the model for p(z |Y = 0, X ) is misspecified but the less restrictive model for E(Z |Y = 0, X ) is correctly specified. Our second estimator is numerically and statistically simpler than our first one: it does not involve numerical integration or a parametric specification of the conditional density p(z |Y = 0, X ), and can achieve a similar asymptotic variance as our first estimator, especially when the true value of β is close to 0. 2. Doubly robust estimation For a semiparametric model, doubly robust estimation can often be derived by studying the orthogonal complement of the nuisance tangent space (Robins and Rotnitzky, 2001). Denote by L2 the Hilbert space of dim(β ) × 1 mean-zero functions q ≡ q(Y , Z , X ), with the inner product defined as E(qT1 q2 ). Denote ε ∗ = Y − π ∗ (Z , X ), π ∗ ≡ π ∗ (Z , X ) = P(Y = 1|Z , X ), and by β ∗ and g ∗ ≡ g ∗ (X ) the truth of β and g(X ). For model (2), the orthogonal complement of the nuisance tangent space is known to be (Bickel et al., 1993; Robins and Rotnitzky, 2001)

) } { ( E [hπ ∗ (1 − π ∗ )|X ] : h ≡ h(Z , X ) unrestricted ∩ L2 . Λ⊥ = ε ∗ h − E [π ∗ (1 − π ∗ )|X ]

(3)

Our first result is a reformulation of Λ⊥ as follows. See the Appendix for all proofs. Assume that π ∗ (Z , X ) ∈ (0, 1) almost surely. The space Λ⊥ can be equivalently expressed as

Proposition 1.

{

(

E [hπ ∗ |Y = 0, X ]

)

}

Λ = ε h− : h ≡ h(Z , X ) unrestricted ∩ L2 E [π ∗ |Y = 0, X ] { } = ζ0∗ (u − E [u|Y = 0, X ]) : u ≡ u(Z , X ) unrestricted ∩ L2 , ⊥

∗

(4) (5)

where u ≡ u(Z , X ) is a dim(β ) × 1 function and

ζ0∗ =

1 − π∗ ε∗ ∗T ∗ =Y − (1 − Y ) = Y e−β Z −g (X ) − (1 − Y ). ∗ π π∗

Our reformulation (5) suggests the following set of doubly robust estimating functions. Let g(X ; α ) be a parametric model for g ∗ (X ) and, independently, f (X ; γ ) be a parametric model for f ∗ (X ) ≡ E(Z |Y = 0, X ). The two functions g ∗ (X ) and E(Z |Y = 0, X ) are variation independent, because g ∗ (X ) and p(z |Y = 0, X ) are variation independent (Chen, 2007). For a dim(β ) × dim(β ) function φ (X ), define

{

r(Y , Z , X ; β, α, γ , φ ) = Y e−β

T Z −g(X ;α )

} − (1 − Y ) φ (X ){Z − f (X ; γ )},

(6)

by letting u(Z , X ) = φ (X )Z in (5). Then r(Y , Z , X ; β, α, γ , φ ) is an unbiased estimating function for β ∗ if either model g(X ; α ) or f (X ; γ ) is correctly specified. Proposition 2.

For α and γ such that either g ∗ (X ) = g(X ; α ) or f ∗ (X ) = f (X ; γ ) but not necessarily both, it holds that

E {r(Y , Z , X ; β ∗ , α, γ , φ )} = 0, provided that the above expectation exists. Various doubly robust estimators can be constructed through (6). of α , for example, ∑nIn general, let αˆ be an estimator the maximum likelihood estimator, which satisfies αˆ − α¯ = n−1 i=1 s1 (Yi , Zi , Xi ; α; ¯ β¯ ) + op (n−1/2 ) for some constant (α, ¯ β¯ ) and influence function s1 (·) such that g(X ; α¯ ) = g ∗ (X ) if model g(X ; α ) is correctly ∑nspecified. Let γˆ be an estimator of γ , for example, the least-squares or related estimator, which satisfies γˆ − γ¯ = n−1 i=1 s2 (Yi , Zi , Xi ; γ¯ ) +op (n−1/2 ) for

Z. Tan / Statistics and Probability Letters 155 (2019) 108577

3

some constant γ¯ and influence function s2 (·) such that f (X ; γ¯ ) = f ∗ (X ) if model f (X ; γ ) is correctly specified. Define an estimator βˆ (φ ) as a solution to n 1∑

n

r(Yi , Zi , Xi ; β, α, ˆ γˆ , φ ) = 0.

i=1

Under suitable regularity conditions (e.g., Manski, 1988), it can be shown that if either model g(X ; α ) or f (x; γ ) is correctly specified, then

βˆ (φ ) − β ∗ =

n H −1 ∑{

n

r(Yi , Zi , Xi ; β ∗ , α, ¯ γ¯ , φ )

i=1

} − B1 s1 (Yi , Zi , Xi ; α, ¯ β¯ ) − B2 s2 (Yi , Zi , Xi ; γ¯ ) + op (n−1/2 ),

(7)

where H = E {∂ r(Y , Z , X ; β, α, ¯ γ¯ , φ )/∂β}|β=β ∗ , B1 = E {∂ r(Y , Z , X ; β ∗ , α, γ¯ , φ )/∂α}|α=α¯ , and B2 = E {∂ r(Y , Z , X ; β ∗ , α, ¯ γ, φ )/∂γ }|γ =γ¯ . The asymptotic variance of βˆ (φ ) can be estimated by using the sample variance of an estimated version of the influence function in (7). We now provide several remarks. First, there is a remarkable connection between estimating function (6) and calibrated estimation studied in Tan (2017) for fitting logistic regression. Estimating function (6) can be expressed as r(Y , Z , X ; β, α, γ , φ ) =

{

}

Y

π (Z , X ; β, α )

− 1 φ (X ){Z − f (X ; γ )},

(8)

where π (Z , X ; β, α ) = expit{β T Z + g(X ; α )}, representing the conditional probability P(Y = 1|Z , X ) under the conjunction of model (2) and model g(X ; α ). Therefore, our doubly robust estimating function involves the product of two ‘‘residuals’’, π −1 (Z , X ; β, α )Y − 1 and Z − f (X ; γ ). Similar products can also be found in previous doubly robust estimating functions for β in model (1) with the identity or log link (Robins and Rotnitzky, 2001). However, a notable feature in (8) is that the residual used from the model P(Y = 1|Z , X ) = π (Z , X ; β, α ) is π −1 (Z , X ; β, α )Y − 1, associated with the estimating equation for calibrated estimation (Tan, 2017), which in the case g(X ; α ) = α T X gives n 1∑

n

{

Yi

π (Zi , Xi ; β, α )

i=1

} − 1 (ZiT , XiT )T = 0.

The standard residual from logistic regression is Y − π (Z , X ; β, α ), associated with the score equation for maximum likelihood estimation, which in the case g(X ; α ) = α T X gives n 1∑

n

{Yi − π (Zi , Xi ; β, α )}(ZiT , XiT )T = 0.

i=1

In general, the estimating function {Y − π (Z , X ; β, α )}φ (X ){Z − f (X ; γ )} is not unbiased for β ∗ if model f (X ; γ ) is correctly specified but model g(X ; α ) is misspecified. Second, our results can also be used to shed light on the class of doubly robust estimators in Tchetgen Tchetgen et al. (2010), which are briefly reviewed as follows. For model (2), the conditional distribution of (Y , Z ) jointly given X can be determined as (Chen, 2007) p(y, z |X ) = c −1 (X )eβ

T (z −z )y 0

p(z |Y = 0, X )p(y|Z = z0 , X ),

(9) β T zy

where z0 is some fixed value (assumed to be 0 hereafter), c(X ) = e p(z |Y = 0, X )p(y|Z = z0 , X ) dµ(z , y), and the conditional densities p(z |Y = 0, X ) and p(y|Z = 0, X ) are variation-independent nuisance parameters. Let p† (y, z |X ) = † † † † p1 (y|X )p2 (z |X ) be some pre-specified conditional densities p1 (y|X ) and p2 (z |X ). By using (9), the ortho-complement of the nuisance tangent space in model (2) can be characterized as (Tchetgen Tchetgen et al., 2010)

∫

{

†

Λ = [d(Y , Z , X ) − d (Y , Z , X )] ⊥

p† (Y , Z |X ) p(Y , Z |X )

: d(Y , Z , X ) unrestricted

}

∩ L2 ,

(10)

where d† (Y , Z , X ) = E † (D|Z , X ) − E † (D|Y , X ) − E † (D|X ) for D ≡ d(Y , Z , X ), and E † (·|·, X ) denotes the expectation under p† (y, z |X ). It can be verified by direct calculation that the two sets on the right hand sides of (3) and (10) are equivalent to each other: each element in the right hand side of (10) can be expressed in the form of elements in the right hand side of (3), and vice versa. Let p(y|Z = 0, X ; α ) or equivalently g(X ; α ) be a parametric model for p(y|Z = 0, X ) or g ∗ (X ), and let p(z |Y = 0, X ; θ ) be a parametric model for p(z |Y = 0, X ). The estimating function based on (10) in Tchetgen Tchetgen et al. (2010) can be equivalently defined, based on (3), as

{ } E [hπ (1 − π )|X ; β, α, θ ] τ (Y , Z , X ; β, α, θ , h) = {Y − π (Z , X ; β, α )} h(Z , X ) − , E [π (1 − π )|X ; β, α, θ ]

(11)

4

Z. Tan / Statistics and Probability Letters 155 (2019) 108577

where h ≡ h(Z , X ) is a dim(β ) × 1 function associated with d(Y , Z , X ) in (10), π ≡ π (Z , X ; β, α ) = expit{β T Z + g(X ; α )} and E(·|X ; β, α, θ ) denotes the expectation under the law defined as (9), but evaluated at p(y|Z = 0, X ; α ) and p(z |Y = 0, X ; θ ). The estimating function (11) is doubly robust, i.e. unbiased for β ∗ if either model p(y|Z = 0, X ; α ) or p(z |Y = 0, X ; θ ) is correctly specified. Although (11) appears to be asymmetric in Y and Z , the double robustness of (11) follows from that of its equivalent version based on (10), as shown by exploiting the symmetry in Y and Z in Tchetgen Tchetgen et al. (2010). See also Tchetgen Tchetgen and Rotnitzky (2011) for an explicit demonstration of symmetry of (11) in Y and Z with h(Z , X ) = Z in the case of a binary Z . As an interesting implication of our reformulation (4) in Proposition 1, the estimating function (11) can be equivalently expressed as

{

τ (Y , Z , X ; β, α, θ, h) = {Y − π (Z , X ; β, α )} h(Z , X ) −

E [hπ |Y = 0, X ; θ]

}

E [π|Y = 0, X ; θ]

,

(12)

which involves the expectation E(·|Y = 0, X ; α ) under p(z |Y = 0, X ; θ ), instead of E(·|X ; β, α, θ ) under the law (9) evaluated at p(y|Z = 0, X ; α ) and p(z |Y = 0, X ; θ ). Therefore, (12) is computationally much simpler than (11) and its equivalent version based on (10). Moreover, the double robustness of (12) with respect to p(y|Z = 0, X ; α ) and p(z |Y = 0, X ; θ ) can be directly shown as in the Appendix, without invoking its equivalent version based on (10). Third, we compare our doubly robust estimating functions with those in Tchetgen Tchetgen et al. (2010). For a dim(β ) × 1 function u ≡ u(Z , X ), consider the estimating function

τ (Y , Z , X ; β, α, θ, u) = ′

{

Y

π (Z , X ; β, α )

}

− 1 {u(Z , X ) − E [u|Y = 0, X ; θ]} .

(13)

By our reformulation (5), the class of estimating functions τ ′ (Y , Z , X ; β, α, θ, h) over all possible choices of u(Z , X ) is equivalent to that of τ (Y , Z , X ; β, α, θ, h) over all possible choices of h(Z , X ) as used in Tchetgen Tchetgen et al. (2010). A subtle point is that the mapping between h(Z , X ) and u(Z , X ) depends on π (Z , X ; β, α ), but this does not affect our subsequent discussion. Similarly as (12), the estimating function (13) can be shown to be doubly robust for β ∗ with respect to models p(y|Z = 0, X ; α ) and p(z |Y = 0, X ; θ ). By comparing (6) and (13), we see that our estimating function (6) corresponds to a particular choice of estimating function (13) with u(Z , X ) = φ (X )Z , such that (6) depends only on a parametric model for the conditional expectation E(Z |Y = 0, X ), but not the conditional density p(z |Y = 0, X ). Therefore, our class of (6) is in general a strict subset of the class of (13) to achieve double robustness with respect to conditional mean models for E(Z |Y = 0, X ), except when Z is binary and hence the classes of (6) and (13) are equivalent. Fourth, there is a similar characterization of Λ⊥ as in Proposition 1, involving expectations under p(z |Y = 1, X ) instead of p(z |Y = 0, X ). By symmetry, it can be shown that

) } { ( E [h(1 − π ∗ )|Y = 1, X ] : h ≡ h(Z , X ) unrestricted ∩ L2 Λ⊥ = ε ∗ h − E [1 − π ∗ |Y = 1, X ] { } = ζ1∗ (u − E [u|Y = 1, X ]) : u ≡ u(Z , X ) unrestricted ∩ L2 , where u ≡ u(Z , X ) is a dim(β ) × 1 function and ζ1∗ = ε ∗ /(1 − π ∗ ) = Y − (1 − Y )eβ Z +g (X ) . Consequently, a similar estimating function as (6) can be derived such that it is doubly robust for β ∗ with respect to parametric models for g ∗ (X ) and E(Z |Y = 1, X ). ∗T

∗

3. Efficiency considerations For our class of doubly robust estimating functions (6), we study how to choose the function φ (X ) based on efficiency considerations. First, the following result gives the optimal choice of φ (X ) with correctly specified models g(X ; α ) and f (X ; γ ). Proposition 3. If both models g(X ; α ) and f (X ; γ ) are correctly specified for g ∗ (X ) and E(T |Y = 0, X ) respectively, then the optimal choice of φ (X ) in minimizing the asymptotic variance of βˆ (φ ) which admits asymptotic expansion (7) is

φopt (X ) = E [(Z − E(Z |Y = 0, X ))⊗2 |Y = 0, X ] × E −1 [π ∗−1 (Z , X )(Z − E(Z |Y = 0, X ))⊗2 |Y = 0, X ], where b⊗2 = bbT for a column vector b.

ˆ αˆ ) be From this result, it is straightforward to derive a locally-efficient like, doubly robust estimator for β ∗ . Let (β, the maximum likelihood estimator in the model π (Z , X ; β, α ) = expit{β T Z + g(X ; α )}, and θˆ be the maximum likelihood estimator in a conditional density model p(z |Y = 0, X ; θ ) as in (11) but compatible with model f (X ; γ ) for E(Z |Y = 0, X ), where θ = (γ , γ ′ ) and γ ′ is a variance parameter. Consider the estimator βˆ (φˆ opt ) with ˆ φˆ opt (X ) = E [(Z − f (X ; γˆ )⊗2 )|Y = 0, X ; θ] ˆ αˆ )(Z − f (X ; γˆ ))⊗2 |Y = 0, X ; θ]. ˆ × E −1 [π −1 (Z , X ; β,

Z. Tan / Statistics and Probability Letters 155 (2019) 108577

5

Then it can be shown under suitable regularity conditions that βˆ (φˆ opt ) is doubly robust, i.e. remains consistent for β ∗ if either model g(X ; α ) or f (X ; γ ) is correctly specified, and achieves the minimum asymptotic variance among all estimators βˆ (φ ) when both models g(X ; α ) and p(z |Y = 0, X ; θ ) including f (X ; γ ) are correctly specified. It is interesting to compare βˆ (φˆ opt ) with the locally efficient, doubly robust estimator for β ∗ in Tchetgen Tchetgen et al. ∑n ˆ h) = 0, ˆ θ, (2010). For a dim(β ) × 1 function h(Z , X ), define an estimator βˆ (h) as a solution to n−1 i=1 τ (Y , Z , X ; β, α, ˆ where (α, ˆ θ ) are maximum likelihood estimators as above or, without affecting our discussion here, profile maximum likelihood estimators as in Tchetgen Tchetgen et al. (2010). Then the optimal choice of h(Z , X ) in minimizing the asymptotic variance of βˆ (h) is heff (Z , X ) = ∂ (β T Z )/∂β = Z . In fact, the estimator βˆ (heff ) is locally efficient, i.e. achieving the semiparametric variance bound in model (2) when both models g(X ; α ) and p(z |Y = 0, X ; θ ) are correctly specified. Unless Z is binary, this semiparametric variance bound is in general strictly smaller than the asymptotic variance achieved by βˆ (φˆ opt ) when both models g(X ; α ) and p(z |Y = 0, X ; θ ) are correctly specified, because the class of estimating functions (6) is strictly a subset of the class (11), (12), or (13), as discussed in Section 2. In the case of a binary Z and hence θ = γ , the two estimators βˆ (φˆ opt ) and βˆ (heff ) are equivalent. On the other hand, βˆ (heff ) is doubly robust only with respect to models g(X ; α ) and p(z |Y = 0, X ; θ ), whereas βˆ (φˆ opt ) is doubly robust with respect to g(X ; α ) and f (X ; γ ) and hence remains consistent for β ∗ if model p(z |Y = 0, X ; θ ) is misspecified but the less restrictive model f (X ; γ ) for E(Z |Y = 0, X ) is correctly specified. Evaluation of the function φˆ opt (X ) and hence the estimator βˆ (φˆ opt ) in general requires cumbersome numerical integration with respect to the density p(z |Y = 0, X ; θˆ ). For computational simplicity, consider the estimator βˆ (φsimp ) with scalar φsimp (X ) = P(Y = 1|Z = 0, X ; αˆ ) = expit{g(X ; αˆ )}. The corresponding estimating function can be shown to become r(Y , Z , X ; β, α, ˆ γˆ , φsimp ) =

Y e−β

− (1 − Y )eg(X ;αˆ ) {Z − f (X ; γˆ )}. 1 + eg(X ;αˆ )

TZ

(14)

The particular choice φsimp (X ) can be motivated by the fact that if the true β ∗ = 0 then φopt (X ) = expit{g ∗ (X )}. Then βˆ (φsimp ) is nearly as efficient as βˆ (φˆ opt ) and, by similar reasoning, also βˆ (heff ) whenever β ∗ is close to 0. This is analogous to how the easy-to-compute estimator is related to the locally efficient estimator βˆ (heff ) in Tchetgen Tchetgen et al. (2010, Section 4). Moreover, the estimating function 3 can be equivalently expressed as r(Y , Z , X ; β, α, ˆ γˆ , φsimp ) = e−β

T ZY

[Y − expit{g(X ; αˆ )}]{Z − f (X ; γˆ )},

which, in the case of a binary Z , coincides with the estimating function underlying the closed-form estimator for β ∗ in Tchetgen Tchetgen (2013). 4. Conclusion We derive simple, doubly robust estimators of coefficients for the covariates in the linear component in a logistic partially linear model. Such estimators remain consistent if either a nuisance model is correctly specified for the nonparametric component of the partially linear model, or a conditional mean model is correctly specified for the covariates of interest given other covariates and the response at a fixed value. These estimators can be useful in conventional settings with a limited number of covariates. Moreover, there have been various works exploiting doubly robust estimating functions to obtain valid inferences in high-dimensional problems (e.g., Farrell, 2015; Chernozhukov et al., 2018; Tan, 2018). Our estimating functions can potentially be employed to achieve similar properties in high-dimensional settings. Appendix Proof of Proposition 1. First, we show that for any h ≡ h(Z , X ), E [hπ ∗ (1 − π ∗ )|X ] = P(Y = 0|X )E [hπ ∗ |Y = 0, X ]. This follows because E [hπ ∗ (1 − π ∗ )|X ] = E [hπ ∗ 1{Y = expectations and then the law of total probability. Then to {ε ∗ hc : hc ≡ hc (Z , X ) satisfying E [hc π ∗ |Y = 0, X ] uc (Z , X ) satisfying E [uc |Y = 0, X ] = 0} ∩ L2 . The two sets

0}|X ] = P(Y = 0|X )E [hπ ∗ |Y = 0, X ] by the law of iterated the set (3) is equivalent to (4). Next, the set (4) is equivalent = 0} ∩ L2 , and the set (5) is equivalent to {ζ0∗ uc : uc ≡ are equivalent to each other, by letting hc = uc π ∗ . □

Proof of Proposition 2. By the law of iterated expectations, we have

⏐ } ] ⏐ − (1 − Y )⏐⏐Z , X φ (X ){Z − f (X ; γ )} [ { ∗ } ] = E (1 − Y ) eg (X )−g(X ;α) − 1 φ (X ){Z − f (X ; γ )} . [ {

E {r(Y , Z , X ; β ∗ , α, γ , φ )} = E E Y e−β

∗T Z −g(X ;α )

This immediately shows that if either g(X ; α ) = g ∗ (X ) or f (X ; γ ) = f ∗ (X ) ≡ E(Z |Y = 0, X ), then E {r(Y , Z , X ; β ∗ , α, γ , φ )} = 0. □

6

Z. Tan / Statistics and Probability Letters 155 (2019) 108577

Proof of double robustness of (12). By the law of iterated expectations, we have

⏐ }{ }] ⏐ π E [hπ |Y = 0, X ; θ] − (1 − Y )⏐⏐Z , X hπ − E [π |Y = 0, X ; θ] }] [ { ∗ }{ π E [ h π | Y = 0 , X ; θ] . = E (1 − Y ) eg (X )−g(X ;α) − 1 hπ − E [π |Y = 0, X ; θ] [ {

E {τ (Y , Z , X ; β ∗ , α, θ, h)} = E E Y e−β

∗T Z −g(X ;α )

This immediately shows that if either g(X ; α ) = g ∗ (X ) or p(z |Y = 0, X ; θ ) = p(z |Y = 0, X ), then E {τ (Y , Z , X ; β ∗ , α, θ, φ )} = 0. □ Proof of Proposition 3. Suppose that both models g(X ; α ) and f (X ; γ ) are correctly specified, such that g(X ; α¯ ) = g ∗ (X ) and f (X ; γ¯ ) = E(T |Y = 0, X ). Then B1 = B2 = 0 by direct calculation, and hence (7) reduces to

βˆ (φ ) − β ∗ =

n H −1 ∑

n

r(Yi , Zi , Xi ; β ∗ , α, ¯ γ¯ , φ ) + op (n−1/2 ).

i=1

By the proof of Proposition 2, we actually have E {ϱ(Y , Z , X ; β ∗ )|X } = 0, where

{ } T ϱ(Y , Z , X ; β ) = Y e−β Z −g(X ;α¯ ) − (1 − Y ) {Z − f (X ; γ¯ )}. ∑n

Therefore, βˆ (φ ) is asymptotically equivalent to a solution to n−1 i=1 φ (Xi )ϱ(Yi , Zi , Xi ; β ) = 0, which can be seen as an estimator for β ∗ under the conditional moment condition E {ϱ(Y , Z , X ; β ∗ )|X } = 0. By Chamberlain (1987), the optimal choice of φ (X ) in minimizing the asymptotic variance of such an estimator is E T {∂ρ (Y , Z , X ; β )/∂β T |X }|β=β ∗ var−1 {ρ (Y , Z , X ; β ∗ )|X }, which can be simplified as φopt (X ) by direct calculation. □ References Bickel, P.J., Klaassen, C.A.J., Ritov, Y., Wellner, J.A., 1993. Efficientand Adaptive Estimation for Semiparametric Models. The Johns Hopkins University Press, Baltimore. Chamberlain, G., 1987. Asymptotic efficiency in estimation with conditionalmoment restrictions. J. Econometrics 34, 305–334. Chen, H.Y., 2007. A semiparametric odds ratio model for measuring association. Biometrics 63, 413–421. Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W.K., Robins, J.M., 2018. Double/debiased machine learning for treatment and structural parameters. Econom. J. 21, C1–C68. Farrell, M.H., 2015. Robust inference on average treatment effects with possibly more covariatesthan observations. J. Econometrics 189, 1–23. Manski, C.F., 1988. Analog Estimation Methods in Econometrics. Chapman & Hall, New York. McCullagh, P., Nelder, J.A., 1989. Generalized Linear Models, second ed. Chapman & Hall, London. Robins, J.M., Rotnitzky, A., 2001. Comment on the Bickel and Kwon article, inference for semiparametric models: Some questions and an answer. Statist. Sinica 11, 920–936. Severini, T.A., Staniswalis, J.G., 1994. Quasi-likelihood estimation in semiparametric models. J. Amer. Statist. Assoc. 89, 501–511. Speckman, P., 1988. Kernel smoothing in partial linear models. J. R. Stat. Soc. Ser. B 50, 413–436. Tan, Z., 2017. Regularized calibrated estimation of propensity scores with model misspecification and high-dimensional data. arXiv:1710.08074. Tan, Z., 2018. Model-assisted inference for treatment effects using regularized calibrated estimation with high-dimensional data. arXiv:1801.09817. Tchetgen Tchetgen, E.J., 2013. On a closed-form doubly robust estimator of the adjusted odds ratio for abinary exposure. Am. J. Epidemiol. 177, 1314–1316. Tchetgen Tchetgen, E.J., Robins, J.M., Rotnitzky, A., 2010. On doubly robust estimation in a semiparametric odds ratio model. Biometrika 97, 171–180. Tchetgen Tchetgen, E.J., Rotnitzky, A., 2011. Double-robust estimationof an exposure-outcome odds ratio adjusting for confounding incohort and case-control studies. Stat. Med. 30, 335–347.