- Email: [email protected]

A semiparametric hypothesis testing procedure for the ROC curve area under a density ratio model Biao Zhang∗ Department of Mathematics, The University of Toledo, Toledo, OH 43606, USA Received 27 September 2004; received in revised form 2 February 2005; accepted 4 February 2005 Available online 14 March 2005

Abstract We propose a semiparametric Wald statistic to test whether a diagnostic test is capable of discriminating between diseased and nondiseased subjects based on the receiver operating characteristic (ROC) curve area under a two-sample semiparametric density ratio model. The proposed Wald test is constructed on the basis of the maximum semiparametric likelihood estimator of the ROC curve area. The proposed test statistic has an asymptotic chi-squared distribution under the null hypothesis and an asymptotic noncentral chi-squared distribution under local alternatives to the null hypothesis. We present some results on a simulation study and on the analysis of two data sets. © 2005 Elsevier B.V. All rights reserved. Keywords: Binormal model; Chi-squared; Density ratio model; Gaussian process; Local alternative; Logistic regression model; Maximum likelihood; Power; ROC curve; Wald test

1. Introduction The accuracy of a diagnostic test with results which are binary is commonly summarized by the sensitivity and the speciﬁcity, deﬁned, respectively, as the probability that the test result is positive given that the subject is truly diseased and the probability that the test result is negative given that the subject is truly nondiseased. The receiver operating characteristic (ROC) curve generalizes notions of sensitivity and speciﬁcity from binary tests to nonbinary tests. When the test results are not simply positive or negative, but are ∗ Tel.: +1 419 530 4506; fax: +1 419 530 4720.

E-mail address: [email protected] 0167-9473/$ - see front matter © 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.csda.2005.02.001

1856

B. Zhang / Computational Statistics & Data Analysis 50 (2006) 1855 – 1876

measured on continuous or ordinal scales, the ROC curve is the most popular approach for assessing the accuracy of a continuous or ordinal-valued diagnostic test and is deﬁned as a plot of the sensitivity versus one minus the speciﬁcity across all possible choices of threshold values. It is often useful to summarize the accuracy of a continuous or ordinal test by a single number. One summary measure associated with a ROC curve is the area under the ROC curve (AUC) which conveys important information about the ROC curve. For comprehensive reviews of recent developments in ROC curves see for example Begg (1991), Hsieh and Turnbull (1996), Pepe (2000a), Zhou et al. (2002), and Pepe (2003). Let X1 , . . . , Xn0 denote independent and identically distributed test results from a nondiseased population and, independent of the Xi , let Y1 , . . . , Yn1 be independent and identically distributed test results from a diseased population. If G and F represent, respectively, the distribution functions of X1 and Y1 , a closed-form expression for the ROC curve is a plot of R(s) = 1 − F G−1 (1 − s) against s ∈ [0, 1]. The area under the ROC curve is given by 1 R(s) ds = P (Y1 > X1 ) . AUC = 0

In the literature, statistical inferences for an ROC curve and its AUC are based either on a fully parametric model, fully nonparametric model, or semiparametric model. In most parametric analyses, F and G are assumed to have normal distributions, but the normality assumption is typically not so robust. Goddard and Hinberg (1990) found that the normal assumption is not a good choice for describing the distributions of blood samples assayed by some diagnostic kits for cancer patients and that one should not assume that the observed data themselves are normal; the sample means and sample variances should not be substituted directly into the estimates for the intercept and the slope in the binormal model. The alternative nonparametric analysis is to leave F and G completely unspeciﬁed and to estimate R(s) and AUC by replacing F and G by their corresponding empirical distribution functions based on diseased and nondiseased data. Hsieh and Turnbull (1996) established asymptotic theory for the nonparametric ROC-curve estimator. One disadvantage of the nonparametric analysis is that the estimated ROC curve is not smooth, especially for small sample sizes, so that the clinical interpretation of sensitivity and speciﬁcity would be quite different even for close yet different threshold values. Based on kernel estimators of F and G, Lloyd (1998) introduced a smoothed ROC curve which is a kernel-based estimator of R(s). As a compromise between the parametric and nonparametric approaches, the binormal model is a popular semiparametric approach to modeling a ROC curve. In this model, there exists some unknown monotonic transformation of the test results which simultaneously converts the G and F distributions to normal ones, although the raw test results themselves may be decidedly non-normal. Metz and Kronman (1980) studied statistical signiﬁcance tests for binormal ROC curves, whereas Metz et al. (1984) proposed a new approach for testing the signiﬁcance of differences between ROC curves measured from correlated data. Metz (1986) considered statistical analysis of ROC data in evaluating diagnostic performance. McClish (1990) studied how to determine a range of false-positive rates for which ROC curves may differ. Dorfman et al. (1992) discussed receiver operating characteristic rating analysis with generalization to the population of readers and cases using the jackknife method. Jiang et al. (1996) proposed a receiver operating characteristic partial area index for

B. Zhang / Computational Statistics & Data Analysis 50 (2006) 1855 – 1876

1857

highly sensitive diagnostic tests. Toledano and Gatsonis (1996) studied ordinal regression methodology for ROC curves derived from correlated data. Hsieh and Turnbull (1996) proposed a generalized least-squares procedure for estimating parameters under the binormal model. Metz et al. (1998) discussed statistical comparison of two ROC curve estimates obtained from partially paired data sets. Metz et al. (1998) proposed a rank-based method for estimating R(s) by grouping the observed continuous data on the basis of the “truth-state runs” in rank-ordered test results. In exploring robust and efﬁcient estimates of R(s), Li et al. (1999) considered a semiparametric approach in which they modeled F parametrically and modeled G nonparametrically. Zou and Hall (2000) employed a rank-based likelihood to estimate parameters in the ROC curve. By interpreting each point on the ROC curve as being a conditional expectation of a binary random variable, Pepe (2000b) proposed to use generalized linear model methods for binary data to model the ROC curve and to estimate the underlying parameters. Lloyd (2002a, 2002b) considered ﬁtting ROC curves using a speciﬁc nonlinear logistic regression model. Qin and Zhang (2003) considered a semiparametric procedure for estimating the ROC curve and established the large sample results for the semiparametric ROC estimator under the following two-sample semiparametric density ratio model (1.2). Let D = 1 or 0 denote the disease or nondisease status. For a given test result X = x, the logistic regression model is exp ∗ + r(x) , P (D = 1|X = x) = (1.1) 1 + exp ∗ + r(x) where ∗ is a scalar parameter, is a p × 1 vector parameter, and r(x) is a p × 1 smooth vector function of x. Then F (x) = P (X x|D = 1) and G(x) = P (X x|D = 0). Let f (x) and g(x) be the density functions corresponding to F (x) and G(x). Qin and Zhang (1997) showed that model (1.1) is equivalent to the following two-sample semiparametric density ratio model in which the density functions f (x) and g(x) are linked by an two unknown “exponential tilt” exp + r(x) : X1 , . . . , Xn0 are independent with density function g(x), Y1 , . . . , Yn1 are independent with density function f (x) = exp + r(x) g(x), (1.2) where = ∗ + log[{1 − P (D = 1)}/P (D = 1)]. Kay and Little (1987) discussed various versions of the density ratio model for some conventional distributions. In most applications, r(x) = x or r(x) = x, x 2 . For r(x) = x, model (1.2) encompasses many common distributions including two exponential distributions with different means and two normal distributions with a common variance but different means; for r(x) = x, x 2 , model (1.2) includes two normal distributions with different means and variances. In some applications, we found that model (1.2)with r(x) = log x, which contains two log normal distributions log N 1 , 2 and log N 2 , 2 with 1 = 2 , provides a good ﬁt to the observed data. For a given choice of r(x), we can use the Kolmogorov–Smirnov-type statistic of Qin and Zhang (1997) to test the validity of model (1.2). If more than one form of r(x) is possible for any particular situation, we would prefer simple models to complex ones that ﬁt our data almost equally well according to the principle of parsimony.

1858

B. Zhang / Computational Statistics & Data Analysis 50 (2006) 1855 – 1876

Under model (1.2), the ROC curve R(s) and its AUC can be expressed as s R(s) = exp + r G−1 (1 − t) dt, 0

AUC =

I (y > x) exp + r(y) dG(x) dG(y),

(1.3)

where I (A) for any set (event) A denotes the indicator function of A; that is, I (A) assumes the value 1 on A and 0 on A’s complement Ac . Note that the representation of R(s) was deduced in Qin and Zhang (2003). The AUC describes the inherent ability of a diagnostic test to discriminate between healthy and diseased individuals. Our focus of attention in this paper is to test whether a diagnostic test is capable of discriminating between diseased and nondiseased subjects based on the AUC under model (1.2). We propose a semiparametric approach to testing the null hypothesis H0 : AUC = AUC0 versus H1 : AUC = AUC0 under model (1.2) with AUC0 being any speciﬁc hypothesized value. We anticipate that our semiparametric hypothesis testing procedure would be more robust than a fully parametric approach and would be more powerful than a fully nonparametric approach. This paper is organized as follows. In Section 2, we propose our test statistic and establish its asymptotic null distribution under model (1.2). In Section 3, we investigate the power of the test statistic theoretically and via simulation by considering local alternatives to H0 : AUC = AUC0 under model (1.2). In Section 4, we apply the proposed test statistic to two real examples. Proofs of the main theoretical results are provided in the appendix.

2. Construction of test statistics

Let {T1 , . . . , Tn } denote the combined test results X1 , . . . , Xn0 ; Y1 , . . . , Yn1 with n = n0 +n1 . Moreover, let {t1 , . . . , tn } represent the observed values of {T1 , . . . , Tn } . Following Qin and Zhang (2003), let n 1 exp ˜ + ˜ r (Ti ) I (Ti t) F˜ (t) = , ˜ r (Ti ) n0 1 + exp ˜ + i=1 n I (Ti t) 1 ˜ G(t) = , n0 ˜ + ˜ r (Ti ) i=1 1 + exp

where = n1 /n0 and ˜ , ˜ is the solution to the following system of score equations: n exp + r (ti ) j(, ) = 0, = n1 − j 1 + exp + r (ti ) i=1

n1 n exp + r (ti ) j(, ) r (ti ) = 0. = r yj − j 1 + exp + r (ti ) j =1

i=1

(2.1)

B. Zhang / Computational Statistics & Data Analysis 50 (2006) 1855 – 1876

1859

Here (, ) stands for the semiparametric proﬁle log likelihood function of (, ).As shown ˜ G ˜ is the maximumsemiparametric likelihood estimator of in Qin and Zhang (1997), ˜ , , ˜ G ˜ maximizes the following semiparametric full likelihood (, , G) in the sense that ˜ , , function of (, , G): L(, , G) =

n0

dG (xi )

n1

exp + r yj dG yj .

j =1

i=1

According to the invariance property of maximum likelihood estimation, the maximum semiparametric likelihood estimator of AUC in (1.3) under model (1.2) is given by ˜ ˜ AUC = I (y > x) exp ˜ + ˜ r(y) dG(x) dG(y) =

n n

p˜ i p˜ j exp ˜ + ˜ r Tj I Tj > Ti ,

(2.2)

i=1 j =1

where p˜ i = Let

1 1 , n0 1 + exp ˜ + ˜ r (T ) i

n1 1 ˆ F (t) = I Yj t n1

and

j =1

i = 1, . . . , n.

n0 1 ˆ G(t) = I (Xi t) n0 i=1

be, respectively, the nonparametric maximum likelihood estimators of F and G. Then the maximum nonparametric likelihood estimator of AUC in (1.3) is given by n0 n1 1 ˆ A UC = I (y > x)G(x) dFˆ (y) = I Yj > Xi . n0 n1

i=1 j =1

Throughout this paper, let 0 , 0 be the true value of (, ) under model (1.2). Moreover, write t exp 0 + 0 r(x) dG(x), A0 = A0 (∞), A0 (t) = −∞ 1 + exp 0 + 0 r(x) t exp 0 + 0 r(x) r(x) dG(x), A1 = A1 (∞), A1 (t) = −∞ 1 + exp 0 + 0 r(x) exp 0 + 0 r(x) A0 A1 . A2 = r(x)[r(x)] dG(x), A = A1 A2 1 + exp 0 + 0 r(x) The following theorem whose proof is given in the appendix demonstrates that the max imum semiparametric likelihood estimator A UC is asymptotically more efﬁcient than the maximum nonparametric likelihood estimator A UC.

1860

B. Zhang / Computational Statistics & Data Analysis 50 (2006) 1855 – 1876

Theorem 1. Let 0 < a < b < 1 be given. Suppose that F and G have continuous positive densities f and g, respectively, on G−1 (a) − , G−1 (b) + for some > 0. If A is positive d √ deﬁnite, then under model (1.2) we have n A UC − AUC −→ N 0, 2 0 , 0 , G and d √ n A UC−AUC −→ N 0, 2 (F, G) , where the asymptotic variances 2 0 , 0 , G and 2 (F, G) are deﬁned in (A.1) and (A.3) of the appendix and satisfy ∞ ∞ 2 (F, G) − 2 0 , 0 , G = J (s, t) dG(s) dG(t) 0, (2.3) −∞ −∞

where J (s, t) =

1+ 1 + exp 0 + 0 r(s) 1 + exp 0 + 0 r(t) −1 A0 (t) × A0 (s ∧ t) − A0 (s), A1 (s) A A1 (t)

with s ∧ t = min(s, t). Remark 1. The condition under which A is positive deﬁnite is satisﬁed when r(x) = r1 (x), . . . , rp (x) with r1 (x), . . . , rp (x) being p functionally independent functions. In particular, if r(x) = x, x 2 , . . . , x p , then A is positive deﬁnite. Since the choice r(x) = x or r(x)= x, x 2 is most frequently encountered in practice, the condition that A is positive deﬁnite does not restrict the results of Theorem 1 to a large extent. We now consider the problem of testing H0 : AUC = AUC0 versus H1 : AUC = AUC0 under the two-sample semiparametric density ratio model (1.2). When AUC0 = 0.5, the null hypothesis H0 : AUC = 0.5 represents a noninformative test without discrimination ability so as to have a sensitivity = 1 − speciﬁcity. The ROC curve corresponding to such a noninformative test lies along the 45◦ diagonal line with R(s) = s and has an area of AUC = ˜ G ˜ 0.5. To formulate our semiparametric test statistics under model (1.2), let ˜ 2 = 2 ˜ , , 2 be the maximum semiparametric likelihood estimator of 0 , 0 , G with 0 , 0 , G ˜ G ˜ . Then it is seen from (A.3) and the deﬁnitions of G, ˜ F˜ , and p˜ i that replaced by ˜ , , n n ˜ G ˜ = 1+ ˜ 2 = 2 ˜ , , p˜ i p˜ j F˜ Ti ∧ Tj − F˜ (Ti ) F˜ Tj i=1 j =1

+ (1 + )

n n i=1 j =1

p˜ i p˜ j exp ˜ + ˜ r (Ti ) exp ˜ + ˜ r Tj

˜ Ti ∧ Tj − G ˜ (Ti ) G ˜ Tj × G n n 1 + − p˜ i p˜ j 1+ exp ˜ +˜ r (Ti ) 1+ exp ˜ +˜ r Tj i=1 j =1

B. Zhang / Computational Statistics & Data Analysis 50 (2006) 1855 – 1876

× A˜ 0 Ti ∧ Tj =

− A˜ 0 (Ti ) , A˜ 1 (Ti ) A˜ −1

1861

A˜ 0 Tj A˜ 1 Tj

n n 1 + p˜ i p˜ j F˜ Ti ∧ Tj − F˜ (Ti ) F˜ Tj i=1 j =1

+ (1 + )

n n

p˜ i p˜ j exp ˜ + ˜ r (Ti ) exp ˜ + ˜ r Tj

i=1 j =1

˜ Ti ∧ Tj − G ˜ (Ti ) G ˜ Tj × G n n A˜ 0 Tj 1 + 1 ˜ −1 − A0 Ti ∧ Tj − A˜ 0 (Ti ) , A˜ 1 (Ti ) A˜ , 2 n0 A˜ 1 Tj i=1 j =1 (2.4) where

exp ˜ + ˜ r (Ti ) A˜ 0 (t) = p˜ i I (Ti t) , A˜ 0 = A˜ 0 (∞), 1 + exp ˜ + ˜ r (Ti ) i=1 n exp ˜ + ˜ r (Ti ) ˜ A1 (t) = p˜ i r (Ti ) I (Ti t) , A˜ 1 = A˜ 1 (∞), ˜ r (Ti ) 1 + exp ˜ + i=1 n exp ˜ + ˜ r (Ti ) A˜ 0 A˜ 1 ˜ ˜ A2 = . p˜ i r (Ti ) [r (Ti )] , A = ˜ A1 A˜ 2 1 + exp ˜ + ˜ r (Ti ) n

i=1

˜ G ˜ is a consistent It can be shown that the estimated asymptotic variance ˜ 2 = 2 ˜ , , estimator of 2 0 , 0 , G under model (1.2). According to Theorem 1, a semiparametric Wald-type statistic for testing H0 : AUC = AUC0 versus H1 : AUC = AUC0 under model (1.2) is given by W˜ n =

√ n A UC − AUC0 ˜

or

W˜ n2 =

2 n A UC − AUC0 ˜ 2

.

(2.5)

Clearly, large observed values of W˜ n2 indicate evidence against H0 : AUC = AUC0 under d model (1.2). If 2 0 , 0 , G > 0, then it follows from Theorem 1 that W˜ n2 −→ 21 under model (1.2). We reject the null hypothesis H0 : AUC = AUC0 in favor of the alternative 2 2 ˜2 hypothesis H1 : AUC 2 = AUC0 at level if Wn > 1 (1 − ), where 1 (1 − ) is such that 2 P 1 1 (1 − ) = 1 − . On the other hand, according to Theorem 1, a nonparametric Wald-type statistic for testing H0 : AUC = AUC0 versus H1 : AUC = AUC0 is given by √ n A UC − AUC0 Wˆ n = ˆ

or

Wˆ n2

=

2 n A UC − AUC0 ˆ 2

,

1862

B. Zhang / Computational Statistics & Data Analysis 50 (2006) 1855 – 1876

ˆ is the maximum nonparametric likelihood estimator of 2 (F, G) with where ˆ 2 = ˆ 2 Fˆ , G ˆ and is given by (F, G) replaced by Fˆ , G 2

ˆ = ˆ

2

ˆ = Fˆ , G

n0 n0 n ˆ Xi ∧ Xj − Fˆ (Xi ) Fˆ Xj F n20 n1 i=1 j =1

+

n1 n1 n ˆ (Yi ) G ˆ Yi ∧ Yj − G ˆ Yj . G 2 n0 n1 i=1 j =1

d When 2 (F, G) > 0, it follows from Theorem 1 that Wˆ n2 −→ 21 regardless of whether or not model (1.2) is valid. Since A UC is more efﬁcient than A UC by Theorem 1, we anticipate that the semiparametric Wald statistic W˜ n2 is more powerful than the nonparametric Wald statistic Wˆ n2 for testing H0 : AUC = AUC0 versus H1 : AUC = AUC0 under model (1.2), as illustrated in a simulation study of the next section.

Remark 2. More generally, we can determine, under model (1.2), whether a diagnostic test has the ability of discriminating between diseased and nondiseased subjects based on the ROC curve area under a restricted range of values of one minus the speciﬁcity. For any particular values s1 and s2 of 1−speciﬁcity, the null versus alternative hypotheses are expressed in terms of the partial area as H0 : AUC H1 : AUC (s1 , s2 ) = 0 (s1 , s2 ) versus s (s1s, s2 )=AUC AUC0 (s1 , s2 ),where AUC (s1 , s2 ) = s12 0 exp + r G−1 (1 − t) dt ds is the partial area under model (1.2) and AUC0 (s1 , s2 ) is any speciﬁc hypothesized value of AUC (s1 , s2 ) . When s1 = 0 and s2 = 1, AUC (s1 , s2 ) reduces to the total area AUC in (1.3). Similar to (2.5), a semiparametric Wald statistic for testing H0 : AUC (s1 , s2 ) = AUC0 (s1 , s2 ) versus H1 : AUC (s1 , s2 ) = AUC0 (s1 , s2 ) is given by 2 n A UC (s1 , s2 ) − AUC0 (s1 , s2 ) W˜ n2 (s1 , s2 ) = , A Var UC (s1 , s2 ) −1 ˜ r G ˜ (1−t) dt ds is the maximum semiparametexp ˜ + s1 0 A ric likelihood estimator of AUC (s1 , s2 ) under model (1.2) and Var UC (s1 , s2 ) is the esti A mated asymptotic variance of A UC (s1 , s2 ). The explicit expression of Var UC (s1 , s2 ) can be derived in a manner similar to (2.4). where A UC (s1 , s2 ) =

s2 s

Remark 3. The proposed semiparametric Wald test procedure can be extended to the case of comparing two or more independent ROC areas under model (1.2). In the case of comparing two such ROC areas, a natural test statistic is based on the difference in maximum semiparametric likelihood estimators of AUC. Suppose that there are two types of tests, labeled A and B, that are applied to subjects. We denote the two ROC areas by AUCA and AUCB and their maximum semiparametric likelihood estimators under model (1.2) by A UCA and A UCB . If A UCA and A UCB are derived from two independent samples, then a semiparametric Wald-type statistic for testing H0 : AUCA = AUCB versus H1 : AUCA =

B. Zhang / Computational Statistics & Data Analysis 50 (2006) 1855 – 1876

1863

2 AUCB is given by Dn = A UCA − A UCB / n−1 ˜ 2A + n−1 ˜ 2B , where nA and nB are A B sample sizes and ˜ 2A and ˜ 2B are versions of ˜ 2 in (2.4); both nA , ˜ 2A and nB , ˜ 2B correspond to tests A and B. The null hypothesis H0 : AUCA = AUCB is tested by comparing the value of Dn with a 21 distribution. Remark 4. When testing the differences between two or more ROC areas, the proposed semiparametric test approach is restricted to those estimates obtained from independently sampled data sets. On the other hand, some nonparametric and semiparametric approaches proposed previously apply also to correlated data sets; see for example Metz et al. (1984), Wieand et al. (1989), Dorfman et al. (1992), Jiang et al. (1996), Toledano and Gatsonis (1996), Zhou and Gatsonis (1996), and Metz et al. (1998).

3. Power considerations In this section we investigate the power of the proposed test statistic W˜ n2 theoretically and via simulation by considering the following local alternatives to model (1.2): X1 , . . . , Xn0 are independent with density g(x), Y1 , . . . , Yn1 are independent with density f (x, ) = exp + r(x) h(x, )g(x), (3.1) where = 1 , . . . , q ranges over a neighborhood of a point 0 = 01 , . . . , 0q in R q and h(x, ) is, for ﬁxed x, a known function from R q to R + for which h x, 0 = 1 for each x and with partial derivative s(x, ) = jh(x, )/j . Note that model (3.1) reduces to model (1.2) when = 0 . Throughout this section, let F (x, ) be the distribution function corresponding to f (x, ). Write exp 0 + 0 r(x) s x, 0 dG(x), V0 = V0 (∞), V0 (t) = −∞ 1 + exp 0 + 0 r(x) exp 0 + 0 r(x) s x, 0 [r(x)] dG(x), V1 = 1 + exp 0 + 0 r(x) B0 = G(x)s x, 0 exp 0 + 0 r(x) dG(x),

B(t) =

t

t −∞

s x, 0 exp 0 + 0 r(x) dG(x),

V0 , m(t) = V0 (t) − (A0 (t), A1 (t)) A−1 V1 (t) =

1 1 + exp 0 + 0 r(t) m(t) − B (t),

=

∞ −∞

(t) dG(t).

1864

B. Zhang / Computational Statistics & Data Analysis 50 (2006) 1855 – 1876

Additionally, we assume that the function h(x, ) or its partial derivative s(x, ) is chosen such that all the components of V1 and B0 are ﬁnite. Under model (3.1), the AUC is equal to AUC( ) = G(t) dF (t, ) = G(t) exp + r(t) h(t, ) dG(t) = G(t)h(t, ) dF (t). (3.2) When = 0 , it can be shown by (1.3) and (3.2) that AUC 0 = AUC under model (1.2). For ﬁxed = , . . . , q ∈ R q , let n = n1 , . . . , nq = 0 + n−1/2 [1 + o(1)] as n → ∞. Under the sequence of parameter values 0 , 0 , n in model (3.1), we have AUC n − AUC =

G(t)h t, n dF (t) −

G(t) dF (t)

G(t) h t, n − h t, 0 dF (t) 1 =√ G(t)s t, 0 dF (t) + o n−1/2 n 1 = √ B0 [1 + o(1)]. n

=

3.1. Local asymptotic power We now consider the local asymptotic power of the proposed semiparametric Wald test of H0 : AUC=AUC0 with local alternatives H1 : AUCn =AUC0 +n−1/2 B0 [1+o(1)] under model (3.1),where ∈ R q . The following theorem establishes the large-sample distribution of the proposed semiparametric Wald statistic W˜ n2 for testing H0 : AUC = AUC0 under the sequence of parameter values 0 , 0 , n ,where n = 0 + n−1/2 [1 + o(1)] as n → ∞ for ﬁxed ∈ R q . d Theorem 2. Under the conditions of Theorem 1,we have W˜ n −→ N , 2 0 , 0 , G and d W˜ n2 −→ 21 2 under model(3.1) with (, , ) = 0 , 0 , 0 as n → ∞, where 2 = 2 2 /2 0 , 0 , G and 21 2 is a noncentral chi-squared random variable with one degree of freedom and noncentrality parameter 2 . The proof of Theorem 2 is given in the appendix. The asymptotic null and alternative distributions of W˜ n2 , presented in Theorems 1 and 2, can be employed to obtain critical values of the proposed semiparametric Wald test and power against various local alternatives AUCn = AUC0 by numerical integration, although explicit computation is unfortunately somewhat complicated.

B. Zhang / Computational Statistics & Data Analysis 50 (2006) 1855 – 1876

1865

3.2. Comparative study In the following we present a small simulation study of the ﬁnite sample performance of the proposed semiparametric Wald statistic W˜ n2 for testing H0 : AUC = AUC0 versus H1 : AUC = AUC0 under model (1.2). In our simulation study, we ﬁrst assume that g(x) is the density function of a N 0 , 20 distribution and f (x) is the density function of a N 1 , 21 distribution. Then model (1.2) holds with r(x) = x, x 2 , = 1 , 2 , and 21 0 1 20 1 0 1 1 1 = log + − 2 , 1 = 2 − 2 , 2 = − 2 . 1 2 20 2 20 1 1 0 1 √ The ROC curve area under this binormal model is given by AUC = a/ 1 + b2 , where a = 1 − 0 /1 and b = 0 /1 . Here (·) is the standard normal distribution function. According to Zhou et al. (2002, pp. 122 and 140), the parametric Wald statistic Wn2 for testing H0 : AUC = AUC0 versus H1 : AUC = AUC0 under the binormal model is given by aˆ 2 Wn = − AUC0 ˆ 2 , 2 ˆ 1+b where y¯ − x¯ sx aˆ + dˆ 2 Var bˆ + 2cˆdˆ Cov a, ˆ bˆ , , bˆ = , ˆ 2 = cˆ2 Var sy sy exp − aˆ 2 /2 1 + bˆ 2 aˆ bˆ exp − aˆ 2 /2 1 + bˆ 2 ˆ=− cˆ = , , d 3 2 ˆ 2 1 + bˆ 2 2 1 + b

aˆ =

n0 aˆ 2 + 2 + 2n1 bˆ 2 aˆ = Var , 2n0 n1

ˆ2 bˆ = (n1 + n0 ) b , Var 2n0 n1

aˆ bˆ a, Cov ˆ bˆ = . 2n1

Here x¯ and sx2 are, respectively, the sample mean and the sample variance of X1 , . . . , Xn0 , and y¯ and sy2 are, respectively, the sample mean and the sample variance of Y1 , . . . , Yn1 . The parametric Wald statistic Wn2 is seldom used in practice because it is not robust to nonnormality. Instead, Metz et al. (1998) LABROC method and its corresponding ROCKIT software is mostly used because of its robustness to nonnormality. The LABROCbased semiparametric Wald statistic for testing H0 : AUC = AUC 0 versus H1 : AUC = 2 = A UCM or WM UCM − AUC0 / Var AUC0 for continuous data is given by WM = A 2 A A UCM − AUC0 /Var UC is the LABROC ROC curve area estiUCM , where A M A UCM is its estimated standard error. mator of AUC of Metz et al. (1998) and Var A FORTRAN program (available at http://www-radiology.uchicago.edu) implementing the LABROC4 algorithm can be employed to calculate the LABROC ROC curve area estimator A UCM and its estimated standard error.

1866

B. Zhang / Computational Statistics & Data Analysis 50 (2006) 1855 – 1876

In our simulation study, we assume that 0 = 0 and 0 = 1 = 1 so that g(x) is the standard normal density function and f (x)=g(x −1 ) is the density function of a N 1 , 1 distribution. Moreover, we assume that f x, n is the density function of a N 1 , 2n distribution with 2n = 1/ 1 − 2n−1/2 for some ﬁxed ∈ R such that 2n > 0. Then model (1.2) holds with = −21 /2 and = 1 , and model (3.1) holds with h x, n = exp n1 + n2 x + n3 x 2 where n = n1 , n2 , n3 and 21 1 1 1 1 1 2 1 − 2 − log n , n2 = 1 1− 2 . − 1 , n3 =

n1 = 2 n 2 2n 2 n It is easy to verify that n= 0 + n−1/2 {1 + o(1)} as n → ∞ with 0 = (0, 0, 0) and = 21 − 1, −21 , 1 . Our aim is to compare the performances of the proposed 2, semiparametric Wald statistic W˜ n2 , the LABROC-based semiparametric Wald statistic WM 2 2 the nonparametric Wald statistic Wˆ n , and the parametric Wald statistic Wn by examining their powers against some local alternatives = 0 under model (3.1). We considered = 0, 2.0, 4.0 and sample sizes of (n0 , n1 ) = (40, 60) and (n0 , n1 ) = (60, 40). Furthermore, we let 1 = 1 be ﬁxed so that = −0.5, = 1, a = 1, b = 1, and AUC0 = 0.760. Note that, for = 0, 2.0, 4.0, we have n = 1.0, 1.291, 2.236 and AUCn = 0.760, 0.785, 0.819 when n = 100. For each pair (n0 , n1 ) and each value of , we generated 1000 indepen dent sets of combined random samples from the N 0 , 20 , N 1 , 21 , and N 1 , 2n distributions. The simulation results are summarized in Table 1. It is seen that the achieved signiﬁcance 2,W ˆ n2 , and Wn2 are quite close to the corresponding nominal signiﬁcance levels of W˜ n2 , WM 2 ,W ˆ n2 , and Wn2 are getting larger as moves away from 0. levels, and the powers of W˜ n2 , WM Our simulation results also reveal that the powers of W˜ n2 are greater than or equal to those 2 and W ˆ n2 in all cases, and the powers of Wn2 are slightly larger than those of W˜ n2 , W 2 , of WM M and Wˆ n2 when = 4.0 and are comparable to those of W˜ n2 when = 2.0. In summary, our simulation study indicates that the proposed semiparametric Wald statistic W˜ n2 is superior to the nonparametric Wald statistic Wˆ n2 and is quite comparable to the LABROC-based 2 and the parametric Wald statistic W 2 , in terms of their semiparametric Wald statistic WM n power performances. In Table 2, we report simulation results of the ﬁnite sample performance of the proposed semiparametric ROC curve area estimator A UC under the same setup as that for the above four test statistics when = 0. Our aim is to compare the biases and standard deviations of the proposed semiparametric ROC curve area estimator A UC, the LABROC ROC curve area estimator A UCM of Metz et al. (1998), the nonparametric ROC estimator curve area 2 ˆ AUC, and the parametric ROC curve area estimator AUC = a/ ˆ 1 + b . In Table 2, Bias and SD stand for, respectively, the average of 1000 biases and the sample standard deviation of 1000 estimates. Moreover, MS represents the mean square deviation of an estimate. Additionally, we use SE to represent the square root of the average of 1000 variance estimates of an estimate. Table 2 reveals that A UC produces a smaller standard deviation than A UC and a larger standard deviation than AUC. Furthermore, the standard deviation of A UC is comparable to that of A UCM . While the absolute values of the biases

B. Zhang / Computational Statistics & Data Analysis 50 (2006) 1855 – 1876

1867

Table 1 2 ,W ˆ n2 , and Wn2 under the two-sample normal model Achieved signiﬁcance levels and powers of W˜ n2 , WM

(n0 , n1 )

n

AUCn

Nominal level

Power W˜ n2

2 WM

Wˆ n2

Wn2

0.0 0.0 0.0

(40, 60) (40, 60) (40, 60)

1.000 1.000 1.000

0.760 0.760 0.760

0.10 0.05 0.01

0.115 0.063 0.019

0.115 0.066 0.019

0.117 0.066 0.020

0.107 0.057 0.015

2.0 2.0 2.0

(40, 60) (40, 60) (40, 60)

1.291 1.291 1.291

0.785 0.785 0.785

0.10 0.05 0.01

0.148 0.089 0.022

0.146 0.073 0.014

0.140 0.082 0.022

0.143 0.083 0.021

4.0 4.0 4.0

(40, 60) (40, 60) (40, 60)

2.236 2.236 2.236

0.819 0.819 0.819

0.10 0.05 0.01

0.640 0.494 0.254

0.621 0.485 0.242

0.596 0.459 0.222

0.663 0.515 0.252

0.0 0.0 0.0

(60, 40) (60, 40) (60, 40)

1.000 1.000 1.000

0.760 0.760 0.760

0.10 0.05 0.01

0.111 0.057 0.015

0.100 0.058 0.019

0.112 0.056 0.014

0.105 0.047 0.015

2.0 2.0 2.0

(60, 40) (60, 40) (60,40)

1.291 1.291 1.291

0.785 0.785 0.785

0.10 0.05 0.01

0.144 0.091 0.025

0.141 0.077 0.023

0.144 0.079 0.024

0.135 0.081 0.022

4.0 4.0 4.0

(60, 40) (60, 40) (60, 40)

2.236 2.236 2.236

0.819 0.819 0.819

0.10 0.05 0.01

0.509 0.379 0.169

0.503 0.372 0.159

0.473 0.353 0.157

0.534 0.399 0.177

Table 2 UC, and AUC under Biases, standard deviations, standard errors, and mean square deviations of A UC, A UCM , A the two-sample normal model Estimator

(n0 , n1 )

Bias

SD

SE

MS

A UC A UCM A UC AUC

(40, 60) (40, 60) (40, 60) (40, 60)

−0.00179 0.00492 −0.00003 −0.00096

0.04726 0.04724 0.04801 0.04650

0.04722 0.04754 0.04830 0.04735

0.00224 0.00226 0.00231 0.00216

A UC A UCM A UC AUC

(60, 40) (60, 40) (60, 40) (60, 40)

−0.00194 0.00201 −0.00031 −0.00107

0.04714 0.04773 0.04817 0.04633

0.04723 0.04747 0.04829 0.04737

0.00223 0.00228 0.00232 0.00215

of A UC are slightly larger than those of A UC and AUC, but smaller than those of A UCM , the mean square deviations of A UC are smaller than those of A UCM and A UC but larger than that of AUC. As for the variance estimates, Table 2 indicates that the standard errors UC work well and work better than those of AUC for each value of A UC, A UCM , and A

1868

B. Zhang / Computational Statistics & Data Analysis 50 (2006) 1855 – 1876

of (n0 , n1 ) . In addition, the ratios of estimated biases to standard errors, Bias/SE, are all within 0.11 in absolute value for each estimate and each value of (n0 , n1 ) . According to the rule of thumb described in Efron and Tibshirani (1993, p. 128), the biases of A UC, A UCM , A UC, and AUC are due to sampling variation and can be ignored. In summary, our simulation study indicates that the parametric estimator AUC is, as anticipated, better than the proposed semiparametric estimator A UC and the LABROC ROC curve area estimator AUCM , which are in turn superior to the nonparametric estimator A UC, in terms of mean squared deviation, and that the proposed semiparametric estimator A UC is more efﬁcient than the nonparametric estimator A UC. 3.3. Robustness Next, we study how well the proposed semiparametric Wald statistic W˜ n2 and the proposed semiparametric ROC curve area estimator A UC work, under the same scenario as that in Tables 1 and 2, when the underlying data violate the normal assumption. To this end, −x we assume that g(x) −1= −e x I (x > 0) is the standard exponential E(1) density function, f (x) = /() x e I (x > 0) is the density function of a (, )√distribution, and f x, n is the density function of a (, n ) distribution with n = +/ n for some ﬁxed log ∈ R such that n > 0. Then model (1.2) holds with r(x)=(x, x) , = log −log (), and = (1 − , − 1) , and model (3.1) holds with h x, n = exp n1 + n2 x where

n = n1 , n2 and n , n2 = + 1 − n .

n1 = a log It is easy to verify that n = 0 + n−1/2 {1 + o(1)} as n → ∞ with 0 = (0, 1) and = a −1 , −1 . For = 2, = 1, each value of = 0, −1.5, −3.0, each pair of (n0 , n1 ) = (40, 60), (60, 40), we generated 1000 independent sets of combined random samples from the E(1), (, ), and (, n ) distributions. It is seen that, for = 0, −1.5, −3.0, we have n = 1.0, 0.85, 0.7 and AUCn = 0.750, 0.789, 0.831 when n = 100. The simulation results 2 ,W ˆ n2 , Wn2 and in Table 4 for A are reported in Table 3 for W˜ n2 , WM UC, A UCM , A UC, AUC. 2 2 2 It is seen from Table 3 that the achieved signiﬁcance levels of W˜ n , WM , Wˆ n are closer to their corresponding nominal signiﬁcance levels than those of Wn2 in most cases, and the powers 2,W ˆ n2 are all signiﬁcantly larger than those of Wn2 . Furthermore, the powers of of W˜ n2 , WM 2 ˜ Wn are larger than those of Wˆ n2 except for a single case with (n0 , n1 ) = (60, 40), = −1.5, 2 . Table 4 shows that and nominal level equal to 0.10, and are comparable to those of WM the biases and mean square deviations of the parametric ROC curve area estimator AUC are both signiﬁcantly larger than those of the proposed semiparametric ROC curve area estimator A UC, the LABROC ROC curve area estimator A UCM , and the nonparametric ROC curve area estimator AUC. Furthermore, AUC has smaller standard deviations and mean square deviations than A UCM , A UC, and AUC. Moreover, the ratios of estimated biases to standard errors, Bias/SE, are all within 0.07 in absolute value for A UC, A UCM , A UC, whereas |Bias|/SE = 0.60 for AUC when (n0 , n1 ) = (40, 60) and |Bias|/SE = 0.62

B. Zhang / Computational Statistics & Data Analysis 50 (2006) 1855 – 1876

1869

Table 3 2 ,W ˆ n2 , and Wn2 under the two-sample exponential-gamma Achieved signiﬁcance levels and powers of W˜ n2 , WM model

(n0 , n1 )

n

AUCn

Nominal level

Power W˜ n2

2 WM

Wˆ n2

Wn2

000 000 000

(40, 60) (40, 60) (40, 60)

1.00 1.00 1.00

0.750 0.750 0.750

0.10 0.05 0.01

0.113 0.062 0.024

0.113 0.061 0.029

0.107 0.057 0.020

0.150 0.085 0.021

−150 −150 −150 −300 −300 −300

(40, 60) (40, 60) (40, 60) (40, 60) (40, 60) (40, 60)

0.85 0.85 0.85 0.70 0.70 0.70

0.789 0.789 0.789 0.831 0.831 0.831

0.10 0.05 0.01 0.10 0.05 0.01

0.286 0.214 0.097 0.647 0.545 0.344

0.284 0.199 0.094 0.645 0.529 0.353

0.285 0.207 0.089 0.644 0.532 0.336

0.125 0.076 0.025 0.298 0.210 0.091

000 000 000

(60, 40) (60, 40) (60, 40)

1.00 1.00 1.00

0.750 0.750 0.750

0.10 0.05 0.01

0.119 0.056 0.017

0.120 0.071 0.024

0.113 0.059 0.017

0.122 0.066 0.014

−150 −150 −150 −300 −300 −300

(60, 40) (60, 40) (60, 40) (60, 40) (60, 40) (60, 40)

0.85 0.85 0.85 0.70 0.70 0.70

0.789 0.789 0.789 0.831 0.831 0.831

0.10 0.05 0.01 0.10 0.05 0.01

0.271 0.194 0.084 0.629 0.528 0.328

0.267 0.186 0.105 0.648 0.549 0.349

0.275 0.190 0.082 0.623 0.514 0.320

0.086 0.043 0.016 0.224 0.140 0.051

Table 4 Biases, standard deviations, standard errors, and mean square deviations of A UC, A UCM , A UC, and AUC under the two-sample exponential-gamma model Estimator

(n0 , n1 )

Bias

SD

SE

MS

A UC A UCM A UC AUC

(40, 60) (40, 60) (40, 60) (40, 60)

−0.00039 0.00307 0.00068 −0.02927

0.04957 0.04989 0.05010 0.05026

0.04828 0.04870 0.04977 0.04875

0.00246 0.00250 0.00251 0.00338

A UC A UCM A UC AUC

(60, 40) (60, 40) (60, 40) (60, 40)

−0.00254 0.00299 −0.00125 −0.03253

0.04833 0.04896 0.04893 0.04810

0.04712 0.04752 0.04847 0.05238

0.00234 0.00241 0.00240 0.00337

for AUC when (n0 , n1 ) = (60, 40). Thus, the bias of AUC is not due to sampling variation and is a systematic bias. In conclusion, our simulation study indicates that the proposed semiparametric procedure is more efﬁcient than the nonparametric procedure, is more robust

1870

B. Zhang / Computational Statistics & Data Analysis 50 (2006) 1855 – 1876

than the normal-based parametric procedure, and is comparable to the LABROC-based semiparametric procedure.

4. Examples In this section, we consider the problem of testing H0 : AUC=AUC0 versus H1 : AUC = AUC0 under the density ratio model (1.2) by applying the proposed semiparametric Wald test to two real data sets. Example 1. Wieand et al. (1989) reported a case-control study in which sera from n0 = 51 control patients with pancreatitis and n1 = 90 case patients with pancreatic cancer were assayed at the Mayo Clinic with a cancer antigen (CA-125) and with a carbohydrate antigen (CA19-9). Both the CA-125 and CA19-9 antigens are measured on a continuous positive scale. The raw data is available at http://www.fhcrc.org/labs/pepe/book/data/wiedat2b.dta. On the basis of the test results associated only with CA-125, we consider the problem of whether the antigen CA-125 has any discrimination ability by testing H 0 : AUC = 0.5 versus H1 : AUC = 0.5 under model (1.2). For r(x) = log x, (log x)2 , the system of score equations in (2.1) yields ˜ = −4.33700 and ˜ = (2.06145, −0.18325) . Furthermore, the Kolmogorov–Smirnov-type statistic of Qin and Zhang (1997) for testing model (1.2) with r(x) = log x, (log x)2 is found to be = 0.77997 with the observed P -value equal to 0.316 based on 1000 bootstrap replications. Thus, the semiparametric density ra tio model (1.2) with r(x) = log x, (log x)2 provides a good ﬁt to the observed data. If we denote w(x) = exp −4.33700 + 2.06145 × log x − 0.18325 × (log x)2 , then under model (1.2) with r(x) = log x, (log x)2 and according to (2.2) and (2.4), the maximum semiparametric likelihood estimate of AUC is calculated as A UC = 0.69216 and its estimated asymptotic variance and standard error are respectively equal to ˜ 2 /n = 0.00204 and √ √ / ˜ n = 0.00204 = 0.04516. It follows from (2.5) that the proposed semiparametric Wald statistic for testing the null hypothesis H0 : AUC=0.5 versus the alternative hypothesis H1 : √ AUC = 0.5 is identical to W˜ n = n A UC − 0.5 /˜ = (0.69216 − 0.5)/0.04516 = 4.25500 or W˜ n2 = 18.10504 with P -value equal to 0.00002, which is statistically signiﬁcant. Since the maximum semiparametric likelihood estimator of AUC is A UC = 0.69216, we conclude that the antigen CA-125 has some ability of discriminating between patients with and without pancreatic cancer. Example 2. Venkatraman and Begg (1996) discussed a melanoma data set in which the deﬁnitive diagnosis of a pigmented lesion suspected of being a melanoma involves a biopsy. There are two systems that can be used to evaluate suspicious lesions. One is the clinical score system given by doctors, and the other system is the dermoscope. The raw data is listed in Table 4 of Venkatraman and Begg (1996, pp. 842 and 843) with n0 = 51, n1 = 21, and n = n0 + n1 = 72. Based on the data only from the clinical score system, we consider the problem of testing whether the clinical score system has any discrimination ability at all ˜ under model (1.2). For model (1.2) with r(x)=x, we have (˜, )=(0.88725, 1.00015), and the Kolmogorov–Smirnov type statistic of Qin and Zhang (1997) for testing model (1.2)

B. Zhang / Computational Statistics & Data Analysis 50 (2006) 1855 – 1876

1871

equals = 0.28030 with the observed P -value equal to 0.430 based on 1000 bootstrap replications. Thus, the semiparametric density ratio model (1.2) with r(x) = x ﬁts the melanoma data well. Under model (1.2) with r(x) = x, the maximum semiparametric likelihood estimate of AUC is, according to (2.2) and (2.4), equal to A UC = 0.88923 2 with its estimated asymptotic variance and standard error identical to ˜ /n = 0.00159 and √ √ / ˜ n = 0.00159 = 0.03982. The proposed semiparametric Wald statistic for testing the null hypothesis H0 : AUC = 0.5 versus the alternative hypothesis H1 : AUC = 0.5 is equal √ to W˜ n = n(A UC − 0.5)/˜ = (0.88923 − 0.5)/0.03982 = 9.77580 or W˜ n2 = 95.56631 with P -value < 0.00001. Consequently, there is strong evidence that the clinical score system is capable of discriminating between diseased and nondiseased subjects.

Acknowledgements The author wishes to thank two referees for providing several important references and a number of constructive and helpful comments and suggestions that have greatly improved my original submission. This research was supported in part by a grant from the National Security Agency.

Appendix A. A.1. Proofs Proof of Theorem 1. According to Example 6.1.2 of Lehmann (1999, p. 374), we have √ d n(A UC − AUC) −→ N 0, 2 (F, G) , where 20 (F, G) =

1+ 2 01 + (1 + )210

(A.1)

with 201 = Cov [I (X1 < Y1 ) , I (X2 < Y1 )] = P (X1 < Y1 , X2 < Y1 ) − [P (X1 < Y1 )]2 , 210 = Cov [I (X1 < Y1 ) , I (X1 < Y2 )] = P (X1 < Y1 , X1 < Y2 ) − [P (X1 < Y1 )]2 . (A.2) √ Furthermore, it follows from Theorem 2 of Qin and Zhang (2003) that n A UC − AUC d −→ N 0, 2 0 , 0 , G , where 1+ ∞ ∞ 2 0 , 0 , G = [F (s ∧ t) − F (s)F (t)] dG(s) dG(t) −∞ −∞ ∞ ∞ + (1 + ) [G(s ∧ t) − G(s)G(t)] dF (s) dF (t) −∞ −∞ ∞ ∞ − J (s, t) dG(s) dG(t). (A.3) −∞ −∞

1872

B. Zhang / Computational Statistics & Data Analysis 50 (2006) 1855 – 1876

The reason the distribution function F does not occur as an argument in the notation of 2 0 , 0 , G is because under model (1.2), F itself depends on , , and G. Since P (X1

∞ −∞

P (X1

P (X1 < Y1 , X1 < Y2 ) = =

=

−∞

G(y) dF (y) =

∞ −∞

[1−F (x)] dG(x),

∞

−∞ ∞ −∞

P (X1 < Y1 , X2 < Y1 ) =

∞

P (X1 < x, X1 < y) dF (x) dF (y) G(x ∧ y) dF (x) dF (y),

∞

−∞ ∞ −∞

P (x < Y1 , y < Y1 ) dG(x) dG(y) [1−F (y)−F (x)+F (x ∧ y)] dG(x) dG(y),

(A.4)

it follows from (A.2) and (A.4) that 201 = P (X1 < Y1 , X2 < Y1 ) − [P (X1 < Y1 )]2 ∞ = [1 − F (y) − F (x) + F (x ∧ y)] dG(x) dG(y) −∞ ∞ ∞ [1 − F (x)] dG(x) [1 − F (y)] dG(y) − −∞ ∞ −∞ = [F (x ∧ y) − F (x)F (y)] dG(x) dG(y), −∞

210 = P (X1 < Y1 , X1 < Y2 ) − [P (X1 < Y1 )]2 ∞ ∞ ∞ = G(x ∧ y) dF (x) dF (y) − G(x) dF (x) G(y) dF (y) −∞ −∞ −∞ ∞ = [G(x ∧ y) − G(x)G(y)] dF (x) dF (y). (A.5) −∞

Combining (A.1), (A.3), and (A.5) yields ∞ ∞ 1+ 2 2 0 , 0 , G = J (s, t) dG(s) dG(t) 01 + (1 + )210 − −∞ −∞ ∞ ∞ = 20 (F, G) − J (s, t) dG(s) dG(t). −∞ −∞

(A.6)

According to Theorem 2.1 of Zhang (2000), 2 J (s, t)=(1+) 1 + exp 0 + 0 r(s)

0 (t) is the asymptotic 1 + exp 0 + 0 r(t) A0 (s ∧ t) − A0 (s), A1 (s) A−1 A A1 (t)

√ ˜ ˆ covariance function of the process Zn (t) = n 1 + exp 0 + 0 r(t) G(t) − G(t) . Consequently, it follows from the Continuous Mapping Theorem (Billingsley, 1968, p. 30)

B. Zhang / Computational Statistics & Data Analysis 50 (2006) 1855 – 1876

1873

that as n → ∞,

1 0

Zn G−1 (s) ds → N 0, 2Z 0 , 0 , G

in distribution, where 0 2Z 0 , 0 , G =

=

1 1

0 0 ∞

2

2 J G−1 (s), G−1 (t) ds dt ∞ J (s, t) dG(s) dG(t).

−∞ −∞

This, along with (A.6), implies that 2 0 , 0 , G 20 (F, G), thus establishing (2.3). The proof is complete. Proof of Theorem 2. It can be shown as in the proof of Theorem 1 of Qin and Zhang (2003) that under model (3.1) with (, , ) = 0 , 0 , n , one can write ˜ R(s) − R(s) = Qn G−1 (1 − s) + op n−1/2 ,

s ∈ [1 − b, 1 − a],

(A.7)

where Qn (t) = − U1 (t) − exp + r(t) U2 (t) − −1 U3 (t) with U1 (t) = Fˆ (t) − F (t), U2 (t) = H1 (t) − H2 (t) − G(t), ˆ − H2 (t), U3 (t) = H1 (t) − G(t) and n 1 I (Ti t) , n0 1 + exp 0 + 0 r (Ti ) i=1 ⎞ ⎛ j 0 , 0 ⎜ ⎟ 1+ j ⎟ . H2 (t) = A0 (t), A1 (t) A−1 ⎜ ⎝ j 0 , 0 ⎠ n j

H1 (t) =

Furthermore, it can be shown after some algebra that under model (3.1) with (, , ) = 0 , 0 , n , we have E [U1 (t)] = E Fˆ (t) − F (t) = I (y t) h y, n − h y, 0 dF (y) t 1 1 = √ s y, 0 dF (y) + o n−1/2 = √ B(t)[1 + o(1)], n n −∞

1874

B. Zhang / Computational Statistics & Data Analysis 50 (2006) 1855 – 1876

E

E

√ √ √ √ ˆ nU2 (t) = E nU3 (t) = E n H1 (t) − G(t) − E nH2 (t) t exp 0 + 0 r(x) s x, 0 dG(x) = −∞ 1 + exp 0 + 0 r(x) V0 + o(1) − (A0 (t), A1 (t)) A−1 V1 V0 = V0 (t) − (A0 (t), A1 (t)) A−1 + o(1) V1 = m(t) + o(1), √

√ nQn (t) = − nE U1 (t) − exp 0 + 0 r(t) U2 (t) − −1 U3 (t)

= − B(t) − exp 0 + 0 r(t) m(t) − −1 m(t) + o(1)

= exp 0 + 0 r(t) + −1 m(t) − B(t) + o(1) ' &

V0 = 1+ exp 0 +0 r(t) V0 (t)− (A0 (t), A1 (t)) A−1 V1 − B (t) + o(1) = (t) + o(1).

(A.8)

Moreover, it can be shown asin the proof of Theorem 1 of Qin and Zhang (2003) that under model (3.1) with (, , ) = 0 , 0 , n as n → ∞, we have Cov

√ √ nQn (s), nQn (t) = K(s, t) + o(1),

(A.9)

where K(s, t) =

1+ [F (s ∧ t) − F (s)F (t)] + (1 + ) exp 0 + 0 r(s) exp 0 + 0 r(t) [G(s ∧ t) − G(s)G(t)]

1+ − 1 + exp 0 + 0 r(s) 1 + exp 0 + 0 r(t) −1 A0 (t) × A0 (s ∧ t) − A0 (s), A1 (s) A . A1 (t)

(A.8) and (A.9), along with the central limit theorem for sample √ means and the Cramer–Wold device, implies that the ﬁnite-dimensional distributions of nQn converge weakly to those of W under model (3.1) with (, , ) = 0 , 0 , n , where W is a Gaussian process with mean function E[W (t)] = (t) and covariance function E[W (s)W (t)] = K(s, t). By employing the tightness criteria in Billingsley (1968, Chapter 3), we can show that the process

√ D √ nQ (t), −∞ t ∞ is tight in D[−∞, ∞]. As a result, we have nQn G−1 −→ W −1 n G in D[1 − b, 1 − a] by (A.7). It now follows from the Continuous Mapping Theorem

B. Zhang / Computational Statistics & Data Analysis 50 (2006) 1855 – 1876

1875

(Billingsley, 1968, p. 30) that √ n A UC − AUC0 =

1 d ˜ n R(s) − R(s) ds −→ W G−1 (1 − s) ds 0 0 ∼N , 2 0 , 0 , G , 1√

d that W˜ n −→ N / 0 , 0 , G , 1 where 2 0 , 0 , G is given by (A.3). This implies under model (3.1) with (, , ) = 0 , 0 , n as n → ∞. Since it can be shown that ˜ G ˜ is a consistent estimator of 2 0 , 0 , G under 0 , 0 , n as n → ∞, ˜ 2 = 2 ˜ , , it follows from Slutsky’s theorem and the well-known results on the distribution of quadratic 2 2 ˜ ˜ d 2 2 ˜2 forms of normal random variables that Wn = n AUC − AUC0 / ˜ , , G −→ 1 under model (3.1) with (, , ) = 0 , 0 , n as n → ∞, thus establishing Theorem 2. The proof is completed.

References Begg, C.B., 1991. Advances in statistical methodology for diagnostic medicine in the 1980’s. Statist. Med. 10, 1887–1895. Billingsley, P., 1968. Convergence of Probability Measures. Wiley, New York. Dorfman, D.D., Berbaum, K.S., Metz, C.E., 1992. Receiver operating characteristic rating analysis: generalization to the population of readers and cases with the jack-knife method. Invest. Radiol. 27, 723–731. Efron, B., Tibshirani, R., 1993. An Introduction to the Bootstrap. Chapman & Hall, New York. Goddard, M.J., Hinberg, I., 1990. Receiver operating characteristic (ROC) curves and non-normal data: an empirical study. Statist. Med. 9, 325–337. Hsieh, F., Turnbull, B.W., 1996. Nonparametric estimation of the receiver operating characteristic curve. Ann. Statist. 24, 25–40. Jiang, Y., Metz, C.E., Nishikawa, R.M., 1996. A receiver operating characteristic partial area index for highly sensitive diagnostic tests. Radiology 201, 745–750. Kay, R., Little, S., 1987. Transformations of the explanatory variables in the logistic regression model for binary data. Biometrika 74, 495–501. Lehmann, E.L., 1999. Elements of Large-Sample Theory. Springer, New York. Li, G., Tiwari, R.C., Wells, M.T., 1999. Semiparametric inference for a quantile comparison function with applications to receiver operating characteristic curves. Biometrika 86, 487–502. Lloyd, C.J., 1998. Using smoothed receiver operating characteristic curves to summarize and compare diagnostic systems. J. Am. Statist. Assoc. 93, 1356–1364. Lloyd, C.J., 2002a. Semi-parametric estimation of ROC curves based on binomial regression modelling. Aust. New Zeal. J. Statist. 44, 193–204. Lloyd, C.J., 2002b. Estimation of a convex ROC curve. Statist. Prob. Lett. 59, 99–111. McClish, D.K., 1990. Determining a range of false-positive rates for which ROC curves differ. Med. Decis. Making 10, 283–287. Metz, C.E., 1986. Statistical analysis of ROC data in evaluating diagnostic performance. In: Herbert, D., Myers, R. (Eds.), Multiple Regression Analysis: Applications in the Health Sciences. American Institute of Physics, New York, pp. 365–384. Metz, C.E., Kronman, H.B., 1980. Statistical signiﬁcance tests for binormal ROC curves. J. Math. Psychol. 22, 218–243. Metz, C.E., Wang, P., Kronman, H.B., 1984. A new approach for testing the signiﬁcance of differences between ROC curves measured from correlated data. In: Deconinck, F. (Ed.), Information Processing in Medical Imaging. Nijhoff, The Hague, The Netherlands, pp. 432–445.

1876

B. Zhang / Computational Statistics & Data Analysis 50 (2006) 1855 – 1876

Metz, C.E., Herman, B.A., Roe, C.A., 1998. Statistical comparison of two ROC curve estimates obtained from partially-paired datasets. Med. Decis. Making 18, 110–121. Metz, C.E., Herman, B.A., Shen, J.H., 1998. Maximum likelihood estimation of receiver operating characteristic (ROC) curves from continuously distributed data. Statist. Med. 17, 1033–1053. Pepe, M.S., 2000a. Receiver operating characteristic methodology. J. Am. Statist. Assoc. 95, 308–311. Pepe, M.S., 2000b. An interpretation for ROC curve and inference using GLM procedures. Biometrics 56, 352–359. Pepe, M.S., 2003. The Statistical Evaluation of Medical Tests for Classiﬁcation and Prediction. Oxford University Press, New York. Qin, J., Zhang, B., 1997.A goodness-of-ﬁt test for logistic regression models based on case-control data. Biometrika 84, 609–618. Qin, J., Zhang, B., 2003. Using logistic regression procedures for estimating receiver operating characteristic curves. Biometrika 90, 585–596. Toledano, A.Y., Gatsonis, C., 1996. Ordinal regression methodology for ROC curves derived from correlated data. Statist. Med. 15, 1807–1826. Venkatraman, E.S., Begg, C.B., 1996. A distribution free procedure for comparing receiver operating characteristic curves from a paired experiment. Biometrika 83, 835–848. Wieand, S., Gail, M.H., James, B.R., James, K.L., 1989. A family of nonparametric statistics for comparing diagnostic markers with paired or unpaired data. Biometrika 76, 585–592. Zhang, B., 2000. Quantile estimation under a two-sample semi-parametric model. Bernoulli 6, 491–511. Zhou, X.H., Gatsonis, C.A., 1996. A simple method for comparing correlated ROC curves using incomplete data. Statist. Med. 15, 1687–1693. Zhou, X.H., McClish, D.K., Obuchowski, N.A., 2002. Statistical Methods in Diagnostic Medicine. Wiley, New York. Zou, K.H., Hall, W.J., 2000. Two transformation models for estimating an ROC curve derived from continuous data. J. Appl. Statist. 27, 621–631.