Estimating the covariance matrix of bivariate medians

Estimating the covariance matrix of bivariate medians

Statistics & Probability North-Holland Letters 12 (1991) 305-309 October 1991 Estimating the covariance matrix of bivariate medians J.S. Maritz In...

308KB Sizes 3 Downloads 48 Views

Statistics & Probability North-Holland

Letters 12 (1991) 305-309

October

1991

Estimating the covariance matrix of bivariate medians J.S. Maritz Institute for Biostatisiics,

Medical Research Council, Tygerberg, South Africa 7505

Received May 1990 Revised December 1990

Abstract: motivated

Methods of estimating the covariance matrix of the marginal medians in a bivariate sample are given. The study was by the analysis of a data set which involved the comparison of four bivariate medians. Brief details of the application are

given. Keywords:

Bivariate medians, bootstrap.

1. Introduction We consider two random variables X and Y whose joint continuous distribution functions is F(x, y). The marginal distribution functions and densities are Fx(x), Fy( y), &(x), fv( y). The median of X is 5, defined by

and the median of Y is 9, defined similarly. Suppose that we observe n independent realizations (X,, y), i=l,2,..., n, of (X, Y). We estimate 5 and q by the sample medians 2 and I? The question of estimating the variance of a sample median has been studied in some depth; see for example Maritz and Jarrett (1978). A new element here is estimating the covariance of the two sample medians 2 and 9. There are at least three approaches that one might consider. They are: (a) Obtaining a formula for the covariance in terms of the joint distribution function of X and Y, and then replacing F(x, y) by its empirical version. This is equivalent to the method described by Maritz and Jarrett (1978). (b) Making use of the large sample approximate formula for the covariance of 2 and I? (c) The resampling bootstrap method. We shall give some details of these methods, and illustrate their application by considering a data set requiring the estimation of four bivariate medians.

2. The exact covariance of the medians in terms of F(x, y) David

(1981, p.23

0167-7152/91/$03.50

gives a formula

for the joint

0 1991 - Elsevier Science Publishers

distribution

function

B.V. All rights reserved

of bivariate

order

statistics.

Let 305

Volume

12, Number

4

STATISTICS

& PROBABILITY

LETTERS

October

1991

Then

where t = max(O,

i +j - n),

2.4= min(i,

j).

By substituting F,, the empirical joint distribution function, for F, as is done in the univariate case, an estimate of the joint distribution of the two sample medians can be obtained. Using this estimated distribution it is then possible to estimate the desired covariance. This is essentially a bootstrap procedure, but owing to the availability of formula (1) it can be implemented without actual resampling. However, tabulation of weights does not seem practical since the weights will depend on the realized configuration of the order statistics. We shall illustrate this in Example 1 below. Calculation of the estimated distribution function g(x, y) is simpler for odd n than for even n. It is convenient to consider the two cases separately. Odd sample size: n = 2m + 1. The estimated joint distribution of X and f is discrete with positive probability mass at a subset of all points (x,, y,), i, j = 1, 2,. . . , n. Let (x, y) be one of these points, and write x + for x + E, etc., where E is small. Then, to estimate H(x - , y - ) we need F,(x - , y - ), F,(x - , + co), F,( + co, y - ). These are obtained simply by counting the numbers of observations in the appropriate quadrants. Substituting in (1) gives the estimated H(x - , y - ), i.e. fi(x - , y - ). This substitution constitutes the bulk of the computation, and is easily programmed. Four such calculations are required to obtain the estimate of the probability mass at (x, y) which is given by S(x+,y+)-fi(x-,y+)-A(x+,y-)+ti(x-,y-). The estimate of the product moment E(XP) can be written as Q, + Q2, where Q, is a weighted sum of products of the realized observation pairs and Q2 is a weighted sum of products of certain pairings of the X and Y order statistics. Both Q, and Q2 generally depend on the sample configuration, making tabulation of weights impractical. Even sample size; n = 2m. When

n is even X is conventionally

defined

as

X= $( XCVI,+ XCm+l)), and P is defined

similarly.

Cov(X(+

Therefore

4 Cov( 2, P) is given by

Y&,) + Cov(X(,+1)3

Y&,> + Cov(Xc+

Every covariance in the above expression can be estimated in the case n = 2m + 1. The variances Cov(X+?+n, $l+1,) Maritz and Jarrett (1978). 306

Y&+1)) + Cov([email protected]+I), %l+l,>. by the same method as that used to estimate of _? and ? can be estimated as described in

STATISTICS & PROBABILITY

Volume 12, Number 4

LETTERS

October 1991

Table 1 x Y 3125P

1 3 181

2 1 181

2 3 420

2 2 210

3 2 210

3 4 601

3 3 330

4 2 210

4 3 210

4 4 210

4 5 181

5 2 181

Table 2 x Y 3125P

1 2 181

2 3 601

2 2 210

Example 1. (a) Let n = 5 observations (I, 3), The points this case

(2, I),

of positive

(3,4),

mass of k(x,

3 2 420

3 1 181

3 3 540

5 5 181

4 4 811

(x,, y,) be (4,5),

(532).

y) are listed in Table

312X?, = 18lx,,,Y,,,

+ 18lx,,,y,,,

+ 6OIx,,,y,,

3125&z = 2lOx,,,y,z,

+ 420xCzjyCjj + 2l($,,y~,,

1 where the probability

+ I8lx,,,y~,,

+ 18lx,,,y,,,

+ 330x,,,y,,,

+ 210++,y,z,

mass is denoted

P. In

and + 2lox,,,y,q

+ 21OX(,,[email protected], . (b) Let n = 5 observations (1,2), Then the positive

(2,3), mass points

(xi, y,) be (3, l>>

(4>4)>

(5,5).

of Z? are given in Table 2. In this case

3125Qi = 18lx,,,~,,,

+ ~O~X,,,Y,,, + 18lx,,,y,,,

+

3125Q,= ~I(+,Yc,,

+ 42Ox,,,y~,, + 54Ox,,,y,,,.

81$4,~,4,

+ 18$5,~,5,

and

It will be noted that Q, and Q2 in cases (a) and (b) involve different different weights. Also, in both (a) and (b) the marginal estimated have masses &(181,

811,1411,

cross-products distributions

of order statistics and of the sample medians

811,181).

These are exactly the weights given in Maritz

and Jarrett

(1978).

3. Large sample approximate covariance of the medians For

large

n we

can use the following

approximate

l/f,'G> woo-w(fxwfYh)) where II OO= P( X G 6, Y G q); see for example,

covariance

matrix

of the two sample

P&Mrw(fx(5)fY(~)) vf:w Maritz

medians:

(2)

(1981, p. 224). 307

Volume 12, Number 4

STATISTICS & PROBABILITY

LETTERS

October 1991

To estimate the elements of this covariance matrix we need estimates of IIoo and the densities &(E) and fr(q). In a continuous bivariate distribution P( X < 5, Y < TJ) = P( X > 5, Y > TJ) and P( X < 6, Y > TJ)=P(X>& Y
4. An application Four groups of mice were studied in a 2 X 2 factorial experiment involving factor D (levels: 0 = no drug, 1 = drug) and R (levels: 0 = no irradiation, 1 = irradiation). On each mouse measurements of the size of an induced tumour were made at regular intervals. Thus there was a sequence of (size = y, time = f) observations for each mouse. For each mouse a regression curve ‘y = b, + b,t + b,t2’ was fitted by the method of least squares. Since the main concern was with the effect of treatment on tumour growth rate the coefficients of interest were b, and b,. Hence the data for each mouse were reduced to two numbers b, and b,. In the notation of previous sections X= b, and Y = 6,. There were 12 mice in each of the four groups corresponding to the four factor combinations. Table 3 shows the (b,, b2) values for the group D = 0, R = 0. The data in Table 3, and those for the other three groups indicate that the distributions, especially of the b, values, are rather long tailed. Therefore the use of robust measures of location, such as medians, seemed desirable for comparing the groups. Table 4 gives the observed medians and the estimated exact covariance matrices of the sample medians for the four groups. To illustrate the use of these results consider testing the main effect of the factor R. A measure of main effect R is given by the vector contrast b =

10.62 + 6.42 - 10.18 - 9.86 -12.40-7.03-0.9%1.20)=(

1231p508).

The estimated covariance matrix of this contrast is obtained by summing the elements of the four matrices

Table 3 bl 12.24 6.49 16.88 IO.67

308

b,

b,

b,

b,

b,

- 2.32 89.96 - 4.24 14.03

11.47 17.38 - 3.03 10.95

- 9.23 - 22.56 49.43 0.34

4.60 0.07 7.52 9.68

29.40 25.21 1.56 - 1.59

Volume

12, Number

0

0

0 1 1

1 0 1

STATISTICS

4

10.18 9.86 10.62 6.42

& PROBABILITY

0.95 1.20 - 12.40 - 7.03

3.328 1.582 0.843 2.362

LETTERS

October

80.434 12.350 18.910 66.679

1991

- 12.379 - 2.233 - 2.651 - 12.122

in Table 4, giving [email protected]=

8.115 ! 29.384

- 29.384 178.374 1 .

These results can be used to calculate a statistic Q = A/i%% whose distribution is approximately 15.74.

for testing the null hypothesis xs under the null hypothesis.

of ‘no main effect R’. It is The observed value of Q is

References David, H.A. (1981), Order Statistics (Wiley, New York, 2nd ed.). Maritz, J.S. (1981), Distribution-Free Statistical Methods (Chapman and Hall, London). Maritz, J.S. and R.G. Jarrett (1978), A note on estimating the

variance of the sample median, J. Amer. Statist. Assoc. 73, 194-196. Silverman, B.W. (1986), Density Estimation for Statistics and Data Analysis (Chapman and Hall, London).

309