- Email: [email protected]

Contents lists available at ScienceDirect

Journal of Statistical Planning and Inference journal homepage: w w w . e l s e v i e r . c o m / l o c a t e / j s p i

Rank covariance matrix estimation of a partially known covariance matrix Kristi Kuljusa,∗ , Dietrich von Rosenb a Department b Department

of Mathematics, Uppsala University, P.O. Box 480, 751 06 Uppsala, Sweden of Biometry and Engineering, Swedish University of Agricultural Sciences, P.O. Box 7032, 750 07 Uppsala, Sweden

ARTICLE

INFO

Article history: Received 10 February 2006 Accepted 21 November 2007 Available online 24 March 2008 Keywords: Multivariate ranks Rank covariance matrix Marginal rank covariance matrix Elliptical distributions Affine equivariance

ABSTRACT

Classical multivariate methods are often based on the sample covariance matrix, which is very sensitive to outlying observations. One alternative to the covariance matrix is the affine equivariant rank covariance matrix (RCM) that has been studied in Visuri et al. [2003. Affine equivariant multivariate rank methods. J. Statist. Plann. Inference 114, 161--185]. In this article we assume that the covariance matrix is partially known and study how to estimate the corresponding RCM. We use the properties that the RCM is affine equivariant and that the RCM is proportional to the inverse of the regular covariance matrix, and hence reduce the problem of estimating the original RCM to estimating marginal rank covariance matrices. This is a great computational advantage when the dimension of the original data vector is large. © 2008 Elsevier B.V. All rights reserved.

1. Introduction In classical multivariate analysis it is assumed that the underlying distribution is multivariate normal. In this case the sample mean and the sample covariance matrix are sufficient statistics and UMVU estimators and standard multivariate methods are based on the sample covariance matrix. However, the sample covariance matrix is very sensitive to outlying observations, which leads to searching for more robust methods. Besides, the assumption about multivariate normal distribution is restrictive, because it does not allow the data to come from a distribution with lighter or heavier tails. Simple simulation studies show how sensitive the sample covariance matrix is to observations from the tails, when we have data from a multivariate distribution with heavier tails, e.g., a t-distribution. This affirms again that alternatives to the standard sample covariance matrix are needed. In this article we assume that the underlying distribution belongs to the class of elliptical distributions and thus we have a semiparametric model. The class of elliptical distributions is an extension of the normal distribution. This class includes distributions with both lighter and heavier tails than the normal distribution. For definition and properties of elliptical distributions, see Fang and Zhang (1990) and Fang et al. (1990). In the univariate case robust statistics are often based on ordering the data and defined via rank statistics. In the multivariate case the concept of ordering becomes much more complicated. Since there is no obvious method for complete ordering, different restricted ordering or subordering principles are used. For the issue of multivariate ordering principles, see the classical paper by Barnett (1976). Since there is no unique way of ordering multivariate data, the definition of rank can also be extended to the multivariate case in several ways. Puri and Sen (1985) give the simplest definition: they define multivariate ranks through the usual univariate ranks for each marginal sample, i.e., via coordinatewise ranks. In this article we consider the affine equivariant ranks based on the Oja (1983) median. For the definition of these multivariate sample ranks R1 , . . . , Rn and the corresponding

∗

Corresponding author. Tel.: +46 18 471 33 89; fax: +46 18 471 32 01. E-mail address: [email protected] (K. Kuljus).

0378-3758/$ - see front matter © 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.jspi.2007.11.015

3668

K. Kuljus, D. von Rosen / Journal of Statistical Planning and Inference 138 (2008) 3667 -- 3673

population rank function R(x), x ∼ F, where F denotes the distribution function of x, see Visuri et al. (2003). The sample and population rank covariance matrices are then defined as ˆ = ave {R R }, D i i i

D = E[R(x)R (x)],

respectively. The rank covariance matrix (RCM) based on these ranks has two nice properties: it is proportional to the inverse of the regular covariance matrix and it is equivariant with respect to affine transformations (Visuri et al., 2003). It follows that the RCM can be used in many applications instead of the regular covariance matrix. Besides, the RCM is more robust than the usual covariance matrix (Visuri et al., 2003; Ollila et al., 2004). Visuri et al. (2003) study the use of the RCM in classical multivariate analysis methods, e.g., multivariate analysis of variance, principal component analysis, multivariate regression analysis and canonical correlation analysis. However, Visuri et al. (2003) do not make any assumptions about the structure of the underlying covariance matrix. In this paper we assume that we have some zero blocks in the covariance matrix. Example. Consider three groups of variables. In the first group we have variables that describe productivity of grain crops. The second group consists of climatic factors and the third group contains variables that have to do with fertilization (entering of mineral and organic fertilizers). Climatic factors and fertilization variables do not affect each other but both groups are correlated with productivity of crops. Therefore, we are going to study the problem where we partition a random vector into three subvectors and where the covariance matrix of any two of the three subvectors is a null-matrix, i.e., two subvectors are uncorrelated. We are interested in estimating the corresponding RCM in this case. Zero blocks in the covariance matrix mean a restriction to the parameter space. Knowing that some of the parameters are equal to zero gives us additional information about the parameter space and we should use this information in estimating the rest of the parameters. In Section 2 we study how zero blocks in a covariance matrix affect the corresponding RCM. To estimate this ''structured'' RCM, we will use the affine equivariance property of the RCM and find such a linear transformation that the RCM of the transformed data will become block-diagonal. Section 3 deals with how to estimate a block-diagonal RCM using the corresponding marginal vectors and marginal rank covariance matrices. In Section 4 we discuss the problem of estimating the proportionality constant that connects an RCM and the inverse of the corresponding covariance matrix. 2. RCM corresponding to a partially known covariance matrix Suppose we have a random vector y = (y1 , y2 , y3 ) from a family of elliptical distributions consisting of three subvectors y1 , y2 and y3 . Let the mean and covariance matrix of this random vector be ⎞ ⎛

11 12 13 μ = (μ1 , μ2 , μ3 ) and = ⎝ 21 22 23 ⎠ , 31 32 33

(1)

respectively, where is positive definite and symmetric. Suppose that 23 = 0 (and 32 = 0), i.e., the subvectors y2 and y3 are uncorrelated. We denote the corresponding rank covariance matrix by ⎞ ⎛ D11 D12 D13 ⎝ D = D21 D22 D23 ⎠ . D31 D32 D33 The class of elliptical distributions belongs to location-scale families of distributions, which is obtained from reflection and permutation invariant random vectors via affine transformations (see Visuri et al., 2003). For a distribution from a location-scale family it holds that the RCM is proportional to the inverse of the covariance matrix (Visuri et al., 2003): D = ||−1 .

(2)

The constant depends not only on the type of distribution, it also depends on the dimension of the distribution. The expressions for calculating this constant in the case of the multivariate normal distribution and t-distribution are given in Ollila et al. (2004). For example, for the normal distribution the constant equals 0.356, 1.920 and 33.958 for dimensions k = 2, 4 and 6, respectively. Using Eq. (2) and the assumption that 23 = 0, we obtain the following result with the help of the formula for the inverse matrix of a partitioned matrix. Theorem 2.1. Suppose y = (y1 , y2 , y3 ) is elliptically distributed with mean μ and covariance matrix . Suppose 23 = 0. Then the following condition holds for the RCM of y: −1 D13 . D23 = D21 D11

K. Kuljus, D. von Rosen / Journal of Statistical Planning and Inference 138 (2008) 3667 -- 3673

3669

−1 −1 D12 ). This relationship is D13 (and D32 = D31 D11 So our problem becomes how to estimate D if we know that D23 = D21 D11 additional information about the RCM and should be taken into account when estimating D. To solve this problem, we will use the affine equivariance property of the RCM. The affine equivariance property means that if D∗ is the RCM of the transformed random vector

y ∗ = Ay + b, where A is nonsingular, then D∗ = |A|2 (A −1 ) DA −1 . We begin by transforming the random vector y. Let the transformation matrix be ⎛ ⎞ −1 −1 I D11 D12 D11 D13 ⎝ ⎠. I 0 A= 0 0 0 I Then |A| = 1 and ⎛ −1 D12 I −D11 −1 I A = ⎝0 0 0

(3)

⎞ −1 −D11 D13 ⎠. 0 I

We choose a transformation matrix of this form because then the RCM of the transformed data will become block-diagonal. When an RCM is block-diagonal, then the subranks of the corresponding components are uncorrelated. The RCM of the transformed vector is ⎞ ⎛D 0 0 11

D∗ = (A −1 ) DA −1 = ⎝ 0

−1 D12 D22 − D21 D11

0

0

0

⎠,

−1 D33 − D31 D11 D13

∗ = 0, because D = D D−1 D . Thus, we have proved the following result. since |A|2 = 1. Here D23 23 21 11 13

Theorem 2.2. Let y be elliptically distributed with a mean and covariance matrix given in (1). Let A be defined as in (3). Then the RCM of the transformed vector −1 −1 Ay = [(y1 + D11 D12 y2 + D11 D13 y3 ) , y2 , y3 ]

equals ⎛D

11

D∗ = ⎝ 0

0 −1 D12 D22 − D21 D11

0

0

0 0

⎞ ⎠.

(4)

−1 D33 − D31 D11 D13

It is interesting to note that the covariance matrix of the transformed vector is equal to ⎞ ⎛ −1 −1 11 + D11 D12 21 + D11 D13 31 0 0 ∗ 22 0 ⎠. 0 = A A = ⎝ 0 0 33

(5)

Because of the relationship (2), we can thus write D∗ = |∗ |(∗ )−1

(6)

for some constant that depends on the distribution and dimension of y. 3. Estimation of a block-diagonal RCM In this section we study the problem of estimating a block-diagonal RCM. If a random vector (y1 , y2 , y3 ) has a multivariate normal distribution and its covariance matrix is block-diagonal, then we know that the components y1 , y2 , y3 are independent. In this case marginal sample vectors are used to estimate 11 , 22 and 33 , respectively. For other elliptical distributions a block-diagonal covariance matrix does not imply independence of the components. Suppose anyway that we want to use −1 −1 D12 and D33 − D31 D11 D13 in the transformed RCM. The marginal sample vectors to estimate the blocks D11 , D22 − D21 D11 circumstance that we can use the marginal sample vectors and the corresponding marginal rank covariance matrices in the

3670

K. Kuljus, D. von Rosen / Journal of Statistical Planning and Inference 138 (2008) 3667 -- 3673

estimation process, is very important. To compute p-dimensional sample ranks based on the Oja median, one needs to calculate a number of determinants of (p + 1) × (p + 1) matrices. This means that computation of sample RCMs for high dimensions is computer-intensive and it is always an advantage to reduce the dimension of data vectors when working with rank covariance matrices. Via Theorem 2.2 we have reduced our problem to estimating the block-diagonal RCM D∗ in (4). When we have estimated the −1 −1 −1 −1 D12 and D33 − D31 D11 D13 of D∗ , and D11 D12 and D11 D13 , we can estimate all the blocks in diagonal blocks D11 , D22 − D21 D11 the original RCM D. From (6) it is easy to see how the diagonal blocks of D∗ and ∗ are related: −1 −1 −1 −1 D11 = |22 | |33 | |11 + D11 D12 21 + D11 D13 31 |(11 + D11 D12 21 + D11 D13 31 )−1 , −1 −1 −1 D13 31 | |33 | |22 |−1 D22 − D21 D11 D12 21 + D11 D12 = |11 + D11 22 , −1 −1 −1 D33 − D31 D11 D12 21 + D11 D13 31 | |22 | |33 |−1 D13 = |11 + D11 33 .

(7)

−1 −1 D12 y2 + D11 D13 y3 , y2 and y3 be vectors of dimensions p1 , p2 and p3 , respectively. Theorem 3.1. Let the marginal vectors y1 + D11 Denote the population rank covariance matrices of these random vectors by M1 , M2 and M3 , correspondingly. Then the following equations hold: −1 −1 −1 −1 M1 = 1 |11 + D11 D13 31 |(11 + D11 D12 21 + D11 D13 31 )−1 , D12 21 + D11

M2 = 2 |22 |−1 22 , M3 = 3 |33 |−1 33 .

(8)

Proof. Since elliptical distributions are invariant under the group of affine transformations (Fang et al., 1990), it follows that Ay is elliptical with the covariance matrix given in (5). It is also known that a marginal vector of an elliptically distributed vector is elliptically distributed, although the marginal vector does not necessarily belong to the same family of elliptical distributions −1 −1 as the original vector (Kano, 1994). It follows that the marginal vectors y1 + D11 D12 y2 + D11 D13 y3 , y2 and y3 are elliptically

−1 −1 distributed with the covariance matrices 11 + D11 D12 21 + D11 D13 31 , 22 and 33 , respectively. The equations in (8) now follow from relation (2).

Notice that if the dimensions p1 , p2 and p3 are not the same, then the constants 1 , 2 and 3 differ. We will discuss estimation of these constants in the following section. Meanwhile, let us assume that , 1 , 2 and 3 are known. −1 −1 D12 and D33 − D31 D11 D13 through the marginal rank covariance matrices M1 , By expressing the blocks D11 , D22 − D21 D11 M2 and M3 , we obtain the following result. −1 −1 D12 and D33 − D31 D11 D13 of the block-diagonal matrix D∗ in (4) can be Theorem 3.2. The block matrices D11 , D22 − D21 D11 expressed in terms of the marginal rank covariance matrices M1 , M2 and M3 as follows:

D11 =

1

1

|M2 | p M1 p2 −1

22

|M3 | p3 −1 p ,

33

1 |M1 | |M3 | −1 p p3 −1 p , D22 − D21 D11 D12 = M p −1 2 2 1 11 33 1 |M1 | |M2 | −1 p p2 −1 p . D13 = M p −1 D33 − D31 D11 3 3 1 11 22

(9)

Proof. From the equations in (8) it follows that −1 −1 D12 21 + D11 D13 31 | = |11 + D11

|M2 |

|22 | =

p2 −1

|33 | =

p3 −1

|M1 |

p1 −1

p ,

11

p ,

22

|M3 |

p ,

33

(10)

K. Kuljus, D. von Rosen / Journal of Statistical Planning and Inference 138 (2008) 3667 -- 3673

3671

and that −1 −1 −1 −1 (11 + D11 D12 21 + D11 D13 31 )−1 = |11 + D11 D12 21 + D11 D13 31 |−1 −1 M1 , 1 −1 −1 −1 22 = 2 |22 | M2 , −1 −1 −1 33 = 3 |33 | M3 .

(11)

The equations in (9) now follow by substituting the expressions in (10) and (11) into (7). It is important to stress that Theorem 3.2 gives us a tool for presenting block-diagonal RCMs via respective marginal RCMs, which are of smaller size. −1 −1 D13 , we need to estimate the marginal rank D12 and D33 − D31 D11 From (9) it follows that to estimate D11 , D22 − D21 D11 covariance matrices M1 , M2 and M3 . There are C-programs (see Visuri et al., 2003) for calculating the sample rank vectors and the −1 −1 D13 y3 , D12 y2 + D11 sample RCM, so estimating M1 , M2 and M3 is not a problem when we have observation vectors from y1 + D11 −1 −1 −1 −1 D12 and D11 D13 . Hence, to estimate M1 , D12 y2 + D11 D13 y3 , which is a function of D11 y2 and y3 . But M1 is the RCM of y1 + D11

−1 −1 D12 and D11 D13 . Since D∗ in (4) is block-diagonal, the subranks of the components are we first need to estimate the matrices D11

−1 −1 −1 D12 and D11 D13 by solving a regression problem. Set B2 =−D11 D12 uncorrelated. This means that we can obtain estimates of D11

−1 and B3 = −D11 D13 . When we have n independent copies of the random vector y, then y1i − B2 y2i − B3 y3i , i = 1, . . . , n, are nothing −1 D12 , else but error vectors. So we have to solve a multivariate regression problem to estimate M1 . When we have estimates for D11

−1 −1 −1 D11 D13 , D11 , D22 − D21 D11 D12 and D33 − D31 D11 D13 , we can obtain the rest of the estimates. The process of estimating a structured RCM can be summarized in the following steps:

−1 −1 −1 −1 (1) estimate D11 D12 and D11 D13 in y1 + D11 D12 y2 + D11 D13 y3 by solving a multivariate regression problem; (2) then estimate the marginal rank covariance matrices M1 , M2 and M3 ; −1 ∗ ∗ = D − D D−1 D (3) thereafter estimate D11 , D22 21 11 12 and D33 = D33 − D31 D11 D13 using the formulas in (9); 22 (4) the rest of the estimates can be obtained as follows:

−1 D12 = D12 , D11 D11 −1 D13 , D13 = D11 D11 −1 ∗ +

D21 D11 D12 , D22 = D 22 −1 ∗ +

D31 D11 D13 , D33 = D 33 −1 D23 = D21 D11 D13 . Finally, we have estimated all the blocks in the original RCM D. 4. Estimating the constants 1 , 2 and 3 −1 −1 We can see from the formulas in (9) that D11 , D22 − D21 D11 D12 and D33 − D31 D11 D13 have one common proportionality constant , so we can write

D11 = K1 , −1 D22 − D21 D11 D12 = K2 , −1 D33 − D31 D11 D13 = K3 ,

where the matrices K1 , K2 and K3 are functions of M1 , M2 and M3 and 1 , 2 , 3 , see (9). We can express the blocks in the −1 −1 original RCM through , K1 , K2 , K3 , D11 D12 and D11 D13 : −1 D12 , D12 = K1 D11 −1 D13 = K1 D11 D13 , −1 −1 D22 = [K2 + (D11 D12 ) K1 (D11 D12 )], −1 −1 D33 = [K3 + (D11 D13 ) K1 (D11 D13 )], −1 −1 D23 = (D11 D12 ) K1 (D11 D13 ).

(12)

3672

K. Kuljus, D. von Rosen / Journal of Statistical Planning and Inference 138 (2008) 3667 -- 3673

It follows that in applications that require the estimate of the RCM up to a proportionality constant, it is enough to estimate K1 , K2 , −1 −1 D12 and D11 D13 , and one does not need to be concerned about estimating the constant . Principal component analysis K3 , D11 and canonical correlation analysis are two examples of applications where it is enough to estimate the RCM up to a proportionality constant. However, since K1 , K2 and K3 are functions of M1 , M2 , M3 and 1 , 2 , 3 , we still need to estimate the constants 1 , 2 and 3 which connect the marginal rank covariance matrices M1 , M2 and M3 with the corresponding covariance matrices. Estimating 1 , 2 and 3 is a difficult problem. One advantage that we have, compared to the situation of estimating , is that for the marginal covariance matrices we do not have any restrictions concerning the parameter space like we had for the covariance matrix of y, i.e., the zero blocks. We can estimate 1 , 2 and 3 using marginal sample vectors, which is also an advantage, because estimating the affine equivariant RCM is computer-intensive. The constant in D = ||−1 can be expressed through the radius r in respective spherical distributions. If a random vector x follows a spherically symmetric distribution F0 , it can be presented as x = ru, where r = x and u = x/x and r and u are independent. The population rank function is then given by RF (x) = qF (r)u 0

(13)

0

¨ onen ¨ et al. (1998). The expressions for qF (r) for the spherical multivariate normal and tfor some function qF (r), see Mott 0 0 ¨ onen ¨ distribution can be found in Mott et al. (1998) and Ollila et al. (2004). Lemma 4.1. Let x : p × 1 follow a spherical distribution F0 . Then the constant in (2) is given by

=

E{q2F (r)} 0

(Er 2 )p−1

pp−2 .

Proof. Since x = ru, Cov x = (Er 2 /p)Ip . On the one hand, from (2) it follows that DF = 0

(Er 2 )p−1 pp−1

Ip .

On the other hand, (13) gives that DF =

E{q2F (r)} 0

p

0

Ip ,

thus the result follows. We can see that is a function of the radius in the spherical distribution corresponding to our elliptical distribution. The constants 1 , 2 and 3 are accordingly functions of the radii in the respective marginal spherical distributions. One possible way to define the estimator of is to use relationship (2), which gives ˆ =

ˆ 1/p |D| ˆ |(p−1)/p |

,

where D and are p × p-matrices. Hence, the estimates of 1 , 2 and 3 can be obtained via the marginal covariance matrices and the corresponding RCMs as follows: ˆ |1/p1 |M 1 , ∗ (p1 −1)/p1 ˆ |11 | ˆ |1/p2 |M 2 ˆ2= , ˆ |(p2 −1)/p2 | 22 ˆ |1/p3 |M 3 ˆ3= , ˆ |(p3 −1)/p3 | 33 ˆ1=

−1 −1 where ∗11 = 11 + D11 D12 21 + D11 D13 31 , see (5). The drawback of these estimators is that they include estimates of covariance matrices. In this case we would have to combine the estimates of the marginal RCMs and covariance matrices for estimating the original RCM up to the proportionality constant .

5. Summary In this article we have studied the problem of estimating the affine equivariant rank covariance matrix based on the Oja median, when the corresponding covariance matrix is partially known. Specifically, we have assumed that the covariance matrix contains

K. Kuljus, D. von Rosen / Journal of Statistical Planning and Inference 138 (2008) 3667 -- 3673

3673

zero blocks, i.e., any two subvectors of the observation vector are uncorrelated. We have presented one way of estimating the rank covariance matrix, when the additional knowledge about the covariance matrix has been taken into account. An advantage of the proposed estimators is that the marginal rank covariance matrices can be used in the estimation process. When the size of the data vector is large, reducing the dimension is very useful, because calculating the rank covariance matrices is computer-intensive. One drawback of our estimators is that they include the constants , 1 , 2 and 3 which characterize the relationship between the rank covariance matrices and the respective covariance matrices. The problem of estimating these constants needs to be studied further. The properties of our estimators and their numerical behavior also need to be further studied. Acknowledgment The support of the Swedish Research Council (Contract: VR 621-2005-3186) is gratefully acknowledged. References Barnett, V., 1976. The ordering of multivariate data. J. Roy. Statist. Soc. A 139, 318--355. Fang, K.-T., Zhang, Y.-T., 1990. Generalized Multivariate Analysis. Science Press and Springer. Fang, K.-T., Kotz, S., Ng, K.W., 1990. Symmetric Multivariate and Related Distributions. Chapman & Hall, London. Kano, Y., 1994. Consistency property of elliptical probability density functions. J. Multivariate Anal. 51, 139--147. ¨ onen, ¨ Mott J., Hettmansperger, T.P., Oja, H., Tienari, J., 1998. On the efficiency of affine invariant multivariate rank tests. J. Multivariate Anal. 66, 118--132. Oja, H., 1983. Descriptive statistics for multivariate distributions. Statist. Probab. Lett. 1, 327--332. Ollila, E., Croux, C., Oja, H., 2004. Influence function and asymptotic efficiency of the affine equivariant rank covariance matrix. Statist. Sinica 14, 297--316. Puri, M.L., Sen, P.K., 1985. Nonparametric Methods in General Linear Models. Wiley, New York. ¨ onen, ¨ Visuri, S., Ollila, E., Koivunen, V., Mott J., Oja, H., 2003. Affine equivariant multivariate rank methods. J. Statist. Plann. Inference 114, 161--185.