# Estimation of the covariance matrix in multivariate partially linear models

## Estimation of the covariance matrix in multivariate partially linear models

Journal of Multivariate Analysis 123 (2014) 380–385 Contents lists available at ScienceDirect Journal of Multivariate Analysis journal homepage: www...

Journal of Multivariate Analysis 123 (2014) 380–385

Contents lists available at ScienceDirect

Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva

Note(s)

Estimation of the covariance matrix in multivariate partially linear models Marcin Przystalski The Research Center for Cultivar Testing, Słupia Wielka 63-022, Poland

article

abstract

info

Article history: Received 21 January 2013 Available online 27 September 2013 AMS 2010 subject classifications: 62H12 62G20

Multivariate partially linear models are generalizations of univariate partially linear models. In the literature, some estimators of treatment effects and nonparametric components have been proposed. In this note, the estimator of the covariance matrix in multivariate partially linear models is derived and some of its properties are given. © 2013 Elsevier Inc. All rights reserved.

Keywords: Multivariate partially linear models Estimation Covariance matrix

1. Introduction In the recent years univariate semiparametric models have been extensively studied (see e.g. [1,6,5,7,8,13] and the references therein) and they have found various practical applications, e.g., in agriculture or econometrics. These models combine the nonparametric flexibility with a parametric approach. However, in some situations, it is necessary to use a multivariate model instead of univariate models. For example, in finance, in modeling asset returns, it is now widely accepted that a multidimensional approach may lead to better results than the univariate approach. For this reason, in such case, one may consider the application of a multivariate partially linear model. Let Y = (y1 , . . . , yd ) be an n × d matrix of observations. Then the multivariate partially linear model can be written as Y = XB + F + U,

(1)

 ′   where X = x1 , . . . , xp = ζ 1 , . . . , ζ n n × p design matrix, B = β1 , . . . , βd is a p × d matrix of unknown parameters. For each r ∈ {1, . . . , d}, let fr be an unknown function, and let fr = (fr (t1 ) , . . . , fr (tn ))′ , where for each i ∈ {1, . . . , n} , ti ∈ D is known and nonrandom, with D ⊂ R being a bounded domain, and F = (f1 , . . . , fd ). Finally let U = (u1 , . . . , ud ) = (τ 1 , . . . , τ n )′ be an n × d matrix of errors. It will be assumed that where n ≥ p + d. Without loss of generality, assume that the domain D = [0, 1] and that for each r ∈ {1, . . . , d} , fr has ν ≥ 2 continuous derivatives on [0, 1]. 

Pateiro-Lopéz and Gonzaléz-Manteiga  have described estimators of B and F, which generalize the estimators from a Speckman approach  for the multivariate case, and have studied their asymptotic behavior. In this note, the estimator of the covariance matrix in multivariate partially linear models is derived and some of its properties are given.

M. Przystalski / Journal of Multivariate Analysis 123 (2014) 380–385

381

2. Notation and assumptions In this section we introduce some notation which will be used throughout the note. By vec A we will denote the vector obtained by stacking the columns of A and by Kdd a d2 × d2 commutation matrix (see e.g. ).   Let S = Sn,h ti , tj i,j , where Sn,h (·, ·) is a weight function depending on the bandwidth parameter h. For an n × q (q ≥ 1)

matrix A we write  A = (I − S) A. Let us assume, as in , that xik ’s and ti are related by the following regression model. Let n, p ∈ N and i ∈ {1, . . . , n} , k ∈ {1, . . . , p} then xik = gk (ti ) + ηik ,

(2)

′ where gk are unknown smooth functions and ηik are random variables with mean zero. Let G(t ) = g1 (t ), . . . , gp (t ) and η = (ηik )i=1,...,n;k=1,...,p . Let i, j ∈ {1, . . . , n} , k, l ∈ {1, . . . , p} and r , s ∈ {1, . . . , d}. Throughout the note we will assume that: 

(A1) The error vectors τ i are independent and normally distributed with a mean vector 0 and the matrix of variances and covariances 6 = (σrs ).   (A2) n−1 η′ η → V, where V = vij is positive definite.

(A3) tr S′ S =

n n i=1

j =1

Sij2 = O h−1 .

2 2 (A4) Sηk  = O h−1 = S′ ηk  . (A5) g˜k (ti ) = hν h1 (ti ) gkν (ti ) + o (hν ).

2 fr  = O nh2ν . (A6) ∥(I − S) fr ∥2 = 

 

 ν

(A7) n−1 η′ fr = O n−1/2 h . (A8) There is a probability density function p(t ) on [0, 1] such that for each continuous function c (t )

lim n−1

n→∞

n 

c (ti ) =

1

c (t )p(t )dt . 0

i=1

(A9) tr (S) = O h−1 .

  n   n   (A10) maxi j=1 Sij = O(1), maxj i=1 Sij  = O(1). In the proofs of main results we will need the following lemmas. Lemma 2.1 (See Lemma 1 in ). Let h → 0 and nh → ∞, when n → ∞. Then, 1 ′ XX n 1 ′ XX n

 → V, (ii)   → V. (i)

Lemma 2.2. Let the p × 1 random vectors zi be independently distributed each as Np µi , 6 for i = 1, . . . , n. Let

Z = (z1 , . . . , zn )′

and

′

M = µ1 , . . . , µn .

Consider the matrix quadratic form SA = Z′ AZ, where the n × n matrix A is a nonrandom (not necessarily symmetric) matrix. Then (see [10, p. 253]) E (SA ) = M′ AM + (tr A) 6, and the dispersion matrix (see Theorem 1 in ) D (vecSA ) = tr A′ A (6 ⊗ 6) + trA2 Kpp (6 ⊗ 6) + M′ A′ AM ⊗ 6 + 6 ⊗ M′ A′ AM

   ′ + Kpp M′ A2 M ⊗ 6 + Kpp 6 ⊗ M′ A2 M . 3. Preliminary results To derive the estimators of B and 6 in model (1) we will use the profile likelihood approach (see e.g. ). From assumption (A1), according to Definition 3.3.1 in , we have that (τ 1 , . . . , τ n ) ∼ Nd,n (0, In ⊗ 6); thus U ∼ Nn,d (0, 6 ⊗ In ). This implies that Y ∼ Nn,d (XB + F, 6 ⊗ In ). By Definition 3.3.1 in , we have that the likelihood function is given by L (B, F, 6; Y) = (2π )−

nd 2

  1 n |6|− 2 etr − 6−1 (Y − XB − F)′ (Y − XB − F) . 2

(3)

382

M. Przystalski / Journal of Multivariate Analysis 123 (2014) 380–385

For a given B, F is estimated by a linear fit and the estimator  F = S (Y − XB), where S (see e.g. ) is a smoothing matrix. Substituting this into (3) and denoting by  X = (In − S) X and  Y = (In − S) Y, the log-profile likelihood function is given by l B,  F, 6; Y =

nd

ln (2π ) +

2

n

ln |6|−1 −

2

1 2

tr 6−1 A

(4)

 ′   Y − XB . where A =  Y − XB  Proposition 3.1. Under assumption (A1), the profile likelihood estimators of B, F and 6 are given by

  −1 ′     X  B = X X Y  F = S Y − X B ′    1    6= Y − X B  Y − X B . n

Proof. Differentiating (4) with respect to B and 6−1 (see ), we obtain the following system of equations:

    ∂ l B,  F, 6; Y = 6−1 X′  Y − XB = 0 ∂B   ∂ l B,  F, 6; Y n = (26 − 2 diag 6) − (A − diag A) = 0. −1 2 ∂6 Solving the above system of equations we get the assertion.



4. Main results Theorem 4.1. Let h → 0 and nh → ∞, when n → ∞. Then, under Assumptions (A1)–(A8) we have

  E  6 =

n−p n

 −

+O n 1

h

−1

(ν)

G

     −1 + O (nh) 6 + O h2 ν

(t )F

(ν)

(t ) h1 (t )p(t )dt + o h ′

ν

−1/2

+ O h (nh) 

  ′

0

−1

1

h V

×

(ν)

G

(t )F

(ν)

(t ) h1 (t )p(t )dt + o h ′

ν

−1/2

+ O h (nh) 

0

Proof. From Proposition 3.1 we have that

′   1 ′ 1 ′ ′ 1  Y − X B  Y − X B =  Y Y−  B X X B n n n 1 1 = Y′ RY − Y′ QY, n n

 6=

−1

 where R = (In − S)′ (In − S) and Q = (In − S)′  X  X′ X X′ (In − S). Assumption (A1) implies that Y ∼ Nn,d (XB + F, 6 ⊗ In ). Thus, by Lemma 2.2 

E Y′ RY = (XB + F)′ R (XB + F) + (tr R) 6

 ′      =  XB +  F  XB +  F + tr I − S′ − S + S′ S 6. By (A3) and (A9) tr In − S′ − S + S′ S = n + O h−1 ;

thus E Y′ RY = n + O h−1



′ 6+  XB +  F  XB +  F .

 

Next, by Lemma 2.2

   ′  ′ −1 ′    X X  XB +  F . E Y′ QY = (tr Q) 6 +  XB +  F  X  X

 

.

M. Przystalski / Journal of Multivariate Analysis 123 (2014) 380–385

383

By Lemma 2.1, we obtain that

   −1   ′ −1 ′   X X X′ (In − S) (In − S)′  X′ X X (In − S) = tr  X  X tr Q = tr (In − S)′   −1    X X − X′ S X + X′ SS′ X − X′ S′ X′ X = tr  X′   ′ ′ 1  = p + tr V−1 − X . X − X′ S X + X′ SS′ X S n

Following the same arguments as in the proof of Theorem 2 in , it can be shown X = o(n); X − X′ S X + X′ SS′ − X′ S′ hence tr Q = p +

1 n

O(n) = p + O(1).

Thus, after some algebra E  6 =

 

n−p

n−p

n

=

 −1      1 ′ 1 ′ 1 ′ 1 ′   + O (nh)−1 + O n−1 6 +  F F−  F X X X X F n

n

n

n

n

  ′       1 ′ 1  + O (nh)−1 + O n−1 6 +  F F− X F Bias  B . n

n

By Assumptions (A4)–(A8), following the same arguments as in , we have that

′

1 ′  X F n

 = h

1

(ν)

G

(t )F

(ν)

(t ) h1 (t )p(t )dt + o h ′

ν

−1/2

+ O h (nh) 

  ′

.

0

′    X F Bias  B . To complete the proof, it suffices to consider the term 1 F′ F.

By Theorem 2 in  we get the estimate for

1 n

n

By the Cauchy–Schwarz inequality and assumption (A6), the rs-th element is

   1  ′   1  1 ′   fr  fs  ≤  fr   fs  = O nh2ν = O h2ν . n n n The theorem is proved.



Corollary 4.2. Under the assumptions of Theorem 4.1, under the usual bandwidth assumptions asymptotically unbiased estimator of 6.

h ∼ n−1/(2ν+1) ,  6 is

Proposition 4.3. We have that the dispersion matrix for  6 D vec  6 =

  2   1  Id2 + Kdd tr C 6 ⊗ 6 + M′ CM ⊗ 6 + 6 ⊗ M′ C2 M , 2 n 

 ′  −1 ′  where M = XB + F, C = (In − S)′ In −  X  X X X (In − S) and Kdd is a d2 × d2 commutation matrix. Proof. From Proposition 3.1 we have

 6=

1 n

′   1  Y − X B  Y − X B = Y′ CY. n

By (A1), Y ∼ Nn,d (M, 6 ⊗ In ), where M = XB + F. Thus, by Lemma 2.2, with M = XB + F, we get the assertion. Theorem 4.4. Under assumptions (A1)–(A10), we have that

  1 d  6 → Wd n − p, 6,  , n

as n → ∞, where  =

1 M′ GM n

, G = I − 1n  XV−1 X′ and M = XB + F.

Proof. We have that

 6=

1 ′ 1 Y KY + Y′ DY, n n

 ′ −1 ′  where K = I −  X  X X X and D = −S′ K − KS + S′ KS.



384

M. Przystalski / Journal of Multivariate Analysis 123 (2014) 380–385

From (A1), we have that Y ∼ Nn,d (M, 6 ⊗ In ). Because matrix K is symmetric and idempotent, by a Comment I in  it follows that 1 ′ Y KY ∼ Wd n

n − p,

1 n

6,  ,

(5)

and by Lemma 2.1 we have

 ′ −1 ′ 1  X X = In −  In −  X  X X n

 −1

1 ′  X X n

 X ′ = G.

Hence,

=

1 n

M′ KM =

1 n

M′ GM.

Because Y ∼ Nn,d (M, 6 ⊗ In ), for arbitrary a ∈ Rnd \ {0} we have that P a′ vec Y >



  √  a′ D (vec Y) a a′ 6 ⊗ In + (vec M) (vec M)′ a nε ≤ = → 0, nε 2 nε 2 p

as n → ∞ and √1n vec Y → 0, as n → ∞. Thus, by continuous mapping theorem  we obtain that 1 ′ p Y DY → 0, n as n → ∞. Finally, by (5) and (6) we get the assertion.

(6)



5. Discussion In this note, we derived the estimator of the covariance matrix in the multivariate partially linear model and we studied some of its properties. In particular, we showed that the estimator of the covariance matrix in the multivariate partially linear model has an asymptotic noncentral Wishart distribution. This fact can be used in developing new test procedures concerning the covariance matrix. All results, presented in this note, were obtained under the assumption that the error vectors are independent and identically normally distributed. In the case of partially linear models, the properties of the estimators of the unknown parameters of interests are studied under various assumptions concerning the error terms, e.g., independence , weak dependence (see e.g. [1,2]) or long memory dependence . In this note, we used similar assumptions to the ones used in . We only modified the assumption (A1) in . To the original assumption (A1) in , we added that the error vectors are normally distributed. In comparison to the assumptions used in the literature, the modified assumption seems to be rather too strong and it can be seen as a drawback. However, from the multivariate analysis perspective this assumption seems to be more natural, because the most commonly used tests in the multivariate analysis (e.g., tests concerning covariance matrix) were developed in the case when errors are normally or elliptically distributed (see e.g., [4,14]). Despite this, the results obtained in this note may serve as a tool for obtaining new testing procedures in the multivariate partially linear model, e.g., in testing validity of the model. For the univariate partially linear model, Aneiros-Pérez et al.  described procedures for testing hypotheses related to unknown treatment parameters and an unknown smooth function, respectively. Recently, Fan et al.  constructed, in partially linear errors-in-variables model, two different empirical log-likelihood ratio tests for the parameter of interest, in the case when error variances are known and unknown. Based on the obtained results, one can try to generalize the test procedures proposed by Aneiros-Pérez et al. . In particular, it would be very interesting to test hypotheses H0,F : F = F0 or H0,F : F0 = 0. This problem seems to be theoretically demanding. We suspect that to find the solution, one should consider one of the matrix norms to generalize the test statistic proposed in . However, further research is needed to solve this problem. We can also use the estimator of the covariance matrix to build test procedures which validate the assumption about the covariance matrix. In the multivariate analysis, a common problem is testing hypothesis that the covariance matrix has a specific value, i.e., H0 : 6 = 60 . For the multivariate linear model, this problem has been studied by Korin  (see also [4,14]). Using the obtained estimator of the covariance matrix, we will construct a test procedure for the hypothesis H0 and we will study its power. We plan to explore this in a future work. In recent years the functional data analysis has become an important tool to model data where the unit of observation is a curve or in general a function. For this reason, the partial linear ideas have been adapted to the situation when functional data are observed. The estimation of unknown parameters of interest and bandwidth selection in semi-functional partial linear models, under some additional assumptions, has been studied in e.g. [2,3].

M. Przystalski / Journal of Multivariate Analysis 123 (2014) 380–385

385

However, in most experiments and other scientific studies, researchers observe more than one variable. It is expected that observing more variables provides more information than observing only one, and sometimes the scientist’s interest extends to study the relation between the observed variables. For this reason, it would be interesting to adapt model (1) to the semi-functional setting and study, under some additional assumptions, the asymptotic properties of B, F or 6. To the best of our knowledge, in the literature, there is no article in which the semi-functional partially linear model has been considered in a multivariate setting. Based on such estimators one could construct, in the multivariate semi-functional partially linear model, some procedures for testing hypotheses related to B and F, in particular, a procedure for testing hypothesis H0 : LBN = 0, where Lw×p is a known matrix of full row rank, w ≤ p, and Nd×q is a known matrix of full column rank, q ≤ d. Such test would very be interesting for practitioners, e.g., for researchers working in mass spectrometry, proteomics or metabolomics. However, we suspect that, in practice, the process of preparation of multivariate functional data and/or the constructed tests would be computationally intensive and time consuming. We plan to explore those problems. Acknowledgments I am grateful to the anonymous referees for very constructive and helpful comments. References  G. Aneiros-Pérez, W. González-Maintega, Ph. Vieu, Estimation and testing in partial linear regression model under long memory dependence, Bernoulli 10 (2004) 49–78.  G. Aneiros-Pérez, Ph. Vieu, Automatic estimation procedure in partial linear model with functional data, Statist. Papers 52 (2011) 751–771.  G. Aneiros-Pérez, Ph. Vieu, Nonparametric time series prediction: a semi-functional partial linear modeling, J. Multivariate Anal. 99 (2008) 834–857.  M. Bilodeau, D. Brenner, Theory of Multivariate Statistics, Springer-Verlag, New York, 1999.  J. Fan, T. Huang, Profile likelihood inferences on semiparametric varying-coefficient partially linear models, Bernoulli 11 (2005) 1031–1057.  G.-L. Fan, H.-Y. Liang, J.-F. Wang, Empirical likelihood for heteroscedastic partially linear errors-in-variables model with α -mixing errors, Statist. Papers 54 (2013) 85–112.  W. Härdle, H. Liang, J. Gao, Partially Linear Models, Physica-Verlag, Würzburg, 2000.  W. Härdle, M. Müller, S. Sperlich, A. Werwatz, Nonparametric and Semiparametric Models, Springer-Verlag, Berlin, Heidelberg, 2004.  B.P. Korin, On distribution of statistic used for listing a covariance matrix, Biometrika 55 (1968) 171–178.  J.R. Magnus, H. Neudecker, Matrix Differential Calculus with Applications in Statistics and Econometrics, Wiley, New York, 1999.  H. Neudecker, On the dispersion matrix of matrix quadratic form connected with noncentral Wishart distribution, Linear Algebra Appl. 70 (1985) 257–262.  B. Pateiro-Lopéz, W. Gonzaléz-Manteiga, Multivariate partially linear models, Statist. Probab. Lett. 76 (2006) 1543–1549.  P. Speckman, Kernel smoothing in partial linear models, J. R. Stat. Soc. Ser. 50 (1988) 413–436.  N.H. Timm, Applied Multivariate Analysis, Springer-Verlag, New York, 2002.  A. van der Vaart, Asymptotic Statistics, Cambridge University Press, Cambridge, 1998.