- Email: [email protected]

Contents lists available at ScienceDirect

Journal of Computational and Applied Mathematics journal homepage: www.elsevier.com/locate/cam

Tensor ranks for the inversion of tensor-product binomials Eugene Tyrtyshnikov Institute of Numerical Mathematics, Russian Academy of Sciences, Gubkin Street, 8, Moscow 119333, Russian Federation

article

info

Article history: Received 8 February 2008 Dedicated to the memory of Gene Golub

abstract The main result reads: if a nonsingular matrix A of order n = pq is a tensor-product binomial with two factors then the tensor rank of A−1 is bounded from above by min{p, q}. √ The estimate is sharp, and in the worst case it amounts to n. © 2010 Elsevier B.V. All rights reserved.

MSC: 15A12 65F10 65F15 Keywords: Tensor ranks Kronecker product Low-rank matrices Inverse matrices Multilevel matrices Toeplitz matrices

1. Introduction Tensor decompositions are becoming a subject accumulating research interests in several important fields. Among them one should mention complexity theory for polynomial and matrix computations [1,2], data analysis supported by the Kruskal uniqueness property and Tucker reduction [3,2] and numerous multidimensional applications (see [4]). Despite an ever-increasing interest and several significant results (for some bibliography see [3,5,2,6,4]), many natural questions about seemingly ‘‘simple’’ cases are still not answered. In this paper we present some nontrivial tensor rank estimates for the inverse matrices. Consider a matrix A of order n = pq and its tensor-product decomposition A=

r X

Us ⊗ Vs

(1)

s=1

with minimal possible number of terms. In this case we call r a tensor rank of A, and write r = tRank(A). Matrices Us and Vs are of order p and q, respectively, and U ⊗ V means the Kronecker (tensor) product of matrices U and V . By definition, if U = [uij ] is of order p and V = [vkl ] is of order q, then U ⊗ V is a matrix of order n = pq with the following block structure: " # u11 V . . . u1p V ... ... . U ⊗ V = ... up1 V . . . upp V E-mail address: [email protected] 0377-0427/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.cam.2010.02.006

E. Tyrtyshnikov / Journal of Computational and Applied Mathematics 234 (2010) 3170–3174

3171

Of course, the tensor rank of A is calculated for some fixed p and q whose product is the order of A, so maybe a better way of notation would be tRank(A) = tRankp,q (A), but here we still skip p and q for the sake of brevity. It is obvious that a pair of indices (i, k) with 1 ≤ i ≤ p, 1 ≤ k ≤ q naturally points to a row in A (and in U ⊗ V ), and similarly, another pair (j, l) indicates a column in A. It follows that

(A)(i,k),(j,l) =

r X (Us )ij (Vs )kl .

(2)

s=1

This suggests reshaping A into a matrix B = reshape(A) in the following way:

(B)(i,j),(l,l) = (A)(i,k),(j,l) ,

(3)

the sizes of A and B being (pq) × (pq) and p2 × q2 , respectively. Then, Eqs. (2) and (3) show that B is the sum of r columnby-row matrices. Consequently (see [7]), tRank(A) = rank(B).

(4)

We are interested in sharp estimates for the tensor rank of A−1 , provided that A is nonsingular. A trivial bound emanating from (4) reads tRank(A−1 ) ≤ n, and does not reveal any dependence on r, or on p and q. The results presented below assume that the entries of all matrices are complex numbers or belong to a subfield of complex numbers; whether they are valid for finite fields is an open question. 2. Main result Let n = pq; the tensor decompositions are considered for fixed p and q. Assume that A = U ⊗ V is nonsingular. Then both U and V are nonsingular and, as is easy to check,

(U ⊗ V )−1 = U −1 ⊗ V −1 . Therefore, if tRank(A) = 1, then tRank(A−1 ) = 1. All the cases with tRank(A) ≥ 2 are significantly more intricate. Theorem 2.1. Let a matrix A of order n = pq be nonsingular, and assume that tRank(A) = 2. Then tRank(A−1 ) ≤ min{p, q}. Proof. Under the premises of the theorem we can write A = U1 ⊗ V1 + U2 ⊗ V2 . Let us assume first that all the matrices U1 , V1 , U2 , V2 are nonsingular. Moreover, assume that all the eigenvalues of U1 U2−1 and V2 V1−1 are simple. Then both matrices can be diagonalized by similarity transformations: U1 U2−1 = X ΛX −1 ,

Λ = diag{λ1 , . . . , λp },

,

M = diag{µ1 , . . . , µq }.

−1

V2 V1

= Y MY

−1

Hence, A (U2−1 ⊗ V1−1 ) = (X ⊗ Y ) Z (X −1 ⊗ Y −1 ), where Z = Λ ⊗ I + I ⊗ M.

3172

E. Tyrtyshnikov / Journal of Computational and Applied Mathematics 234 (2010) 3170–3174

In these circumstances, it is easy to deduce that tRank(A−1 ) = tRank(Z −1 ) = rank(reshape(Z −1 )). To better understand the structure of reshape(Z −1 ), take p = 2 and q = 3. Then

Z

−1

(λ + µ )−1 1 1 0 0 = 0

0

(λ1 + µ2 )

0 0

reshape(Z

−1

0

(λ1 + µ3 )−1

0 0 0

0 0 0

(λ1 + µ1 )−1 0 0

)=

0 0

−1

(λ2 + µ1 )−1

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0

(λ2 + µ1 )−1

0 0

(λ2 + µ2 )−1

0 0 0 0

0 0 0

0

0 0

(λ2 + µ2 )−1

0 0

(λ1 + µ2 )−1

0 0 0

0 0 0 0 0

0 0 0 0

(λ1 + µ3 )

,

(λ2 + µ3 )−1 −1

0 0

.

(λ2 + µ3 )−1

Evidently, in this case, rank reshape(Z −1 ) = 2 = min{p, q}.

However, for arbitrary values of p and q, the reshaped inverse to Z has a similar structure: all the entries are equal to zero except those forming a p × q submatrix H of the form

(λ1 + µ1 )−1 ... H = (λp + µ1 )−1

... ... ...

(λ1 + µq )−1 . ... (λp + µq )−1

(5)

Thus, we obtain tRank(A−1 ) ≤ min{p, q} under certain restrictions imposed on U1 , V1 , U2 , V2 . If these restrictions are not fulfilled, then, given an arbitrary ε > 0, we can approximate the factors by matrices U1 (ε), V1 (ε), U2 (ε), V2 (ε) so that they satisfy the restrictions and lie in the ε -vicinity of the corresponding factors (say, in the spectral norm). Thus, A ≈ A(ε) = U1 (ε) ⊗ V1 (ε) + U2 (ε) ⊗ V2 (ε), and for all sufficiently small ε > 0 we have tRank(Z −1 (ε)) = rank H (ε) ≤ min{p, q}. It is well known that the limit matrix for a sequence of matrices whose rank does not exceed some bound cannot have rank greater than this bound. Hence, rank H ≤ min{p, q}. So we complete the proof by transition to the limit with ε → 0.

Corollary 2.1. Let A be a matrix of order n = pq and tRank(A) = 2. If A is nonsingular, then tRank(A−1 ) ≤

√

n.

Our proof highlights that the estimate of Theorem 2.1 is sharp. Indeed, any square submatrix in H is nonsingular. It follows that rank H = min{p, q}, so we have tRank(A−1 ) = min{p, q} so long as the above restrictions on the factors hold true.

E. Tyrtyshnikov / Journal of Computational and Applied Mathematics 234 (2010) 3170–3174

3173

3. An extension to more factors Assume that n = p1 . . . pm , and consider a tensor-product binomial with m factors (m-adic case), A = B1 ⊗ B2 ⊗ · · · ⊗ Bm + C1 ⊗ C2 ⊗ · · · ⊗ Cm , where Bs and Cs are of order ps , 1 ≤ s ≤ m. As previously, the minimal number of m-factor tensor-product binomials in representations of a matrix is called a tensor rank or, to stress the number of factors, an m-adic tensor rank of this matrix. We will use the same notation, tRank(A). If tRank(A) = 2, then Theorem 2.1 suggests a sharp estimate on tRank(A−1 ) in the case m = 2. By similar means, we can try to produce an estimate for m > 2. As before, assume additionally that matrices Bs , Cs and Bs Cs−1 have only simple eigenvalues. Denote by Ds the diagonal eigenvalue matrix obtained from Bs Cs−1 by the similarity transformation. As is readily seen, tRank(A−1 ) = tRank((I + D1 ⊗ · · · ⊗ Dm )−1 ). The diagonal matrix in the right-hand side can be viewed as an m-dimensional array (tensor) with mode sizes p1 , . . . , pm . The tensor rank of such an array cannot exceed the minimum among p1 . . . pm /ps (see [2]). Since it is equal to tRank(A−1 ), we have established the following result. Theorem 3.1. Let tensor ranks be defined for tensor-product representations with m factors of orders p1 , . . . , pm , and assume that A is a nonsingular matrix of order n = p1 . . . pm and tRank(A) = 2. Then, for any ε > 0, there exists a nonsingular matrix Aε with the following properties:

kA − Aε k2 ≤ ε, tRank(Aε ) = 2, 1 tRank(A− ε ) ≤ min n/ps . 1≤s≤m

In all cases 1 (m−1)/m tRank(A− . ε )≤n

It is well known that the m-adic case with m ≥ 3 is essentially different from the case m = 2 [3,2,6]. For example, when m ≥ 3 then a convergent sequence of matrices, each of tensor rank r, may have a limit matrix whose tensor rank is greater than r. As a corollary, in the case m ≥ 3, when pursuing the same scheme of proof as for m = 2, we are generally not allowed to perform a transition to the limit. 4. Discussion We should emphasize that the estimate tRank(A−1 ) ≤ n(m−1)/m

(6)

is valid in the case m = 2 and still under question for arbitrary nonsingular m-adic binomials with m ≥ 3. In the latter case, we maintain the same estimate only under some additional assumptions (which are fulfilled, anyway, with probability 1) or only for some approximations to the original matrix. A principal observation is that the trivial upper bound tRank(A−1 ) ≤ n is never reached, at least in the case m = 2. Some questions, naturally in addition to the considered ones, seem to be open. For instance, we are not aware of any nontrivial estimate on the rank of the inverse to a matrix of the form A = U1 ⊗ V1 + U2 ⊗ V2 + U3 ⊗ V3 . In a certain respect, our estimates may look quite pessimistic even in the case m = 2: with O(n) parameters entirely defining A we need O(n3/2 ) parameters in a tensor representation of A−1 . A major remark on this account is that tensor approximations to A−1 may exist involving much fewer parameters: for many interesting classes of matrices the number of the ε -approximation parameters turns out to behave as O(n logα n logβ ε −1 ) with some positive constants α and β [8,9]. Some refined estimates are available for more special and practically important cases [10]. Moreover, various classes of structured matrices can benefit from a complementary tensor structure: a good justifying example is recently given by two-level Toeplitz matrices [11–13]. The same type of remark concerns the Hilbert matrices H of the form (5) that appeared in our proof of Theorem 2.1. Let p = q and 0 < δ ≤ λi , µj ≤ 1. In this case H can be approximated by a matrix of rank r = r (ε, δ) with entry-wise accuracy ε > 0 so that r = O(log ε −1 log δ −1 ), and this estimate on r is asymptotically sharp [14]. Approximations of low tensor rank provide a ‘‘breaking complexity’’ tool. In applications, we are especially interested in rank estimates for some approximations rather than for the exact data. This makes the difference between the cases m = 2 and m ≥ 3 less important, at least from the practical point of view.

3174

E. Tyrtyshnikov / Journal of Computational and Applied Mathematics 234 (2010) 3170–3174

Acknowledgements The author was supported by RFBR grant 08-01-00115, RFBR/DFG grant 09-01-91332 and Priority Research Programme of Department of Mathematical Sciences of Russian Academy of Sciences. References [1] D. Bini, V. Pan, Matrix and Polynomial Computations, Birkhauser, Boston, 1994. [2] J.B. Kruskal, Three-way arrays: Rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics, Linear Algebra Appl. 18 (1977) 95–138. [3] L. De Lathauwer, B. De Moor, J. Vandewalle, A multilinear singular value decomposition, SIAM J. Matrix Anal. Appl. 21 (4) (2000) 1253–1278. [4] W. Hackbusch, B.N. Khoromskij, E.E. Tyrtyshnikov, Hierarchical Kronecker tensor-product approximations, J. Numer. Math. 13 (2005) 119–156. [5] T.G. Kolda, Orthogonal tensor decompositions, SIAM J. Matrix Anal. Appl. 23 (1) (2001) 243–255. [6] L.-H. Lim, V. de Silva, Tensor rank and the ill-posedness of the best low-rank approximation problem, SIAM J. Matrix Anal. Appl. 30 (3) (2008) 1084–1127. [7] C.F. Van Loan, N.P. Pitsianis, Approximation with Kronecker products, in: NATO Adv. Sci. Ser E Appl. Sci., vol. 232, Kluwer, Dordrecht, 1993, pp. 293–314. [8] E.E. Tyrtyshnikov, Tensor approximations of matrices generated by asymptotically smooth functions, Sb. Math. 194 (5–6) (2003) 941–954. [9] E.E. Tyrtyshnikov, Kronecker-product approximations for some function-related matrices, Linear Algebra Appl. 379 (2004) 423–437. [10] W. Hackbusch, B.N. Khoromskij, Low-rank Kronecker product approximation to multi-dimensional of nonlocal operators. Part I. Separable approximation of multi-variate functions, Computing 76 (2006) 177–202. [11] J.M. Ford, E.E. Tyrtyshnikov, Combining Kronecker product approximation with discrete wavelet transforms to solve dense, function-related systems, SIAM J. Sci. Comput. 25 (3) (2003) 961–981. [12] J. Kamm, J.G. Nagy, Optimal Kronecker product approximations of block Toeplitz matrices, SIAM J. Matrix Anal. Appl. 22 (1) (2000) 155–172. [13] V. Olshevsky, I. Oseledets, E. Tyrtyshnikov, Tensor properties of multilevel Toeplitz and related matrices, Linear Algebra Appl. 412 (2006) 1–21. [14] I.V. Oseledets, Lower estimates for separable approximations of the Hilbert kernel, Sb. Math. 198 (3) (2006) 137–144.