Signal Processing 17 (1989) 3138 Elsevier Science Publishers B.V.
31
APPLICATION OF THE CONJUGATE GRADIENT AND STEEPEST F O R C O M P U T I N G T H E E I G E N V A L U E S O F AN O P E R A T O R *
DESCENT
Tapan K. SARKAR and Xiaopu YANG Department of Electrical Engineering, Syracuse University, Syracuse, N Y 132441240, USA Received 18 February 1987 Revised 24 August 1987 and 12 April 1988
Abstract. This paper describes the method of steepest descent and the method of conjugate gradient for iteratively finding the first few large/small eigenvalues and eigenvectors of a Hermitian operator. Both the methods have been applied for the computation of the prolate spheroidal functions. Since the methods are iterative, it is expected to yield accurate solutions for the first few large/small eigenvalues particularly when the condition number (ratio of the largest to the smallest eigenvalue) is large. Zusammenfassung. Im folgenden wird beschrieben, wie die Verfahren des "steepest descent" und der konjugierten Gradienten angewandt werden k6nnen, um die ersten (gr613ten/kleinsten) Eigenwerte und die zugeh6rigen Eigenvektoren eines hermiteschen Operators zu finden. Beide Verfahren werden verwendet, um "Prolate"SphfiroidFunktionen zu berechnen. Da es sich um iterative Methoden handelt, kann man erwarten, genaue L6sungen fiir die ersten paar grol3en/kleinen Eigenwerte zu finden, insbesondere dann, wenn die Konditionierungszahldas Verh~iltnis von gr613tem zu kleinstem Eigenwertgrol3 ist.
[email protected] Cet article d~crit la m6thode de la plus forte pente et la m6thode du gradient conjugu6 pour trouver it6rativement les quelques premiers valeurs et vecteurs propres les plus grands/petits d'un op~rateur bermitien. Les deux methodes ont 6t~ appliqu6es pour le calcul des fonctions sph6roi'dales prolates. Comme les m6thodes sont it~ratives, on s'attend fi des solutions pr6cises pour les premieres valeurs propres les plus grandes/petites, surtout quand le nombre de condition (le rapport de la plus grande valeur propre fi la plus petite) est grande. Keywords. Hermetian operators, conjugate gradient, steepest descent.
I. Introduction
Many direct methods exist for computing the eigenvalues and the eigenvectors of a Hermitian matrix. A good summary is available in the book by Golub and Van Loan [2]. However, the direct methods become inefficient when only a few e i g e n values and the corresponding eigenvectors are desired. For most signal processing applications, one needs only the smallest eigenvalue and the eigenvector corresponding to it. Hence it is computationally inefficient to solve for all the eigenvalues * This work has been supported in part by the Office of Naval Research under contract N0001485C0598. 01651684/89/$3.50 O 1989, Elsevier Science Publishers B.V.
and the eigenvectors when only one of them is desired [ 1]. Also, if there is a large dynamic range for the eigenvalues (i.e. the condition number is large) then there may be some roundof[ and truncation errors incurred in the solution procedure of direct methods which might introduce errors in the solution. In this paper, two iterative methods are presented for computing the first few large/small eigenvalues and the eigenvectors of a Hermitian operator. Since the method is iterative the solutions can be obtained to a prespecified degree of accuracy. The two iterative methods described in the paper are the method of steepest descent and
T.K. Sarkar, X. Yang / Computing the eigenvalues of an operator
32
the method of conjugate gradient. The method of steepest descent is simpler, whereas the method of conjugate gradient is more complex. However, it has been our experience that the conjugate gradient method is much faster. Both methods are applied to the computation of the prolate spheroidal functions as a solution to the eigenvalue problem.
2. Description of the methods when A is positive definite Let us equation
consider
the
following
From now on we consider maximization of the functional F ( X ) and the procedure for minimization of the functional are identical except a sign in one of the scalar parameters is changed. We will reemphasize that point where we arrive at that formula. The objective then is to maximize the Raleigh quotient F ( X ) given by (2). Equivalently, the constrained maximization problem in (2) can be rewritten as an unconstrained maximization problem,
(AX; X )
eigenvalue
F(X)  
(x; x)
Let Y denote the normalized version of X as (1)
A X = ,~X,
where A is a linear integrodifferential operator which is selfadjoint and A is the eigenvalue and the corresponding eigenvector is X. In order to find the minimum/maximum eigenvalue of the operator equation (1) we select a functional F ( X ) , defined as
Y = X / ( X ; X ) '/2,
F ( X ) = (AX; X), subject to iX; X) = 1,
[F(X)]ma x = [F( Y)]max
(2)
dz
 L
(3)
where the overbar denotes the complex conjugate. The norm can be defined from (3) as
IICIl==
= [ Ic(z)l ~dz.
x5Jmax=
An operator A is defined as Hermitian if
(AX; Y ) = (X; A Y ) and is called positive definite~ provided for all X # 0.
For a nonHermitian operator
(AX; Y ) = ( X ; A ' Y ) , where A* is the adjoint operator. If A is a complex matrix, then A* would be the conjugate transpose of A.
rl L ~TxS Jmax ....
t7:
so that the maximum of F ( X ) or F ( Y ) is indeed the maximum eigenvalue. To maximize F ( Y ) , we start with an initial guess Xo, and then normalize it, i.e. Yo = X o l ( X o ; Xo> '/2.
(4)
,d z
(AX;X)>O
[(AY; Y)]max =
[<~x;x>]
where the inner product in (2) is defined as (C; D ) = f C(z)ff)('z) dz,
(6)
then the maximum of the functional F ( X ) is the maximum eigenvalue as [observe that F(X)=F ( Y ) for any X ]
=
Signal Processing
(5)
(8)
Next, we calculate the approximate maximum eigenvalue from Ao = F[ Yo] = CA Yo; Yo)
(9)
and the residual
Ro = ao Yo  A Yo ,
(101
which is the negative of the derivative of the functional F ( Y ) . Next the initial search direction Pc is chosen as Po = Ro.
(11)
The search direction denotes the vector along which the functional F ( Y ) will be maximized. W~
33
T.K. Sarkar, X. Yang / Computing the eigenvalues of an operator
develop the successive approximation by
Xk+l = Yk + tkPk,
(12)
where tk is chosen such that F [ X k + l ] is maximized along the direction Pk, starting from Yk. TO obtain tk, we write F[Xk+,] 
(AXk+l ; Xk+l)
(Xk+l ; Xk+l) Ak + tkak + "[kak"~ lk[kbk
(14)
The question is now what sign to take in front of the square root. We next show that when one desires the largest eigenvalues, the minus sign should be chosen, and when one desires the smaller eigenvalue the plus sign should be chosen. The plus and the minus sign has to do with the real part of the complex number resulting from the square root operation. Now if we observe the difference, F[Xk+I]
F [ Yk]
1 + lkCk + [k(.k + tktkdk ' tk(ak  akCk) + [k(ak  Ak?k) + tkfk (bk  Akdk)
where Ak =(AYk ; Yk),
(15)
ak=(APk;
Yk),
(16)
bk = ( A P k ; Pk),
(17)
Ck =(Pk ; Yk),
(18)
ark =(Pk; Pk),
(19)
and take the derivative of F [ X k + l ] with respect to tk and set it equal to zero. Since tk is a complex quantity, we can write tk = Pk +jqk,
(20)
and the proper value of tk is obtained from the solution of the coupled equation obtained by setting
aF[Xk+,]
0,
(21)
0.
(22)
Opk
~F[Xk+l]
Oqk
(X~+, ; X~+,) tk ?k ~ 2 tk ( bkCk  akdk ) + ( bk  Akdk + (lkCg  g~ak)
2 ~
) + 2 "[k(bkgk  akdk ) + (bk  Akdk + akgk  CkFtk ) ~.
(25) If we assume (24) can be rewritten as 2[ bkCk  akdk ] tk + (b~  Akdk + akCA  ?gag ) = ± [ P k +jqk],
tik  6kAk+ tk ( bk  Akdk + ?tkCk  Ckak ) + tk(bkck  akdk) = 0.
(23)
This results in tk = {  ( bk  Akdk + akCk  Ckak )
± [(bk  A~dk + akCk  ekak) 2  4( bkCk  akdk ) ( ak  gkAk)]'/2}
X [2(bkCk  akdk)]l.
(24)
(26)
where it is assumed Pk > 0
(27)
and qk is any real number, then it can be shown that in ( 2 5 ) F[Xk÷t]
From the simultaneous equations (21) and (22) it is seen
(Xk+,; Xk+,)
F [ Yk] > O,
(28)
if the minus sign is selected in (26) and (24), and F[Xk+,]
F[ Y~]
(29)
if the plus sign is selected in (26) and (24). Therefore, for the solution of the larger eigenvalues the minus sign should be selected and for the smaller eigenvalues the plus sign should be selected in (26). Once tk has been obtained, the updated solution Xk+l is normalized so that
Yk+l = Xk+l/(Xk+l ", Xk+l) I/2,
(30)
and from (30) an approximation to the maximum Vol 17, No. 1, May 1q89
34
T.K. Sarkar, X. Yang / Computing the eigenvalues of an operator
eigenvalue A(k+ 1) is computed as h k + , = ( A Y k + l ; Yk+,), hk+l Ak
<~ 10 3,
where (31)
gk = ( A R k ; Rk)  hk(Rk ; Rk),
(32)
hk =
then the iteration is continued by selecting a new direction P ( k + 1). The basic difference between the method of steepest descent and the method of conjugate gradient is how the new search direction Pk+~ is selected.
For this case, the search directions are chosen as the residual, i.e.
and X k + l = Yk + t k R k ,
(46;
Yk+, = X k + l / ( X k + l ; Xk+l) 1/2.
(471
In this case, the method is similar to that of [4], where it was first developed for real matrices.
For the conjugate gradient method, the nex~ search directions are selected as
ak = (APk ; Yk)=(Rk; A Y k )
(481
Pk+, = Rk+, + lkPk,
(33)
With this particular choice of Pk+l, certain simplifications can be made in equations (16)(19). For example
= ( R k ; A k Y k  Rk) =   i R k ", R k ) ,
(451
2.2. M e t h o d o f conjugate gradient
2.1. M e t h o d o f steepest descent
Pk+, = Rk+, = hk+, Yk+l  A Yk+,.
IlRkll 4,
(441
where lk
(Rk+, ; Rk+,/IIR~I[ IlXk+lll ' Tk) =

(Pk; Rk+,/llRkll
IIX,,+,ll " Tk)
(49~
and (34)
Tk = R k / ( R k ; Rk) 1/2.
(50)
as
(Rk; Y k ) = ( h k Y k   A Y k ; bk
Yk)=0,
(36)
( A R k ; Rk),
=
(35)
ck = (Rk ; Yk) = 0,
(37)
dk = iRk ; Rk) = ak.
(38)
Therefore [bk  hkdk] ± x/(bk  hkdk)2 + 4d3k 2dZk
tk 
(39)
This recursive choice of the search directions results in a faster rate of convergence. In summary, for the conjugate gradient method the algorithm takes the following form: We start with an initial guess Xo, then
Yk = X k / ( X k ; Xk) '/2,
(Sl~
hk = (AYk ; Yk),
(52;
Rk = h k Y k  A Y k ,
(531
In summary, for the method of steepest descent, the algorithm takes the following form: Start with an initial guess X0, then
Po = Ro,
(541
ak = (APk ; Yk),
(551
(40)
bk = (aPk ; Pk),
(561
(41)
Ck = (Pk ; Yk),
(571
(42)
dk = (Pk ; Pk),
(581
(43)
tk  
Yk
=
X k / (Xk ; Xk) '/2,
X k = ( A Y k ; Yk), Rk hk Yk 
A Yk,
 g k ± x/ g ] + 4 h 3k/2 6

Signal Processing
2hk
,
fk + x/ f2k  4gkhk
2hk
,
(591
T.K. Sarkar, X. Yang / Computing the eigenvalues of an operator
where fk = bk  Akdk + ?tkCk Ckak,
(60)
hk = bkCk  akdk,
(61)
gk = ilk  Ckhk,
(62)
Xk+, = Yk + tkPk,
(63)
Pk + l = R k + ! q lk Pk ,
(64)
lk =
(gk+l; Rk+,/llgkl[ IlXk+,ll " T~> (Pk;
R~+,/IIR~II I l x ~ + , [ I
35
would be no way to correct the roundoff errors introduced in forming the direct product A * A . In the present algorithms all computation are made only with A and never with the accumulated product A * A . This is particularly important from a computational viewpoint if A is highly illconditioned. 3.1. Method o f steepest descent
(65)
• T~> '
Tk+l = Rk+~/(Rk+l; Rk+l) I/2,
(66)
Yk+! = Xk+l/ (Xk+~ ; Xk+l) ~/2.
(67)
For the method of steepest descent, we start with an initial guess Xo for the eigenvector of A * A . We then normalize it Yk = X k / ( X k ; Xk) '/z,
(70)
and compute 3. Development of the methods when A is arbitrary Ak = ( A Y k ;
For an arbitrary operator A, the function to be minimized is no longer the same F ( X ) as defined in (2), but the functional takes on a different form •Ix defined as Jx = ( A X ; A X )
(68)
(X; X ) = 1.
Rk = Ak Yk A * A Y k ,
(69)
Observe Jx now represents the eigenvalues of A*A. It is important to note that the matrix in (68) can even be rectangular. Now in many signal processing problems, A represents the data matrix and X is the unknown weight vector, to be selected in such a way that it is the eigenvector corresponding to the smallest eigenvalue of the data covariance matrix A * A . For this problem, the method of steepest descent and the method of conjugate gradient can again be applied for the computation of the eigenvalues and eigenvectors of A * A and not of A. The main advantage of the two techniques is that the eigenvalues and the eigenvectors of A * A can be obtained without explicitly forming A * A as this would be very timeconsuming. The direct product of A * A requires N 3 operations, whereas evaluation o f A * A Y requires only 2 N 2 operations, if the evaluation is made as A * ( A Y ) . Also, there
(71) (72)
gk + x/g2k + 4h 3/2 tk
2hk
gk = ( A R k ; A R k )  Ak IIgk II2,
(73) (74)
Ilgkll 4,
(75)
xk+~ = Yk + tkRk,
(76)
rk+, = xk+,/(x~+, ; x~+,) '/2.
(77)
hk =
subject to
AYk),
Again the plus or minus sign in (73) is chosen depending on whether the smaller or the larger eigenvalues of A * A are desired. 3.2. M e t h o d o f conjugate gradient
For the conjugate gradient method, one starts with the initial guess Xo, and then computes Yk = X k / ( X k ; Xk) ~/2,
(78)
•~k = ( A Y k ; A Y k ) ,
(79)
Rk = hk Yk  A * A Yk,
(80)
Po = Ro,
(81)
ak = (APk ; a Yk),
(82)
bk = ( A P k ; A P k ) = [[aPk II2,
(83)
Ck = (Pk ; rk),
(84) Vol. 17, No. , May 1989
36
T.K. Sarkar, X. Yang / Computing the eigenvalues of an operator
a~ = (p~; p~) = IIP~ II~,
(85)
fk + 4 f~  4hkgk 2hk
(86)
tk
fk = bk  Akdk + a k C k

Pc(w) = l, 0, (87)
Ckak,
hk = bkCk  akdk,
(88)
gk = ak  CkAk,
(89)
Xk+, = Yk + tkPk,
(9o)
Pk+, = Rk+, + IkPk,
(91)
(P~;
Rk+,/llR,,llIIXk+,l[
(92)
" Tk> '
Tk+, = Rk+,/(Rk+, ; Rk+,) 1/2.
(93)
4. Computation of prolate spheroidal functions Both the method of steepest descent and the method of conjugate gradient are applied for the computation of'the prolate spheroidal functions. The prolate spheroidal functions is a good test case because of a tremendous dynamic range of the eigenvalues. The prolate spheroidal functions are the eigenfunctions of the following integral equation [7]
I
+l sin c ( t  x ) l O(x) ~ r ( t  x )
where
dx=A~(t),
(94)
where c is a parameter, A is the eigenvalue and q,(x) are the prolate spheroidal functions. If we restrict our attention to Itl < 1, then the operator
f o r l ~ l ~ 1/¢ otherwise.
Hence
(AX; X)=~~
~, IX(o~)12Pc(o~) doJ > 0.
(98) Therefore we utilize the equations developed in Section 2 for the computation of the prolate spheroidal functions. For computational purposes, (94) can be rewritten as N sin c(t  ti) A,.tl,(c, t ) = ~, wi ~(c, t~). i=l ~r(tti)
X ( u ) sin _
du
(95)
~(tu)
is selfadjoint and positive definite, since
(AX; X)=
f+l I ~I sinc(tu) X ( t ) dt X(u) _, , ~(tu)
=f_~lX(t) dt~f~
X(to)Pc(to)
du
e j'°t dw, (96)
Signal Processing
(99)
In (99) the integral has been replaced by a summation. This is numerically achieved by the GaussLegendre quadrature formula [5], i.e.
f ( x ) dx~ Y~ w,f(x~). 1 i=l
(100)
Here we have utilized a double precision 256 point ( N = 256) GaussLegendre formula in the evaluation of (95). Out of all numerical integration routines, the GaussLegendre formula has the least truncation error for a given number of samples of the function, to~ in (100) are the weights of the GaussLegendre formula and x~ are the abscissa points where the function f ( x i ) is evaluated. If we now define oar eigenvector x as
X T= [x,, x 2 , . . . , xN],
(101)
where
x~=x/~w~qt(c, tg), AX(t) A
(97)
i = 1 , 2 , . . . , N,
(102)
then our computed elements of the eigenvector X is a scaled version of the eigenvector qt that need to be computed. The operator equation of (94) is now transformed to a matrix equation A X = AX, where the elements a o of A, a symmetric N x N matrix, are given by c sin c( ti  b) ao = v w i %  7     7 .
(103)
T.K. Sarkar, X. Yang / Computing the eigenvalues of an operator
The functional values of t~ are the quadrature abscissas given by the GaussLegendre quadrature formula.
5. Numerical results
There are various direct methods like E I G R F in the IMSL library [3] to solve for the eigenvalues and eigenvectors of a real symmetric matrix. We deal with a matrix with real eigenvectors and real eigenvalues, even though the expressions derived in this paper are valid for Hermitian matrices. It has been observed by Panayirci and Tugbay [6] that such direct methods often fail to provide accurate computational values when large dynamic ranges exist for the eigenvalues. In their case, the direct methods failed to provide accurate eigenvalues. This is the reason we apply iterative methods. The problem with direct methods are that
37
roundoff and truncation errors propagate with computation, whereas in an iterative method, roundoff and truncation errors are limited only to the last stage of iteration. We now apply the method of steepest descent and the conjugate gradient to find the first few eigenvalues of the prolate spheroidal functions. For all the numerical computations, the initial starting value was [1,  1 , 1,  1 , . . . ] 7 . In Tables 1 and 2 the eigenvalues that have been computed for the method of steepest descent and conjugate gradient are given. The values have been computed for five different values of c, as defined in (94). The first column n describes the order of eigenvalues for a particular value c. p denotes the magnitude of the exponent and m determines the number of iterations required to obtain the eigenvalue up to eight significant digits of accuracy. For example, for c = 0.5, the fifthorder eigenvalue A5 is 7 . 2 7 1 2 . . . × 1 0 '~. This eigenvalue has been
Table 1 Steepest descent: Values o f A,,(c) =
c = 0.5
L,,(c)× lOPn ~'I, m
c = 1.0
n
L
p
m
L
0
3.09689557
1 2 3
8.58107375 3.92745344 7.21138704
1
2
3 5 8
3 2 2
4
7.27127396"
11
2
= n u m b e r o f iterations
c = 2.0
c = 4.0
c = 8.0
p
m
L
p
m
L
p
m
L
p
m
5.72581781
1
3
8.80559922
1
3
9.95885478
1
4
9.99543350
1
60
6.27912741 1.23747933 9.20097705
2 3 6
3 2 3
3.55640624 3.58676877 1.15011812
1 2 3
4 3 2
9.12107424 5.19054838 1.10210986
1 1 1
4 3 4
3.7192856
8
2
8.82778435"
3
2
3.81276001"
4
6
5 ~' T h e v a l u e is close to that s h o w n in table ! o f [2]. Table 2 C o n j u g a t e g r a d i e n t : Values o f A,,(c) =
c = 0.5
L,,(c)
× 10 Pn ~'~, m = n u m b e r o f iterations
c = 1.0
c = 2.0
c = 4.0
c = 8.0
n
L
p
m
L
p
m
L
p
m
L
p
m
L
p
m
0 1 2 3 4
3.0989557 8.58107375 3.91745344 7.21132597 ~ 7.27127391 a
1 3 5 8 11
2 3 2 3 2
5.72581 6.27912739 1.23747933 9.20097701 3.71785657
1 2 3 6 8
3 4 2 3 2
8.80559922 3.52058662 3.58676877
1 1 2
3 4 3
9.95885443 9.12107360 5.19054838 1.10210986 8.82772934 ~'
1 1 1 1 3
4 4 3 4 2
9.99151428
1
60
~' T h e v a l u e is close to that s h o w n in table I o f [2]. Vo[. 17, No. I, May 1989
38
TK. Sarkar, X. Yang / Computing the eigenvalues of an operator
obtained after two iterations. We start with an initial guess o f 1,  1 , . . . and find Ao in two iterations. Then the iteration is restarted with an initial guess which is orthogonal to the eigenvector of Ao and in another two iterations we obtained A~, as 8.58 × 10 3. To find A2, the initial guess was taken as orthogonal to the eigenvector c o m p u t e d for both ;to and A~ as eigenvectors c o r r e s p o n d i n g to different eigenvalues are orthogonal. This procedure has been repeated for both methods. Therefore the iterative methods are c o m p u t a tionally accurate and efficient if only a few large/small eigenvalues and eigenvectors are desired. Also, in an iterative m e t h o d one can obtain the eigenvectors with a prespecified degree o f accuracy.
6. Conclusion The m e t h o d o f steepest descent and the conjugate gradient are used to c o m p u t e the eigenvalues and eigenvectors of a Hermitian operator. W h e n applied to the c o m p u t a t i o n o f the prolate spheroidal functions, both m e t h o d s turn out to be quite accurate and efficient. The m e t h o d o f steepest descent is m u c h simpler than the m e t h o d o f conjugate gradient. However, the reward o f doing more work
Signal Processing
results in a faster rate o f convergence. However, this point has not been illustrated in this paper. The greatest asset of an iterative method over a direct m e t h o d is that the solution can be obtained with a prespecified degree o f accuracy. In this paper it has been shown that the iterative methods are also efficient when only a few large/small eigenvalues o f the operator are desired.
References [1] H. Chen, T.K. Sarkar, J. Brule and S.A. Dianat, "Adaptive spectral estimation by the conjugate gradient method", IEEE Trans. Acoust. Speech Signal Process., Vol. 34, April 1986, pp. 272284. [2] G.H. Golub and G.F. van Loan, Matrix Computations, The John Hopkins University Press, Baltimore, 1983. [3] International Mathematical and Statistical Libraries, Inc. Houston Texas, 1984. [4] L.V. Kantorovich, "Functional analysis and applied mathematics", National Bureau of Standards Report, NBS project 1101105100, NBS report 1509, March 1952 (first published in Usp. Mat. Nauk, Vol. I11, No. 6, pp. 89185, 1948). [5] V.I. Krylov, Approximate Calculation of lntegrals, translated by A.H. Stroud, The Macmillan Company, New York, 1962. [6] E. Panayirci and N. Tugbay, "Optimum design of finite duration Nyquist signals", Signal Processing, Vol. 7, 1984, pp. 5764. [7] D. Slepian and H.O. Pollack, "Prolate spheroidal wave functions, Fourier analysis and uncertaintyI', Bell Syst. Tech. J., Vol. 40, No. 1, January 1961, pp. 4363.