- Email: [email protected]

ELSEVIER

Modified explicitly restarted Lanczos algorithm G.E Zhang z Max-Planck-lnstimt fiir Physik komplexer Systeme. Noethnitzer Str. 38. D-O1187 Dresden. Germany

Received 21 October 1997

Abstract We propose a modified explicitly restarted Lanczos algorithm to calculate the eigenvalues and eigenvectors of large sparse matrices. There are at least four interesting features of this algorithm: ( I ) it can calculate multiplicities of matrices; (2) the convergence is rapid, especially for those hardly converged eigenstates. Thus the threshold can be set high, up to 10-t~; (3) the implementation is extremely simple; (4) the required memory space is small. This new algorithm has been extensively tested and used in extended Hubbard chains. © 1998 Elsevier Science B.V. PACS: 02.60.Dc; 02.70.Rw; 02.70.-c; 71.10.F Keywords: Modified Lanczos method, Tests in Hubbard model

1. Introduction One of the current principal thrusts of computational physics and chemistry is to calculate the eigenvectors and eigenvalues of very large matrices using reliable methods. There are many traditional algorithms to do it. The Lanczos method [ 1,21 is one of the commonly used methods and is well-suited to calculate the extreme eigenvalues and eigenvectors. Traditionally this method cannot treat muitiplici ty (for a review, see Cullum and Willoughby [3] ), but a modified Lanczos, such as the block Lanczos algorithm, at least in principle, can calculate all the eigenvalues and eigenvectors. In practice, however, the efficiency of the method diminishes due to numerical instabilities. As Weikert noted [4], as the iteration proceeds, the Lanczos vectors tend to lose their orthogonality and the algorithm starts to produce either copies of already converged i Present address: Max-Planck-lnstitutfiJr Mikrostrukturphysik, Weinherg2, D-06120 Halle. Germany.E-mail: [email protected]

eigenvalues or some spurious states or both. Another drawback of the traditional Lanczos is that the convergence is slow. This will become more serious in the block Lanczos algorithm. Another popular method is the Davidson algorithm [5], which has been widely used in quantum chemistry [6]. The algorithm can calculate several eigenvalues and eigenvectors simultaneously and has no difficulty to compute multiplicities. It builds a large Krylov matrix with the dimension m (here m should not be less than k if the kth eigenstate is desired). To have a fast convergence, a sufficient large Krylov matrix is needed 2 . In real calculations, the Davidson algorithm converges much faster than the Lanczos algorithm but the used memory is increased. Two implicit assumptions that underlie the success of 2 Normally,the largerthe Krylovmatrix's dimensionis, the faster the convergenceis. However, due to the limitations in computer memory,the rnatrix'sdimensioncannot he very large. This is why the tolal numberof eigenstates is generally I~s than 50 in the Davidson algorithm..see Ref. 151.

0010-4655/98/$19.00 (~) 1998 Elsevier Science B.V. All rights reserved. PII S0010-4655 (97) 00136-7

28

G.P. Zhang/Computer Physics Communicatimzs 109 (1998) 27-33

the Davidson algorithm are: ( I ) one has a sufficient large memory space available to accommodate those working vectors3 ; (2) the user's desired physical or chemical quantities are not memory intensive. Due to the memory limitation, in real applications, such as configuration interaction, a typical threshold around 10-4-10 -6 can be reached [6] although naturally the convergence depends on the difference between two successive levels. Traditionally, the Lanczos and Davidson methods calculate the eigenvalues and eigenvectors of large matrices independently. Carefully comparing those two methods, one may find that there is a significant difference. In the Davidson algorithm, the number of the basis states {qj}, which form the Krylov space for the original Hamiltonian, is not fixed. One or more basis states which are calculated from the residuals, after their orthonormalization to the previous states, will be added to the original basis set. Thus the dimension is increased step by step. Once the dimension exceeds a certain large dimension, then one reduces the dimension to the original dimension by reconstructing initial bases. It should be noted that the change of the dimension plays an important role in the convergence of eigenstates as one will see below. Although there are several different versions of the Lanczos scheme, a commonly used one is as follows: Suppose H to be a real symmetric matrix. Hereafter all the matrices discussed are assumed to be real symmetric, but nothing in our algorithm relies on this assumption. The algorithm starts with an initial (normalized) vector ql, which is practically randomly selected in order to ensure a nonzero projection on each eigenvector of H. Then the Krylov subspace is spanned by K r = { q j , H q l . . . . . H r q l }. The algorithm builds an orthonormal basis {ql, q2 . . . . . q,}, leading to a tridiagonal matrix T *~> through the three-term iteration T/.i+lqi~ I = Hqi - Ti.iqi - Ti,i.-iqi-i ,

( I)

where Ti.i = q i t H q i and T/.i+l = T/*~l.i = q i ÷ j t H q i . Diagonalize T (r) and select the relevant eigenval-.es of 3 If one has an efficient I/O on his machine, one can also store all the eigenvectors and auxiliary vectors on the disk. Then the overall performance will partly depend on one specific computer. For one certain dimension,high-lyingeigenstates mostlyconverge slower than those low-lyingeigenstates since for high-lyingstates lh¢ effective auxiliary space becomes small.

T (r) as approximations to eigenvalues of the given ma-

trix H. To have a good accuracy, one always uses the above eigenvector of T (") as an initial trial eigenvector for the next iteration. Thus this is an explicitly restarted Lanczos (ERL) method. The method updates the initial Lanczos vector through an iterative scheme. One feature for this method is that the dimension r of T <') can be any number starting from 2, but r is fixed practically. As we will show below, this fixed dimension renders the Lanczos method many drawbacks, such as a poor convergence. The purpose of this paper is to find an optimized path to accelerate the convergence. There are several restraints on our intplementations. ( i ) The algorithm accelerates the convergence, but should not increase memory requirements. (2) The algorithm should not build a big Krylov space. (3) Numerically it should be stable ar:,d precise. (4) The algorithm should be easily implemented. With those restraints in mind, we modify the above ERL method by properly adopting the Davidson algorithm's idea. Thus we call this method a modified explicitly restarted Lanczos (MERL) method. Our study shows that one is able to obtain an optimized path if one properly changes the Krylov dimensior Numerical experiments have shown that ( ! ) the MERL method can calculate multiplicities of matrices; (2) the convergent threshold is high, up to 10-14; (3) the convergence is rapid, especially lor those hardly converged'eigenstates; (4) only a very small memory space is needed for constructing the Krylov space. We emphasize here that the MERL scheme has been extensively tested and used in Ref. [7], where more than 1000 large matrices have been diagonalized. The efficiency of this new method has been demonstrated clearly there. It is our brief that any new methods should be carefully and extensively tested before their publications. This article is arranged as follows. In Section 2 we describe technical details of the MERL algorithm. Section 3 is devoted to numerical tests and discussions, while a conclusion is presented in Section 4.

2. Modified explicitly restarted Lanczos method Our modified algorithm is as follows. Note we tbrm the next vector qi÷~ by

G.P Zhang/Cot,puter Physics Communications 109 (1998) 27-33

qi+l = Hqi -

~

Hj.iqj,

(2)

j= 1,2,...,i

rather than use the three-term iteration ( 1). Here Hj.i = qJHqi. The reason is that the three-term iteration only ensures the orthogonality in infinite precision arithmetic (for more details, see Ref. [ 3 ] ) . The following ql does not necessarily refer to the lowest eigenstate. (A) Choose an orthonormalized trial vector ql, and lbrm a standard Krylov space Kr{ql, Hql . . . . . Hrql } with the dimension r initially chosen as three or tour. (B) Build an orthonormal basis {ql,q2 . . . . . qr} by the Gram-Schmidt orthonormalization scheme several times (typically two or three times) to ensure the overlap between themselves and all the previously converged eigenvectors smaller than 10-15-10 -16 . (C) Form T/.j = q~Hqj, and diagonalize it to get eigenvectors { z ( I ) , z ( 2 ) . . . . . z ( r ) } and approximate eigenvalues {e( 1 ), e(2) . . . . . e ( r ) }. (D) Construct a new ~l from { z ( l ) , z ( 2 ) . . . . . z ( r ) } [5]. (E) Iterate ( A ) - ( D ) a few times (typically four or five times) to get a rough convergence. (F) If the convergence is below a desired threshold, use ~t to build a new Krylov space but with the dimension increased to r -I- I. (G) Check whether r is larger than one tolerant dimension, for instance, six. If so, truncate the dimension r to the initial dimension rather than increase r. (H) Iterate to ( A ) . From Step ( A ) to (H), one may notice that the dimension of the Krylov space is restricted as small as six. This is a big :advantage as one does not need to build a large matrix which consumes a lot of CPU time. Meanwhile the memory requirement becomes small. Step (B) ensures a good orthogonality. Although there are many schemes which claim no need of orthogonalization, we think that for a real scientific calculation such as [7], the reliability is most important, especially if one has no idea about matrices beforehand. With a good orthogonality, the MERL algorithm can calculate multiplicities of eigenvalues with the threshold high up to lO-14, and avoid spurious

eigenvalues. However, one must sacrifice some CPU ;.ime, but actually this ;.~quickly compensated by a fast convergence. Steps ( C ) - ( E ) are the traditional steps. Numerical experiments show that for one certain dimension, the rapid convergence mostly occurs at the first few steps of the Lanczos iteration and then slows down almost exponentially. This observation leads to Step ( F ) . After primary sweeps over ( A ) - ( E ) , the dimension of the Kryiov matrix is increased by one. Alternatively, one can increase the dimension by two or more. With the variation of the dimension, the algorithm avoids being trapped in a slowly convergent path and ensures that one is always in a fast convergent channel. Alternatively, one may also decrease rather than increase the dimension of the Krylov space (as we do below). The essence of it is to jump out of those slow convergent trapping areas. In Step (E) we suggest that one iterates typically five times, but one may also set a tolerance as to when one sh:~uid increase or decrease the dimension. Step (G) is devised to avoid building a too large Lanczos matrix in order to reduce the total CPU time and the memory requirement.

3. Numerical tests and discussions Numerical tests have been performed in onedimensional extended Hubbard chains. The matrices are given by

-- - t ~ ( e L . j , . , r

+ h.c.)

i.~'r

+u ~,,,T,,. + v Z " " + ' i

(3)

i

by choosing different U and V. U and V are the onsite and inter-site electron interactions, respectively. All the operators have their common meanings.

3.1. Multiplicities of matrices We would like to demonstrate the capability of computing multiplicities. A conventional single vector Lanczos scheme cannot treat multiplicities efficiently. We have tested many matrices with high degeneracies. Here we give an example in a Hubbard chain

3o

G.P. Zhang/Computer Physics Communications 109 (1998) 27-33

with length N = 4, and Hubbard U/t = 4, V/t = 0 with an open boundary condition. The size of the whole Hilbert space is 44 = 256. Here we have not employed any symmetry to biock-diagonalize th" Hamiltonian. Carefully comparing the eigenvalucs obtained by the MERL scheme with those exact data, we find that the degenerate levels are correctly reproduced. For example, levels 4 - 6 are triplets. The eigenvalues (in units of t) are -2.23606797749979, -2.23606797749978, -2.23606797749978. Compared with the results from an exact diagonalization, -2.23606797749979, only the last digit is found to be different. High levels, such as levels I11-116, are six-fold degenerate and the exact eigenvalues are 3.38196601125011 while the eigenvalues obtained by the MERL scheme are also 3.38196601125011. We have not found any spurious eigenvalues during the calculation.

3.2. Convergent tests for the MERL algorithm To converge hardly-converged levels is an interesting and challenging problem. In the following, we firstly provide several comparisons with the ERL algorithm to demonstrate the efficiency of the MERL scheme while comparisons with the Davidson algorithm are presented at the end of this subsection. We note that ( l ) for those easily-converged levels, our new method does not show a big improvement over the ERL scheme though it really accelerates the convergence; (2) all the initial trial eigenvectors are chosen randomly, but are exactly the same for the MERL and ERL algorithms; (3) for the ERL scheme, the Krylov dimension is fixed to 6 while for the MERL algorithm the dimension is changed from 6 to 2. Thus one iteration over Steps ( A ) - ( D ) for the MERL scheme will not take a longer time than that for the ERL scheme. First we. calculate an eight-site Hubbard chain with U/t = 4, l~rt = 2. The half-filling case and open boundary condilion are used. If one is restricted to the subspace of Sz = O, the Hamiltonian dimension is 4900. In Fig. l, we illustrate how the convergence develLr~ for a level with the eigenvalue 8.326. From the figure, one may notice that for the first 20 iterations, th~ MERL and ERL algorithms converge roughly at the same speed 4 . After 24 iterations, a step-function4 Though

the ERL scheme converges slightly faster, the total

8.386

8.366 uJ

8.346

8.326 0

10

20

.... 30 40 50 60 Iterations

70

Fig. 1. Convergenttests for the MERL and ERL algorithms. The dimensionof the matrix is 4900. like jump occurs for the MERL scheme due to the change of the Krylov dimension from 6 to 2 5. Thus the algorithm leaves the slow convergent channel for a fast channel. After another 24 iterations, the algorithm goes to another fast convergent channel. For the ERL algorithm, the convergence is fast at the first 20 steps, but then almost exponentially slows down. This is quite common in the Lanczos scheme. There is no sharp jump. The algorithm is trapped in a slow convergent channel. For those easily-converged levels, this may not be a serious problem, but for those hardlyconverged levels, this becomes a major hindrance. We provide another example below. We use the same parameters as before but set V/t = O. We calculate a eigenstate with the eigenvalue -2.68301733 by both the MERL and ERL algorithms. In Fig. 2a, we plot the results up to 200 iterations. For the ERL scheme, the convergence becomes slower after about 150 iterations. It takes about 500 iterations to reach the energy error A = E - Eexact = 10 -7 (see Fig. 2b). From A = 10 -7 to 10 -s, one needs another 500 iterations. To reach the precision up to 10 -14, the total iterations are over 1500. However, if we use the MERL scheme, to reach the same precision, the number of the total iterations is less than 200, which is about seven times faster than that in the ERL scheme. The approximate eigenvalue at the 50th iteration is already close to that at the 200th iteration by the ERL cPu spent is roughly the same due to the fact that the CPU time needed per iteration is longer in the ERL scheme. 5 We changethe Krylovdimensionback to the originaldimension after a few iterations (typically four or five). Such big dimension change occurs per 4x6 iterationsas we haveexplainedin Section2.

G.P. Zhang/Computer Physics Communications 109 (1998) 27-33

31

6e-06 -2.677 4e-06 "------~ -2.681

50

150

200 Oe 00

/

+

, 2e-07

100

(b)

scheme. Such sharp convergence is significant not only in the first few iterations, but also in the later iterations 6. Fig. 3 shows some later iterations. The convergent path still behaves like a step function (see Fig. 3a). A steep j u m p occurs when the Krylov dimension changes. We also display a few very last iterations in Fig. 3b, where a step-function-like convergence is again obvious. Such nice feature has greatly helped us to accelerate the calculations in Ref. [7], where we have diagonalized more than 1000 large matrices. The matrix dimension is nearly 3 x 104 and the number of the nonzero elements is nearly 2 x 106. The threshold has been set high up to 10 -t4. Here we present two examples to demonstrate the efficiency of the M E R L scheme for larger matrices. We choose U / t = 2, V/t = 1.04, N = I0 7. The periodic boundary condition is employed. The matrix dimension is 27106. The number of nonzero elements is 1912306. We consider a level with the eigenvalue °This rapid convergence in la~er iterations is very important in real calculations. However, traditionally it is difficult to have such nice properties. In our new scheme, we enjoy this rapid convergence both at the beginning and end of iterations. 7 We use the density matrix renormalization group to truncate the Hilhert space by keeping 160 states. The actual V,., where the phase transition from spin density wave to charge density wave occurs in the extended Hubbard model, is slightly larger than U/2. We choose V/t = 1.04 for U/t = 2. For details, see 171.

60

i

80

6e-o9

1000 1500 2000 Iterations Fig. 2. Numerical tests of convergenees in the MERL and ERL schemes. (a) A step-function-like rapid convergenceis achieved in the MERL scheme. (b) A slow convergence for the ERL scheme for the later iterations can be noted.

(a)

100

120

8e-09

~

0e+00 500

MERq

L_

2e-06 -2.685 - 0 4e-07

I--

I--

MERq

(b)

4e-09 ~' 2e-09

~

0e+00 120

, 1,$0

160 180 200 Iterations Fig. 3. The same matrix is used as that in Fig. 2. but more detailed information about the convergenceis shown for the MERL scheme. (a) For iterations from 60 to 120. the convergencebehaves like a step function. (b) For some very last ilemtions, the convergence still behaves like a step function. of 1.7042687. The results are plotted in Fig. 4. One can see that after the first variation of the Krylov dimension, the convergent path sharply changes from the line above !.711 to the line which is close to the final expected value. After nearly 80 iterations, the error A comes down to 10 - N . For the ERL scheme, the convergence is very slow. After a first few iterations, the convergent path is trapped in a slow convergent channel. The number of the total iterations is over 600 for the threshold of 10 -~4. Fig. 5 shows the results for another large matrix, where U/t = 6, V/t = 2, N = 10. The matrix dimension is 28040. The number of nonzero elements is 2018692. From the figure one can see that after the second variation of the Krylov dimension, the error A is sharply reduced from 10 -6 to 10 -9 . We also illustrate several last iterations in the inset, where one can also find a very steep convergence. The number of the total iterations is about 120. For the ERL scheme, however, the convergence is very poor. From the 20th to 160th iteration, the change of A is rather tiny, which indicates a slow convergence ahead. Finally we present comparisons with the Davidson algorithm. The MERL scheme is very similar to the Davidson algorithm in the sense that the Krylov di-

32

G.P Zhang/Computer Physics Commnnk'ations 109 (1998) 27-33 8e-04 i

6e-04

,d

"-.

1.707

4e-04

I

Daviclson I

0

20

40 60 Iterations

80

0e+00 0

--~-

--

4e-06

,e.,0 ;o !0,20,0 4'0

80

120

100 200 Iterations

300

Fig. 6. A similar steep convergence in the Davidson method is noted. The Krylov matrix dimension is 30. In the MERL scheme. the largest Krylov dimension is 6. The matrix dimension is 4900. eigenstates than the Davidson algorithm. For the case that the K r y l o v dimension is six, the Davidson method can calculate at most six eigenstates. ( 2 ) For a similar fast convergence, the M E R L scheme uses a ,v,uch

8e-06

I-

"~

100

Fig. 4. For a larger matrix with the dimension of 27106, and 1912306 nonzero e"ments, the MERE scheme converges much faster than the EI~L algorithm.

<

-.

"-. 2e-04

0e+00 0

I- MEFIL

~'~

-.

1.711

1.703

'

160

Iterations Fig. 5. For a matrix with the dimension of 28040 and 2108692

nonzero elements, a faster convergence is observed for the MERE algorithm while the ERE scheme is quickly trapped in a slow convergent channel after first a few iterations, that: for the very last iterations, the MERE algorithm still shows a robust rapid convergence. mension is changed after a few iterations. Thus, the convergent behavior is very similar to each other. In Fig. 6, we illustrate a typical example. The matrix dimension is 4900. We consider one eigenstate with the eigenvalue - 2 . 8 7 5 9 7 . The Krylov dimension for the Davidson scheme is 30. From Fig. 6, one can see that the Davidson algorithm converges like a step function too. The M E R L scheme actually absorbs this nice feature, but uses a much smaller Krylov space, which needs a very small memory space for those auxiliary vectors. This is a big advantage for real calculations, especially for those calculations where the computer memory is a major hindrance. In comparison to the Davidson scheme, there are several new features in the M E R L algorithm. ( 1 ) With the above small Krylov dimension, the M E R L method is able to calculate more

smaller memory space than the Davidson method uses. The convergence for the Davidson algorithm greatly relies on the Krylov dimension. For a small Krylov dimension such as six, the convergence is very slow. We present an example here but use a larger Krylov dimension for the Davidson method while for the M E R L scheme, the Krylov dimension is exactly the same as ~ f o r e . The matrix dimension is 10092. The number of the nonzero elements is 503206. The M E R L algorithm takes 1206 seconds s to converge 15 eigenstates. To obtain 15 eigenstates by the Davidson scheme, we enlarge the Krylov dimension up to 20. It takes 305 I seconds to converge 15 eigenstates, which is nearly 3 times slower than the M E R L scheme. If we further enlarge the Krylov dimension up to 30, 1453 seconds are needed to converge those eigenstates, which is compatible to the time needed in the M E R L scheme. If ,,le had no memory limitation and were able to furtaer increase the Krylov dimension, then tl'e perforrlance of the Davidson algorithm would be better than that of the M E R L scheme. However, normally the available memory space is finite, which inhibits one from building a large matrix. Technically the main reason to use those iterative diagonalization schemes STests were performed on our Dec alpha machine with the clockspeed 600 MHz.

G.P. Zlumg/Computer Physics Communications 109 (1998) 27-33

is to overcome the memory limitation. Those methods would lose their advantages if the memory requirement is still too large 2 . For the above two large matrices in the last two paragraphs, to have a reasonable fast convergence, the necessary auxiliary vectors in the Davidson algorithm immediately exceed our memory limit 9. Thus the M E R L scheme can achieve a similar fast convergence as the Davidson method, but uses a much smaller memory space. This is ideal in many real calculations.

Krylov space, the M E R L scheme normally converges faster. Numerical tests of this modified algorithm have been presented in one-dimensional Hubbard model.

Acknowledgements The author thanks ProL P. Fulde for helpful discussions. He especially thanks Dr. W. Stephan for critical reading the manuscript and many fruitful suggestions. He is grateful to ProL E.R. Davidson to help him with several numerical tests.

4. Conclusion In conclusion, we propose a modified explicitly restarted Lanczos algorithm to calculate the eigenvalues and eigenvectors of large sparse matrices. The performance is better than that of the traditional ERL method while the modification is simple. The convergence is improved greatly, especially for those hardly converged eigenstates. Compared to the Davidson method, with the same small Krylov matrix dimension, the M E R L scheme is able to calculate more eigenstates than the Davidson algorithm. For a similar fast convergence, the memory requirement in the M E R L scheme is much smaller than that in the Davidson method. This is very desirable in real calculations. In some cases where one is not allowed to build a large

9 In Ref. 17 I, we have to save the density matrices for all the sites which already occupy a large memory space. This strongly appeals to an efficient diagonalization scheme with a very small memory requirement.

References I I I C. Lanczos, J. Res. Natl. Bur. Stand. 45 (1950) 255. 121 B.N. Parlett, The Symmetric Eigenvalu¢ Problem (PrenticeHall, Englewood Cliffs, NJ, 1980). 131 J.K. Cullum, R.A. Willoughby,Lanczos Algorithms for Large Symmetric Eigenvalne Computations (Birklg.-iuser, Boston, 1985). [4] H.-G. Weikert, H.-D. Meyer, L.S. Cederbaum. F. Tarantelli, J. Chem. Phys. 104 (1996) 7122. [51 E.R. Davidson, J. Comput. Phys. 17 (1975) 87; E.R. Davidson, Comput. Phys. 7 (1993) 519. [61 C.W. Murry, S.C. Racine. E.R. Davidson, J. Comput. Phys. 103 (1992) 382. 171 G.P. Zhang. Phys. Rev. B 56 (1997) 9189.