A linear algebra biography

A linear algebra biography

Linear Algebra and its Applications 473 (2015) 3–13 Contents lists available at ScienceDirect Linear Algebra and its Applications www.elsevier.com/l...

1MB Sizes 4 Downloads 63 Views

Linear Algebra and its Applications 473 (2015) 3–13

Contents lists available at ScienceDirect

Linear Algebra and its Applications www.elsevier.com/locate/laa


A linear algebra biography Ingram Olkin Stanford University, United States

a r t i c l e

i n f o

Article history: Received 19 December 2014 Accepted 19 December 2014 Available online 24 February 2015 Submitted by P. Semrl MSC: 15-00 01A05 01A70

a b s t r a c t In my talk at the LAA meeting in honor of Hans Schneider, I gave a brief biography of my introduction to linear algebra and my interaction with some of the linear algebraists at that time. It was suggested that I write this up, and thus the following. 2015 Published by Elsevier Inc.

Keywords: Matrix theory Singular value decomposition Eigenvalues Biographies

I entered CCNY (College of the City of New York and later CUNY, City University of New York) in 1941. The war was on and math majors were being recruited by the US Army Air Force to become meteorologists. I enlisted and was sent to MIT. The Army permitted me to sit in on MIT math classes. (Two of the instructors were Norman Levinson and Robert Cameron.) After graduation I was stationed at different airports where I served as the weather officer. I was discharged in 1946 and returned to CCNY, where I took courses with Emil Post (of Post Logic) and Bennington Gill, and graduated with a B.S. degree in 1947.

E-mail address: [email protected] http://dx.doi.org/10.1016/j.laa.2015.01.007 0024-3795/2015 Published by Elsevier Inc.


I. Olkin / Linear Algebra and its Applications 473 (2015) 3–13

My next move was to the newly formed department of statistics at Columbia. The faculty consisted of Abraham Wald, Jacob Wolfowitz, Theodore Anderson, and Howard Levene. My linear algebra connection was in a basic course taught by Howard Levi that was based on Maxime Bôcher’s 1907 book, Introduction to Higher Algebra [4]. There is a very interesting review of this book by Robert Thompson (1984), seventy-five years later. I can’t remember whether Paul Halmos’ Finite-Dimensional Vector Spaces of 1942 [7] was required or just recommended. Harold Hotelling, well known in both economics and statistics, was at Columbia from 1931 to 1946, when he left for Chapel Hill to chair a new department of theoretical statistics. He is the originator of principal components (1933), canonical correlations (1936), and a number of important results in multivariate analysis and matrix calculations (1943). He suggested using powers of matrices to more readily estimate the largest eigenvalue. These papers were important not only for the theory but because they set the tone for mathematical statistics. The methods proposed by Hotelling are currently central in the data mining and machine learning literature. I wanted to study with Hotelling, so applied for admission, was awarded a Rockefeller fellowship, and arrived at UNC in 1948. My first encounter with linear algebra was a course by Alfred Brauer, who was working on Gershgorin disc inequalities and ovals of Cassini. Emilie Haynsworth was in that class and continued her studies with him. Later on Tom Markham was also a student of Brauer; I am sure there were others, but I did not meet them. Brauer was of the old school, totally absorbed with mathematics. He was gentle with his students and they were attracted to him. My wife and I were invited to his house where we met his wife Hilda. After his death I visited her whenever I was in Chapel Hill. Alfred occasionally talked about his brother Richard, who was on the Harvard faculty and very well known. I also met Richard’s son George, who is also a mathematician, at the University of Minnesota. Brauer introduced me to E.T. Browne, another linear algebra faculty member. Browne was not well at the time, so I did not interact much with him. He published several papers from 1928 to 1930 and also a book on linear algebra. One of the papers was on the polar decomposition, another on the interlacing property of eigenvalues; he also showed that maxi Rλi (A) ≤ maxi λi ((A +A∗ )/2) and that |λ1 (A)| ≤ σ1 (A). Here λ1 is the eigenvalue with largest modulus, and σ1 is the largest singular value. Both of these inequalities were later generalized to majorization inequalities. In addition to Hotelling, Pao-Lu Hsu was on the faculty. There were three Chinese mathematicians, all born in the same year, who changed the course of mathematics in China: Pao-Lu Hsu in probability and statistics, Lo-Keng Hua in functional analysis, and Shiing-Shen Chern in geometry. (For a review of Hsu’s contributions see [5].) In 1949 Hsu was on sabbatical leave in China and decided to stay there. He wanted to help China build its mathematical and statistical base. As a result I couldn’t take his course. Hotelling suggested that I study notes taken by students in the previous year and then give a lecture on what I had learned. Hsu’s proofs were elegant examples of the use of linear algebra in statistics. My thesis in 1951 came out of these lectures and was on

I. Olkin / Linear Algebra and its Applications 473 (2015) 3–13


the distribution of the component matrices in matrix factorizations. A key step in the derivation was to obtain the Jacobian of the matrix factorization. I digress for a moment to mention several such factorizations. In what follows all matrices are real. The singular value decomposition is perhaps the oldest factorization, and its use in computations has made it the most famous. The following factorization arose in statistics in 1937 and is called the QR factorization in computer science literature. Suppose X : p × n, p ≤ n, is of rank p. Then X = T G,


where T : p ×p is lower triangular and G : ρ ×n satisfies GG = Ip . The elements of T are called rectangular coordinates in an early paper by Mahalanobis, Bose and Roy [9]. This factorization is also contained in [3]. If the elements of X are independent and standard normally distributed random variables then the lower off-diagonal elements of T are also normally distributed, and the squared diagonal elements have chi-square distributions. Furthermore, if S = T T  , then S has a standard Wishart distribution. A second factorization is a simultaneous reduction of two positive definite matrices. If A and B are p × ρ positive definite matrices, then there exists a nonsingular matrix W such that A = W Ds2 W  ,

B = W Dc2 W  ,


where Ds = diag(s1 , . . . , sp ), Dc = diag(c1 , . . . , cp ), s2i + c2i = 1, and φi = s2i /c2i , i = 1, . . . , n, are the roots of the determinantal equation |A − φB| = 0. Factorization (2) is in the spirit of the eigenvalue decomposition for a single symmetric matrix. Along the lines of the singular-value decomposition, if X : p × n, Y : p × m, p ≤ n, are of rank p, then there exists a nonsingular matrix W , and matrices G : p × n, H : p × m, GG = Ip , HH  = Ip , such that X = W Ds G,

Y = W Dc H,

where φ1 , . . . , φp (defined above) are the roots of the determinantal equation |XX  − φY Y  | = 0. For simplicity of exposition I have assumed full rank, but this condition can be weakened. Hotelling attracted a faculty of stars: Herbert Robbins (of Courant and Robbins), Wassily Hoeffding, Raj Chandra Bose (one of the Euler spoilers; disproved Euler’s conjecture on the existence of two mutually orthogonal Latin squares of order 4n +2), S.N. Roy (Roy and Hotelling were my thesis advisors), William Madow, William Cochran, George Nicholson. He also attracted many visitors, one being Sir Ronald A. Fisher, perhaps the most influential statistician of the last century, and another, P.C. Mahalanobis, then head of the Indian Statistical Institute. Robbins used to invite a few students to his house after a colloquium. I was fortunate to be one of three students invited when John von Neumann gave a lecture. In the


I. Olkin / Linear Algebra and its Applications 473 (2015) 3–13

conversation at dinner he was asked how he decides what to work on. The answer was: The problem should be important, the solution feasible, and it should be enjoyable. His paper [13] on unitarily invariant norms is a gem, and satisfies all three conditions. Linear algebra became central to my research in multivariate analysis. The sources of this connection are described in a paper on the interface between statistics and linear algebra [11]. At Stanford which I joined in 1961, Charles (Karl) Loewner taught a course on matrix analysis. His notes provide a masterful exposition of the ordering of matrices

I. Olkin / Linear Algebra and its Applications 473 (2015) 3–13


which is now known as the Loewner order. For a symmetric (Hermitian) matrix A define A  0 to mean that A is positive definite. For two symmetric (Hermitian) matrices A and B, the Loewner order A  B means that A − B  0. Let A and B have ordered eigenvalues α1 ≥ · · · ≥ αn and β1 ≥ · · · ≥ βn , respectively. A useful result is that if A  B then αi ≥ βi , i = 1, . . . , n, which in turn implies that the respective vectors of eigenvalues α = (α1 , . . . , αn ) and β = (β1 , . . . , βn ) are ordered by weak majorization. (See [10] for more details on majorization.) In the late 1950s and early 1960s Sam Karlin was developing the theory of total positivity (TP) which he later published as a book [8]. Two TP results became central to a number of statistical problems. The first, due to I.J. Schoenberg (for a history of his work, see [10]), is the convolution theorem that if f (x, θ) is totally positive of order m, TP(m), and g(θ, y) is totally positive of order n, TP(n), then  h(x, y) =

f (x, θ)g(θ, y) dμ(θ)


is TP(min(m, n)). This result permits one to show that many noncentral distributions have the TP property. The second result is the variation diminishing property that if f (y) has at most j sign changes, j ≤ m − 1, and k(x, y) is TP(m), then the number of sign changes of  g(x) =

k(x, y)f (y) dμ(y)


is at most j. In addition to Loewner and Karlin in the Math Department, Ted Anderson and Charles Stein were in the Statistics Department. Anderson’s classic book on multivariate analysis had an important influence on the development of that subject, and on me. At an IMS meeting in 1958 Hotelling made a comment about Ted’s book: “I wish it a speedy obsolescence,” by which he meant that the field would progress so rapidly that the material in the book would have to be updated. Indeed, Ted’s book went into a second edition in 1984 and a third in 2003 [2]. Many multivariate procedures are now embedded in data mining methods. Gene Golub joined George Forsythe in the newly founded Computer Science Department. Gene started a numerical linear algebra seminar which served as an international meeting ground. As is well known, Gene had many visitors: Alexander Ostrowski, Jim Wilkinson, Fritz Bauer, Miroslav Fiedler, Leslie Fox, Welvil Kahn, and many others. The Wilkinsons were frequent visitors, and I visited them in England as well. Jim was a wine connoisseur, and stored wine under the sofa which was better temperature-controlled. It was always comical to see him take a bottle from under the sofa. Compound matrices play an important role in total positivity. There are two versions of compound matrices, one multiplicative and the other additive. The book by Aitken [1] is a gem that provides all the necessary tools for the multiplicative version. The kth


I. Olkin / Linear Algebra and its Applications 473 (2015) 3–13

multiplicative compound A(k) of an m × m, 1 ≤ k ≤ m, matrix A with real eigenvalues   m λ1 , . . . , λm is an m k × k matrix whose eigenvalues are all the products λi1 λi2 . . . λik of all k-tuples 1 ≤ i1 < i2 < · · · < ik ≤ m. Wielandt defined an additive compound   m Δk (A) of an m × m matrix A with real eigenvalues λ1 , . . . , λm as an m k × k matrix

I. Olkin / Linear Algebra and its Applications 473 (2015) 3–13


whose eigenvalues are the sums λi1 + · · · + λik of all k-tuples 1 ≤ i1 < · · · < ik ≤ m. (For more details see Chapter 10 in [10].) I asked Wielandt whether he defined the additive compound. He wrote me that he checked his notes and those of Issai Schur and believes that he was the originator. The key point in the use of compounds is that some results for the largest eigenvalue of a matrix A provides a direct extension using A(k) to the product of the k largest eigenvalues. Stanford had a policy that the chair of the Ph.D. orals committee was a member of another department. I often served as chair for theses in numerical analysis, and if my memory is correct that is where I met some of the next generation of numerical analysts: Cleve Moler, Dianne O’Leary, and Beresford Parlett. In addition to computer science, there was a newly formed Department of Operations Research (OR) with Gerald Lieberman as founding chair. Within a short period he created a great department. OR is a broad area. The group with whom I interacted in the area of linear algebra included Dick Cottle, George Dantzig, Curtis Eaves, and Pete Veinott. Cottle, Golub, and I taught a class in which each of us taught our specialized area: mathematical programming, numerical linear algebra, and matrix inequalities, respectively. In an interview [6] I noted that Dick, Gene, and I started to meet on Saturdays to discuss a book project incorporating our respective areas of interest. But the three of us were among the world’s heaviest travelers, and absences became more frequent, so the project ended with a table of contents. It is now too late, but I still believe that such a book would be a good read. Other linear algebraists (in the wide sense) whom I met over the years were: A.C. Aitken (his small 1939 book is packed with interesting material), Leonid Mirsky (I visited him in England and we talked about majorization inequalities), D.S. Mitrinovic (I had a nice visit with him in Belgrade; he was working on analytic inequalities), Marvin Marcus (we intersected on a number of occasions), Robert Thompson (I learned a lot about eigenvalue inequalities from him), Alston Householder (in addition to the Gatlinburg conferences I used to visit the Oak Ridge National Labs and would meet with him), Ed Beckenbach and Richard Bellman (he wanted me to write a book on matrix theory and gave me a recipe: write one page per day and you will have a book in a year. If that were only true for me!), Ky Fan (I think he didn’t get the credit he deserved; some of his work done early on was very elegant.), John Todd and Olga Taussky-Todd (they often visited Stanford), and Alan Hoffman (Alan is special because he knew inequalities, matrix analysis, combinatorics, and jokes which were not always at the level of his mathematics). Another source of much collegial interaction was the International Workshop on Matrices and Statistics (IWMS). The first series of conferences was held in 1990 in Tampere, Finland; over the past 20 years George Styan and Simo Puntanen have been instrumental in keeping these conferences active. (Ted Anderson was the honoree at the 17th IWMS held in Tomar, Portugal, in July of 2008.) These conferences brought together many linear algebraists whose primary area was statistics. My own contribution was a talk entitled “Interface between statistics and linear algebra” [11].


I. Olkin / Linear Algebra and its Applications 473 (2015) 3–13

LAA played an important role in fostering the connection between linear algebra and statistics. I am pleased to have been a co-editor (together with C.R. Rao and

I. Olkin / Linear Algebra and its Applications 473 (2015) 3–13


George Styan) of the first Special Issue on Linear Algebra and Statistics (volumes 67, 70, and 82). Subsequently there have been a total of ten special issues in this series (edited by Simo Puntanen, George Styan, and Hans Joachim Werner). This series served as a catalyst to bring together those working in both camps, linear algebra and statistics. Before closing I do need to acknowledge my indebtedness to my long-time collaborator, Albert W. Marshall. We met by accident: he was a post-doc at Stanford from the University of Washington, and I was on sabbatical leave from Michigan State University. I had just published a paper with John Pratt on a multivariate Chebychev inequality [12], and his thesis was also on a multivariate Chebychev inequality. Our offices were next door to one another (and to George Forsythe before the department of computer science was formed). We talked about inequalities, and over the next 50 years we published two books and over 25 papers. We seemed to complement each other, and the joke by our colleagues at the Boeing Scientific Research Labs where Al worked at one time was that if you wanted complicated integrations you asked Al, and if you wanted complicated differentiations you asked me. How more complementary can you get? I am grateful that chance took such a favorable turn for us to be together. Reminiscing about all of these colleagues, who except for a few are only memories, provided me an opportunity to acknowledge their positive influence on my linear algebra life. Acknowledgments I am indebted to Simo Puntanen for his substantive comments and many historical suggestions and to Richard Cottle, Alan Hoffman, Leslie Hogben, and Roger Horn for their comments on an earlier draft. They all helped to improve this paper. Appendix The Collyer brothers became famous for their compulsive collecting. I am not in their league, but I have saved class notes from courses that I took as a student or auditor. Normally lectures on the first day of classes deal with administrative details. The following are several excerpts from first-day lectures that are substantive. They provide some insight into the personalities of the individuals. In each case there was no chit-chat: each moved directly into the subject. Alfred Brauer, University of North Carolina, 09/24/1948: a course in number theory The key to the theory of numbers is an understanding of the properties of integers. For example, consider the old problem of obtaining a right triangle with integral sides. The usual examples are 3, 4, 5 or perhaps 5, 12, 13. What we are seeking is a solution to x2 + y 2 = z 2 . A scheme which will give such triangles is: take any numbers m, n, let x = m2 − n2 , y = 2mn, z = m2 + n2 . (Note: (m2 − n2 )2 + (m2 n2 ) = (m2 + n2 )2 .) The question remains whether such a system will give all the possible triangles. The answer is in the affirmative, and the proof will be given later.


I. Olkin / Linear Algebra and its Applications 473 (2015) 3–13

Irving Kaplansky, University of Chicago, 1955: rings of operators Even a casual inspection of the literature on rings of operators reveals that the early part of the subject is highly algebraic — say 90 percent. In this course I have set out to make it 100 percent. Such a project was obviously in von Neumann’s mind when he invented continuous geometry. But continuous geometries have two disadvantages: they apply only to finite rings of operators, and it takes quite a while to verify their axioms. A more general lattice project might well be worthwhile. In the meantime, Baer rings are offered as a workable vehicle for an algebraic study of rings of operators. Raj Chandra Bose, University of North Carolina, 1949-51: Graeco-Latin squares Two Latin squares may be said to be orthogonal to each other if, when they are superimposed, every letter of the one square occurs once and only once with every letter of the other. Such a pair of squares (one square being written with Greek letters) may be called a Graeco-Latin square. When p − 1 mutually orthogonal squares of side p exist, then by their superposition we get what may be called a completely orthogonalised or hyper-Graeco-Latin square. The work of Fisher and Yates has shown that such squares are of fundamental importance in experimental design. It is easy to see that for a prime number p, a p-sided hyper-Graeco-Latin square exists. Recently Yates has shown that hyper-Graeco-Latin squares exist also for the cases p = 4, 8, 9. Professor Fisher, during his recent visit to India, in a seminar held under the auspices of the Indian Statistical Institute, made the surmise that it should be possible to construct a completely orthogonalised hyper-Graeco-Latin square for every value of p, which is a prime or a power of a prime. It is the object of this paper to prove that this surmise is correct, by using the properties of Galois fields. It is hoped that the properties of Galois fields, and the finite geometries connected with them, will prove useful in many problems of experimental design and the author hopes to pursue this matter in subsequent papers. Saunders MacLane, University of Chicago, 1955: point-set topology Open sets in metric spaces Topology is the study of continuous transformations of one “space” into another. This requires a suitable notion of a topological space. They are constructed as very general types of objects on which a continuous transformation may be defined. An illustration is provided by the metric spaces M , which are sets of points x, y with a real-valued distance function ρ(x, y) with the properties ρ(x, y) ≥ 0,

and ρ(x, y) = 0 if and only if x = y;

ρ(x, y) = ρ(y, x);

(1.1) (1.2)

ρ(x, z) ≤ ρ(x, y) + ρ(y, z).

(Triangle axiom.)


A transformation f of a metric space M into a second such space N is called continuous if for each positive real and each point x ∈ M there exists a δ > 0 such that ρ(x, y) < δ


  ρ f (x), f (y) < .


I. Olkin / Linear Algebra and its Applications 473 (2015) 3–13


References [1] A. Aitken, Determinants and Matrices, Oliver and Boyd, Edinburgh, 1939. [2] T.W. Anderson, An Introduction to Multivariate Statistical Analysis, 3rd ed., Wiley Series in Probability and Statistics, Wiley-Interscience [John Wiley & Sons], Hoboken, NJ, 2003. [3] M.S. Bartlett, The vector representation of a sample, Math. Proc. Cambridge Philos. Soc. 30 (1934) 327–340, http://dx.doi.org/10.1017/S0305004100012512. [4] M. Bôcher, Introduction to Higher Algebra, The MacMillan Company, New York, 1907. [5] D. Chen, I. Olkin, Pao-Lu Hsu (Xu, Bao-lu): The grandparent of probability and statistics in China, Statist. Sci. 27 (2012) 434–445, http://dx.doi.org/10.1214/12-STS387. [6] A conversation with Ingram Olkin, in: Contributions to Probability and Statistics: Essays in Honor of Ingram Olkin, L.J. Gleser, M.D. Perlman, S.J. Press, A.R. Sampson (Eds.), Springer-Verlag, New York, 1989. [7] P.R. Halmos, Finite-Dimensional Vector Spaces, Annals of Mathematics Studies, vol. 7, Princeton University Press, Princeton, NJ, 1942. [8] S. Karlin, Total Positivity, vol. I, Stanford University Press, Stanford, CA, 1968. [9] P.C. Mahalanobis, R.C. Bose, S.N. Roy, Normalisation of statistical variates and the use of rectana 3 (1937) 1–40. gular co-ordinates in the theory of sampling distributions, Sankhy¯ [10] A.W. Marshall, I. Olkin, B.C. Arnold, Inequalities: Theory of Majorization and Its Applications, 2nd ed., Springer Series in Statistics, Springer, New York, 2011. [11] I. Olkin, Interface between statistics and linear algebra, in: Matrix Theory and Applications, Phoenix, AZ, 1989, in: Proc. Sympos. Appl. Math., vol. 40, Amer. Math. Soc., Providence, RI, 1990, pp. 233–256. [12] I. Olkin, J.W. Pratt, A multivariate Tchebycheff inequality, Ann. Math. Statist. 29 (1958) 226–234. [13] J.von Neumann, Some matrix-inequalities and metrization of matrix-space, Tomsk Univ. Rev. 1 (1937) 286–300.