Least-squares estimation of enzyme parameters

Least-squares estimation of enzyme parameters

Cornput. /Sol. Med. Vol. 21, No. 6. pp. 459-464. Printed m Great Britain 1991 0 001~4825/91 s3.00+ .a0 1991 Pergamon Press plc LEAST-SQUARES ESTIMA...

411KB Sizes 0 Downloads 66 Views

Cornput. /Sol. Med. Vol. 21, No. 6. pp. 459-464. Printed m Great Britain

1991 0

001~4825/91 s3.00+ .a0 1991 Pergamon Press plc

LEAST-SQUARES ESTIMATION OF ENZYME PARAMETERS M. E. JONES and K. TARANSKY The School of Medicine, The Fhnders University of South Australia, Bedford Park, South Australia 5042 (Received 19 October 1990; in revised form 22 March 1991; received for publication 14 August 1991) Abstract-The estimation of the enzyme parameters K, and V,,,,, from initial velocity data, or of analogous parameters in binding or transport experiments may be accomplished by transformation of the data, or by a direct weighted least-squares fit. Although the latter makes better use of the data, the method is complex and may be sensitive to initial parameter estimates. We develop a method which reduces the problem to finding the zero of a continuous function of a single variable. Non-linear least squares

Parameter estimation

1. INTRODUCTION Of the mathematical models used in quantitative biology, the rectangular hyperbola is probably the most common. It models enzyme reactions, the binding of ligand to receptor, and simple uptake mechanisms. Transformation methods are useful for the visual assessment of data, but there is increasing consensus that parameter estimation and the statistical testing of hypotheses about these parameters should be based on a weighted least-squares estimation of the parameters [l, 2, and references therein], The problem falls within the field of non-linear optimization, and a discussion of the methodology often involves the Marquardt-Levenberg algorithm and the Cholesky decompositon of matrices. This is unfortunate, not only because these topics fall outside the immediate interest of the practising biologist, but because they represent a very general approach which is inappropriate for the particular problem of the rectangular hyperbola. We offer an approach suitable for the least-squares estimation of K,,, and V,,,,, and the parameters in similar models. In the statistical analysis of data based on non-linear models, both point estimation and hypothesis testing are complicated by the non-linearity. This non-linearity introduces the necessity of using iterative techniques to estimate the parameters. Such techniques are time-consuming and, when carried out in more than one dimension (i.e. when more than one parameter are being estimated), are not guaranteed to converge on a solution. The present communication was prompted, in part, by an ‘off-the-shelf’ program for estimating Km and V,,,,, which entered an infinite loop of calculations, oscillating about the solution. When parameters have been estimated iteratively, confidence regions for those parameters may be estimated in several ways. In general, an assumption is made that the model is linear in the region of the estimated parameters and the confidence regions will therefore be ellipses defined by the variance and covariance of the parameters. A more accurate approach is to avoid approximations and to derive the probability distributions directly by repeated simulation. Such techniques are very time-consuming and require an algorithm which rarely, if ever, fails to converge. A central tenet of non-linear optimization is that no single technique is best for all problems, and that for any given problem there are data sets that will foil most 459

M. E. JONESand K. TARANSKY

460

techniques. We offer a technique that exploits the simple structure of the rectangular hyperbola. In outline, the method is as follows. We define a function G(k) of the variable k in such a way that G(k) is continuous and must be zero when k is the required weighted leastsquares estimate which we denote R,,,. Because G(k) is a continuous function of k we have only to find a value a for which G(a) is positive, and another, 6 such that G(b) is negative to be assured that G is zero somewhere on the interval (a, b). Although finding the value &, on (a, b) such that G(&J =0 requires an iterative search, such a one-dimensional search is computationally trivial. This is one of the few contexts in which an iterative search can be guaranteed to converge on a solution. Once & is found, the estimate r’,,, is defined algebraically; it requires no iteration. Accordingly, the commonly cited drawbacks of nonlinear multi-dimensional search algorithms such as Marquardt-Levenberg are avoided. There is a need for an initial parameter estimate, and sensitivity of convergence to the accuracy of that initial estimate. Because iteration is confined to finding the zero of a function of one variable, convergence to that zero can be guaranteed once the zero is bracketed. 2. STANDARD

FORMULATION

OF THE

PROBLEM

To provide a concrete example we develop the problem in terms of an experiment designed to evaluate the parameters V,,, and & of an enzyme using initial velocity measurements. Then the standard model relating velocity, V, to substrate concentration [S] is (see, for instance, [3]),

hlax[s1

v=Km+[s]’ in which V,,,,, and K,,, are the parameters to be estimated. The data consists of II pairs (Si, ui), i= 1,2, . . . , n of measurement of velocity Uiwith substrate concentration Si. The aim is to find estimates v,,,,, and &, which, in some sense, minimize the residual differences between the predicted and observed velocities. The weighted sums of squares of residuals, Q,, to be minimized is Q,(k,

V) =

2

Wi

i=l

where an unweighted least-squares estimator has Wi= 1 for all i. Various weightings, including an approximation to minimum chi-square, can be accommodated by updating the values of Uiiteratively, and this is the basis of the iteratively reweighted least squares (IRLS) approach. The emphasis of this communication is in avoiding unnecessary iteration and approximation, however, and we do not advocate IRLS for minimum chisquare. For minimum chi-square the function Q, to be minimized is

(1) a

is

when

readings

smaller

as

often

in 3. The defined,

the

algorithms a estimate

is

general root this

A i.e.

G(k) k = G(k)

Enzyme parameters

461

defined and continuous for k 3 0. Accordingly, G(k) is evaluated in steps from zero, until G(0) and G(k,) have opposite sign. Then a value k,,, exists on the interval [0, kh] such that G(&,J = 0, and k,,, can be found iteratively by any one-dimensional search algorithm on this interval. Binary chop and secant methods (see, for instance [4]) are both adequate. Then ?,,,,, is defined algebraically, given k,,,, and its estimation requires no iteration. It remains to define the function G(k) and the formula for V,,,,. 3.1. Minimum chi-square Write A=&;

and B=Z(vT/sJ.

Then G(k) is defined

and if G(k,,,) = 0, then v,,,,, is given by

(3)

3.2. Weighted least-squares Let

(4) (5) (6) D=c

,y;;;,

.

(7)

G(k) =AD - BC,

(8)

If,,,,, = DIB.

(9

I

Then,

and when G(k) = 0,

4. DISCUSSION This paper does not advocate new estimators of K,,, or V,,,,. It is based on the premise that the direct fitting of untransformed data is desirable and is becoming routine practice. Biologists are increasingly computer-literate, and are capable of writing programs which sum a number of terms, and which find the zeros of the function G(k) by the binary

M. E. JONESand K. TARANSKY

462

chop algorithm. These elementary procedures suffice to obtain weighted least-squares estimates. The function G can be inspected for multiple roots, indicating local minima, and the values of the sum of squares examined to distinguish minima from maxima. Unless the algorithm is used within a Monte-Carlo simulation, it is probably unnecessary to automate the finding of the root of the function G(k). A trial-and-error approach is quite adequate to determine the root to any desired accuracy. A few lines of Pascal code to accomplish this is offered in the Appendix. 5. SUMMARY Models of enzyme, binding, and transport mechanisms often reduce mathematically to the rectangular hyperbola. Estimation of the parameters in these models involves either transformation of the data, or a direct non-linear least-squares fit. The latter method makes more efficient use of the data, but if, as is currently the practice, it is posed as a non-linear optimization process in two dimensions it suffers from four drawbacks. The programming is complex, the algorithm may not converge, it may be sensitive to initial parameter estimates, and it is difficult to distinguish between local and global minima. By reducing the problem to one of finding the roots of a continuous function in the one variable K,,,, the difficulties attendant upon a direct weighted least-squares fit are largely overcome. Acknowledgements-This work was supported by the Clive and Vera Ramaciotti Foundations of Australia, and by the National Health and Medical Research Council of Australia. Dr Jones holds a Visiting Fellowship to the National Centre for Epidemiology and Population Health, The Australian National University, Canberra ACT.

REFERENCES 1. J. A. Zivin and D. R. Waud, How to analyse binding, enzyme and uptake data; the simplest case, a single phase, tife Sci. 30, 1407-1422 (1982). 2. E. Biirgisser, Radioligand-receptor binding studies; what’s wrong with Scatchard analysis? Trends Pharmacol. Sci. 5, 142-144 (1984). 3. A. L. Lehninger, Principles of Biochemistry, pp. 213-215. Worth Publishers, New York (1982). 4. W. H. Press. B. P. Flannerv. S. A. Teukolskv and W. T. Vetterlina. Numerical recioies. The Art of . Scientific Cor&ting. Cambridge University Press, U.K. (1986). 5. W. A. Wood and I. C. Gunsalus, Serine and threonine deaminases of Echerichia coli: activators for a cellfree enzyme, /. Biol. Chem. 181, 171-182 (1949).

About the Author-MICHAEL

E. JONESreceived degrees in Medicine and Mathematics from the University of Adelaide in 1970, and a Ph.D. in Theoretical Biology from the University of Chicago in 1975. He is currently senior lecturer in Anatomy at The Flinders University of South Australia. Dr Jones is a member of the Society for Mathematical Biology and a Fellow of the Faculty of Anaesthetists of the Royal College of Surgeons of England.

About the Author-KAREN

L. TARANSKYreceived the B.Sc. degree with Honours from The University of Adelaide in 1985. She worked on a biomathematics software project at The Flinders University of South Australia before joining Cruikshank Technology where she is now a Project Manager developing software for industrial applications.

APPENDIX Source code The following code represents the crux of a procedure to find the root of G(k) determining &, and p,,. It assumes that the experimental values u, and si are contained in the real arrays V[I] and S[I] respectively, and that NumPoints, an integer variable, is the number of readings. The code is written for minimum chi-square. To convert to weighted least-squares, the six lines coding for the real variables A, B, C, D, G and V,,, will be defined by equations (4)-(9) respectively. Strictly speaking, a more efficient code for minimum chi-square

Enzyme parameters

463

would take advantage of A and B being independent of k, but we forgo that advantage in order to maintain a similar subroutine stucture for both types of estimate. The essential source code is, k:=O.O; write(’ k= 0.0 ‘); { start with k=O ) Repeat A:=O; B:=O; C:=O; D:=O; For I:= 1 to NumPoints do Begin A:=A

+Sqr(V[I]);

B:= B+ Sqr(V[I])lS[I]; c:=c+s[I]/(k+S[I]); D:= D + S[I]ISqr(k + S[I]);

End G:=B*C-(A+k*B)*D; V:= Sqrr(BID);

Writeln(’ G = ‘,G:8:6,’ V,,, = ‘,1/‘:8:3); Writeln(’ New Guess at K,,,? (negative to exit) ‘); Read(k); Until k
Wood and Gunsalus [5] examined the deaminase of Escherichio coli. They give L-threonine (substrate) concentration in mg per ml, and measuredpM of keto acids formed during a ten minute incubation. Their data for (substrate, velocity) are (0.05, 0.29), (0.15, 0.45), (0.25, l.OO), (0.5, 1.25), (1.00, 1.55) and (5.00, 1.60). Using the algorithm for the &weighted least-squares fit (minimum chi-square values are in square brackets) an initial value a =0.05 gives G(0.05) = - 15.2162 I - 56.40791. At the other extreme of the experimental ranae for substrate concentratan we ‘have’ b = 5.0 which gives G(5.0) = 0.0059 [2.9140]. As these values of G are of opposite sign, there must be a zero of G in the interval (0.05,5.0). A search locates the root at G(0.2.502) = 0.0 [G(0.2723) = 0.01. Accordingly the unweighted least-squares estimate of K,,, is 0.2502 [and minimum chi-square estimate is 0.27231. The estimate of V,,, is 1.791 [1.837]. Values given here are quoted beyond the appropriate number of significant figures to assist program debugging. Theory

For notational convenience write

Minimum chi-square. For minimum chi-square we have, from 1,

u,+$ u:/&(k).

Q,(k V)=Vzf,(k)-2z

At the required minimum the slope of Q,(k, V) will be zero in all directions, giving rise to the normal equations X2ldk = 0 and aQlaV= 0, which give

V* c

aJ(k)/ak=

c

(a~(k)lak)ufl[f,(k)]*,

(11)

M. E. JONESand K. TA~ZANSKY

464

which can be recast in terms of an equation independent

of V,

(12) Then with &,, determined as the root of this, V,,, is obtained from 10 or 11 which will give identical answers for VIll-. If we now exploit the properties off,(k) = s,/(s, + k) we note,

c c

oflf(k)=x

(af7ak)o:/f(k)2= i

u:+kx -c

of/s,,

Vfl& i

So that with A and B as defined previously, equations (2) and (3) are restatements of (lo)-(12). Weighted least-squares. Development analogous to the above gives the normal equations

from which equations (8) and (9) are immediate.