Appendices

Appendices

Appendices Appendix A Theory of Globule-Coil Transitions in Homopolymers Here we will consider the simplest but fundamental model of the globule-co...

578KB Sizes 0 Downloads 24 Views

Appendices

Appendix A

Theory of Globule-Coil Transitions in Homopolymers Here we will consider the simplest but fundamental model of the globule-coli transition in a homopolymer chain (Grosberg and Khokhlov 1994; Lifshitz et al. 1978). In this model many (N) equal monomers are connected by the chain, like beads. And we will compare it with the “cloud of monomers,” that is, with an aggregate of the same N monomers unconnected by the chain: this is the simplest model, used since van der Waals’ time (in the 19th century) to describe the gas–liquid transition (Fig. A.1). We will see that the polymer has no analog of separation into “liquid” and “gas” phases typical of the “cloud.” Preliminary remark. I will assume that the monomers are attracted together (otherwise, it’s too dull: no dense phase will form, and that’s all), and that the strength of this attraction is temperature-independent, that is, that it is purely energetic in nature. The latter is often incorrect, since the monomers are floating in the solvent (recall the hydrophobic effect: it increases with temperature); but this allows us to find and consider all the basic scenarios of phase transitions in the simplest way. And later we can complicate the problem by adding the temperature dependence of the interactions and show at what temperature each of these scenarios can take place. To begin with, I should like to give you the general idea and to explain in words why the “cloud” can have two phases and the polymer cannot, that is, why the polymer has to condense gradually. Then the same will be more rigorously shown in formulas. What is the difference between the life of a monomer in the polymer and in the cloud? In the polymer it cannot go far away from its chain neighbor, while in the cloud it can go as far from any other monomer as the walls of the vessel permit. Why do monomers form a dense phase, either liquid (if they are not connected by the chain) or globule (if otherwise)? Because they attract each other, and at low temperature, the attraction energy overcomes any tendency for the monomers to scatter and gain entropy: at low temperature, the role of entropy is small (recall: in the free energy F ¼ E  TS the energy E is compared with the entropy multiplied by the temperature). Overcomes always but for one case: when scattering leads to a virtually infinitely large entropy gain, that is, when the cloud of monomers is placed in a vast vessel. And just here the monomers have to choose between two

431

432 Appendices

FIG. A.1 Schematic dependence of the monomer’s energy ΔE, entropy ΔS, and free energy ΔF ¼ ΔE  TΔS on the density ρ for two systems: “cloud of monomers” and “polymer chain.” The dependence of ΔF on ρ is shown for various temperatures T. A scheme of dependence of the density ρ on the monomer attraction energy (expressed in kT units) is shown in the auxiliary plot for each system (at the bottom). The plots refer to attracting monomers (ε < 0) and to a low (and constant) pressure: at very high, “supercritical” pressure the loose (with ρ < 0.5) states are absent.

phases: either to form a drop (low energy but low entropy as well) or to stay in a rarefied gas (negligible energy but a very high entropy). The states of intermediate density (eg, of 10% or 50% of that of the drop) have a higher free energy (their entropy is not as large while their energy is not as low) and therefore are unstable. Thus, the stable state of the “cloud” at low

Theory of Globule-Coil Transitions in Homopolymers Appendix

A

433

temperature can be either dense, or very rarefied, but not of an intermediate density. In other words, the “cloud” of monomers can exhibit an abrupt evaporation of the dense phase when the temperature rises. This is what happens in the “cloud.” And what about the polymer? Here the situation is quite different, because a monomer cannot go far away from its chain neighbor even in the most diffuse coil. This means that the coil entropy cannot exceed a certain limit, and thus the entropy is not able (at low temperatures) to compete with the energy gained by compaction of the polymer. Thus, the polymer has no alternative: at low temperature its monomers cannot scatter, and it has to be a dense globule only, which means that (unlike the “cloud”) it cannot jump from the dense to the rare phase. Therefore, a change of conditions can cause only a gradual swelling (or compression) of the polymer globule rather than its abrupt “evaporation.” Now, let us repeat the above in the language of equations. Let us compare a “cloud” of N monomers confined within the volume V with a polymer of the same N monomers in the same volume V. Assuming that the monomers (of both the cloud and the polymer) are distributed over the volume more or less uniformly, let us add one monomer to each system and estimate the dependence of the free energy of the added monomer (ie, the chemical potential of the system) on the system’s density. The chemical potential is appropriate for analysis of the co-existence of phases. The point is that the molecules pass from a phase with a higher chemical potential to a phase of a lower chemical potential: this decreases the total free energy, tending towards equilibrium. At equilibrium, the chemical potentials of molecules in both phases are equal. This means that an abrupt transition of one phase into another is possible only when the same chemical potential value refers to both phases, to both densities. On the contrary, if the chemical potential only increases (or only decreases) with increasing/decreasing density of the system, an abrupt phase transition is impossible. It should be noted that phase stability requires an increasing chemical potential with increasing density of the system. Then penetration of each new particle to this phase is more and more difficult. This is the stable state: like a spring, the system increasingly resists the increasing impact. In the opposite case (if the chemical potential drops with increasing density), the system is unstable and does not resist occasional fluctuations that increase its density. To summarize: the particular state of a system is determined by particular external conditions (ie, by the given volume or the given pressure), but the prerequisite for a possibility of a phase transition is the presence of two ascending (with density) branches of chemical potential values, and these two branches must have approximately equal chemical potential values, that is, they must be separated by a region where the chemical potential grows with increasing density of the system. In exploring the chemical potential value, let us first consider the energy of the added monomer, or rather, that part of the energy which depends on the system’s density, that is, on the interactions of non-covalently bound monomers.

434 Appendices

In both cases, the energy of the system (“cloud” or “polymer”) having the same density changes by the same value when a monomer is added. The change occurs because of the monomer’s interactions and is equal to ΔE ¼ ερ,

(A.1)

where ρ ¼ Nω/V is the system’s density, that is, the fraction of its volume V occupied by the N monomers having the volume ω each, and ε is the energy of interactions of this monomer at ρ ¼ 1, that is, in the most dense system (ε < 0, since we have assumed that monomers are attracted together). Thus, the added monomer’s energy behaves alike in both systems; but the added monomer’s entropy behaves very differently in the polymer as compared with the “cloud.” The “cloud” leaves the volume V/N at the monomer’s disposal. However, since the volume Nω is already occupied by the monomers, the free volume at the monomer’s disposal in the “cloud” is V1mono ¼ ðV  NωÞ=N ¼ ðV=N Þð1  ρÞ ¼ ðω=ρÞð1  ρÞ:

(A.2)

Notice that this volume is infinitely large when the density ρ turns to zero. At the same time, the free volume at the monomer’s disposal in the polymer is V1poly ¼ W ð1  ρÞ;

(A.3)

here, the volume Ω is limited by the monomer’s binding to the previous chain link, and the multiplier (1  ρ) shows, as above, that other monomers occupy the fraction ρ of the volume. Then, the entropy change resulting from addition of a monomer to the “cloud” is   (A.4) ΔSmono ¼ k ln V1mono ¼ k ln ½ðω=ρÞðð1  ρÞ while the entropy change resulting from addition of a monomer to the polymer is h i (A.5) ΔSpoly ¼ k ln V1poly ¼ k ln ½Ωð1  ρÞ: As the functions of the density ρ, ΔSmono and ΔSpoly behave alike (see Fig. A.1) only as ρ ! 1 (where the term ln(1  ρ) dominates in ΔS, so that both ΔSmono and ΔSpoly decrease as ρ ! 1), but their behavior is completely different as ρ ! 0: ΔSpoly remains finite, while ΔSmono grows infinitely because of the term ω/ρ. Thus (see Fig. A.1), at low temperatures T, the chemical potential ΔF ¼ ΔE  TΔS of the “cloud” has two ascending branches, that is, two potentially stable regions divided by the first-order phase transition (rarefied gas at a low ρ, owing to the  TΔS term, and liquid at a high ρ, owing to the ΔE term). And at high temperatures the “cloud” has only one ascending branch of ΔF value, that is, only one stable state (gas, owing to the TΔS term).

Theory of Globule-Coil Transitions in Homopolymers Appendix

A

435

At the same time, the polymer entropy does not grow infinitely even as ρ ! 0, and here only one ascending branch of the ΔF value exists at all temperatures. At extremely low temperatures, or rather, at high ε/kT values, the origin of this branch corresponds to ρ  1, that is, to a dense globular state. The temperature increase (or rather, the decrease in the ε/kT value) shifts the origin of the ascending branch towards lower densities ρ, and finally, the origin comes to the value ρ ¼ 0 and stays there permanently (Fig. A.1). As you see, the chemical potential of the globule never has two ascending branches. This means that the globule’s expansion does not include a phase separation. And, since there is no phase separation, there is no first-order phase transition. It can be shown (this has been done by Lifshitz et al. 1978 and de Gennes 1979) that the second-order globule-to-coil phase transition occurs at ρ  0 (or rather, at the coil state density ρ  N1/2), but this is beyond the scope of these lectures.

REFERENCES de Gennes, P.-G., 1979. Scaling Concepts in Polymer Physics. Part A. Ithaka, London: Cornell University Press. Grosberg, A.Yu., Khokhlov, A.R., 1994. Statistical Physics of Macromolecules. NY: American Institute of Physics (Chapters 1, 2, 3, 7). Lifshitz, I.M., Grosberg, A. Yu., Khokhlov, A.R., 1978. Some problems of the statistical physics of polymer chains with volume interactions. Rev. Mod. Phys. 50, 683–713.

Appendix B

Theory of Helix-Coil Transitions in Homopolymers Here we will consider the simplest but fundamental model of the helix-coli transition in a homopolymer chain. Each monomer of this chain can be in one of two states: either “coil” or “helix.” A scheme

presents one of possible arrangements of helices in a short polypeptide chain as a path through a set of states. The free energy of the coil of any length is zero by definition, and thus its statistical weight is 1. At the same time, the free energy of n-link helix formation from the coil is ΔFα ¼ finit + n  fel ;

(B.1)

the statistical weight of this helix is exp ðΔFα =kT Þ ¼ exp ðfinit =kT Þ  ½ exp ðfel =kT Þn ¼ σsn :

(B.2)

(here, we used designations σ ¼ exp(finit/kT) and s ¼ exp(fel/kT)). Strictly speaking, it should be n 3 for α-helices (with their hydrogen bonds of the type 0–4, 1–5, etc.), but, given that α-helices are usually rather long, we, for simplicity, consider helices with all n 1 (Schultz and Schirmer, 1979; Zimm and Bragg, 1959); for the more general case assuming that the minimal length of the helix may differ from 1, see Birstein and Ptitsyn (1966) and Flory (1969). Calculation of the average helicity at given σ and s values is based on the Kramers–Wannier (1941) solution of the one-dimensional Ising (1925) problem. ! We introduce two-component vectors Qi ¼ ðCi , Hi Þ, where Ci is the partition function of the chain of i links, the last of which is in the “coil” (c) state, 436

Theory of Helix-Coil Transitions in Homopolymers Appendix

B

437

and Hi is the partition function of the chain, the last link of which is in the “helix” (h) state. In particular, !

Q1 ¼ ð1, σsÞ, !



(B.3)  2

Q2 ¼ 1 + σs, σs + σs , etc; (B.3a)   1 The value Zi ¼ ðCi , Hi Þ  Ci + Hi is the partition function of all 2i 1 conformation of the chain of i links. ! ! It is easy to see that vectors Qi1 and Qi (with i > 1) are connected by a recursion   1 σs ðCi1 , Hi1 Þ ¼ ðCi , Hi Þ: (B.4) 1 s Thus, the partition function of all 2m possible conformations of the chain of m links is  m1   1 1 σs (B.5) Zm ¼ ð1, σsÞ 1 1 s To calculate this partition function for any m, we have to find eigenvalues   1 σs . To do this, we use a well-known equation: λ1, λ2 of the matrix U ¼ 1 s   1  λ σs Det ð1  λÞðs  λÞ  σs ¼ 0 (B.6) 1 sλ and obtain

sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi   s1 2 + σs; 2 sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi   s1 s1 2  + σs: λ2 ¼ 1 + 2 2 s1 + λ1 ¼ 1 + 2

!

(B.7)

!

Then we calculate the matrix U eigenvectors A1 , A2 corresponding eigenvectors, to the eigenvalues λ1, λ2. The   which  have an arbitrary ! ! A1 A2 , A2 ¼ . They are found from length, can be taken in a form A1 ¼ 1 1    Ak 1  λk σs ¼ 0 (for k ¼ 1, 2), that is, from well-known equations 1 1 s  λk   A1 ¼ 1  Ak + ðs  λk Þ  1 ¼ 0 (for k ¼ 1, 2). Thus, we obtain 1           λ1  s A2 λ s 1  λ2 1  λ1 and ¼ 2 .   1 1 1 1 1

438 Appendices

 1  λ2 1  λ1 , which can 1 1   λ1 0 1 ; here, diagonalize the matrix U by transformation A UA ¼ 0 λ2   1 1 1 + λ1 . Now we can take the matrix U in a form A1 ¼ 1 1  λ2  λ λ 1 2       λ1 0 λ1 0 λ1 0 1 1 2 A ; since U ¼ A A A U¼A 0 λ2 0 λ2 0 λ2  2  n λ 0 A1 ¼ A 1 A1 ,…, Un ¼ A λ1 0 A1 ,…, we have 0 λ2 0 λ2 !m1 !m1 ! ! 1 σs 1 λ1 0 1 1 A ¼ ð1, σsÞ A Zm ¼ ð1, σsÞ 1 s 1 0 λ2 1 ! ! λ1 λ1m1 0 1 (B.8) ¼ ð1  λ2 + σs, 1  λ1 + σsÞ m1 λ1  λ2 λ2 0 λ2 !

!



The eigenvectors A1 , A2 form a matrix A ¼

¼

 1  m+1 +1 λ 1 ð1  λ2 Þ + λ m ð λ 1  1Þ : 2 λ1  λ2

(here we used the fact that σs ¼ (λ1  1)(1  λ2)). It is seen that the magnitude of +1 (provided λ2 < λ1) when m ! 1. Zm increases as  λm 1 Since the partition function Zm consists of summands which include the multiplier s as many times as many links in the helical state correspond to this summand, the value 1 s @Zm 1 @ ln Zm   , (B.9)  θm ¼  m Zm @s m @ ln s is the average degree of chain helicity is concerned. It is easy to calculate that  m + 1  m

λ2 2 λ2 λ2 1+   1 λ1  1 λ1 λ1 m λ1  λ2 θm ¼  :  m + 1 λ1  λ2 λ1  1 λ2 1+ 1  λ2 λ1

(B.10)

When ðλ2 =λ1 Þm ≪1, then, for a sufficiently long chain, θm depends on the number m of the chain links as   λ1  1 2 λ2 (B.11) 1  θm≫1  λ1  λ2 m λ1  λ2 0 31 2 λ1  1 1 6 s1 B 7C ¼ 41 + qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi5A. @in particular, θm!1 ¼ λ1  λ2 2 ðs  1Þ2 + 4σs

Theory of Helix-Coil Transitions in Homopolymers Appendix

For s ¼ 1 and σ ≪ 1, we obtain θm≫1 

  1 2 1  pffiffiffi : 2 m σ

B

439

(B.12)

This formula allows us to find the value of σ from the dependence of θm on m. Finally, we can find the average number of helical regions in a very long chain. Since Zm≫1  ðλ1 Þm and the partition function Zm consists of summands which include the multiplier σ as many times as many helical (as well as coil!) regions correspond to this summand, the mean number of helical (and coil!) regions in a very long chain is Nm ¼ m

@ ln λ1 ¼ @ ln σ

m  2σs qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : 2 2 ðs + 1Þ ðs  1Þ + 4σs + ðs  1Þ + 4σs

(B.13)

Therefore, with s ¼ 1, we obtain

pffiffiffi σ Nm ðs ¼ 1Þ pffiffiffi , ¼ 2ð 1 + σ Þ m

(B.14)

1 Consequently, with s ¼ 1, a helical (and a coil) region includes pffiffiffi + 1 chain σ links on the average.

REFERENCES Birshtein, T.M., Ptitsyn, O.B., 1996. Conformations of Macromolecules. NY: Interscience Publishers (Chapters 4, 9). Flory, P.J., 1969. Statistical Mechanics of Chain Molecules. NY: Interscience Publishers (Chapters 3, 7). Ising, E., 1925. Beitrag zur Theorie des Ferromagnetismus. Zeitschr. Phys. 31, 253–258. Kramers, H.A., Wannier, G.H., 1941. Statistics of the two-dimensional ferromagnet. Part I. Phys. Rev. 60, 252–262. Schulz, G.E., Schirmer, R.H., 1979 & 2013. Principles of Protein Structure. Appendix. NY: Springer. Zimm, B.H., Bragg, J.K., 1959. Theory of phase transition between helix and random coil in polypeptide chains. J. Chem. Phys. 31, 526–535.

Appendix C

Statistical Physics of One-Dimensional Systems and Dynamic Programming STATISTICAL PHYSICS OF ONE-DIMENSIONAL SYSTEMS: EQUILIBRIUM DISTRIBUTION Appendix B considered the helix-coil transition. The calculations were made for a homopolymer, and we were interested in the average degree of its helicity and some other average characteristics of the whole chain. Now we shall consider a mathematical apparatus that can calculate the average degree of helicity for any link of a heteropolymer (Finkelstein, 1977) with an arbitrary sequence, and then generalize the results (Aho et al., 1979; Angel and Bellman, 1972; Finkelstein and Roytberg, 1993) to solve a variety of problems in statistical physics of inhomogeneous one-dimensional systems. Among these, we will touch on the problem of searching for a structure corresponding to the minimum energy of the chain (which is called the “dynamic programming problem”). Let us consider the helix-coil transition. Then, each link i of the chain can take only two conformations (later, we will get rid of this “only two” limitation): qi ¼ qð1Þ or qi ¼ qð2Þ , where q(1) is “coil” and q(2) is “helix.” It is easy to see that in the helix-coil transition problem, the energy of the given conformation q1, …, qm of an m-link chain has the form Eðq1 , …, qm Þ ¼

m m X X Φ i ð qi Þ + Ψ i ðqi1 , qi ÞΦ1 ðq1 Þ i¼1

+

m X

i¼2

(C.1)

½Ψ i ðqi1 , qi Þ + Φi ðqi Þ

i¼2

where

Φi qð1Þ  0, Φi qð2Þ ¼ ðfinit Þi + ð  fel Þi , 440

Statistical Physics of One-Dimensional Systems Appendix

while

C

441

Ψ i qð1Þ , qð1Þ  0, Ψ i qð1Þ , qð2Þ  0, Ψ i qð2Þ , qð1Þ  0, Ψ i qð2Þ , qð2Þ ¼ ðfinit ÞI :

One can see that the values Φ1 (q1) form an array Φ1 q1 ¼ qð1Þ ¼ 0 Φ1 q1 ¼ qð2Þ ¼ ðfinit Þ1 + ðfel Þ1 whose components correspond to logarithms of components of the vector (1, σs), and the values Ψ i ðqi1 , qi Þ + Φi ðqi Þ form an array Ψ i qð1Þ , qð1Þ + Φi qð1Þ ¼ 0 Ψ i qð1Þ , qð2Þ + Φi qð2Þ ¼ ðfinit Þi + ðfel Þi Ψ i qð2Þ , qð1Þ + Φi qð1Þ ¼ 0 Ψ i qð2Þ , qð2Þ + Φi qð2Þ ¼ ðfel Þi whose components correspond to logarithms of the components of matrix  1 σs (provided we used the same designations as in Appendix B: 1 s σ ¼ exp(finit/kT).and s ¼ exp(fel/kT)). In the homopolymer problem that has been considered in Appendix B, the values finit and fel (and therefore, σ and s) were the same for all units, but in a more general heteropolymer problem both (finit)i and (fel)i depend on the link i (on its chemical type and, probably, on the chemical type of surrounding links, as well). The partition of the whole m-link chain is X X … q exp ½Eðq1 , …, qm Þ=kT  Zm ¼ q m

1



X q1



X qm

exp ½Φ1 ðq1 Þ=kT  

m Y

exp ½ðΨ i ðqi1 , qi Þ + Φi ðqi ÞÞ=kT 

i¼2

(C.2) It consists of 2m terms, but can be computed not only by their sequential addition (which would take time proportional to 2m); it can be computed recurrently, in time proportional to m: Zm ¼ ð1, σ 1 s1 Þ

  m  Y 1 1 σ i si ; 1 s 1 i i¼2

here, σi ¼ exp((finit)i/kT).and si ¼ exp((fel)i/kT).

442 Appendices

To make this computation efficiently, one can use two-component vectors h i ð1Þ ð2Þ Qi ¼ Qi , Qi (i ¼ 1, …, m), where the value Q(1) i is the partition function of !

the chain embracing links 1  i, the link i being in the state qð1Þ ¼ }coil}, and is the partition function of the same chain, the link i being in the state Q(2) i ð2Þ q ¼ }helix}. Thus, h i h i ! ð1Þ ð2Þ Q1 ¼ exp Φ1 q1 =kT , exp Φ1 q1 =kT , !

!

while vectors Qi1 and Qi (1 < i m) are connected recursively: h i X ðqÞ ð1Þ ð1Þ ð1Þ Qi ¼ Q  exp  Ψ q, q q + Φ =kT i i i1 q h i, X ðqÞ ð2Þ ð2Þ ð2Þ Q  exp  Ψ q, q q + Φ =kT Qi ¼ i i i1 q so that Zm ¼

X q

QðmqÞ : !

h

ð1Þ

ð2Þ

Now, one can introduce two-component vectors Ri ¼ Ri , Ri Here

i

!

Rm ¼ ð1, 1Þ, !

(C.3)

(C.4)

(C.5) (i ¼ m, …, 1). (C.6)

!

while vectors Rm1 ,…, R1 are computed recursively: h i X ð1Þ ðqÞ ð1Þ exp  Ψ q , q + Φ ð q Þ =kT  Ri , Ri1 ¼ i i q h i X ð2Þ ðqÞ ð2Þ exp  Ψ q , q + Φ ð q Þ =kT  Ri , Ri1 ¼ i i q ðqÞ

(C.7)

ðqÞ

One can see that the value Qi  Ri is statistical weight of the sum of all terms of the partition function Zm, where a link i (i ¼ 1, …, m) is in the conformation q (q ¼ q(1), q(2)), so that the value ðqÞ

ðqÞ

wi ðqi Þ ¼ Qi  Ri =Zm

(C.8)

is a probability to find, in the state of thermodynamic equilibrium, the link i in the conformation q. The above formulas concern the simplest case, when each link i of the chain can take only two conformations, qi ¼ qð1Þ or qi ¼ qð2Þ . However, it is easy to generalize them to o the case when each link can take ki n

conformations, or states (qi ¼ qð1Þ , …, qðki Þ ; ki, can be different for different

links) (Finkelstein, 1977; Finkelstein and Roytberg, 1993). For example, in the “floating logs” model presented in Fig. 22.10, we used (Ptitsyn and Finkelstein, 1983) the following nine states:

Statistical Physics of One-Dimensional Systems Appendix

Coil (loop)

β-strand side chain “down”

q(1) ¼ l q(2) ¼ β#

β-strand side chain “up”

α-helix α-helix α-helix side side side chain chain “to chain “down” right” “up”

α-helix N-end

q(4) ¼ αN q(5) ¼ α#

q(3) ¼ β"

C

443

α-helix side chain α-helix “to center” C-end q(9) ¼ αC

q(6) ¼ α! q(7) ¼ α" q(8) ¼ α

A matrix for possible (+) and forbidden (0) transitions between these states looks as follows: q(k) i l

β#

β"

αN

α#

α!

α"

l

+

+

+

+

0

0

0

0

0

β#

+

0

+

+

0

0

0

0

0

β"

+

+

0

+

0

0

0

0

0

αN

0

0

0

0

+

+

+

+

0

α#

0

0

0

0

0

+

0

0

+

α!

0

0

0

0

0

0

+

0

+

α"

0

0

0

0

0

0

0

+

+

α

0

0

0

0

+

0

0

0

+

αC

+

+

+

0

0

0

0

0

0

ðk Þ

qi1

α

αC

I will allow myself not to present sequence-dependent values corresponding to various possible (+) transitions… The above set of states (and transitions between them) looks rater complicated, but much more (hundreds!) states were used in the program “ALB” (Finkelstein, 1983) to describe various secondary structures, including antiparallel and parallel β-hairpins and β-sheets composed of such hairpins. ! ! Vectors Qi , Ri (i ¼ 1, …, m) have ki components; h i h i ! ð1Þ ðk Þ Q1 ¼ exp Φ1 q1 =kT ,…, exp Φ1 q1 1 =kT , (C.3a) !

and components of vectors Qi (i ¼ 2, …, m) are recursively computed as h i ðqðkÞ Þ X ðqi1 Þ ðk Þ ðk Þ Qi i ¼ Q  exp  Ψ q , q q + Φ =kT (C.4a) i i1 i i i i1 q i1

(where

ðkÞ qi

ð1Þ ðk Þ ¼ qi , …,qi i ,

ð1Þ

ðk

Þ

i1 qi1 ¼ qi1 , …,qi1 ), while

!

Rm ¼ ð1, …, 1Þ,

(C.6a)

444 Appendices !

and components of vectors Ri1 (i ¼ m, …, 2) are recursively computed as h i ð q ðk Þ Þ X ðkÞ ðq Þ Ri1i1 ¼ exp  Ψ q , q ð q Þ =kT  Ri i (C.7a) + Φ i i i i i1 q i

(where

ðk Þ qi1

ð1Þ ðki1 Þ ¼ qi1 , …, qi1 ,

ð1Þ

ðk Þ

qi ¼ qi , …,qi i ).

DYNAMIC PROGRAMMING: SEARCH FOR THE ENERGY MINIMUM Let us now turn to the problem of finding the structure of the chain corresponding to the minimum of its energy, that is, the problem of dynamic programming (Aho et al., 1979; Angel and Bellman, 1972; Bellman, 2003). The minimum of the chain energy is defined as: " # m X ðΨ i ðqi1 , qi Þ + Φi ðqi ÞÞ (C.9) Emin ¼ min q1 … min qm Φ1 ðq1 Þ + i¼2

It can also be computed without exhaustive enumeration of all 2m compared (standing in [ ]) terms, that is, recurrently, in time proportional to m. ! ! To do this, one,nas above, haso to introduce ki-component vectors Qi , Ri

(i ¼ 1, …, m; qi ¼ qð1Þ , …, qðki Þ ; ki, can be different for different links) (Bellman, 2003; Finkelstein and Roytberg, 1993). Here ! ð1Þ ðk Þ Q1 ¼ Φ1 q1 ,…,Φ1 q1 1 ,

(C.10)

!

and components of vectors Qi (i ¼ 2, …, m) are recursively computed as h i ðqðkÞ Þ ðq Þ ðk Þ ðk Þ Qi i ¼ min qi1 Qi1i1 + Ψ i qi1 , qi + Φi qi (C.11) ðk Þ

ð1Þ

ðk Þ

ð1Þ

ðk

Þ

i1 ). At each (i ¼ 2, …, m) iter(where qi ¼ qi , …, qi i , qi1 ¼ qi1 ,…, qi1 ation step, one remembers an auxiliary “control” vector (Bellman, 2003) ! ð1Þ ðk Þ Ri ¼ Ri qi , …, Ri qi i , (C.12) ðkÞ where Ri qi ¼ qoptK i1 is that qi1 coordinate (or one—actually, arbitrary taken, see below—of those qi1 coordinates), which minimizes Qi(q(k) i ) in (C.11). As a result of iterations (C.11), one obtains the energy minimum

Emin ¼ min qm fQm ðqm Þg

(C.13)

and the coordinate qoptK that corresponds to that qm, which minimizes (C.13). m ! ! ! and the remembered set of support vectors R , R , Now, having qoptK m m1 …, R2 , m one can find out the whole optimal (corresponding to the energy minimum

 opt Emin) chain conformation Copt ¼ qopt 1 , …, qm :

Statistical Physics of One-Dimensional Systems Appendix

C

 opt   opt  opt optK opt qopt m ¼ qm ; qm1 ¼ Rm qm ,…, q1 ¼ R2 q2 :

445

(C.14)

Note. There may be not one but several optimal (corresponding the same to ðkÞ

Emin) conformations of the chain. Then the choice of some of Ri qi

¼ qoptK i1

and/or qoptK is ambiguous: the above described Bellman (2003) algorithm “ranm domly” finds only one of these conformations. This usually meets the engineering requirements but appears insufficient for natural science problems. Generalizations of Bellman dynamic programming algorithm to this case can be found in (Byers and Waterman, 1984; Finkelstein and Roytberg, 1993). It can be seen that the algorithm for solving the problems of one-dimensional systems in statistical physics corresponds to the dynamic programming with replacements: Statistical physics

Dynamic programming

Values exp ðΦi =kT Þ

Values Φi

Values exp ½Ψ i =kT 

values Ψ i

Operations “summarize” Operations “multiply”

Operations “minimize” Operations “summarize”

For other generalizations of the dynamic programming method suitable for solving various problems, see Finkelstein and Roytberg (1993).

DYNAMIC PROGRAMMING AND SEARCH FOR OPTIMAL ALIGNMENT OF SEQUENCES In molecular biology, the dynamic programming method is most often used when searching for homology of amino acid or nucleotide sequences (Needleman and Wunsch, 1970; Smith and Waterman, 1981). One compares two sequences: U ¼ a1, a2, …, aM and V ¼ b1, b2, …, bN, where as and bt are amino acid residues (or nucleotides). An alignment of sequences means the following: in the first pair of the alignment as1 is set in correspondence with bt1; in the second pair as2 is set in correspondence with bt2, …, in the last pair of the alignment asM is set in correspondence with btM (where 1 s1 < s2 … < sm M and 1 t1 < t2 … < tm N). Two presentations of some alignment are shown here:

446 Appendices

Correspondence of links a and b has a “weight” Φ(a,b). Transition from the correspondence as,bt to the next correspondence as0 ,bt0 has its weight Ψ (s0  s, t0  t); usually, this weight has a form Xs0 1 DðaJ Þ Ψ ðs0  s,t0  tÞ ¼ Δ  θðs0  sÞ + J¼s + 1 (C.15) Xt0 1 + Δ  qðt0  tÞ + D ð b Þ, J J¼t + 1 where θ(d) ¼ 0 at d ¼ 1, θ(d) ¼ 1 at d > 1 (d 0 is not possible, because s < s0 and t < t0 , see above), and D is the weight of deleting the corresponding link from the alignment. The weight W of alignment (as1,bt1; as2,bt2; …; asm,btm) is the sum of weights of all correspondences and transitions in this alignment: W ðas1 , bt1 ; as2 , bt2 ; …; asm , btm Þ ¼ Φðas1 , bt1 Þ m X + ½Ψ i ðsi  si1 ,ti  ti1 Þ + Φi ðasi , bti Þ: i¼2

(C.16) The best alignment is the one that has the maximal weight W. For maximization of W one can use the above given formalism of dynamic programming with replacement of qi for correspondence (si,ti) and formulas (C.11), (C.13) for formulas Qðs0 , t0 Þ ¼  min 1 i
(C.13a) 2

Working time of this algorithm is proportional to (M N) . A more complicated but faster working version of such an algorithm (its working time is proportional to MN) one can find in Smith and Waterman (1981). Without going into details, we note only that this fast algorithm uses not only the values Q(s,t), corresponding to the best alignment of fragments 1  s and 1  t of U and V, but also auxiliary values QU(s,t), QV(s,t), QUV(s,t). QU(s,t) corresponding to the best alignment of fragments 1  s and 1  t0 < t (links t0 + 1  t being deleted from the alignment); QV(s,t) corresponds to the best alignment of fragments 1  s0 < s and 1  t (links s0 + 1  s being deleted); QUV(s,t) corresponds to the best alignment of fragments 1  s0 < s and 1  t0 < t (links s0 + 1  s and t0 + 1  t being deleted from the alignment).

REFERENCES Aho A., Hopcroft J., Ullman, J., 1976. The Design and Analysis of Computer Algorithms. Reading, MA: Addison-Wesley. Angel, E., Bellman, R., 1972. Dynamic Programing of a Partial Differential Equation. NY: Academic Press (Chapter 3). Bellman, R., 2003. Dynamic Programming. Mineola, NY: Dover Publications Inc (Chapters 1–3, 11).

Statistical Physics of One-Dimensional Systems Appendix

C

447

Birshtein, T.M., Ptitsyn, O.B., 1966. Conformations of Macromolecules. NY: Interscience Publishers (Chapters 4, 9). Byers, T., Waterman, M.S., 1984. Determining all optimal and near optimal solutions when solving shortest path problems by dynamic programming. Oper. Res. 32, 1381–1384. Finkelstein, A.V., 1977. Theory of protein molecule self-organisation. III. A calculating method for the probabilities of the secondary structure formation in an unfolded protein chain. Biopolymers 16, 525–529. Finkelstein, A.V., 1983. Program ALB for polypeptide and protein secondary structure calculation and prediction. Submitted to Brookhaven Protein Data Bank, Upton, NY, USA, and to EMBL Database, Heidelberg, FRG. Finkelstein, A.V., Roytberg, M.A., 1993. Computation of biopolymers: A general approach to different problems. BioSystems 30, 1–19. Needleman, S.B., Wunsch, C.D., 1970. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443–453. Ptitsyn, O.B., Finkelstein, A.V., 1983. Theory of protein secondary structure and algorithm of its prediction. Biopolymers 22, 15–25. Smith, T.F., Waterman, M.S., 1981. Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197.

Appendix D

Random Energy Model and Energy Gap in the Random Energy Model In the theoretical physics studies of proteins, a so-called Random Energy Model, or simply REM is widely used. It describes statistical properties of energy spectra of complex systems in the simplest form. This model has been originally proposed by Derrida (1980, 1981) for the “disordered” physical systems such as spin glasses, and it was introduced to the physics of proteins by Bryngelson and Wolynes (1987, 1990). Its applicability to proteins was justified by Shakhnovich and Gutin (1989). According to this model, the energy of one heteropolymer fold is statistically independent of that of another. This assumption looks strange, but Shakhnovich and Gutin (1989) showed that it is valid for a dense globule, where transition from one chain fold to another requires so much altered intra-chain contacts, provided a variety of chemical units (chain monomers) is large. The energy of each fold of a heteropolymer is determined by the primary structure of its chain and the system of contacts between the chain links. According to the random energy model, each chain fold i (which belongs to M >>> 1 of the possible globular folds) takes the energy E with a probability h i  1=2

exp ðE  εi Þ2 =2σ i 2 , (D.1) Pi ðEÞ ¼ 2πσ i 2 where εi is the average (over all primary structures) energy of the fold i, and σ i is the average (over all primary structures) deviation of E from εi. In the simplest version of the random energy model it is assumed that all εi ¼ 0 and all σ i ¼ σ (ie, they are equal). As a result, the expected density of the energy spectrum is:   M X M E2 (D.2) Pi ðEÞ ¼ pffiffiffiffiffiffiffiffiffiffi exp  2 : wðEÞ ¼ 2σ 2πσ 2 i¼1 In this model, one can determine the energy EC, typical of the lower edge of the energy spectrum, and the temperature TC, characteristic of this edge.

448

Random Energy Model and Energy Gap Appendix

pE 0 ¼

ð E0 1

449

D

PðEÞdE is the “accumulated” probability that the energy E of one

fold is below the level E0 . The probability that the energies of all M folds lie above E0 is (1  pE0 )M. If pE0 ≪ 1 (after all, we are interested in the lower edge of the spectrum), then (1  pE0 )M  exp[-MpE0 ]. Hence, when MpE0 ¼ ln(½), the chance that all chain folds simultaneously have energies greater than E0 is ½. Thus, the characteristic value of EC can be found from the condition pEc ¼ (ln2)/M. The value pEc is very small when M >>> 1. Therefore, ð Ec ð Ec  1=2   PðEÞdE ¼ 2πσ 2 exp E2 =2σ 2 dE pEc  1

1

¼

ð Ec



2πσ 2

1=2

n o exp ½ðE  EC Þ + EC 2 =2σ 2 dE

ð0

h i exp ðx + EC Þ2 =2σ 2 dx

1 1=2

¼ ð2πσ 2 Þ

1

  exp EC 2 =2σ 2 ¼ ð2πσ Þ ð0   exp xðEC Þ=σ 2  x2 =2σ 2 dx 2 1=2

1

(where x ¼ E  EC). It is obvious that, because of a small pEc value, EC < 0 and EC2/σ 2 ≫ 1 (ie, EC/σ ≪  1). Therefore, in that range of x, where the integrand (in the last expression) is not small [and where, consequently, 1 x(EC)/ σ 2 0, ie, (x2/σ 2)(EC/σ)2 1, and hence, x2/σ 2 (σ/EC)2 ≪ 1], the value of x2/σ 2 can be neglected, and the integral is equal to σ 2/(–EC). As a result, we obtain   1=2

 exp EC 2 =2σ 2 , pEc  2π EC 2 =σ 2 (D.3) from where   2 2 1=2

 exp EC 2 =2σ 2 ¼ ð ln 2Þ=M: 2π EC =σ

(D.4)

If M is very large, the most significant terms of this equation are exp{–EC2/ 2σ } and M, so that we get 2

EC  σ ð2 ln MÞ1=2 :

(D.5)

This is the energy that is typical of the lower edge of the energy spectrum of a random heteropolymer (and the middle of the spectrum, on the assumption that εi ¼ 0, corresponds to zero energy, while its upper edge to jECj). Near the edge EC, as seen from the preceding calculations, the probability pE can be presented as pEc exp[(E  EC)(EC/σ 2)]. This means that “accumulated density” of the energy spectrum,

450 Appendices

   mE ¼ MpE ¼ ð ln 2Þ exp ðE  EC Þ EC =s2 , is large enough already at a distance of σ 2/(–EC) from the lower edge of the spectrum; this distance is small in comparison with the half-width jECj of the spectrum. This allows using the Eq. (D.2) in the region jEj jECj to calculate the entropy of the system: S(E) ¼kBln[w(E)]. Since S(E) must turn to 0 at the edge jEj ¼ jECj, then     SðEÞ ¼ kB ln M  E2 =2σ 2 ¼ kB EC 2  E2 =2σ 2 : (D.6) So, within the dense spectrum (at jEj jECj) the spectrum’s temperature is

while

T ðEÞ ¼ ðdS=dEÞ1 ¼ σ 2 =kB E,

(D.7)

h i TC ¼ σ 2 =ðkB EC Þ ¼ σ ð2 ln MÞ1=2 =kB

(D.8)

is the temperature corresponding to the lower edge of the energy spectrum (see Fig. 18.07), that is, the temperature of “freezing” the most stable folds of the considered random heteropolymer (also called the “glass transition temperature”). These folds have energy close to EC. The free energy of a random heteropolymer at temperature T(E) TC is:   FðEÞ ¼ E  T ðEÞ  SðEÞ ¼ E + EC 2  E2 =2E: (D.9)

“ENERGY GAP” IN THE RANDOM ENERGY MODEL Now let us address a “protein-like” heteropolymer where one of the chain folds (the “native fold”) is selected so that only its energy EC  jΔEj is below EC, and the rest of the energy spectrum (see Fig. 18.07) is arranged like that of a random heteropolymer (Buchler and Goldstein, 1999; Finkelstein et al., 1995b). Thus, the energy of the native fold is EN ¼ EC  jΔEj, and its entropy it is 0 (because it is single); therefore, it free energy is equal to EN at any temperature. Thermodynamic equilibrium of the native and non-native folds occurs at such the energy E of non-native folds that corresponds to F(E) ¼ EN, that is, E=2 + EC 2 =ð2EÞ ¼ EN :

(D.10)

Solution of the emerging equation E  2E EN + EC ¼ 0 is  1=2 E 1 ¼ EN + E N 2  E C 2 2

2

(D.11)

(the second root, E2 ¼ EN  [EN2  EC2]1/2, has no physical sense, since it is less than the EN, and at these energies there are no folds at all). Consequently, the heat of melting is

Random Energy Model and Energy Gap Appendix

 1=2 ΔH ¼ E1  EN ¼ EN 2  EC 2 ¼ jΔEj1=2 ½jΔEj + j2EC j1=2 ,

D

451

(D.12)

the melting temperature is:

  TM ¼ T ðE1 Þ ¼ σ 2 =ðkB E1 Þ ¼ σ 2 =kB =½jEC j + jΔEj  ΔH,

(D.13)

and the heat capacity jump at melting (from CN ¼ dEN/dT ¼ 0 to Cdenat ¼ (dE/ dT)jE¼E1 ¼ (dT/dE)1jE¼E1) is ΔCР ¼ (dT/dE)1jE¼E1 ¼ kBE12/σ 2,—or, considering the fact that TM  T(E1) ¼ σ 2/(kE1), it is   ΔCР ¼ σ 2 = kB TM 2 : (D.14) The last three equations relate all three parameters jΔEj, σ, jECj of the “random energy model” to the three main parameters observed in protein melting (ΔH, TM, ΔCР), while the ratio (2ln M)1/2 ¼ jECj/σ [cM. (18.3.2)] allows evaluating M, the number of folds of the protein chain. If jΔEj ≪ jECj, the Eqs. (D.12) and (D.13) take a very simple form:



ΔH  j2ΔE=EC j1=2  jEC j,

TM  σ =kB =fjEC j  ΔH g ¼ TC =f1  ΔH=jEC jg: Hence:

2

(D.15) (D.16)

  σ 2 ¼ kB TM 2 ΔCР ,

(D.17)

jEC j ¼ ΔH + ΔCР TM ,

(D.18)

jΔEj ¼ ⁄2ΔH=½1 + ΔCР TM =ΔH :

(D.19)

1

Addendum: The question arises as to whether we can use these equations to obtain the parameters M, σ, jΔEj of that random energy model which corresponds to a protein whose main thermodynamic parameters mp (TM, ΔH, ΔCР) are known from the experiment. Answer: Formally—yes. But, unfortunately, it cannot be done without introduction of further amendments, since the experimental thermodynamic parameters depend not only on reconstruction of chain folds (that are taken into account in the random energy model), but also on temperature-caused changes in hydrophobic and electrostatic interactions, which, without being implied in this model, contribute much to the measured values (in particular—to the value of ΔCР, appearing in each of the Eqs. D.17–D.19). Let us now determine the probability that one of the M chain folds has its energy below the EC  jΔEj (at a relatively small jΔEj), and the energies of the rest folds are above EC. The probability that one given fold has its energy below the EC  jΔEj (where the “energy gap” jΔEj is much less than the spectrum half-width jECj) is    pEc  jΔEj  pEc exp ðjΔEjÞ EC =σ 2 ¼ pEc exp ½jΔEj=kB TC :

452 Appendices

Thus, the probability that some of the M folds has energy below EC  jΔEj, while the rest fold energies are above EC, is M pEc  jΔEj ð1  pEc ÞM

1

¼ M pEc

exp ½jΔEj=kB TC  ð1  pEc ÞM1   ln ð1⁄2Þ 1⁄2

exp ½jΔEj=kB TC   exp ½jΔEj=kB TC 

(D.20)

In conclusion: if the probability Pi(EC,0,σ) corresponds to the case εi ¼ 0 and σ i ¼ σ, then, for the case of εi 6¼ 0 and σ i ¼ σ + Δσ i for fold i (it is assumed that εi, Δσ i are small and EC ≫ σ), we get: h i1=2 h i

exp ðEC  εi Þ2 =2ðσ + Δσ i Þ2 Pi ðEC , εi , σ + Δσ i Þ ¼ 2π ðσ + Δσ i Þ2 h i :  Pi ðEC , 0, σ Þ exp εi =kB TC  0:5ðσ  Δσ i Þ=ðkB TC Þ2 (D.21) This “Boltzmann-like” distribution (Finkelstein et al., 1995a; Gutin et al., 1992) is applicable to statistics of protein architectures (see Lecture 16).

REFERENCES Bryngelson, J.D., Wolynes, P.G., 1987. Spin glasses and the statistical mechanics of protein folding. Proc. Natl. Acad. Sci. USA 84, 7524–7528. Bryngelson, J.D., Wolynes, P.G., 1990. A simple statistical field theory of heteropolymer collapse with application to protein folding. Biopolymers 30, 177–188. Buchler, N.E.G., Goldstein, R.A., 1999. Universal correlation between energy gap and foldability for the random energy model and lattice proteins. J. Chem. Phys. 111, 6599–6609. Derrida, B., 1980. Random energy model: limit of a family of disordered models. Phys. Rev. Lett. 45, 79–82. Derrida, B., 1981. The random energy model, an exactly solvable model of disordered systems. Phys. Rev. B 24, 2613–2626. Finkelstein, A.V., Badretdinov, A.Ya., Gutin, A.M., 1995a. Why do protein architectures have a Boltzmann-like statistics? Proteins 23, 142–150. Finkelstein, A.V., Gutin, A.M., Badretdinov, A.Ya., 1995b. Perfect temperature for protein structure prediction and folding. Proteins 23, 151–162. Gutin, A.M., Badretdinov, A.Ya., Finkelstein, A.V., 1992. Why is the statistics of protein structures Boltzmann-like? Mol. Biol. (in Russian) 26, 118–127. Shakhnovich, E.I., Gutin, A.M., 1989. Formation of unique structure in polypeptide-chains theoretical investigation with the aid of a replica approach. Biophys. Chem. 34, 187–199.

Appendix E

How to Use Stereo Drawings Protein is a three-dimensional (3D) object, and the two-dimensional (2D) image gives little understanding of its spatial structure. Fortunately, we can use the fact that each eye sees the object from a slightly different angle, and draw a “stereopair,” that is, a couple of pictures, one of which shows the object as it is seen by the right eye, and the other, by the left. In the stereo illustrations, we will use cross stereo-pairs, where the image to be seen by the right eye is placed on the left, and the image to be seen by the left eye is placed on the right. Besides of the cross-stereo images, people often use direct stereo images (where the image to be seen by the right eye is on the right and the image to be seen by the left eye is on the left). These look more “natural,” but their stereo effect is very sensitive to the distance between the eyes, and one usually needs special stereoscopic glasses to get a stereo effect using direct stereo images (only some special people, such as Alexey Murzin can do without the stereoscopes). The cross-stereo images require some practice, but thereafter no stereoscopic glasses are needed, and one can enjoy stereo pictures not only in book, but also at the computer screen or at the large screen where stereoscope glasses are useless. Therefore, I will teach you how to use cross-stereo illustrations To have a stereo effect, cross your eyes and focus your right eye on the left part of the stereo-pair and your left eye on the right part of the stereo-pair, as it is shown in the diagram:

453

454 Appendices

You have to look at the stereo drawing directly, keeping it at a distance of 40–50 cm. At the beginning of training, you can use a cardboard mask with a hole:

Put it at a distance of 15–20 cm from your face; it will allow your right eye to see only the left half of the stereo-pair, and your left eye will see only the right half of the stereo-pair; 15–20 cm is an approximate estimate of the distance; for the optimal—for you—position of the mask depends on the distance between your eyes and the distance to the stereo drawing. To find the optimal for you, position the mask, move it back and forth and rotate your head a little; try to see only the left half of the stereo-pair by your right eye (closing your left one), then only the right half of the stereo-pair by your left eye (closing your right one), and then open the both eyes and enjoy the stereo effect. After 20–30 min of training you will be able to see the 3D image of the object protruding from the page.

How to Use Stereo Drawings Appendix

E

455

Below is the object that I have chosen for training. It is a curved rod. Three projections of this rod are shown first:

and then the rod is presented as a stereo-pair.

Training in “stereovision” can be considered complete when you are sure to see the 3D image of the curved rod, even without the help of the mask. By the way, the experience gained will enable you to see the cross stereo-pairs not only in a book or journal, but on the big screen as well.