Structure of the Bacterial RNA Polymerase Promoter Specificity σ Subunit

Molecular Cell, Vol. 9, 527–539, March, 2002, Copyright 2002 by Cell Press

Structure of the Bacterial RNA Polymerase Promoter Specificity ␴ Subunit Elizabeth A. Campbell,1 Oriana Muzzin,1 Mark Chlenov,1 Jing L. Sun,1 C. Anders Olson,1 Oren Weinman,1 Michelle L. Trester-Zedlitz,2 and Seth A. Darst1,3 1 Laboratory of Molecular Biophysics 2 Laboratory of Mass Spectrometry and Gaseous Ion Chemistry The Rockefeller University 1230 York Avenue New York, New York 10021

Summary The ␴ subunit is the key regulator of bacterial transcription. Proteolysis of Thermus aquaticus ␴A, which occurred in situ during crystallization, reveals three domains, ␴2, ␴3, and ␴4, connected by flexible linkers. Crystal structures of each domain were determined, as well as of ␴4 complexed with ⫺35 element DNA. Exposed surfaces of each domain are important for RNA polymerase binding. Universally conserved residues important for ⫺10 element recognition and melting lie on one face of ␴2, while residues important for extended ⫺10 recognition lie on ␴3. Genetic studies correctly predicted that a helix-turn-helix motif in ␴4 recognizes the ⫺35 element but not the details of the protein-DNA interactions. Positive control mutants in ␴4 cluster in two regions, positioned to interact with activators bound just upstream or downstream of the ⫺35 element. Introduction Promoter-specific transcription initiation in bacteria requires the ␴ factor, which binds to the ⵑ400 kDa catalytic core RNAP (subunit composition ␣2␤␤⬘␻) to form the holoenzyme (Burgess et al., 1969; Travers and Burgess, 1969). One primary ␴ factor (␴A) directs the bulk of transcription during exponential growth. Alternative ␴ factors direct transcription of specific regulons during unusual physiological or developmental conditions (reviewed in Gross et al., 1992; Helmann and Chamberlin, 1988). The primary and most of the alternative ␴ factors comprise a homologous family (Gribskov and Burgess, 1986; Stragier et al., 1985) with four regions of highly conserved amino acid sequence (see Figure 1A and supplemental data at http://www.molecule.org/cgi/content/full/9/3/ 527/DC1; reviewed in Lonetto et al., 1992). The central role of ␴ in transcription initiation is underlined by its diverse functions. Mutants that cause defects in ␴ binding to core RNAP indicate that the ␴-core interface is extensive and involves many regions of ␴. The ␴ factors direct the complex process of transcription initiation by first locating the promoter through sequence-specific recognition of two hexamers of consensus DNA sequence: the Pribnow box or –10 element, 3

Correspondence: [email protected]

centered at about –10 with respect to the transcription start site (⫹1), and the –35 element (reviewed in Gross et al., 1998). The ␴ factors then play a major role in melting the DNA to form the open complex (reviewed in Darst et al., 1997). Finally, ␴ serves as a target for transcription activators that bind to DNA sites overlapping the –35 element (reviewed in Gross et al., 1998). Limited proteolysis studies have led to a consistent view of ␴ factor structure as a series of compact domains connected by flexible linkers (Campbell and Darst, 2000; Chen and Helmann, 1995; Lowe and Malcolm, 1976; Severinova et al., 1996). This type of architecture lends itself to conformational changes, and biochemical and biophysical probes indicate that ␴ undergoes substantial conformational changes during the initiation process (Callaci et al., 1999). A highly protease-resistant domain from the primary ␴ of E. coli (␴70) comprised most of conserved regions 1.2–2.4 (Severinova et al., 1996). The crystal structure of this domain was determined to 2.6 A˚ resolution (Malhotra et al., 1996). The structure of an intact ␴ factor, or even structures of the other ␴ domains, has not been obtained. Here, we present X-ray crystal structures of two Thermus aquaticus (Taq) ␴A fragments that comprise three distinct structural domains. The three domains, ␴2, ␴3, and ␴4, consist of conserved regions 1.2–2.4, 3.0–3.1, and 4.1–4.2, respectively, and thus include the most highly conserved regions of the ␴ factor family (see Figure 1A and supplemental data at http://www.molecule. org/cgi/content/full/9/3/527/DC1; Gruber and Bryant, 1997; Lonetto et al., 1992). Limited proteolysis indicates that these fragments likely constitute the structured domains of ␴, the rest of the polypeptide being the protease-sensitive N terminus (region 1.1) and a proteasesensitive region linking conserved regions 3.1 to 4.1, comprising mostly region 3.2. In addition, we present the crystal structure of the ␴4 domain complexed with –35 element DNA. This represents the initial specific RNAP-promoter interaction to occur in transcription initiation (Eichenberger et al., 1997), which then persists throughout open complex formation and into the first stages of transcript synthesis (Mecsas et al., 1991; Schickor et al., 1990). Results and Discussion In Situ Proteolysis, Fragment Identification, and Structure Determination Crystal trials with purified Taq ␴A (Minakhin et al., 2001) yielded crystals under a variety of conditions but only after at least 1 month. Examination of the protein present in the drops from which crystals grew indicated in situ degradation by unknown contaminating proteases into five major, stable fragments (fragments a–e, Figure 1B). The proteolytic profile was the same from each drop examined, regardless of the conditions. N-terminal sequencing and mass spectrometry were used to determine the N and C termini of the fragments in the drops and from the crystals (Table 1 and Figure 1A). Two of

Molecular Cell 528

Figure 1. Conserved Regions of the ␴70 Family, In Situ Proteolysis, and Domain Structures (A) The thick bar represents the 438 aa Taq ␴A primary sequence with amino acid numbering below. Evolutionarily conserved regions are labeled and color coded (Lonetto et al., 1992) but with region 2.5 (Barne et al., 1997) renamed 3.0. The histogram immediately above the bar represents the level of sequence identity within the conserved regions of 53 group 1 ␴’s (see supplemental data at http://www.molecule.org/ cgi/content/full/9/3/527/DC1; Gruber and Bryant, 1997) as follows: 100% sequence identity, tall red bar; 20%, small blue bar; intermediate levels, orange, light green, and light blue bars. The thin, horizontal black bars represent limited proteolysis results. The trypsin-resistant fragment of E. coli ␴70 (Severinova et al., 1996; Malhotra et al., 1996) is illustrated on top (includes a 175 aa insert between conserved regions 1.2 and 2.1 compared with Taq ␴A). Below, the in situ proteolysis results of Taq ␴A in crystallization drops is schematically illustrated. The crystal structures of fragments c and e3 (highlighted with pink shading) were determined. (B) Taq ␴A (49.8 kDa) was incubated for several months in hanging drop crystallization trials and then analyzed by SDS-PAGE and Coomassie blue staining. Lane 1: protein directly from the supernatant of a centrifuged crystallization drop. The major, stable proteolytic fragments are labeled a–e. Lane 2: hexagonal crystals (crystal I) were analyzed, indicating that they contain predominantly fragment c. Lane 3: thin, hexagonal rod crystals (crystal II) were analyzed, indicating that they contain fragment e. The band labeled fragment e actually contained three polypeptides with slightly different N termini (e1, e2, and e3; Table1); only e3 was found in the washed, dissolved crystals (Table 1). (C) Backbone ribbons of Taq ␴A proteolytic fragments c and e3. The conserved regions of the ␴70 family (Barne et al., 1997; Lonetto et al., 1992) are color coded as in Figure 1A. Depth cueing causes parts of the structure far from the viewer to fade into the white background. The two fragments fold into three distinct structural domains, labeled ␴2, ␴3, and ␴4. (D) Schematic drawing showing the knot formed by the ␴2 and ␴3 domains of two noncrystallographically related ␴2-3 molecules (green and yellow). The yellow molecule corresponds to the orientation of the ␴2-3 molecule in Figure 1C.

the crystallized fragments, c and e3, were subcloned and crystallized under similar conditions as the original proteolytic fragments, and the structures were solved to 2.6 and 1.8 A˚ resolution (Table 2). Fragment c comprises ␴ conserved regions 1.2–3.1. Fragment e3 comprises ␴ conserved regions 4.1–4.2. The structures of the two ␴ fragments comprise three distinct structural domains labeled ␴2, ␴3, and ␴4 (Figure 1C), since they contain ␴-conserved regions 2, 3.1, and 4, respectively. The proteolytic profile of Taq ␴A is consistent with but expands upon previous analyses (Figure 1A). Regions sensitive to proteolysis and therefore likely to be rela-

tively unstructured, exposed loops, were just N-terminal of region 1.2, the junction between 2.4 and 3.0, and sites within 3.2 and the junction between 3.2 and 4.1. These same regions were observed to be sensitive to hydroxylradical cleavage in E. coli ␴70 (Nagai and Shimamoto, 1997). In previous analyses (Severinova et al., 1996) as well as this one, the 90 or so residues N-terminal of region 1.2, containing region 1.1 (Lonetto et al., 1992), were completely degraded, suggesting a lack of stable structure. A fluorescence anisotropy assay demonstrated that ␴4 possessed weak double-stranded DNA binding activity

Structure of the RNA Polymerase ␴ Subunit 529

Table 1. Identification of Taq ␴A Proteolytic Fragments Fragmenta

N-Terminal Sequence

Observed Massc

Extent

Calculated Mass

a

92-TSDPVRQY

30,605.6 ⫾ 3.2

30,605.0

b

92-TSDPVRQY

29,121.1 ⫾ 4.3

cb

92-TSDPVRQY

27,568.4 ⫾ 6.9

d

92-TSDPVRQY

ⵑ22,000 ⫾ 1,000

e1 e2 e3b

74%: 361-AQSLLSEE 13%: 363-SLLSEELE 13%: 366-SEELEKAL

9214.6 ⫾ 3.4 9016.4 ⫾ 3.9 8704.6

Taq-T92–A360 (Ec-T94–A535) Taq-T92–D346 (Ec-T94–D521) Taq-T92–L332 (Ec-T94–L505) Taq-T92–I283 (Ec-T94–I260) Taq-A361–E438 (Ec-T536–D613) Taq-S363–E438 (Ec-E538–D613) Taq-S366–E438 (Ec-R541–D613)

29,124.4 27,569.8 22,020.0 9211.6 9012.4 8699.0

a

See Figure 1. Crystallized fragments (Figure 1). c Determined by MALDI-TOF mass spectrometry, except for fragment d, which was estimated from the SDS gel (Figure 1). b

that was specific for the –35 element consensus sequence, TTGACA (data not shown). We therefore set up crystal trials of ␴4 with double-stranded DNA oligonucleotides and obtained crystals with an 11-mer oligonucleotide, and the structure was solved to 2.4 A˚ resolution (Table 2).

Structure and Function of ␴2-3 Limited proteolysis of E. coli ␴70 (Severinova et al., 1996) and Taq ␴A (Figure 1A) indicates that ␴2 and ␴3 are linked by an exposed, flexible loop. This is strikingly confirmed by crystal packing arguments, since two neighboring molecules related by noncrystallographic symmetry are

Table 2. Crystallographic Analysis Diffraction Data

Data Set

Wavelength (A˚)

Resolution (A˚)

No. of Reflections (tot./unique)

Completeness (%) (tot./last shell)

I/␴(I) (tot./last shell)

Rsymc (%) (tot./last shell)

No. of Sites

Phasing Power

0.9790 0.9792 0.9648 1.1

25–3.2 25–3.2 25–3.2 30–2.9

67,681/29,704 68,501/30,358 50,680/23,279 97,790/23,536

90.9/89.4 91.3/89.6 76.1/73.3 97.1/96.5

10.2/2.4 12.5/3.2 10.3/2.2 24.9/6.2

8.2/38.7 5.7/26.4 7.5/38.5 3.7/14.7

6 6 6 6

1.31 1.49 1.23 0.50

(ano) (ano) (ano) (iso)

0.9790 0.9792 0.9648 1.1

25–2.35 25–2.35 25–2.35 30–1.8

21,899/5,094 16,415/5,009 17,024/5,224 49,941/5,999

99.1/99.6 97.3/83 96.4/76.2 98/99.3

21.3/5.6 16.9/4.1 15.5/3.0 30/6.7

6.1/29.5 6.2/29.8 7.4/40.0 4.7/23.7

3 3 3 3

1.50 1.30 1.21 0.72

(ano) (ano) (ano) (iso)

0.9790 0.9792 0.9648 0.979

25–2.4 25–2.4 25–2.4 25–2.5

33,285/16,118 30,902/16,046 25,640/15,714 29,108/7,801

96.7/79.7 96.8/83.3 94.1/74.5 97.7/97.7

16.1/2.9 14.6/2.6 16.1/2.8 20.9/5.3

4.5/25.3 4.9/27 3.8/21.8 5.2/19.7

4 4 4 4

1.36 1.32 1.26 1.14

(ano) (ano) (ano) (iso)

␴2–3a SeMet(␭1)b SeMet(␭2)b SeMet(␭3)b Nativeb,e ␴4L386Mc SeMet(␭1)b SeMet(␭2)b SeMet(␭3)b Natived,e ␴4L386M/DNAf SeMet(␭1)a,e SeMet(␭2)a SeMet(␭3)a Nativea Refinements

␴2–3

␴4

␴4-DNA

Resolution (A˚) No. of solvent molecules Rcryst/Rfreef (%) Rmsd bond lengths Rmsd bond angles

30–2.9 0 23.0/27.1 0.00787 1.286

30–1.8 39 21.8/25.2 0.00508 1.066

25–2.4 66 25.5/29.3 0.00696 1.300

Crystal space group, P65; unit cell, a ⫽ 104.19 A˚, c ⫽ 169.18 A˚; Mol./AU, 2; figure of merit (25–4.5 A˚ resolution), 0.474. Data collected at the National Synchrotron Light Source, Brookhaven, NY, Beamline X9A. c Crystal space group, P3221; unit cell, a ⫽ 39.80 A˚, c ⫽ 66.81 A˚; Mol./AU, 1; figure of merit (30–2.5 A˚ resolution), 0.606. d Data collected at the National Synchrotron Light Source, Brookhaven, NY, Beamline X25; cRsym ⫽ 兺|I ⫺ ⬍I⬎|/兺I, where I is observed intensity and ⬍I⬎ is average intensity obtained from multiple observations of symmetry related reflections. e Data set used for refinement fRcryst ⫽ 兺||Fobserved| ⫺ |Fcalculated||/兺|Fobserved|, Rfree ⫽ Rcryst calculated using 5% random data omitted from the refinement. f Crystal space group, P212121; unit cell, a ⫽ 36.18 A˚, b ⫽ 62.65 A˚, c ⫽ 94.81 A˚; Mol./AU, 2 protein, 1 ds-DNA; Figure of merit (25–2.5 A˚ a

b

resolution), 0.470.

Molecular Cell 530

linked by a topological “knot” through the loop connecting ␴2 and ␴3 (Figure 1D). This could only occur if the domains were very flexibly linked, allowing them to wrap around each other during crystallization. Thus, we conclude that the orientation of ␴2 and ␴3 with respect to each other in the crystal structure (Figure 1C, left) is not relevant to ␴ factor function but is determined by crystal packing interactions. Interestingly, the sequence of the connecting loop is very highly conserved (Taq269 [Ec446]-QARTIRIP, where the bold, underlined residues are universally conserved in an alignment of 53 group 1 ␴’s; see supplemental data at http://www.molecule.org/ cgi/content/full/9/3/527/DC1; Gruber and Bryant, 1997), suggesting that this loop plays an important role in ␴ factor function. The fold of ␴2 (excluding the nonconserved insert between regions 1.2 and 2.1, gray in Figure 1C) is nearly identical to the structure of an E. coli ␴70 fragment (Malhotra et al., 1996) extending from the middle of region 1.2 to near the end of 2.4 (root-mean-square deviation of 1.05 A˚ between 83 ␣ carbon pairs in the conserved regions), except the previous structure is extended at the N terminus by a 90⬚ kink and an ␣ helix (comprising the N-terminal half of region 1.2) that packs against the back of the bundle of conserved region helices and at the C terminus by the flexible loop described above and another compactly folded domain, ␴3. The fold of the nonconserved insert between regions 1.2 and 2.1 (Taq126–197) is not related to the much larger insert in E. coli ␴70 (Ec128–374; see supplemental data at http:// www.molecule.org/cgi/content/full/9/3/527/DC1 for a sequence alignment to relate Taq ␴A and E. coli ␴70 numbering). Region 2 is the most highly conserved region of the ␴70 family (see Figure 1A and supplemental data at http:// www.molecule.org/cgi/content/full/9/3/527/DC1; Lonetto et al., 1992; Gruber and Bryant, 1997), and residues in region 2 are implicated in critical functions of ␴, including core RNAP binding and ⫺10 element recognition and melting. The ␴-core RNAP interface is extensive and involves several conserved regions of ␴ (Joo et al., 1997, 1998; Sharp et al., 1999), but available evidence suggests that the primary interface involves the exposed, polar surface of the amphipathic region 2.2 helix (Figure 2A, orange ␣ carbons) and a coiled-coil-like motif of the RNAP ␤⬘ subunit (Arthur et al., 2000; Young et al., 2001). In the structure of ␴2, the polar surface of the region 2.2 helix is partially occluded by a loop within the nonconserved insert between regions 1.2 and 2.1. This loop is flexible as it has very high B factors, and its position appears to be influenced by crystal packing interactions. Thus, we presume that in the complex of ␴2 with core RNAP, the loop is repositioned to expose the core RNAP binding determinants in region 2.2. Region 2.4 contains allele-specific suppressors of promoter mutations in the –10 element, strongly implicating these residues in base-specific interactions with the –10 element (Figure 2A, green ␣ carbons; Daniels et al., 1990; Kenney et al., 1989; Siegele et al., 1989; Tatti et al., 1991; Waldburger et al., 1990; Zuber et al., 1989). Highly conserved aromatic and basic residues in region 2.3 play a role in promoter melting (Figure 2A, cyan ␣ carbons; deHaseth and Helmann, 1995; Jones and Moran, 1992; Juang and Helmann, 1994, 1995). Univer-

sally conserved basic residues in regions 2.2 and 2.3 (Taq-Arg237 and Taq-Lys241, corresponding to EcLys414 and Ec-Lys418, blue ␣ carbons in Figure 2A) appear to be critical for DNA binding, probably in a nonsequence-specific manner (Tomsic et al., 2001). The fold of ␴3 is a compact, three-helix domain. An N-terminal, amphipathic ␣ helix consists of a conserved sequence element named region 2.5 (Barne et al., 1997). Genetic studies indicate that two residues on the exposed face of this helix interact with the “extended –10” promoter motif (Figure 2B; Bown et al., 1997; Keilty and Rosenberg, 1987). Based on the structure, we have renamed this conserved region 3.0 since it occurs in a structural domain with region 3.1 rather than with regions 2.1–2.4. Packed against the hydrophobic face of the 3.0 helix is region 3.1, which consists of a helix-turn-helix (HTH) motif found in many DNA binding proteins (Pabo and Sauer, 1992). There is no evidence that this region of ␴ factors participates in DNA binding, and substitutions in this region cause defects in core RNAP binding (Figure 2B; Hernandez and Cashel, 1995; Joo et al., 1998; Sharp et al., 1999). That this motif indeed serves as a core RNAP binding interface is shown in the crystal structures of Taq RNAP holoenzyme (K. S. Murakami, S. Masuda, and S.A.D., submitted) and Taq RNAP holoenzyme bound to a promoter fragment (K. S. Murakami., S. Masuda., E.A.C., O.M., and S.A.D., submitted). The primary ␴’s, such as Taq ␴A, direct transcription through interactions of ␴ region 2 with the –10 element (consensus sequence TATAAT) and of ␴ region 4 with the –35 element (consensus TTGACA). However, ␴ region 4 is dispensable at promoters containing an “extended –10” element (TGnTATAAT; Bown et al., 1997; Keilty and Rosenberg, 1987; Kumar et al., 1993). Apparently, interactions between residues in region 3.0 and the upstream “TG” motif substitute for region 4 interactions with the –35 element. Remarkably, the crystallized fragment ␴2-3 is sufficient for transcription from an extended –10 promoter (Figure 2C, lane 14). Earlier, Severinova et al. (1996) showed that a fragment of E. coli ␴70 comprising regions 1.2–2.4 was not sufficient for transcription from an extended –10 promoter. Thus, we conclude that ␴ regions 1.2–3.1 contain all that is necessary and sufficient for the basic ␴ factor functions of directing RNAP holoenzyme to a specific DNA sequence, melting the double-stranded DNA around the transcription start site and initiating the synthesis of an RNA chain. The activity of ␴2-3 was relatively weak compared to full-length ␴A (Figure 2C), but control experiments confirmed that the observed transcription was not due to contaminating ␴ activities. The weak activity of ␴2-3 was not due to trace amounts of full-length Taq ␴A that could have been introduced with the endogenous Taq core RNAP preparation since Taq core RNAP alone showed no activity (Figure 2C, lane 10). The activity was also not due to contaminating E. coli ␴70 that could have been introduced with the recombinant Taq ␴2-3 prepared from E. coli since even a large molar excess of ␴70 showed no activity when combined with Taq core RNAP (Figure 2C, lane 11). Interestingly, the activity of ␴2-3 could be boosted to levels comparable with full-length Taq ␴A by increasing the concentration of the initiating dinucleotide (Figure 2D), suggesting that the absence of ␴ re-

Structure of the RNA Polymerase ␴ Subunit 531

Figure 2. Structure and Function of ␴2-3 (A) Two views of the ␣ carbon backbone of ␴2, shown as a worm and color-coded as in Figure 1C. The nonconserved insert (gray) between conserved regions 1.2 and 2.1 is partially transparent so as not to obscure features behind it. Marked with spheres and labeled are the ␣ carbon positions of residues shown to be important for critical functions of ␴2: orange, core RNAP binding (Joo et al., 1997; Sharp et al., 1999); green, –10 element recognition (Daniels et al., 1990; Kenney et al., 1989; Siegele et al., 1989; Tatti et al., 1991; Waldburger et al., 1990; Zuber et al., 1989); cyan, universally conserved aromatic residues important for open complex formation (deHaseth and Helmann, 1995; Helmann and Chamberlin, 1988; Jones and Moran, 1992; Juang and Helmann, 1994, 1995); blue, universally conserved basic residues critical for DNA binding (Tomsic et al., 2001). The view on the left is similar to the view of Figure 1C; the view on the right reveals the exposed face on top containing all of the amino acid positions important for DNA interactions and the exposed face of the region 2.2 helix (orange backbone) containing core RNAP mutants. Amino acid labeling is according to Taq ␴A numbering. Corresponding E. coli ␴70 numbering is (Taq/E. coli): L207/L384, V210/V387, L225/L402, D226/D403, Q229/Q406, E230/E407, N232/N409, I236/M413, R237/K414, K241/K418, F248/Y425, Y253/ Y430, W256/W433, W257/W434, Q260/Q437, N263/T440, and R264/R441. (B) The ␣ carbon backbone of ␴3, shown as a worm and color-coded as in Figure 1C. Marked with spheres and labeled are the ␣ carbon positions of residues important for ␴3 functions: orange, core RNAP binding (Hernandez and Cashel, 1995; Joo et al., 1998; Sharp et al., 1999); magenta, extended –10 recognition (Barne et al., 1997). Amino acid labeling is according to Taq ␴A numbering. Corresponding E. coli ␴70 numbering is (Taq/E. coli): H278/H455, E281/E458, M310/M487, S331/S506, and P329/P504. (C) Abortive initiation assays (McClure et al., 1978) on a –10/–35 promoter (T7 A1) and an extended –10 promoter (gal P). Transcription complexes were formed with E. coli (lanes 1, 2, 8, and 9) or Taq (lanes 3–7 and 10–14) core RNAP, the indicated ␴ factor or Taq ␴A fragment, 0.1 mM initiator dinucleotide CpA, and promoter DNA. Transcription was initiated by the addition of [␣-P32]UTP. The reaction product CpApU, where bold type denotes the radioactive label, was separated by denaturing polyacrylamide gel electrophoresis and visualized by autoradiography. (D) Effect of initiator dinucleotide (CpA) concentration on abortive initiation from gal P. Transcription complexes were formed with Taq core RNAP and the indicated Taq ␴A fragment, promoter DNA, and the indicated concentration of CpA and assayed as above. The relative activities determined from the quantitated bands, normalized to 100 for full-length Taq ␴A (lane 2), are shown below the gel.

Molecular Cell 532

Figure 3. Superposition of ␴4 Structures, Crystallization DNA, and Overview of the ␴4-DNA Complex (A) The ␣ carbon backbones of the three independently determined ␴4 structures, shown as worms (blue, ␴4; orange, ␴4-DNA molecule A; yellow, ␴4-DNA molecule B) aligned over the structural core (Taq376-424, corresponding to Ec461-599). (B) Synthetic 11-mer oligonucleotides used for crystallization with –35 element denoted by yellow shading. (C) The contents of two asymmetric units from the ␴4-DNA crystal are shown. The ␴4 molecules are shown as ␣ carbon backbone worms. On the right, ␴4 molecule A (specifically bound to the –35 element DNA) is orange, and molecule B, which does not make specific interactions with the DNA, is yellow. The DNA nontemplate strand is light green, and the template strand is dark green. The bases of the –35 element are yellow. The sequence of the –35 element is denoted for the nontemplate strand. On the left, the asymmetric unit related by crystallographic symmetry is shaded gray. The path of the DNA helix axis, calculated using CURVES (Lavery and Sklenar, 1988), is denoted by a pink line.

gions 3.2–4.2 substantially decreased the apparent Km for the initiating substrate. Since crosslinking results place region 3.2 near the initiating nucleotide (Severinov et al., 1994), we suggest that this effect is mediated by region 3.2. This could occur either directly if residues in region 3.2 interact with the initiating substrate nucleotide or indirectly if region 3.2 influences the initiating nucleotide binding site of the RNAP ␤ subunit (Naryshkina et al., 2001) allosterically. Structure and Function of ␴4 The fold of ␴4 includes a HTH motif (Pabo and Sauer, 1992) comprising conserved region 4.2 (brown in Figure 1C). The overall fold most closely resembles members of the FixJ family of bacterial transcription factors, such as the Bacillus transcription factor GerE (1FSE [Ducros et al., 2001]; DALI Z-score of 5.7 [Holm and Sander, 1996]), and the DNA binding domain of NarL (1A04 [Baikalov et al., 1996]; DALI Z-score of 4.9), as predicted from sequence analysis (Kahn and Ditta, 1991; Lonetto et al., 1992, 1998). The structure of ␴4 was determined independently three times (Figure 3A), since two copies (A and B) occurred in the asymmetric unit of the ␴4-DNA crystal and a single copy (C) occurred in the asymmetric unit of the ␴4 crystal. In the ␴4-DNA crystal, one copy (molecule A) was bound specifically to the –35 element DNA (Figure 3B), while the other copy (molecule B) was incorporated into the crystal lattice mainly through protein-protein interactions without significant interactions with the DNA (Figure 3C). The structural core, which consisted of three compactly folded ␣ helices, was essentially identical between the three copies of ␴4; the root-meansquare deviation of ␣ carbon positions from residues Taq376–424 (Ec461–599), with respect to molecule A

(orange in Figure 3A), was 0.33 A˚ for molecule B (yellow) and 0.62 A˚ for molecule C (blue). The N-terminal (residues Taq366–375, corresponding to Ec451–460) and C-terminal (residues Taq425–438, corresponding to Ec600–613) segments varied substantially between the molecules (Figure 3A). Most of the C-terminal segment in molecule C (after residue Taq428, corresponding to Ec603) was disordered. These structural differences were attributed to crystal packing interactions and not to differences in protein-DNA interactions since the interactions with nucleic acids all occurred within the structural core. In the ␴4-DNA crystal, symmetry-related DNA double helices packed head-to-tail to form a pseudo-continuous double helix with C-C mismatches at the junctions (Figure 3C). Throughout this manuscript, the numbering system for the DNA described in Figure 3B will be used, where negative numbers denote base pairs (bp) upstream of the transcription start site (which is denoted ⫹1), assuming an optimal promoter with a start site 7 bp downstream of the –10 element and a 17 bp spacer between the –10 and –35 elements (Gaal et al., 2001). An unprimed number indicates the nontemplate (or top) strand of the DNA, while primes indicate the template (bottom) strand. In this system, the doublestranded portion of the oligonucleotide used for crystallization extends from –27 to –36 (Figure 3B). The –35 hexamer extends from –30 to –35 (shaded yellow in Figure 3B). Protein-DNA interactions, which occur exclusively from the major groove, extend from –30 to –38, spanning the entire –35 element as well as upstream DNA, including an interaction with the phosphate backbone of the upstream symmetry-related DNA molecule (numbered consecutively as if it were a continuous double helix;

Structure of the RNA Polymerase ␴ Subunit 533

Figure 4). Over this range, the path of the DNA helix axis bends about 36⬚ around the HTH recognition helix inserted deep in the major groove (Figure 3C). The bending of the DNA may be important for the proper orientation of transcription activators that bind upstream of the –35 element (see below). The protein anchors itself on the DNA through extensive interactions with the phosphate backbone on the nontemplate strand from –35 to –38 and the template strand from –31⬘ to –33⬘ (Figure 4). Ethylation of many of these phosphates (–35 to –38, and –31⬘, marked with red dots in Figure 4A) interferes with RNAP binding to the promoter (Siebenlist and Gilbert, 1980; Siebenlist et al., 1980). The interactions with the phosphate backbone of the nontemplate strand are direct, while many of the template strand interactions are water mediated (Figure 4). The backbone phosphate interactions occur through protein side chains from both conserved regions 4.1 and 4.2, as well as from peptide –NH groups of Taq-L398 (Ec573) and Taq-E399 (Ec574) at the beginning of the first helix of the HTH motif. After region 2, region 4 is the most highly conserved of the ␴70 family (Figure 1A; Lonetto et al., 1992). The conclusion from genetic studies (Gardella et al., 1989; Kenney and Moran, 1991; Siegele et al., 1989) that residues in region 4.2 determine sequence-specific interactions with the –35 element is borne out by the structure; protein interactions with the DNA bases occur exclusively through side chains from region 4.2 and, more specifically, side chains emanating from the HTH motif recognition helix. Inferences regarding the details of base recognition by specific residues are not all supported by the structure, however. Siegele et al. (1989) described a substitution of Cys for a universally conserved Arg at position Taq409 (Ec584) in region 4.2 that decreased expression from the wild-type lac promoter and from most of a panel of lac promoter mutants. Two mutant promoters, different from the wild-type promoter only at the –31 bp (⫺31C→T or –31C→G), had significantly increased expression with the mutant protein compared to wild-type, leading to the proposal that Taq-Arg409 (Ec-Arg584) interacts with the CG base pair at –31. Indeed, Taq-Arg409 (Ec-Arg584) donates two hydrogen bonds to the O6 and N7 acceptors of –31⬘G (Figure 4), explaining the effect of the TaqArg409Cys (Ec-Arg584Cys) substitution on expression from the wild-type promoter and explaining the strong, deleterious effect of N7 methylation at this position by dimethyl sulfate on promoter binding by RNAP (Siebenlist and Gilbert, 1980; Siebenlist et al., 1980). RNAP also strongly protects this position against methylation once bound (Johnsrud, 1978; Ross et al., 2001; Siebenlist and Gilbert, 1980; Siebenlist et al., 1980). The increased expression from the mutant promoters observed with the Taq-Arg409Cys (Ec-Arg584Cys) substitution is not explained by the structure. Gardella et al. (1989) described a substitution of His for a universally conserved Arg at position Taq413 (Ec588) in region 4.2 that increased expression from mutant promoters (relative to the wild-type promoter) having a nonconsensus bp at –33 (–33G→A or –33G→C), but not from other mutant promoters. This led to the proposal that Taq-Arg413 (Ec-Arg588) interacts with the GC bp at –33. Kenney and Moran (1991) subsequently

confirmed this observation using Bacillus subtilis ␴A and also showed that the mutant ␴ had decreased expression from the wild-type promoter relative to wild-type ␴A. However, while the crystal structure shows that TaqArg413 (Ec-Arg588) makes water-mediated interactions with the phosphate backbone at –32⬘ and –33⬘ and makes van der Waal’s contact with the edge of the –32⬘T ring, it does not make interactions with the –33 bp (Figure 4). Taq-Arg413 (Ec-Arg588) does appear to be key in positioning another universally conserved residue, TaqGlu410 (Ec-Glu585), which in turn interacts directly with N4 of –33⬘C, as well as making a water-mediated hydrogen-bond with N7 of –34⬘A. Thus, the effects of substitutions at Taq-Arg413 (Ec-Arg588) appear to be indirect and likely result from effects on Taq-Glu410 (Ec-Glu585). Modeling suggests that a His residue at position Taq413 (Ec588) could potentially interact with hydrogen bond acceptors of the T or G nucleotides present at –33⬘ in the mutant promoters, which would explain the increased expression from these promoters by the Taq-Arg413His (Ec-Arg588His) mutant. This prediction needs to be confirmed, however, by structural studies of the mutant protein complexed with mutant promoters. In addition to the key interactions of Taq-Arg409 (Ec-Arg584) and Taq-Glu410 (Ec-Glu585), Taq-Gln414 (Ec-Gln589) makes a hydrogen bond with O4 of –35T. Several side chains make van der Waal’s interactions with the ⫺35 element nucleotides that may also contribute to sequence specificity. Most important among these appear to be between the alkyl chain of TaqArg411 (Ec-Arg586) and the C5-methyl of –35T and close contact between Taq-Arg409 (Ec-Arg584) and the C5methyl of –30⬘T. Examination of the molecular surface of ␴4 in complex with –35 element DNA (Figure 5) reveals interesting properties. Overall, the protein is C shaped, with a concave pocket facing downstream toward the rest of the RNAP (Figure 5A). The inside surface of the pocket consists almost totally of hydrophobic residues (Figure 5A, yellow and orange). Amino acid substitutions in region 4 that cause defects in core RNAP binding have been identified (Joo et al., 1998; Sharp et al., 1999). Most of these mutants occur in or around the edge of the hydrophobic pocket (Figure 5A). The pocket and its hydrophobic surface is also distinct from the DNA binding interface (Figure 5A, blue). Thus, we suggest that ␴4 latches onto the core RNAP through this hydrophobic pocket, the dimensions of which would accommodate an ␣ helix. Activation of transcription is a major regulatory strategy in bacteria. Most bacterial transcription activators bind to sites overlapping or just upstream of promoters and activate transcription through direct contacts with the RNAP. Although activators can target any RNAP subunit, most contact either ␣ or ␴. In many cases, the precise location of the DNA operator to which the activator binds determines the targeted RNAP subunit. When the activator binds at or near the –35 element, ␴4 is frequently a target (Ishihama, 1993). On the basis of mutations in ␴ that selectively disrupt the function of an activator without affecting basal transcription (positive control mutants), at least ten activators are thought to contact ␴ (Gross et al., 1998). These activation-specific mutations cluster in two regions of ␴4 (Figure 5B). One cluster, exemplified by mutants in the first helix

Molecular Cell 534

Figure 4. Protein-DNA Contacts (A) Schematic representation of ␴4-DNA contacts, all from the major groove. The nontemplate strand is light green, and the template strand is dark green. The bases of the –35 element are yellow. The upstream DNA from a symmetry-related DNA molecule is gray. Colored boxes denote ␴4 residues making DNA contact (tan, region 4.1; brown, region 4.2). Connecting solid lines indicate hydrogen bonds (⬍3.2 A˚) or salt bridges (⬍4.0 A˚) with the magenta lines (from Taq-L398 and E399, corresponding to Ec-L573 and E574) denoting main chain –NH contacts. The thick solid lines (from Taq-R379 and R409, corresponding to Ec-R554 and R584) indicate two hydrogen bonds from the same residue. Intervening water molecules are shown as pink circles. Dashed blue lines indicate potential van der Waal’s (hydrophobic) contacts (⬍4.0 A˚). Bridging phosphates with red dots (–35 to –38, and –31⬘) interfere with RNAP binding when ethylated (Siebenlist and Gilbert, 1980; Siebenlist et al., 1980). Dimethyl sulfate methylation of N7 on –31⬘G (underlined in red) strongly inhibits RNAP binding (Siebenlist and Gilbert, 1980; Siebenlist et al., 1980). Binding of RNAP also strongly protects this position against methylation (Johnsrud, 1978; Ross et al., 2001; Siebenlist and Gilbert, 1980; Siebenlist et al., 1980). Amino acid labeling is according to Taq ␴A numbering. Corresponding E. coli ␴70 numbering is (Taq/E. coli): R379/R554, R387/R562, L398/L573, E399/E574, T408/T583, R409/R584, E410/E585, R411/R586, R413/R588, Q414/Q589, and K418/K593. (B) Stereo view showing ␴4-DNA interactions in the major groove of the –35 element DNA. The ␣ carbon backbone of ␴4 is shown as a worm, with region 4.1 colored tan, region 4.2 colored brown, and the rest colored gray. Side chains and main chain nitrogens that contact the DNA are shown (as illustrated schematically in Figure 4A). Potential hydrogen bonds (⬍3.2 A˚) are shown as gray, dashed lines. Carbon atoms of the side chains are colored as the backbone, nitrogen atoms are blue, and oxygen atoms are red. Water molecules mediating protein-DNA contacts are shown in pink. For the DNA, the nontemplate strand is light green, and the template strand is dark green. The bases of the –35 element are yellow. DNA atoms that contact the protein are colored blue (nitrogens), red (oxygens), or cyan (van der Waal’s contacts). DNA from the upstream, symmetry-related molecule is shaded gray. Amino acid labeling is according to Taq ␴A numbering. Corresponding E. coli ␴70 numbering is listed in the legend for Figure 2A.

Structure of the RNA Polymerase ␴ Subunit 535

Figure 5. Surface Properties of ␴4 (A) Orthogonal views of ␴4-DNA, showing surfaces involved in DNA binding and core RNAP binding. The ␣ carbon backbone (at 50% scale) is shown in the same orientation to the lower left of each surface diagram. The DNA nontemplate strand is light green, and the template strand is dark green. The –35 element bases are yellow. The protein surface is gray except for the following color-coding as denoted in the color-wheel (left): residues in the DNA binding interface (⬍4.0 A˚; Figure 4), blue, green, or magenta; hydrophobic residues, yellow, green, or orange; core RNAP binding mutants (Sharp et al., 1999), red, orange, or magenta. The core RNAP binding mutants are labeled as well. The concave pocket coated with hydrophobic residues is indicated. Amino acid labeling is according to Taq ␴A numbering. Corresponding E. coli ␴70 numbering is (Taq/E. coli): R387/R562, K388/F563, L390/I565, and L423/598. (B) Same views of ␴4-DNA as Figure 5A, showing the two clusters of positive control mutants. The DNA is color coded as in Figure 5A, except the backbone of the DNA where the downstream subunits of the dimeric activators catabolite activator protein (CAP), FNR, and phage ␭cI bind in the major groove is red. The protein surface is gray, except selected positive control mutants are color coded as follows: blue, basic residues; red, acidic; cyan, polar; yellow, hydrophobic. The positive control mutants are also labeled; the lowercase black letters indicate which activator function is disrupted by the particular mutant: c, CAP (Lonetto et al., 1998); f, FNR (Lonetto et al., 1998); l, ␭cI (Kuldell and Hochschild, 1994; Li et al., 1994); p, PhoB (Kim et al., 1995). Amino acid labeling is according to Taq ␴A numbering. Corresponding E. coli ␴70 numbering is (Taq/E. coli): E395/D570, H396/Y571, T397/T572, L398/L573, E400/E575, A403/K578, E416/E591, K418/K593, R421/R596, K422/ K597, K424/R599, H426/H600, and R429/R603.

of the HTH motif that disrupt activation by PhoB (Kim et al., 1995), is exposed on the downstream face of the protein above the core binding pocket next to but not overlapping the DNA binding interface (Figure 5B). PhoB appears to function by substituting for ␴4-DNA contacts. Promoters activated by PhoB lack a recognizable –35 element; instead, they contain a PhoB binding site. The PhoB dimer recognizes two half-sites located at –23 to –40 (Makino et al., 1996). The second cluster of activation mutants, exemplified by mutants that disrupt activation of CAP (at class II promoters; Zhou et al., 1994), FNR, or ␭cI, occurs in the

C-terminal part of the HTH recognition helix (Kuldell and Hochschild, 1994; Li et al., 1994; Lonetto et al., 1998). These residues are all positively charged Arg or Lys residues, or polar His residues. They face the upstream part of the promoter and are well positioned to interact with the downstream subunit of the dimeric activators, which all bind in the major groove of the DNA to recognition sequences located at –34 to –38 (Figure 5B). In summary, we have analyzed the proteolytic fragments remaining after in situ proteolysis that occurred during crystallization trials of Taq ␴A to determine that ␴ factors consist of three structured domains connected

Molecular Cell 536

by flexible linkers. We solved crystal structures of the three domains, as well as the ␴4 domain in complex with –35 element DNA. We have shown that only two of these domains (␴2–␴3) are necessary and sufficient for promoter-specific initiation at a specialized promoter. The insights from these studies provide a more complete framework for the design of experiments probing the key role of the ␴ factor in controlling transcription initiation in bacteria. Experimental Procedures Full-Length Taq ␴A Purification, N-terminal Sequencing, and Mass Spectrometry The gene encoding for Taq ␴A (Minakhin et al., 2001) was PCR subcloned into the NdeI/Bpu1102I sites of the pET15B expression vector (Novagen) and transformed into E. coli BL21(DE3)pLysS cells. Transformants were grown at 37⬚C in LB medium supplemented with ampicillin (100 ␮g/ml) and chloramphenicol (34 ␮g/ml) to an A600 of 0.4–0.6. Expression was induced by adding 1 mM isopropyl␤,D-thiogalactopyranoside (IPTG) for 4 hr, after which cells were harvested by centrifugation, lysed in a continuous flow French press, and clarified by centrifugation. Taq ␴A was purified by Ni2⫹-affinity chromatography (HiTrap chelating cartridge, Amersham-Pharmacia Biotech), followed by thrombin cleavage (100 ␮g thrombin/100 mg protein) overnight at 4⬚C, then reapplied to the Ni2⫹-affinity column, and the flowthrough was collected. The protein was further purified by cation exchange chromatography (HiTrap SP-Sepharose HP cartridge, Amersham-Pharmacia Biotech) and gel filtration (Superdex 200, Amersham-Pharmacia Biotech). Pure Taq ␴A was concentrated to 12 mg/ml by centrifugal filtration (Millipore) and exchanged into 10 mM Tris-HCl (pH 8.0), 0.15 M NaCl, and 0.1 mM EDTA (buffer A). The crystals analyzed in Figure 1B (lanes 2 and 3) were obtained using vapor diffusion by mixing 1 ␮l of protein solution (12 mg/ml) with 1 ␮l of 0.1 M MES (pH 6.0), 2.4 M (NH4)2SO4, and 20 mM MgCl2 and incubating at 22.5⬚C. Two crystal forms appeared only after several months, sometimes in the same drops. SDS-PAGE analysis of the washed, dissolved crystals indicated one form contained primarily proteolytic fragment c (Figure 1B, lane 2), while the other contained fragment e (lane 3). N-terminal sequences of the proteolytic fragments were determined by the Protein Chemistry Laboratory of the University of Texas Medical Branch (Galveston, TX). For matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI-TOF-MS), ␣-cyano-4-hydroxy cinnamic acid matrix was prepared as both a saturated solution in a 3:1:2 (v/v/v) mixture of formic acid/water/ isopropanol and a saturated solution in a 2:1 (v/v) mixture of 0.1% TFA/acetonitrile. Protein samples (0.9 mg/ml) were diluted 1:5 in both matrix solutions and then spotted (0.5 ␮l) onto an “ultrathin layered” gold surface (Cadene and Chait, 2000). As soon as the crystalline film appeared homogeneous, excess liquid was removed by vacuum aspiration. The spot was then washed for a few seconds with 2–4 ␮l of cold 0.1% aqueous TFA. Proteins were analyzed within 10 min of dilution in matrix solution to prevent adventitious formylation by formic acid. All spectra were acquired using a MALDITOF mass spectrometer Voyager-DE STR (PE Biosystems) operating in linear, delayed extraction mode. Spectra from 200 individual laser shots were averaged (using 4-ns data channel width) with software provided by the manufacturer. The spectra were smoothed, calibrated, and analyzed using the program M-over-Z (http://proteometrics.com, http://prowl.rockefeller.edu). Purification, Crystallization, and Structure Determination of Taq ␴A Fragments ␴2-3 The ␴2-3 fragment (Table 1) was subcloned into the NdeI/Bpu1102I sites of the pET21A expression vector (Novagen), transformed into E. coli BL21(DE3) cells, induced, and lysed as described for ␴A. After centrifugation, the lysate was heat treated at 65⬚C for 45 min and then recentrifuged. The sample was further purified by (NH4)2SO4 precipitation (50% w/v), gel filtration (Superdex 75, AmershamPharmacia Biotech), and cation exchange chromatography (SP-

sepharose). Pure ␴2-3 was then concentrated in buffer A to 10 mg/ ml by centrifugal filtration. Crystals of recombinant ␴2-3 were obtained using vapor diffusion with 2 ␮l of protein solution (10 mg/ml) and 2 ␮l of crystallization solution (0.1 M MOPS [pH 7.0], 2 M [NH4]2SO4, and 20 mM MgCl2). Hexagonal crystals (0.4 ⫻ 0.3 ⫻ 0.3 mm) grew after 1 week at 22.5⬚C. Selenomethionyl protein was prepared for MAD analysis by suppression of methionine biosynthesis (Doublie, 1997). Crystals were prepared for cryocrystallography by soaking in crystallization solution but with 2.1 M (NH4)2SO4, then dunking in 0.1 M MOPS (pH 7.0), 1.3 M sodium citrate, and 20 mM MgCl2, and incubating for 2 min. Then they were flash frozen in a vial of liquid ethane at liquid nitrogen temperature. MAD data were collected at three wavelengths corresponding to the peak, the inflection, and one remote value of the X-ray absorption spectrum (␭1, ␭2, and ␭3, respectively). The data were processed using DENZO and SCALEPACK (Otwinowski and Minor, 1997). Using the anomalous signal from SeMet(␭1), six of a possible eight selenium sites in the asymmetric unit were located using SnB (Weeks and Miller, 1999), and no additional sites were located using difference Fourier techniques. Phases were calculated using MLPHARE (Otwinowski, 1991). SeMet(␭1) was treated as the reference, and anomalous signals from SeMet(␭1), SeMet(␭2), and SeMet(␭3), along with the isomorphous signal from Native (with negative occupancies for the Se sites), which gave the best electron density map (4.5 A˚ resolution). While some helices were discernible in the map, in general it was uninterpretable. Density modification and phase extension to 2.9 A˚ resolution using DM (Cowtan, 1994) yielded an excellent map, allowing an initial poly-alanine model of the two molecules in the asymmetric unit to be built using O (Jones et al., 1991). The map was improved through iterative cycles of refinement against the Native amplitudes and SIGMAA-weighted phase combination using CNS (Adams et al., 1997), density modification with 2-fold noncrystallographic symmetry (NCS) averaging using DM, and model building using O. Initially, tight NCS restraints between the two ␴2-3 monomers were incorporated. Eventually, the NCS restraints were removed. The final model contains residues 93–332 of both molecules in the asymmetric unit, plus two sulfate ions. PROCHECK (Laskowski et al., 1993) revealed one residue in an unfavorable (φ, ␺) region (this residue is in the poorly ordered loop in the nonconserved insert between conserved regions 1.2 and 2.1) and an overall G factor of 0.4. ␴4 The ␴4 fragment was subcloned into the NdeI/BamHI sites of the pET21A expression vector (Novagen), which was transformed into E. coli BL21(DE3) cells, induced, lysed, and heat treated as described for ␴2-3. The sample was further purified by (NH4)2SO4 precipitation (60% w/v), heparin-affinity chromatography (heparin-sepharose, Amersham-Pharmacia), and cation exchange chromatography (SP-sepharose). The purified ␴4 was concentrated to 6 mg/ml in buffer A by centrifugal filtration, flash frozen, and stored at ⫺80⬚C. Because the native ␴4 fragment contained only one Met residue, we incorporated a Met substitution to increase the likelihood that a MAD experiment with selenomethionyl-␴4 would be successful. Based on an extensive alignment of group 1 and group 2 ␴’s (Gruber and Bryant, 1997), we mutated Taq-Leu386 to Met (Ec-Met561) since, other than Leu, Met was the most frequent residue at this position. The point mutant ␴4L368M was generated using the QuikChange Site-Directed Mutagenesis Kit (Stratagene). Purification of selenomethionyl-␴4L368M was the same as for ␴4. Crystals of recombinant ␴4L386M were obtained using vapor diffusion with 2 ␮l of protein solution (6 mg/ml) and 2 ␮l of crystallization solution (0.1 M MES [pH 6.0], 2.4 M [NH4]2SO4, and 20 mM MgCl2). Hexagonal, rod-shaped crystals (0.4 ⫻ 0.15 ⫻ 0.15 mm) grew after 1 week at 22.5⬚C. These crystals diffracted well beyond 1.5 A˚ resolution, but data collection and refinement is currently limited to 1.8 A˚ (Table 2). Crystals of selenomethionyl-␴4L386M grew at the same conditions but were much smaller in size (0.05 ⫻ 0.02 ⫻ 0.02 mm). Nevertheless, MAD data to 2.35 A˚ resolution was obtained (Table 2). Crystals were prepared for cryocrystallography by soaking in solution A (crystallization solution but with 2.5 M [NH4]2SO4), followed by successive transfers into increasing concentrations of solution B (0.1 M MES [pH 6.0], 6 M sodium formate, and 20 mM MgCl2) in steps of 5% (95% A:5% B, 90% A:10% B, and so on, up to 0%

Structure of the RNA Polymerase ␴ Subunit 537

A:100% B) with 15 min incubation between each transfer. After the final soak in 100% B, the crystals were flash frozen in a vial of liquid ethane at liquid nitrogen temperature. Using the anomalous signal from the data set SeMet(␭1), all three possible selenium sites in the asymmetric unit were located using SnB (Weeks and Miller, 1999). Phases were calculated as described for ␴2-3. The initial 2.5 A˚ resolution electron density map was excellent; density modification and phase extension to 1.8 A˚ resolution using the Native amplitudes (Table 2) with DM (Cowtan, 1994) gave slight improvement. A model (residues 373–428, complete with side chains) was built automatically using ARP/wARP 5.1 (Perrakis et al., 1999). Iterative refinement against the Native amplitudes and SIGMAA-weighted phase combination with CNS (Adams et al., 1997) and model building improved the map. The final model contains residues 368–428 of ␴4 (a few residues at the N terminus and ten residues at the C terminus were disordered), along with 39 water molecules. PROCHECK (Laskowski et al., 1993) revealed no residues in disallowed (φ, ␺) regions and an overall G factor of 0.2. ␴4-DNA Lyophilized, tritylated, single-stranded oligonucleotides (Oligos Etc.) were detritylated and purified as described (Aggarwal, 1990). Dried oligonucleotides were dissolved in 10 mM TEAB (pH 8.5) to a concentration of 2 mM. Equimolar amounts of complementary oligonucleotides were annealed by heating to 90⬚C for 5 min and cooling to 22⬚C at a rate of 0.01⬚C/s. Annealed oligonucleotides were dried in a SpeedVac (Savant) and stored at ⫺80⬚C. Selenomethionyl-␴4L386M and double-stranded DNA were mixed at a ratio of 1:2.7 with a final concentration of duplex DNA of 1 mM. Samples were incubated at 25⬚C for 30 min and used in vapor diffusion crystallization trials. The optimum crystallization conditions were: 1–3 ␮l of protein/DNA solution mixed with 1 ␮l of crystallization solution (50 mM MES [pH 5.5], 4%–7% PEG 4000, and 80 mM magnesium acetate) and incubated over the same solution at 22⬚C. Thin, plate-shaped crystals (0.3 ⫻ 0.3 ⫻ 0.05 mm) grew in 1–2 weeks. Crystals were prepared for cryocrystallography by soaking in stabilization solution (crystallization solution but with 8% PEG 4000), followed by successive transfers into stabilization solution with 5%, 10%, 15%, and finally 20% glycerol (v/v), with 15 min incubation between each transfer. After the final soak, the crystals were flash frozen in a vial of liquid ethane at liquid nitrogen temperature. Using the anomalous signal from the data set SeMet(␭1), three of a possible six selenium sites in the asymmetric unit were located using SnB (Weeks and Miller, 1999). Phases were calculated as described for ␴2-3. While some helices were discernible in the 2.5 A˚ resolution map, in general it was uninterpretable. Density modification and phase extension to 2.4 A˚ resolution using SOLOMON (Abrahams and Leslie, 1996) yielded an excellent map, allowing an initial model containing most of molecule A and the DNA. Iterative refinement against the SeMet(␭1) amplitudes and SIGMAA-weighted phase combination with CNS (Adams et al., 1997) and model building improved the map. The final model contains residues 366–438 of molecule A, 371–438 of molecule B, all 22 nt of the DNA, and 66 water molecules. PROCHECK (Laskowski et al., 1993) revealed no residues in disallowed (φ, ␺) regions and an overall G factor of 0.2. Abortive Initiation Transcription Assay Reactions were performed in 10 ␮l of standard transcription buffer containing 20 mM Tris-HCl (pH 8.0), 50 mM NaCl, and 5 mM MgCl2. Core RNAP (final concentration of 0.1 ␮M) was incubated with 0.5 ␮M of ␴ (or its derivatives) for 15 min at 45⬚C, followed by the addition of CpA dinucleotide (0.1 mM or as indicated) and 0.1 ␮M of T7 A1 or gal P promoter fragment and incubation for another 10 min at 45⬚C. Reactions were initiated by the addition of [␣-P32]UTP (NEN), allowed to proceed for 15 min at 45⬚C, and stopped by the addition of 10 ␮l of gel loading buffer containing 8 M urea in TBE. Reaction products were separated on a 23% polyacrylamide gel in 7 M urea and visualized by PhosphorImagery (Molecular Dynamics). Acknowledgments We are deeply indebted to K.R. Rajashankar at the National Synchrotron Light Source for support at beamline X9A and M. Becker and

L. Berman at beamline X25. We thank K. Murakami for advice and for allowing us to cite unpublished results, and J.B. Bonanno, A. Mustaev, S. Nair, and M. Young for help, advice, and invaluable discussions. We also thank B.T. Chait for support. Figures 1C, 3C, and 4B were made using RIBBONS (Carson, 1991). Figures 2A, 2B, 3A, and 5 were made using GRASP (Nicholls et al., 1991). E.C. was supported by a National Research Service Award (NIH GM20470). M.T.-Z. was supported by a Burroughs Wellcome predoctoral fellowship. This work was supported by a National Center for Research Resources Grant (RR00862 to B.T. Chait) and NIH grant GM53759 to S.A.D. Received October 3, 2001; revised January 11, 2002. References Abrahams, J.P., and Leslie, A.G.W. (1996). Methods used in the structure determination of bovine mitochondrial F1 ATPase. Acta Crystallogr. D52, 30–42. Adams, P.D., Pannu, N.S., Read, R.J., and Brunger, A.T. (1997). Cross-validated maximum likelihood enhances crystallographic simulated annealing refinement. Proc. Natl. Acad. Sci. USA 94, 5018– 5023. Aggarwal, A.K. (1990). Cystallization of DNA binding proteins with oligo deoxynucleotides. Methods. 1, 83–90. Arthur, T.M., Anthony, L.C., and Burgess, R.R. (2000). Mutational analysis of beta⬘ 260–309, a sigma 70 binding site located on Escherichia coli core RNA polymerase. J. Biol. Chem. 275, 23113–23119. Baikalov, L., Schroder, L., Kaczor-Grzeskowiak, M., Grzeskowiak, K., Gunsalus, R., and Dickerson, R.E. (1996). Structure of the Esherichia coli response regulator NarL. Biochemistry 35, 11053–11061. Barne, K.A., Bown, J.A., Busby, S.J.W., and Minchin, S.D. (1997). Region 2.5 of the Escherichia coli RNA polymerase s70 subunit is responsible for the recognition of the ’extended -10’ motif at promoters. EMBO J. 16, 4034–4040. Bown, J.A., Barne, K.A., Minchin, S.D., and Busby, S.J.W. (1997). Extended ⫺10 promoters. In Nucleic Acids and Molecular Biology. Mechanisms of Transcription, F. Eckstein, and D.M.J. Lilley, eds. (New York: Springer), pp. 41–52. Burgess, R.R., Travers, A.A., Dunn, J.J., and Bautz, E.K.F. (1969). Factor stimulating transcription by RNA polymerase. Nature 221, 43–44. Cadene, M., and Chait, B.T. (2000). A robust, detergent-friendly method for mass spectrometric analysis of integral membrane proteins. Anal. Chem. 72, 5655–5658. Callaci, S., Heyduk, E., and Heyduk, T. (1999). Core RNA polymerase from E. coli induces a major change in the domain arrangement of the ␴70 subunit. Mol. Cell 3, 229–238. Campbell, E.A., and Darst, S.A. (2000). The anti-␴ factor SpoIIAB forms a 2:1 complex with ␴F, contacting multiple conserved regions of the ␴ factor. J. Mol. Biol. 300, 17–28. Carson, M. (1991). RIBBONS 2.0. J. Appl. Crystallogr. 24, 958–961. Chen, Y.F., and Helmann, J.D. (1995). The Bacillus subtilis flagellar regulatory protein ␴D: overproduction, domain analysis and DNAbinding properties. J. Mol. Biol. 249, 743–753. Cowtan, K. (1994). Dm-density modification package. ESF/CCP4 Newsletter 31, 34–38. Daniels, D., Zuber, P., and Losick, R. (1990). Two amino acids in an RNA polymerase ␴ factor involved in the recognition of adjacent base pairs in the ⫺10 region of a cognate promoter. Proc. Natl. Acad. Sci. USA 87, 8075–8079. Darst, S.A., Roberts, J.W., Malhotra, A., Marr, M., Severinov, K., and Severinova, E. (1997). Pribnow box recognition and melting by Escherichia coli RNA polymerase. In Nucleic Acids & Molecular Biology, F. Ekstein, and D.M.J. Lilley, eds. (London: Springer), pp. 27–40. deHaseth, P.L., and Helmann, J.D. (1995). Open complex formation by Escherichia coli RNA polymerase: the mechanism of polymeraseinduced strand separation of double helical DNA. Mol. Microbiol. 16, 817–824.

Molecular Cell 538

Doublie, S. (1997). Preparation of selenomethionyl proteins for phase determination. Methods Enzymol. 276, 523–530. Ducros, V.M., Lewis, R.J., Verma, C.S., Dodson, E.J., Leonard, G., Turkenburg, J.P., Murshudov, G.N., Wilkinson, A.J., and Brannigan, J.A. (2001). Crystal structure of GerE, the ultimate transcriptional regulator of spore formation in Bacillus subtilis. J. Mol. Biol. 306, 759–771. Eichenberger, P., Dethiollaz, S., Buc, H., and Geiselmann, J. (1997). Structural kinetics of transcription activation at the malT promoter of Escherichia coli by UV laser footprinting. Proc. Natl. Acad. Sci. USA 94, 9022–9027. Gaal, T., Ross, W., Estrem, S.T., Nguyen, L.H., Burgess, R.R., and Gourse, R.L. (2001). Promoter recognition and discrimination by E␴S RNAP. Mol. Microbiol. 42, 939–954. Gardella, T., Moyle, T., and Susskind, M.M. (1989). A mutant Escherichia coli sigma 70 subunit of RNA polymerase with altered promoter specificity. J. Mol. Biol. 206, 579–590. Gribskov, M., and Burgess, R.R. (1986). Sigma factors from E. coli, B. subtilis, phase SPO1, and phage T4 are homologous proteins. Nucleic Acids Res. 14, 6745–6763. Gross, C.A., Lonetto, M., and Losick, R. (1992). Bacterial sigma factors. In Transcriptional Regulation, K. Yamamoto and S. McKnight, eds. (Cold Spring Harbor, NY: Cold Spring Harbor Laboratory). Gross, C.A., Chan, C., Dombroski, A., Gruber, T., Sharp, M., Tupy, J., and Young, B. (1998). The functional and regulatory roles of sigma factors in transcription. Cold Spring Harb. Symp. Quant. Biol. 63, 141–155. Gruber, T.M., and Bryant, D.A. (1997). Molecular systematic studies of eubacteria, using sigma70-type sigma factors of group 1 and group 2. J. Bacteriol. 179, 1734–1747. Helmann, J.D., and Chamberlin, M.J. (1988). Structure and function of bacterial sigma factors. Annu. Rev. Biochem. 57, 839–872. Hernandez, V.J., and Cashel, M. (1995). Changes in conserved region 3 of Escherichia coli sigma70 mediate ppGpp-dependent functions in vivo. J. Mol. Biol. 252, 536–549. Holm, L., and Sander, C. (1996). Mapping the protein universe. Science 273, 595–602. Ishihama, A. (1993). Protein-protein communication within the transcription apparatus. J. Bacteriol. 175, 2483–2489. Johnsrud, L. (1978). Contacts between Escherichia coli RNA polymerase and a lac operon promoter. Proc. Natl. Acad. Sci. USA 75, 5314–5318. Jones, C.H., and Moran, C.P.J. (1992). Mutant ␴ factor blocks transition between promoter binding and initiation of transcription. Proc. Natl. Acad. Sci. USA 89, 1958–1962. Jones, T.A., Zou, J.-Y., Cowan, S., and Kjeldgaard, M. (1991). Improved methods for building protein models in electron density maps and the location of errors in these models. Acta Crystallogr. A47, 110–119. Joo, D.M., Ng, N., and Calendar, R. (1997). A sigma32 mutant with a single amino acid change in the highly conserved region 2.2 exhibits reduced core RNA polymerase affinity. Proc. Natl. Acad. Sci. USA 94, 4907–4912. Joo, D.M., Nolte, A., Calendar, R., Zhou, Y.N., and Jin, D.J. (1998). Multiple regions on the Escherichia coli heat shock transcription factor sigma32 determine core RNA polymerase binding specificity. J. Bacteriol. 180, 1095–1200. Juang, Y.-L., and Helmann, J.D. (1994). A promoter melting region in the primary sigma factor of Bacillus subtilis: identification of functionally important aromatic amino acids. J. Mol. Biol. 235, 1470– 1488. Juang, Y.-L., and Helmann, J.D. (1995). Pathway of promoter melting by Bacillus subtilis RNA polymerase at a stable RNA promoter: effects of temperature, ␦ protein, and ␴ factor mutations. Biochemistry 34, 8465–8473. Kahn, D., and Ditta, G. (1991). Modular structure of FixJ: homology of the transcriptional activator domain with the ⫺35 binding domain of sigma factors. Mol. Microbiol. 5, 987–997.

Keilty, S., and Rosenberg, M. (1987). Constitutive function of a positively regulated promoter reveals new sequences essential for activity. J. Biol. Chem. 262, 6389–6395. Kenney, T.J., and Moran, C.P.J. (1991). Genetic evidence for interaction of sigmaA with two promoters in Bacillus subtilis. J. Bacteriol. 173, 3282–3290. Kenney, T.J., York, K., Youngman, P., and Moran, C.P.J. (1989). Genetic evidence that RNA polymerase associated with sA factor uses a sporulation-specific promoter in Bacillus subtilis. Proc. Natl. Acad. Sci. USA 86, 9109–9113. Kim, S.-K., Makino, K., Amemura, M., Nakata, A., and Shinagawa, H. (1995). Mutational analysis of the role of the first helix of region 4.2 of the sigma70 subunit of Escherichia coli RNA polymerase in transcriptional activation by activator protein PhoB. Mol. Gen. Genet. 248, 1–8. Kuldell, N., and Hochschild, A. (1994). Amino acid substitutions in the ⫺35 recognition motif of sigma 70 that result in defects in phage lambda repressor-stimulated transcription. J. Bacteriol. 176, 2991– 2998. Kumar, A., Malloch, R.A., Fujita, N., Smillie, D.A., Ishihama, A., and Hayward, R.S. (1993). The minus 35-recognition region of Escherichia coli sigma 70 is inessential for initiation of transcription at an “extended minus 10” promoter. J. Mol. Biol. 232, 406–418. Laskowski, R.A., MacArthur, M.W., Moss, D.S., and Thornton, J.M. (1993). PROCHECK—a program to check the stereochemical quality of protein structures. J. Appl. Crystallogr. 26, 283–291. Lavery, R., and Sklenar, H. (1988). The definition of generalized helicoidal parameters and of axis curvature for irregular nucleic acids. J. Biomol. Struct. Dyn. 6, 63–91. Li, M., Moyle, H., and Susskind, M.M. (1994). Target of the transcriptional activation function of phage lambda cI protein. Science 263, 75–77. Lonetto, M., Gribskov, M., and Gross, C.A. (1992). The ␴70 family: sequence conservation and evolutionary relationships. J. Bacteriol. 174, 3843–3849. Lonetto, M.A., Rhodius, V., Lamberg, K., Kiley, P., Busby, S., and Gross, C. (1998). Identification of a contact site for different transcription activators in region 4 of the Escherichia coli RNA polymerase sigma70 subunit. J. Mol. Biol. 284, 1353–1365. Lowe, P.A., and Malcolm, A.D.B. (1976). Structural properties of Escherichia coli RNA polymerase subunits. Eur. J. Biochem. 64, 177–188. Makino, K., Amemura, M., Kawamoto, T., Kimura, S., Shinagawa, H., Nakata, A., and Suzuki, M. (1996). DNA binding of PhoB and its interaction with RNA polymerase. J. Mol. Biol. 259, 15–26. Malhotra, A., Severinova, E., and Darst, S.A. (1996). Crystal structure of a ␴70 subunit fragment from Escherichia coli RNA polymerase. Cell 87, 127–136. McClure, W.R., Cech, C.L., and Johnston, D.E. (1978). A steady state assay for the RNA polymerase initiation reaction. J. Biol. Chem. 253, 8941–8948. Mecsas, J., Cowing, D.W., and Gross, C.A. (1991). Development of RNA polymerase-promoter contacts during open complex formation. J. Mol. Biol. 220, 585–597. Minakhin, L., Nechaev, S., Campbell, E.A., and Severinov, K. (2001). Recombinant Thermus aquaticus RNA polymerase, a new tool for structure-based analysis of transcription. J. Bacteriol. 183, 71–76. Nagai, H., and Shimamoto, N. (1997). Regions of the Escherichia coli primary sigma factor sigma70 that are involved in interaction with RNA polymerase core enzyme. Genes Cells 2, 725–734. Naryshkina, T., Mustaev, A., Darst, S.A., and Severinov, K. (2001). The beta⬘ subunit of Escherichia coli RNA polymerase is not required for interaction with initiating nucleotide but is necessary for interaction with rifampicin. J. Biol. Chem. 276, 13308–13313. Nicholls, A., Sharp, K.A., and Honig, B. (1991). Protein folding and association: insights from the interfacial and thermodynamic properties of hydrocarbons. Proteins 11, 281–296. Otwinowski, Z. (1991). Maximum likelihood refinement of heavyatom parameters in isomorphous replacement and anomalous scat-

Structure of the RNA Polymerase ␴ Subunit 539

tering. In Proceedings of the CCP4 Study Weekend, W. Wolf, P.R. Evans, and A.G.W. Leslie, eds. (Warrington, UK: SERC Daresbury Laboratory), pp. 80–86.

Zuber, P., Healy, J., Carter, H.L., III, Cutting, S., Moran, C.P., Jr., and Losick, R. (1989). Mutation changing the specificity of an RNA polymerase sigma factor. J. Mol. Biol. 206, 605–614.

Otwinowski, Z., and Minor, W. (1997). Processing of X-ray diffraction data collected in oscillation mode. Methods Enzymol. 276, 307–326.

Accession Numbers

Pabo, C.O., and Sauer, R.T. (1992). Transcription factors: structural families and principles of DNA recognition. Annu. Rev. Biochem. 61, 1053–1095.

Structure coordinates are available from the Protein Data Bank (␴2–␴3, PDB ID 1KU2; ␴4, PDB ID 1KU3; ␴4/DNA, PDB ID 1KU7).

Perrakis, A., Morris, R., and Lamzin, V.S. (1999). Automated protein model building combined with iterative structure refinement. Nat. Struct. Biol. 6, 458–463. Roberts, C.W., and Roberts, J.W. (1996). Base-specific recognition of the nontemplate strand of promoter DNA by E. coli RNA polymerase. Cell 86, 495–501. Ross, W., Ernst, A., and Gourse, R.L. (2001). Fine structure of E. coli RNa polymerase-promoter interactions: alpha subunit binding to the UP element minor groove. Genes Dev. 15, 491–506. Schickor, P., Metzger, W., Wladyslaw, W., Lederer, H., and Heumann, H. (1990). Topography of intermediates in transcription initiation of E. coli. EMBO J. 9, 2215–2220. Severinov, K., Fenyo¨, D., Severinova, E., Mustaev, A., Chait, B.T., Goldfarb, A., and Darst, S.A. (1994). The sigma subunit conserved region 3 is part of “5⬘-face” of active center of Escherichia coli RNA polymerase. J. Biol. Chem. 269, 20826–20828. Severinova, E., Severinov, K., Fenyo¨, D., Marr, M., Brody, E.N., Roberts, J.W., Chait, B.T., and Darst, S.A. (1996). Domain organization of the Escherichia coli RNA polymerase ␴70 subunit. J. Mol. Biol. 263, 637–647. Sharp, M.M., Chan, C.L., Lu, C.Z., Marr, M.T., Nechaev, S., Merritt, E.W., Severinov, K., Roberts, J.W., and Gross, C.A. (1999). The interface of sigma with core RNA polymerase is extensive, conserved, and functionally specialized. Genes Dev. 13, 3015–3026. Siebenlist, U., and Gilbert, W. (1980). Contacts between Escherichia coli RNA polymerase and an early promoter of phage T7. Proc. Natl. Acad. Sci. USA 77, 122–126. Siebenlist, U., Simpson, R.B., and Gilbert, W. (1980). E. coli RNA polymerase interacts homologously with two different promoters. Cell 20, 269–281. Siegele, D.A., Hu, J.C., Walter, W.A., and Gross, C.A. (1989). Altered promoter recognition by mutant forms of the sigma 70 subunit of Escherichia coli RNA polymerase. J. Mol. Biol. 206, 591–603. Stragier, P., Parsot, C., and Bouvier, J. (1985). Two functional domains conserved in major and alternate bacterial sigma factors. FEBS Lett. 187, 11–15. Tatti, K.M., Jones, C.H., and Moran, C.P.J. (1991). Genetic evidence for interaction of sigma E with the spoIIID promoter in Bacillus subtilis. J. Bacteriol. 173, 7828–7833. Tomsic, M., Tsujikawa, L., Panaghie, G., Wang, Y., Azok, J., and deHaseth, P.L. (2001). Different roles for basic and aromatic amino acids in conserved region 2 of Escherichia coli sigma70 in the nucleation and maintenance of the single-stranded DNA bubble in open RNA polymerase-promoter complexes. J. Biol. Chem. 276, 31891– 31896. Travers, A.A., and Burgess, R.R. (1969). Cyclic re-use of the RNA polymerase sigma factor. Nature 222, 537–540. Waldburger, C., Gardella, T., Wong, R., and Susskind, M.M. (1990). Changes in conserved region 2 of Escherichia coli sigma 70 affecting promoter recognition. J. Mol. Biol. 215, 267–276. Weeks, C.M., and Miller, R. (1999). The design and implementation of SnB v2.0. J. Appl. Crystallogr. 32, 120–124. Young, B.A., Anthony, L.C., Gruber, T.M., Arthur, T.M., Heyduk, E., Lu, C.Z., Sharp, M.M., Heyduk, T., Burgess, R.R., and Gross, C.A. (2001). A coiled-coil from the RNA polymerase beta⬘ subunit allosterically induces selective nontemplate strand binding by sigma(70). Cell 105, 935–944. Zhou, Y., Pendergrast, P.S., Bell, A., Williams, R., Busby, S., and Ebright, R.H. (1994). The functional subunit of a dimeric transcription activator protein depends on promoter architecture. EMBO J. 13, 4549–4557.