Gene 275 (2001) 241–252 www.elsevier.com/locate/gene
Construction of a mini-intein fusion system to allow both direct monitoring of soluble protein expression and rapid purification of target proteins Aihua Zhang a, Sandra M. Gonzalez b, Eric J. Cantor a, Shaorong Chong a,* a
New England Biolabs, Inc., 32 Tozer Road, Beverly, MA 01915, USA Department of Chemical Sciences, University of Chihuahua, Chihuahua, Mexico
Received 25 June 2001; received in revised form 31 July 2001; accepted 8 August 2001 Received by A. Bernardi
Abstract Affinity purification of recombinant proteins has been facilitated by fusion to a modified protein splicing element (intein). The fusion protein expression can be further improved by fusion to a mini-intein, i.e. an intein that lacks an endonuclease domain. We synthesized three mini-inteins using overlapping oligonucleotides to incorporate Escherichia coli optimized codons and allow convenient insertion of an affinity tag between the intein (predicted) N- and C-terminal fragments. After examining the splicing and cleavage activities of the synthesized mini-inteins, we chose the mini-intein most efficient in thiol-induced N-terminal cleavage for constructing a novel intein fusion system. In this system, green fluorescent protein (GFP) was fused to the C-terminus of the affinity-tagged mini-intein whose N-terminus was fused to a target protein. The design of the system allowed easy monitoring of soluble fusion protein expression by following GFP fluorescence, and rapid purification of the target protein through the intein-mediated cleavage reaction. A total of 17 target proteins were tested in this intein-GFP fusion system. Our data demonstrated that the fluorescence of the induced cells could be used to measure soluble expression of the intein fusion proteins and efficient intein cleavage activity. The final yield of the target proteins exhibited a linear relationship with whole cell fluorescence. The intein-GFP system may provide a simple route for monitoring real time soluble protein expression, predicting final product yields, and screening the expression of a large number of recombinant proteins for rapid purification in high throughput applications. q 2001 Elsevier Science B.V. All rights reserved. Keywords: Protein splicing; Cleavage; Green fluorescent protein
1. Introduction Affinity purification of recombinant proteins has been greatly facilitated by the introduction of a protein splicing element (termed intein; Perler et al., 1994) as a fusion partner with an affinity tag and a target protein (Xu et al., 2000). The intein, capable of catalyzing peptide bond cleavage at one of its termini, separates the target protein from the fusion tag without the need for cleavage by a protease normally Abbreviations: bFGF, basic fibroblast growth factor; CBD, chitin-binding domain; DTT, 1,4-dithiothreitol; GFP, green fluorescent protein; GST, glutathione S-transferase; IPTG, isopropyl b-d-thiogalactopyranoside; MBP, maltose-binding protein; NEB, New England Biolabs, Inc; PCR, polymerase chain reaction; PGK, phosphoglycerate kinase; Pho, Pyrococus horikoshii OT3; PNK, polynucleotide kinase; Sce, Saccharomyces cerevisiae; SDS-PAGE, sodium dodecyl sulfate-polyacrylamide gel electrophoresis; Ssp, Synechocystis sp. * Corresponding author. Tel.: 11-978-927-5054, ext. 324; fax: 11-978921-1350. E-mail address: [email protected]
employed by other affinity purification systems (LaVallie and McCoy, 1995). Inteins, originally discovered as intervening sequences embedded in a protein precursor, catalyze their own excision and concomitantly ligate the flanking sequences in a posttranslational process termed protein splicing (Hirata et al., 1990; Kane et al., 1990). Protein splicing involves both peptide bond cleavage and ligation at the splicing junctions. By appropriate amino acid substitution(s), an intein can be modified to catalyze only cleavage at either or both of its termini (Xu et al., 2000). More than 100 inteins have been identified in various organisms (Perler, 2000). The 50 kDa intein of the 69 kDa vacuolar membrane ATPase subunit from Saccharomyces cerevisiae (Sce VMA intein) was the first intein to be discovered (Hirata et al., 1990; Kane et al., 1990) and was also the first intein used in an inteinmediated affinity protein purification system (Chong et al., 1997). In this system, the modified Sce VMA intein was fused between a target protein at its N-terminus and a chitin-binding domain (CBD) from Bacillus circulans (Watanabe et al.,
0378-1119/01/$ - see front matter q 2001 Elsevier Science B.V. All rights reserved. PII: S 0378-111 9(01)00663-1
A. Zhang et al. / Gene 275 (2001) 241–252
1994) at its C-terminus. The CBD served as an affinity tag to allow binding of the fusion protein to a chitin resin. After addition of a thiol reagent, e.g. dithiothreitol (DTT), the intein was induced to undergo peptide bond cleavage at its N-terminus, which released the target protein from the rest of the fusion protein still immobilized on the resin and resulted in a single column purification of the target protein (Fig. 1) (Chong et al., 1997). Since the establishment of the first intein purification system, many advances have been made in several other intein fusion systems (Xu et al., 2000). For instance, the target protein can be fused to the C-terminus of the modified intein allowing the N-terminal sequence of the fusion protein to be modified for optimal protein expression (Chong et al., 1998a). The intein catalyzed cleavage reaction can be induced not only by addition of thiols but also by changing pH and/or temperature (Evans et al., 1999; Mathys et al., 1999; Southworth et al., 1999). The Sce VMA intein (50 kDa) has been replaced by ‘mini-inteins’, which due to their smaller molecular weights (20–30 kDa) can potentially increase protein expression and the final yield. Mini-inteins are inteins that consist of only the splicing domain and lack the endonuclease domain. A ‘full-length’ intein, such as the Sce VMA intein, contains an endonuclease domain between the N-terminal and C-terminal fragments of the splicing domain (Duan et al., 1997). Most mini-inteins examined to date are naturally occurring inteins (e.g. the mini-intein
from Mycobacterium xenopi gyrase A (Mxe Gyr A intein)) (Telenti et al., 1997). However, some mini-inteins have been engineered by deletion of the endonuclease domain from a ‘full-length’ intein (e.g. in the case of the Sce VMA intein (Chong and Xu, 1997) and the intein from the Synechocystis sp. dnaB gene (Ssp DnaB intein) (Wu et al., 1998b)). A mini-intein from the Ssp dnaE gene (Ssp DnaE intein) is a natural split intein whose N-terminal and C-terminal fragments were found in two separate coding regions (Wu et al., 1998a). The Ssp DnaE intein has been shown to catalyze trans-splicing when expressed both in its natural host and in a foreign protein context (Wu et al., 1998a; Evans et al., 2000; Martin et al., 2001). All inteins, ‘full-length’ inteins or mini-inteins, follow a similar protein splicing pathway (Xu and Perler, 1996; Chong and Xu, 1997; Southworth et al., 2000). Though intein fusion systems have simplified affinity purification, both target protein and intein have to be expressed in a soluble, correctly folded form in order for intein-mediated purification to be effective. Fusion proteins produced in Escherichia coli (E. coli) sometimes misfold resulting in very low expression or inclusion bodies. The amount of work necessary to optimize expression of fusion proteins can be tremendous especially if a large number of recombinant proteins are to be expressed and purified in the intein fusion systems. Thus, a simple ‘reporter’ system that can be used to indicate soluble expression of recombinant
Fig. 1. (A) Schematic illustration of the intein-mediated affinity protein purification. A target protein is fused to a ‘CBD-tagged’ intein which is capable of inducible self-cleavage. The CBD allows the fusion protein to bind to chitin resin. The intein is then induced to cleave the target protein from the rest of the fusion protein still immobilized on the column. This results in a single step affinity purification of a target protein without further treatment with a protease. (B) GFP as a C-terminal domain fused to a modified intein to facilitate protein purification. A C-terminal GFP domain ‘reports’ the solubility and folding status of the upstream protein domains before cell lysis and electrophoretic analyses. A soluble and correctly folded intein fusion protein is more likely to result in effective intein-mediated purification.
A. Zhang et al. / Gene 275 (2001) 241–252
proteins before cell lysis and electrophoretic analyses is highly desirable. Green fluorescent protein (GFP) is such a reporter protein (Prasher et al., 1992; Chalfie et al., 1994). Formation of the chromophore of GFP depends on the correct folding of the protein and the GFP fluorescence can be easily detected in living cells (Cody et al., 1993; Reid and Flynn, 1997; Cha et al., 2000). It has been shown that by fusion of a C-terminal GFP moiety to a target protein, the fluorescence of cells expressing the GFP fusion is directly related to the soluble expression of the target protein when expressed alone (Waldo et al., 1999). To take advantage of the ability of GFP to ‘report’ soluble protein expression and the self-cleavage activity of the intein to allow rapid purification of a target protein free of any fusion tag, we constructed a fusion system in which a mini-intein was fused between a C-terminal GFP domain and an N-terminal target protein. To select an appropriate mini-intein for the system, we characterized three miniinteins (the Ssp DnaE intein (Wu et al., 1998a), a mini-intein from Pyrococus horikoshii OT3 RadA protein (Pho RadA intein) and a mini-intein from Pyrococus horikoshii DNA polymerase II (Pho PolII intein) (Kawarabayasi et al., 1998)) for their ability to catalyze protein splicing and cleavage. The genes for the mini-inteins were synthesized using overlapping oligonucleotides to incorporate optimized codons for E. coli expression and introduce convenient restriction sites for insertion of the CBD as an affinity tag. The modified Ssp DnaE intein containing the CBD insertion was used in the intein-GFP fusion system due to its ability to catalyze the most efficient thiol-induced N-terminal cleavage in comparison to other mini-inteins. A total of 17 target proteins were expressed and purified in the intein-GFP fusion system. The data showed that the GFP fluorescence could be used to indicate soluble expression of the intein fusion protein and efficient intein-catalyzed cleavage. The fluorescence of induced cells was closely correlated to the final protein yields of the intein-mediated purification.
2. Materials and methods 2.1. Construction of synthetic mini-intein genes Protein sequences of the Ssp DnaE intein, the Pho PolII intein and the Pho RadA intein were back-translated into DNA sequences with E. coli codon usage using the GCG sequence analysis package (Devereux et al., 1984; Womble, 2000). In the case of the Ssp DnaE intein, ten overlapping oligonucleotides (30–60 mer) (five forward and five reverse, Fig. 2) were used to synthesize the N-terminal fragment (123 residues) and an additional five overlapping oligonucleotides were used for the C-terminal fragment (36 residues) (Fig. 2). Except those specified, all enzymes and reagents were from New England Biolabs (NEB, Beverly, MA). The overlapping oligonucleotides (250 pmol each) were mixed and phosphorylated in a reaction mixture (60 ml) containing T4 poly-
nucleotide kinase (PNK) (50 units), 50 mM Tris–HCl (pH 7.5), 10 mM MgCl2, 1 mM ATP and 10 mM DTT at 378C for 30 min. The mixture was heated at 958C for 4 min and allowed to cool down to room temperature over a 2 h period. The annealed oligonucleotides for the synthesis of the Nterminal fragment were extended to fill in the gaps on the double-stranded DNA in a reaction mixture (120 ml) containing Klenow DNA polymerase (5 units), T4 DNA ligase (200 units), 400 mM dNTP, 50 mM Tris–HCl (pH 7.5), 10 mM MgCl2, 1 mM ATP and 10 mM DTT. The reaction mixture was incubated at 258C for 20 min and the ligation product was purified using QIAquick PCR purification kit (QIAGEN Inc., Valencia, CA). An aliquot of the purified reaction mixture (5 ml) was used as the template for the following PCR reaction. PCR mixtures (100 ml) contained Thermopol buffer, 4 mM MgSO4, 400 mM of each dNTP, 1 mM each of the forward oligonucleotide 1 (Fw1) and reverse oligonucleotide 5 (Rv5), and 1.0 units of Vent DNA polymerase. Amplification was carried out using a GeneMate thermal cycler at 958C for 1 min, 558C for 1 min and 728C for 1 min for 25 cycles. The PCR products were gel-purified and after double digestion with XhoI and NheI were ligated to the pMY(B)T4 vector (Chong et al., 1998a) to yield pMEn(B)T4. The annealed oligonucleotides for the synthesis of the C-terminal fragment generated the double-stranded DNA without gaps and were directly cloned into pMEn(B)T4 after digestion with NheI and AgeI to yield pMET4. Next, the PCR-amplified CBD gene was inserted in the Ssp DnaE gene in pMET4 between the NheI and BamHI sites to give pME(B)T4, following a similar protocol described previously (Chong et al., 1998a) (Fig. 3). Similar procedures were used for the synthesis of the Pho PolII intein and the Pho RadA intein genes. Based on the sequence homology with the naturally split Ssp DnaE intein, the N-terminal fragment sequences of the Pho inteins were predicted to be the first 129 residues for the Pho PolII intein and the first 139 residues for the Pho RadA intein. The Cterminal fragment sequences were predicted to be the last 24 residues for the Pho PolII intein and 29 residues for the Pho RadA intein. The inteins and the CBD were cloned into pMY(B)T4 at the same sites as those for the Ssp DnaE intein, i.e. the N-terminal fragment between the XhoI and NheI sites, the C-terminal fragment between the BamHI and AgeI sites and insertion of the CBD between the NheI and BamHI sites. Amino acid substitutions at splicing junctions and addition of extein residues (five residues for each extein except for the Pho RadA mini-intein which had seven Nextein residues) were made by using different oligonucleotides containing the mutations for the synthesis of the miniinteins. The synthesized mini-inteins and their mutants were verified by DNA sequencing. 2.2. Protein splicing and cleavage of mini-inteins Cloning the synthetic mini-intein genes into pMY(B)T4 to replace the modified Sce VMA intein (Y(B)) yielded the
A. Zhang et al. / Gene 275 (2001) 241–252
Fig. 2. Synthesis of the Ssp DnaE mini-intein gene with overlapping oligonucleotides containing codons optimized for E. coli expression. The sequences of overlapping forward (Fw) and reverse (Rv) oligonucleotides are labeled above and below the synthetic Ssp DnaE mini-intein sequence. The oligonucleotides Fw1, Fw6, Rv5 and Rv8 contain extra sequences in their 5 0 region which incorporate restriction sites to facilitate cloning (see Section 2.1 for details).
mini-inteins that were fused between the E. coli maltosebinding protein (MBP) and T4 DNA ligase. The fusion proteins were expressed in E. coli and purified following the same protocol as described previously (Chong et al., 1998b). Protein splicing and cleavage of the wild-type and mutant mini-inteins were examined by SDS-PAGE of the purified protein products following essentially similar procedures as those for the MYK and MYT4 fusion proteins (Chong et al., 1998b). 2.3. Thiol-induced N-terminal cleavage of the modified Ssp DnaE intein The modified Ssp DnaE intein (E(B)), containing the CBD insertion and the C-terminal asparagine-to-alanine substitution, was cloned into the XhoI and AgeI sites in pMYK(X 21) (Chong et al., 1998b) to yield pME(B)K(X 21). The pME(B)K(X 21) vector allowed the modified Ssp DnaE intein to be flanked by each of the 20 amino acid residues at the 21 position. Further replacing the MBP gene (between NdeI and XhoI sites) with the T4 DNA ligase gene (T4) and the T4 PNK gene (between AgeI and PstI sites) with a short linker containing an in-frame stop codon yielded pT4E(B)
(X 21) vectors. pT4E(B) (X 21) vectors were expressed in E. coli and T4 DNA ligase was purified on a chitin column following previously described procedures (Chong et al., 1997). The final yields of T4 DNA ligase from each pT4E(B) (X 21) construct were used to estimate DTTinduced N-terminal cleavage efficiency. 2.4. Construction of the intein-GFP fusion vector The gene for GFP was amplified by PCR from pRDW13 (a gift of Dr Richard Whitaker, NEB) containing a copy of a GFP gene variant suitable for soluble expression in E. coli (Crameri et al., 1996). The forward primer, 5 0 -GGT GGT ACC GGT AGT AAA GGA GAA GAA CTT TTC ACT GGA GTT-3 0 , contains an AgeI site (underlined), and the reverse primer, 5 0 -GGT GGT CTG CAG TCA TTT GTA GAG CTC ATC CAT GCC ATG TGT-3 0 , contains a PstI site (underlined). PCR mixtures (100 ml) contained Vent DNA polymerase buffer (NEB), 4 mM MgSO4, 400 mM of each dNTP, 1 mM of each primer, 50 ng pRDW13 DNA and 1.0 units of Vent DNA polymerase (NEB). Amplification was carried out at 958C for 1 min, 558C for 1 min and 728C for 2 min for 25 cycles. The PCR products
A. Zhang et al. / Gene 275 (2001) 241–252
Fig. 3. Construction of the intein-GFP fusion vectors for expression and purification of target proteins. Black box, maltose-binding protein (MBP); gray box, the Ssp DnaE mini-intein; striped box, T4 DNA ligase; dotted box, T4 PNK.
A. Zhang et al. / Gene 275 (2001) 241–252
were gel-purified and after double-digestion with AgeI and PstI were directly ligated to a gel-purified AgeI/PstI-doubledigested pT4E(B) vector (containing Gly at the 21 position and T7 promoter) to yield pT4EG1. Using pT4EG1 as a template, site-directed mutagenesis was performed to introduce a silent mutation in the GFP gene to eliminate an internal NdeI site, yielding pT4EG2. The unique NdeI and XhoI sites (flanking the T4 DNA ligase gene) in pT4EG2 were used to clone all target genes (Fig. 3). Alternatively, a multiple cloning site was inserted into the NdeI and XhoI sites of pT4EG2, yielding pTEG vector (data not shown). 2.5. Cloning of target genes Genes for the target proteins MBP, T4 gene 32 product, T4 DNA ligase, T4 uvsX protein, T4 polymerase accessory protein, T4 PNK, HhaI methylase (M-Hha), human caspase3, yeast phosphoglycerate kinase (PGK), Cre recombinase, human basic FGF and mouse IL-4 were amplified by PCR from plasmids constructed previously in our laboratory. The a-galactoseaminidase gene was obtained from Dr Ellen Guthrie (NEB), the genes for restriction endonucleases BamHI and XhoI were obtained from Drs Bill Jack and Shuanyong Xu (NEB) and the human a-1-antitrypsin gene was obtained from Dr Charles Cooney (Massachusetts Institute of Technology). The forward and reverse primers incorporated NdeI and XhoI sites, respectively. The amplified fragment was digested by NdeI and XhoI and ligated directly to NdeI/XhoI-digested pT4EG2. 2.6. Fusion protein expression and purification The fusion proteins were expressed in E. coli strain ER2566 (NEB) and purified following the same protocol as described previously (Chong et al., 1997). The fusion proteins containing BamHI and XhoI were expressed in the strain that contained a plasmid expressing the BamHI and XhoI methylases, respectively. All samples were analyzed by SDS-PAGE using 12% Tris–glycine gels (Novex, San Diego, CA). Protein concentrations were estimated by the method of Bradford (1976). 2.7. Fluorescence measurements Fluorescences of whole cells from the cultures after IPTG induction (OD600 ranging from 2.7 to 4.2) and soluble cell lysate (supernatant) were measured using a Perkin-Elmer LS50B Luminescence Spectrometer (excitation, 395 nm; emission, 510 nm; each with 2.5 nm bandwidth). As a control, fluorescence measurements were taken from cells containing pT4E(B) without the GFP fusion that were grown and induced under the same conditions, and the fluorescence values were subtracted from those of cells containing the GFP fusion. We also measured the fluorescence of the induced cultures after a serial dilution and found that the fluorescence was proportional to the OD600 value within the range of 0.4–4.2 (data not shown). The fluorescence values
of different target proteins were thus normalized at an OD600 value of 1.0 (see Table 3).
3. Results 3.1. Protein splicing/cleavage of the synthetic mini-inteins Examination of protein splicing and cleavage of the synthetic mini-inteins followed similar strategies used for the previous studies of other inteins in which the miniinteins were fused between two extein domains. The fusion protein, ME(B)T4, contained the N-extein domain MBP which allowed in vitro purification and analysis of any protein splicing/cleavage products that contained the MBP moiety. For instance, protein splicing of ME(B)T4 produced MT4 and E(B), whereas N-terminal cleavage of ME(B)T4 produced M and E(B)T4 and C-terminal cleavage produced ME(B) and T4. The splicing/cleavage products, M, M(E) and MT4, were purified on an amylose column and identified on SDS-PAGE (data not shown). Protein splicing/cleavage activities of three synthetic mini-inteins are summarized in Table 1. Insertion of the CBD inside the mini-inteins facilitated analysis of splicing/cleavage products on a chitin column and allowed both termini of the mini-intein to accept different extein domains. The effect of extein residues on protein splicing/cleavage was examined by adding native extein residues to both termini of the mini-inteins. Previous studies have shown that the naturally split Ssp DnaE mini-intein can catalyze efficient cis-splicing in the presence of several extein residues (Evans et al., 2000). Our data (Table 1) indicate that protein splicing of the synthetic Ssp DnaE mini-intein is efficient (.90%) regardless of the presence of the extein residues (five residues at each splice junction). Protein splicing of the Pho PolII and RadA mini-inteins has not been previously examined. Here our data (Table 1) showed that the Pho RadA mini-intein was capable of efficient splicing only in the presence of seven Nextein and five C-extein residues whereas the Pho PolII miniintein could not splice with or without the extein residues. The thiol-inducible N-terminal cleavage activity was examined using the mutant mini-inteins in which the C-terminal asparagines have been changed into alanine. Again, the Ssp DnaE mini-intein allowed efficient thiol-induced N-terminal cleavage both in the presence and in the absence of extein residues (Table 1, Fig. 4). The Pho RadA mini-intein could be induced by thiols to cleave at its N-terminus only in the presence of extein residues whereas the Pho PolII mini-intein was not capable of the thiol-inducible cleavage reaction. The C-terminal cleavage activities of the mini-inteins were examined by changing the penultimate histidine to glutamine and the first C-extein residue to alanine, similar to our previous study of the Sce VMA intein (Chong et al., 1998a), or by changing the first cysteine residue to alanine as in the case of the Ssp DnaB intein (Mathys et al., 1999). All three mini-
A. Zhang et al. / Gene 275 (2001) 241–252
Table 1 Protein splicing and splice junction cleavage of three synthesized mini-inteins a Mini-intein
In vivo splicing
Yes No Yes No Yes No
1 1 2 2 1 2
1 1 2 2 1 2
ND c 2 2d 2 2 2
Pho PolII Pho RadA
a Protein splicing and cleavage were examined by SDS-PAGE analyses of the fusion precursors and splicing/cleavage products in both crude cell extracts and amylose purified proteins. 1, more than 90% of the precursor underwent in vivo splicing, DTT-induced N-terminal cleavage or C-terminal cleavage; 2, less than 10% of the precursor underwent splicing/cleavage. See Section 3.1 for details. b Five native extein residues at each splice junction except for the Pho RadA mini-intein which had seven N-extein residues. c ND, not determined. d C-terminal cleavage (.90%) was only observed in vivo in the wild-type Pho PolII mini-intein.
inteins containing the aforementioned mutations exhibited no in vivo and in vitro (thiol-inducible) cleavage activity at the C-termini (Table 1). However, the wild-type Pho PolII mini-intein flanked by extein residues catalyzed efficient (.90%) C-terminal cleavage in vivo (data not shown). Unlike other inteins, the Pho PolII mini-intein has a glutamine as the C-terminal residue. The above result suggests that the C-terminal cleavage of the Pho PolII mini-intein might be mediated by a mechanism similar to the succinimide formation of the C-terminal asparagines residue in other inteins (Chong et al., 1996; Xu and Perler, 1996). 3.2. Efficient thiol-inducible cleavage at the N-terminus of the Ssp DnaE mini-intein Efficient thiol-inducible N-terminal cleavage was observed in both the Ssp DnaE and the Pho RadA miniinteins (Table 1). However, the Pho RadA mini-intein required the presence of native extein residues, which limited its use for purification of different proteins. The
Ssp DnaE mini-intein, on the other hand, could catalyze efficient thiol-induced N-terminal cleavage without native extein residues. The N-terminal cleavage activity was thus examined in more detail using the ME(B)K fusion context. The fusion protein was purified on amylose resin and the DTT-induced N-terminal cleavage reactions were conducted in vitro at different temperatures (Fig. 4). As shown in Fig. 4, the fusion protein ME(B)K, with glycine as the 21 residue, underwent minimal in vivo cleavage and was very stable at different temperatures in the absence of DTT. After incubation with DTT, the fusion protein was cleaved efficiently at all three temperatures (Fig. 4). The effect of the 21 residue on the DTT-induced N-terminal cleavage was examined in pT4E(B) (X 21) in which each of the 20 amino acid residues was placed at the 21 position. The fusion proteins were purified on chitin resin and T4 DNA ligase as a target protein was eluted after the DTTinduced cleavage reaction. Since all fusion proteins were expressed at similar levels and without significant in vivo cleavage (except aspartate and valine as the 21 residues),
Fig. 4. SDS-PAGE gel showing DTT-induced N-terminal cleavage of the modified Ssp DnaE mini-intein. The fusion protein, ME(B)K, was expressed and purified on an amylose column. Lane 1, molecular weight marker (molecular weights (kDa) shown on the left); lane 2, supernatant; lane 3, flowthrough; lane 4, purified fusion protein after elution; lanes 5–7, the fusion protein was incubated at different temperatures in the absence of DTT; lanes 8–10, the fusion protein was incubated with 50 mM DTT at different temperatures; M(E)K, the fusion protein of MBP (M, black box), CBD-tagged intein (E(B), gray box) and T4 PNK (K, dotted box). The arrow indicates the cleavage site.
A. Zhang et al. / Gene 275 (2001) 241–252
Table 2 Effect of the 21 residue on the thiol-induced N-terminal cleavage of the modified Ssp DnaE mini-intein Residues at the 21 position
Estimated cleavage efficiency (%) a
Gly, Gln, Ala Tyr, Leu, Met, Ser, His, Phe Ile, Asn, Cys, Thr, Lys, Arg Trp, Glu, Pro Asp, Val
80–100 50–79 20–49 1–19 Not determined due to significant in vivo cleavage
a The cleavage reaction was conducted in the presence of 50 mM DTT at 48C for 16 h. The cleavage efficiencies for most residues were higher at 238C (data not shown).
the DTT-induced cleavage efficiencies were estimated by the final yields of T4 DNA ligase. As shown in Table 2, most of the 20 amino acid residues allowed efficient DTTinduced N-terminal cleavage. Aspartate and valine residues at the 21 position resulted in more than 80% in vivo (Nterminal) cleavage (data not shown). The data suggest that the synthetic Ssp DnaE mini-intein, when fused to different target proteins at its N-terminus, could catalyze efficient thiol-induced cleavage. Based on the above observations, the synthetic Ssp DnaE mini-intein was chosen for construction of the intein-GFP vector. 3.3. Expression and purification of target proteins in the intein-GFP fusion vector The intein-GFP fusion vector contained the modified synthetic Ssp DnaE mini-intein gene fused at the C-terminus to GFP and at the N-terminus to a target gene. In pT4EG, the
target gene (T4 DNA ligase) was cloned between the NdeI and XhoI sites. All other target genes were also cloned into the NdeI and XhoI sites that used ATG of the NdeI site as the translational start and the same additional residues (leucineglutamate-glycine) placed between the target genes and the mini-intein. In all cases, glycine was the 21 residue and the cleavage reactions were conducted at 238C to ensure efficient N-terminal cleavage. A total of 17 target genes were cloned into the intein-GFP fusion vector (Table 3) and the fusion proteins exhibited different expression profiles. The first group of target proteins, including MBP, T4 DNA ligase, T4 gene 32 product, BamHI, Cre recombinase, a-galactoseaminidase, glutathione S-transferase (GST), T4 uvsX protein, yeast PGK and XhoI, were expressed at high levels and were soluble as fusion proteins (Table 3). For instance, the fusion proteins for T4 DNA ligase and Cre recombinase were observed as major components in almost equal amounts in both whole cell lysate and supernatant (Fig. 5, lanes 1 and 2). The second group of target proteins included M-Hha, T4 PNK, human caspase-3, T4 polymerase accessory protein and mouse IL-4. These proteins were expressed at high levels as judged by SDS-PAGE analysis of the whole cell lysate, but the fusion proteins were not completely soluble as indicated by decreased amounts of the fusion proteins in the supernatant (Fig. 5, lanes 1 and 2 of M-Hha, PNK and caspase-3). In the whole cell lysate of PNK and caspase-3 (Fig. 5, lane 1), a band migrating at the molecular weight of the purified PNK or caspase-3 was observed, suggesting that in vivo cleavage of the fusion protein occurred. The fusion protein containing mouse IL-4 was almost completely insoluble (data not shown) and no detectable mouse IL-4 protein was obtained after purification (Table 3). The third group of
Table 3 A list of target proteins expressed and purified in the intein-GFP vectors Target proteins
Molecular weight (kDa)
Whole cells fluorescence per OD600 a
Supernatant fluorescence per OD600 a
Final yields (mg/l)
Final yields (mg/l) per molecular weight
MBP T4 DNA ligase T4 gene 32 product BamHI Cre recombinase a-galactoseaminidase GST T4 uvsX protein Yeast PGK HhaI methylase XhoI T4 PNK Human bFGF Human a-1-antitrypsin Human caspase-3 T4 polymerase accessory protein Mouse IL4
42 55 34 25 39 50 27 44 45 40 27 35 17 44 29 36 14
109.6 ^ 6.0 44.9 ^ 2.3 60.6 ^ 4.1 52.0 ^ 4.1 21.7 ^ 2.8 23.0 ^ 2.1 34.5 ^ 1.5 30.1 ^ 2.8 21.4 ^ 0.9 15.9 ^ 1.1 18.2 ^ 2.7 8.6 ^ 0.3 8.6 ^ 0.7 1.7 ^ 1.5 4.5 ^ 0.3 6.7 ^ 1.3 4.4 ^ 0.3
1678 ^ 4 457 ^ 21 823 ^ 15 696 ^ 6 204 ^ 8 149 ^ 5 464 ^ 12 194 ^ 7 122 ^ 3 175 ^ 6 111 ^ 3 21 ^ 2 59 ^ 4 11 ^ 4 6^1 13 ^ 1 14 ^ 1
30.0 16.5 14.1 8.8 7.6 7.2 6.8 6.3 5.2 4.6 3.6 1.9 1.0 0.7 0.5 0.3 ND b
0.71 0.30 0.42 0.36 0.20 0.14 0.25 0.14 0.12 0.12 0.13 0.055 0.060 0.016 0.017 0.001 ND
The fluorescence was determined as the average fluorescence of three samples taken from the whole cells or supernatants. A standard deviation was derived from these three determinations. b ND, not determined.
A. Zhang et al. / Gene 275 (2001) 241–252
Fig. 5. SDS-PAGE gel showing the expression and purification of several target proteins in the intein-GFP fusion vector. Lane 1, whole cell lysate; lane 2, supernatant; lane 3, purified target protein after intein cleavage and elution. Target proteins include T4 DNA ligase (T4 ligase), Cre recombinase (Cre), HhaI methylase (M-Hha), T4 PNK and human caspase-3 (Caspase-3). For T4 ligase, Cre and M-Hha, the arrows indicate the positions of the expressed precursor proteins. For PNK and Caspase-3, the upper arrows indicate the position of the expressed precursor proteins and the lower arrows indicate the position of in vivo cleaved target proteins. Molecular weights (kDa) are shown on the left.
target proteins, including human basic FGF and a-1-antitrypsin, was poorly expressed and the solubility of the fusion proteins was not evident from the SDS-PAGE analysis (data not shown). The fusion proteins were purified on chitin resin and target proteins were eluted after DTT-induced cleavage. To ensure efficient cleavage, incubation with DTT was conducted at 238C for 16 h. This resulted in .95% cleavage efficiency for the highly expressed target proteins
mentioned above. Incubation with DTT at 48C resulted in lower cleavage efficiencies for some but not all target proteins (data not shown). The fusion proteins containing T4 PNK, human caspase-3, T4 polymerase accessory protein, human basic FGF and a-1-antitrypsin were cleaved at approximately 50–70% efficiency (data not shown). All target proteins were purified with .90% purity (Fig. 5, lane 3). The data indicated that fusion of a C-terminal GFP and insertion of CBD to the synthetic Ssp DnaE mini-intein have
Fig. 6. Correlation of soluble expression of fusion proteins and the final yields of target proteins with the fluorescence of whole cells.
A. Zhang et al. / Gene 275 (2001) 241–252
no significant effect on the thiol-induced cleavage activity of the mini-intein and that the modified mini-intein could be used effectively to purify recombinant proteins. 3.4. Correlation of fluorescence with protein expression, fusion protein solubility and target protein final yields The fluorescence of both whole cells and supernatant was measured and its correlation with final yields of the target proteins is shown in Table 3. Highly expressed proteins, such as MBP and T4 DNA ligase, resulted in high fluorescence whereas poorly expressed proteins, such as human basic FGF and a-1-antitrypsin, had low fluorescence. Target proteins such as T4 PNK, human caspase-3, T4 polymerase accessory protein and mouse IL4, though expressed at relatively high levels in the whole cell lysate, had low fluorescence. Most of the fusion proteins of these target proteins were insoluble. The whole cell fluorescence is correlated positively with the fluorescence of the supernatant (a direct measurement of the fluorescence of the soluble fusion protein), suggesting that the whole cell fluorescence reflects the expression level of soluble fusion proteins (Fig. 6A). After normalizing with the molecular weights, the final yields of the target proteins exhibited a linear relationship with the whole cell fluorescence (Fig. 6B). 4. Discussion Traditional affinity purification of recombinant proteins often includes extra steps to remove the affinity tag from the target protein by proteolytic cleavage. Introduction of a selfcleavable intein as a fusion partner has eliminated use of protease treatment making affinity purification a single-step process to obtain pure protein free of tag. Several intein systems have previously been developed (Xu et al., 2000). A target protein can be fused to either the N-terminus or Cterminus of an intein. Induction of the intein-mediated cleavage can be achieved by addition of thiol reagent or changes in pH and/or temperature. The focus of the study has been (1) improving the expression of the intein fusion protein and (2) increasing the efficiency of the intein-mediated cleavage. Mini-inteins, due to their smaller molecular weights, can potentially help to improve the expression. However, the cleavage efficiency of mini-inteins is affected significantly by target protein sequences and/or residues adjacent to the cleavage site (Mathys et al., 1999; Southworth et al., 1999). In this study, we examined the splicing/cleavage activities of three mini-inteins, which may provide certain advantages over other mini-inteins. All three mini-inteins were synthesized using E. coli optimized codon usage. Insertion of an affinity tag inside the mini-intein sequence leaves open the options for utilizing cleavage activity at both termini of the mini-intein. The thiol-induced N-terminal cleavage of the synthetic Ssp DnaE mini-intein was efficient with a broad range of residues at the 21 position (Table 2). In fact, half times of the N-terminal cleavage
reaction for most of the 20 amino acid residues were less than 2 h at 238C (data not shown). Similar cleavage efficiency was also observed in the trans-cleavage reaction of the Ssp DnaE intein (Martin et al., 2001). Possibly due to a combination of efficient cleavage and characteristics (e.g. optimized codon usage, lack of an endonuclease domain) of the synthetic Ssp DnaE mini-intein, target proteins purified in the intein-GFP system generally resulted in a higher yield than that in the Sce VMA intein system (Chong et al., 1997, 1998a), even though the fusion partners in both systems have similar molecular weights (Ssp DnaE intein 1 CBD 1 GFP: 50 kDa; Sce VMA intein 1 CBD: 55 kDa). For instance, T4 DNA ligase, T4 gene 32 product and BamHI were produced in higher yields in the intein-GFP system (16.5, 14.1 and 8.8 mg/l culture, respectively, Table 3) than in the Sce VMA intein system (8.4, 6.0 (Chong et al., 1998a) and 1.0 mg/l culture (Chong et al., 1997), respectively). Direct comparison of the intein-GFP system with other mini-intein systems (Mathys et al., 1999; Southworth et al., 1999) is not yet possible. The synthetic mini-inteins will serve as a starting material for further intein engineering, e.g. modification of intein catalytic residues to allow thiol-induced cleavage at the C-terminus of the Ssp DnaE mini-intein, control the C-terminal cleavage activity of the Pho PolII mini-intein (Table 1), etc. As different inteins often favor different splice junction sequences/residues for efficient splicing/cleavage, characterization of more miniinteins may enhance the application of intein-mediated purification for a variety of target proteins. The thiol-induced cleavage at the intein N-terminus is mediated by the intein-catalyzed N-S acyl rearrangement reaction (Chong et al., 1996, 1998b; Xu and Perler, 1996). This requires correct folding of both target protein and the intein to create a favorable conformation at the splice junctions. Misfolded target protein/intein would inhibit the N-S acyl rearrangement reaction thereby decreasing the thiolinduced cleavage efficiency and/or expose the splice junctions to hydrolysis resulting in in vivo cleavage. Misfolded proteins produced in E. coli often form insoluble aggregates (inclusion bodies). Only soluble proteins can be effectively purified by the intein fusion systems. Misfolded or poorly expressed target proteins often resulted in poor yields after intein-mediated purification and using previous intein systems (Chong et al., 1998a), protein expression and folding can only be examined after completion of the purification and electrophoretic analysis. The intein-GFP system combines the activities of the intein as a self-cleavable domain for protein purification and GFP as an indicator for protein folding. It was designed to monitor soluble expression of the fusion protein before cell lysis and electrophoretic analysis and allow subsequent purification of the target protein without separately cloning/ expressing the target gene or proteolytically removing the GFP tag. In the intein-GFP fusion protein, GFP was fused as a C-terminal domain to the mini-intein-CBD-target protein fusion and as shown in a previous study, a C-terminal GFP
A. Zhang et al. / Gene 275 (2001) 241–252
domain could serve as an effective ‘reporter’ for correct folding of upstream domains (Waldo et al., 1999). The data in this study indicate that highly soluble fusion proteins gave higher fluorescence than those that were less soluble or insoluble. Soluble fusion proteins were more likely to produce correctly folded and therefore active target proteins. For instance, purified T4 DNA ligase, Cre recombinase, BamHI and XhoI were found to be fully active (data not shown). Fusion proteins that showed low fluorescence (e.g. T4 PNK, human caspase-3, T4 polymerase accessory protein, human basic FGF and a-1-antitrypsin) also resulted in lower cleavage efficiency and/or in vivo cleavage (Fig. 6, lane 1 for PNK and caspase-3) suggesting that the C-terminal GFP domain could also serve as an effective indicator of intein folding and activity. The versatility of the intein-GFP fusion system has many potential applications. Using GFP fluorescence as a convenient ‘reporter’, one can screen for culture conditions, strain/ vector variants, mutations, etc. that improve soluble expression of a particular fusion protein. Once an optimized condition is found, the target protein can be directly purified on a single column. The expression of human caspase-3 in the intein-GFP vector resulted in mostly insoluble fusion protein (Table 3). Using error-prone PCR to mutate the caspase-3 gene, we have isolated several mutant forms of caspase-3 that resulted in the expression of soluble fusion proteins and up to a ten-fold increase in final yields (Zhang, unpublished data). Alternatively, the system can be used to screen for soluble expression of a large number of recombinant proteins for rapid purification in high throughput experiments. The intein-GFP system may allow us to predict final yields of target proteins directly from whole cell fluorescence before going through the actual purification process. As whole cell fluorescence reflects expression of soluble fusion proteins, direct correlation between whole cell fluorescence and final yields is determined by three factors: (1) minimal in vivo cleavage of the fusion protein; (2) efficient binding of the fusion protein to affinity resin; and (3) high intein-mediated cleavage efficiency. For highly expressed and soluble target proteins (e.g. MBP, T4 DNA ligase, etc., Table 3), we estimated that more than 90% of the fusion protein was bound to the chitin resin with no significant in vivo cleavage and more than 95% of the bound protein was cleaved after incubation with DTT at 238C (data not shown). In the case of the insoluble or poorly expressed fusion proteins (e.g. T4 PNK and human caspase-3), we observed in vivo cleavage, inefficient binding to chitin resin and low DTT-induced cleavage efficiency. These observations are consistent with the data shown in Fig. 6 and Table 3 in which a linear correlation between whole cell fluorescence and final yields (normalized by molecular weights) is more evident at high fluorescence levels than at low fluorescence levels. Expression of correctly folded, soluble fusion protein led to minimal in vivo cleavage, efficient binding to affinity resin and high intein-mediated cleavage efficiency (Fig. 5). It is apparent
that more target proteins should be tested in the intein-GFP system in order to refine the correlation between the whole cell fluorescence and final protein yields. Acknowledgements We would like to thank Drs Richard Roberts, Lise Raleigh, Francine Perler, Ming-Qun Xu and Tom Evans for valuable discussions and reading of the manuscript, Drs Richard Whitaker, Bill Jack, Ellen Guthrie, Shuangyong Xu and Charles Cooney for providing some of the target genes, and Dr Donald G. Comb for encouragement. This work was supported by NIH grant GM 57734 and New England Biolabs. References Bradford, M.M., 1976. A rapid and sensitive method for the quantitation of microgram quantities of protein utilizing the principle of protein-dye binding. Anal. Biochem. 72, 248–254. Cha, H.J., Wu, C.F., Valdes, J.J., Rao, G., Bentley, W.E., 2000. Observations of green fluorescent protein as a fusion partner in genetically engineered Escherichia coli: monitoring protein expression and solubility. Biotechnol. Bioeng. 67, 565–574. Chalfie, M., Tu, Y., Euskirchen, G., Ward, W.W., Prasher, D.C., 1994. Green fluorescent protein as a marker for gene expression. Science 263, 802–805. Chong, S., Xu, M.Q., 1997. Protein splicing of the Saccharomyces cerevisiae VMA intein without the endonuclease motifs. J. Biol. Chem. 272, 15587–15590. Chong, S., Shao, Y., Paulus, H., Benner, J., Perler, F.B., Xu, M.Q., 1996. Protein splicing involving the Saccharomyces cerevisiae VMA intein. The steps in the splicing pathway, side reactions leading to protein cleavage, and establishment of an in vitro splicing system. J. Biol. Chem. 271, 22159–22168. Chong, S., Mersha, F.B., Comb, D.G., Scott, M.E., Landry, D., Vence, L.M., Perler, F.B., Benner, J., Kucera, R.B., Hirvonen, C.A., Pelletier, J.J., Paulus, H., Xu, M.Q., 1997. Single-column purification of free recombinant proteins using a self-cleavable affinity tag derived from a protein splicing element. Gene 192, 271–281. Chong, S., Montello, G.E., Zhang, A., Cantor, E.J., Liao, W., Xu, M.Q., Benner, J., 1998a. Utilizing the C-terminal cleavage activity of a protein splicing element to purify recombinant proteins in a single chromatographic step. Nucleic Acids Res. 26, 5109–5115. Chong, S., Williams, K.S., Wotkowicz, C., Xu, M.Q., 1998b. Modulation of protein splicing of the Saccharomyces cerevisiae vacuolar membrane ATPase intein. J. Biol. Chem. 273, 10567–10577. Cody, C.W., Prasher, D.C., Westler, W.M., Prendergast, F.G., Ward, W.W., 1993. Chemical structure of the hexapeptide chromophore of the Aequorea green-fluorescent protein. Biochemistry 32, 1212–1218. Crameri, A., Whitehorn, E.A., Tate, E., Stemmer, W.P., 1996. Improved green fluorescent protein by molecular evolution using DNA shuffling. Nat. Biotechnol. 14, 315–319. Devereux, J., Haeberli, P., Smithies, O., 1984. A comprehensive set of sequence analysis programs for the VAX. Nucleic Acids Res. 12, 387–395. Duan, X., Gimble, F.S., Quiocho, F.A., 1997. Crystal structure of PI-SceI, a homing endonuclease with protein splicing activity. Cell 89, 555– 564. Evans Jr., T.C., Benner, J., Xu, M.Q., 1999. The in vitro ligation of bacterially expressed proteins using an intein from Methanobacterium thermoautotrophicum. J. Biol. Chem. 274, 3923–3926.
A. Zhang et al. / Gene 275 (2001) 241–252
Evans Jr., T.C., Martin, D., Kolly, R., Panne, D., Sun, L., Ghosh, I., Chen, L., Benner, J., Liu, X.Q., Xu, M.Q., 2000. Protein trans-splicing and cyclization by a naturally split intein from the dnaE gene of Synechocystis species PCC6803. J. Biol. Chem. 275, 9091–9094. Hirata, R., Ohsumk, Y., Nakano, A., Kawasaki, H., Suzuki, K., Anraku, Y., 1990. Molecular structure of a gene, VMA1, encoding the catalytic subunit of H(1)-translocating adenosine triphosphatase from vacuolar membranes of Saccharomyces cerevisiae. J. Biol. Chem. 265, 6726– 6733. Kane, P.M., Yamashiro, C.T., Wolczyk, D.F., Neff, N., Goebl, M., Stevens, T.H., 1990. Protein splicing converts the yeast TFP1 gene product to the 69-kD subunit of the vacuolar H(1)-adenosine triphosphatase. Science 250, 651–657. Kawarabayasi, Y., et al., 1998. Complete sequence and gene organization of the genome of a hyper-thermophilic archaebacterium, Pyrococcus horikoshii OT3 (supplement). DNA Res. 5, 147–155. LaVallie, E.R., McCoy, J.M., 1995. Gene fusion expression systems in Escherichia coli. Curr. Opin. Biotechnol. 6, 501–506. Martin, D.D., Xu, M.Q., Evans, T.C., 2001. Characterization of a naturally occurring trans-splicing intein from Synechocystis sp. PCC6803. Biochemistry 40, 1393–1402. Mathys, S., Evans, T.C., Chute, I.C., Wu, H., Chong, S., Benner, J., Liu, X.Q., Xu, M.Q., 1999. Characterization of a self-splicing mini-intein and its conversion into autocatalytic N- and C-terminal cleavage elements: facile production of protein building blocks for protein ligation. Gene 231, 1–13. Perler, F.B., 2000. InBase, the intein database. Nucleic Acids Res. 28, 344– 345. Perler, F.B., Davis, E.O., Dean, G.E., Gimble, F.S., Jack, W.E., Neff, N., Noren, C.J., Thorner, J., Belfort, M., 1994. Protein splicing elements: inteins and exteins – a definition of terms and recommended nomenclature. Nucleic Acids Res. 22, 1125–1127. Prasher, D.C., Eckenrode, V.K., Ward, W.W., Prendergast, F.G., Cormier,
M.J., 1992. Primary structure of the Aequorea victoria green-fluorescent protein. Gene 111, 229–233. Reid, B.G., Flynn, G.C., 1997. Chromophore formation in green fluorescent protein. Biochemistry 36, 6786–6791. Southworth, M.W., Amaya, K., Evans, T.C., Xu, M.Q., Perler, F.B., 1999. Purification of proteins fused to either the amino or carboxy terminus of the Mycobacterium xenopi gyrase A intein. Biotechniques 27, 110–120. Southworth, M.W., Benner, J., Perler, F.B., 2000. An alternative protein splicing mechanism for inteins lacking an N-terminal nucleophile. EMBO J. 19, 5019–5026. Telenti, A., Southworth, M., Alcaide, F., Daugelat, S., Jacobs Jr., W.R., Perler, F.B., 1997. The Mycobacterium xenopi GyrA protein splicing element: characterization of a minimal intein. J. Bacteriol. 179, 6378– 6382. Waldo, G.S., Standish, B.M., Berendzen, J., Terwilliger, T.C., 1999. Rapid protein-folding assay using green fluorescent protein. Nat. Biotechnol. 17, 691–695. Watanabe, T., Ito, Y., Yamada, T., Hashimoto, M., Sekine, S., Tanaka, H., 1994. The roles of the C-terminal domain and type III domains of chitinase A1 from Bacillus circulans WL-12 in chitin degradation. J. Bacteriol. 176, 4465–4472. Womble, D.D., 2000. GCG: the Wisconsin Package of sequence analysis programs. Methods Mol. Biol. 132, 3–22. Wu, H., Hu, Z., Liu, X.Q., 1998a. Protein trans-splicing by a split intein encoded in a split DnaE gene of Synechocystis sp. PCC6803. Proc. Natl. Acad. Sci. USA 95, 9226–9231. Wu, H., Xu, M.Q., Liu, X.Q., 1998b. Protein trans-splicing and functional mini-inteins of a cyanobacterial dnaB intein. Biochim. Biophys. Acta 1387, 422–432. Xu, M.Q., Perler, F.B., 1996. The mechanism of protein splicing and its modulation by mutation. EMBO J. 15, 5146–5153. Xu, M.Q., Paulus, H., Chong, S., 2000. Fusions to self-splicing inteins for protein purification. Methods Enzymol. 326, 376–418.