vISILOGY 153, 1-11 (1986)
The Herpes Simplex Virus Type 2 Equivalent of the Herpes Simplex Virus Type 1 US7 Gene and Its Flanking Sequences T . C . HODGMANI AND A . C . MINSON Division of Virology, Department of Pathology, University of Cambridge, United Kingdom Received January 28, 1986; accepted April 1.5, 1986
Nucleotide sequencing studies (D . J . McGeoch, A . Dolan, S . Donald, and F. Rixon, 1985, J. Mo? BioL 181, 1-14) have indicated that herpes simplex virus type I (HSV-1) has a
coding sequence, referred to as US7, between the genes for the glycoproteins D and E (gD and gE) . Northern blot analysis and nucleotide sequencing have been carried out to show that the type 2 virus (HSV-2) has an equivalent to the US7 gene . A comparison with the HSV-1 sequence has revealed some surprising similarities and differences . At the nucleotide level, HSV-2 has inserted a large sequence into the gE promoter, retained a large palindrome present in the coding sequence but not some tandem repeats, and deleted a region beside those repeats. At the amino acid level, the putative transmembrane sequence has been remarkably well conserved, and hydrophobic moment analysis indicates that it could he interacting with polar species within the plane of the membrane . Immediately after the deletion in the HSV-2 sequence, there is an N-glyeosylation signal, and HSV-2 has one more such signal than HSV-1 . The longest conserved sequence at the nucleotide level codes for a region of polypeptide that is strongly predicted to fold into a-helix . Implications of these analyses to the structure and possible function of these molecules are discussed . ®1996 Aoadendc Prove, Inc .
and glycoprotein E (gE) genes at the map position 0 .919-0 .927 (see Fig. 1) . It is 1 .6 kb long and is the smallest member of a nested set of transcripts which also include the US5 and gD messengers . The product of the gene has not been formally identified . However, int vitro translation of mRNAs, selected by restriction fragments including the US7 gene, produced a 50-kDa species as well as gD (Lee et at, 1982) . The predicted size of 41 kDa is not inconsistent, because other HSV proteins have been shown during electrophoresis to migrate more slowly than expected (McKnight, 1980 ; Watson et at, 1982) . Laskey and Dowbenko (1984) sequenced the gD gene of HSV-2 and discovered the start of an open reading frame analogous to the HSV-1 US7 gene . This communication presents transcript mapping data and the nucleotide sequence of HSV-2 strain 333 between the termination codon of gD and the initiation colon of gE . The data are consistent with the notion that HSV-2 possesses a nested set of transcripts like
The membrane glycoproteins of herpes simplex virus (HSV) play a leading role in many aspects of the pathology and immunology of the virus . However, their total number and precise function are not known . Spear (1984) describes five HSV glycoproteins (gB, gC, gD, gE, and gG) but additional glycoproteins have recently been described (Buckmaster et at, 1984 ; Richman et at, 1986), and sequencing studies of the short unique region of HSV-1 have revealed three open-reading frames which probably code for previously undefined glycoproteins (McGeoch et at, 1985) . One of these, referred to as U57, was unusual in having tandem repeats within its coding region . Its messenger has been mapped by three groups (Ikura et at, 1983; Watson et at, 1983 ; Rixon and McGeoch, 1985) to between the glycoprotein D (gD) 'To whom reprint requests should be addressed. Present address : MRC Laboratory of Molecular Biology, Hill's Road, Cambridge, U .K. 1
0042-6822/86 $3 .00 Copyright 0 1996 by Academic Press, Inc . All rights of reproduction in any form reserved.
HODGMAN AND MINSON U85
2 -1 0D-1
us, O .57
O . Go
Hind-1 Eco-n Be
Open reading frames
Transcripts identified I
A .B .C,D,E Barn-1
Fhe.1 . The transcripts in the region of the gD-1 and gE-1 genes have been drawn in their respective map positions with the equivalent HSV-2 regions aligned beneath . The restriction map of the HSV2 U, has BamHI (B), BgIII (Bg), EcoRI (E), and HindlII (H) sites marked, and also the positions of the Hind-I and Eco-n fragments . The long open-reading frames identified from the sequence, along with the transcript groups detected by the BamHI fragments, have been shown .
that of HSV-1 . The long open-reading frame codes for a polypeptide of almost 40 kDa and is henceforth referred to as p40k . Comparison of the type 1 and 2 nucleotide sequences reveals some noteworthy similarities and differences. The likely implications for the structure and function of US7, the US7 and gE promoters, and the interaction of HSV genomic and polypeptide functions are discussed . MATERIALS AND METHODS
HSV-2 strains 25766 and 333 were propagated in BHK-21 cells grown in Glasgowmodified Eagle's medium . The HSV-2 HindIII h and I fragments cloned in the HindIII site of pAT153 (henceforth referred to as pATHind-h and pATHind-1) were a gift from J. B . Clements, and the HSV-2 Bg1II I fragment was cloned in the BglII site of plasmid pKC7 in this laboratory (Robbins and Minson, unpublished work) . Plasmid preparation by SDS lysis and CsCl density-gradient centrifugation, and DNA restriction were carried out as described by Maniatis et at (1982) . Restriction fragments were purified from agarose gels by the method of Vogelstein and Gillespie (1979), and nick translation by that of Rigby et at (1977) . The preparation, electrophoresis, transfer to nitro-
cellulose, and detection by nick-translated DNA of infected-cell mRNA (from 8 hr postinfection) were carried out as described by Inglis and Darby (1981) . The Eco-n fragment of HSV-2(333) was chosen for nucleotide sequencing, because it contained the complete gD-2 sequence, which was required for other purposes, was predicted to include the 5' end of the gE-2 gene, and was convenient to purify from the plasmid pKC7Bgl-1. The nucleotide sequence was determined by the dideoxychain-termination method using the mp8 and mp9 vectors (Sanger et at, 1977, 1980; Biggin et at, 1983 ; Vieira and Messing, 1982) . Alul and HaeIII fragments were cloned into the Smal site and Sau3a fragments into the BamHI site . The sequence of the gD-2 gene (from strain G) has already been published (Watson, 1993; Lasky and Dowhenko,1984), and is not presented here, because very few differences were found . A substantial part of the gE-2 sequence was found to lie outside the fragment under study, so that region has also been omitted from the sequence shown . The gel readings were assembled using an IBM 3081 computer. All software was written locally (Hodgman, unpublished work), the exceptions being those for codon usage analysis of open-reading frames (Staden and McLachlan, 1982), and RNA folding (Jacobson et at, 1984) .
THE HSV-2 US7 GENE RESULTS
Bam-c' . Nucleotide sequencing has shown that the gD-2 coding region crosses the Northern Blot Analyses BamHI site joining Barn-c' and Bam-1, so the latter should illuminate the gD-2 A selection of plasmids and BamHI mRNA less intensely than Hind-1 on a fragments, containing HSV-2 sequences, northern blot . From analysis of intertypic were used to identify the approximate po- recombinants and other work (Hope et at, sitions and sizes of transcripts produced 1982 ; Para et at, 1982 ; Lee et at, 1982), the by HSV-2 Us . The map positions of the gE-2 gene can be expected to lie somewhere fragments used are shown in Fig . 1, and in Bam-d' and its mRNA should also be dethe resultant autoradiographs in Fig . 2 . tected by Hind-1 . All the probes used depATHind-l illuminated many species (Fig . tected the D transcripts, and the relative 2a, lane 3), only a few of which have been intensities of D and E change between studied further . pATHind-l and Bam-1 (Fig. 2b, lanes 1 and Bam-c' has identified three transcripts 4) . Since Watson et at (1983) have identified B, D, and F (Fig. 2b, lanes 5 and 6), the the gD-1 and gE-1 mRNAs as being very smallest of which is similar in size to the similar in size (at slightly less than 3 kb), HSV-1 US7 mRNA (namely 1 .6 kb) . This F the Dl and D2 transcripts seem to be the transcript was not detected by Bam-1 or likely candidates for the gD-2 and gE-2 Bam-d', and so probably lies entirely within messengers, respectively .
b) 1 2 34 56 78
Late mRNA (3kb)
M (1 .4 kb)
FIG . 2. To obtain high resolution, and size some of the transcripts, 0 .5 pg aliquots of infected-cell RNA were subjected to electrophoresis on a vertical, formaldehyde 1 .5% agarose gel (panel a) . The blot was cut into strips and probed with (1) pATHind-h, (2) pBR322, and (3) pATHind-l . The bands in lane 3 have been labeled to assist in further discussion. Some transcripts migrated close to each other . So their general position on the gel has been denoted by a capital letter, and then specifically by a numerical suffix . Hind-h maps to the 0 .29-0 .40 region of the genome and was need to provide size markers. Only the thymidine kinase and 3-kb late mRNAs (Sharp et aL, 1983 ; Holland et aL, 1984) could be identified from previous experiments, using smaller cloned fragments from around the TIC gene . In panel b, the transcripts detected by different fragments were examined . Lanes 1, 4, 5, and 7 contained infected-cell RNA, and 2, 3, 6, and 8 contained uninfected-cell RNA . Lanes 1 and 2 were probed with pATHind-l, 3 and 4 with Barn-], 5 and 6 with Bam-c', and 7 and S with a 1 .2-kb p fragment containing the HindIII-BamHI subfragments of Bam-d' and pAT153 (from a BamHI digest of pATHind-1) .
HODGMAN AND MINSON
The Nucleotide Sequence The gel readings used to assemble the sequence under discussion are shown in Fig . 3, and Fig. 4 shows an alignment of HSV-2(333) with HSV-1(17) (McGeoch et at, 1985) . Except for numerous small insertions and deletions, the sequence around the US7 "TATA box" and transcriptional start point have been fairly well conserved. The type 2 sequence has two small sequences inserted, one after gD termination and the other in the putative p40k transcript prior to the initiation codon . Such divergence is not surprising since these areas seem unlikely to be under heavy selective pressure . The presumptive p40k TATA box is CATAAA, which is similar to others found in HSV-1U5 . From the first putative start codon of the HSV-2 sequence, an open-reading frame of 1112 bases was found which shows substantial homology at both the nucleotide and protein levels to that of US7 . Two polyadenylation signals are found 3' of this open-reading frame . Codon usage analysis (Staden and McLachlan, 1982) using the gD sequence as reference indicates that both the US7 and p40k open-reading frames have >99% probability of being the coding sequence- A large palindrome centered around the A residue at position 408 has been conserved, but the tandem repeats have not, though an equivalent region remains . This region has a very biased base composition of 66%C+ 7% Gin HSV-1, and 57% C+ 22% Gin HSV-2 . Immediately after these repeats the HSV-2 sequence has a large deletion . Only 8 out of 25 codons remain and these show poor homology . The repeats and deletion constitute a major region of divergence between the HSV types, another region being at the C-terminal end of the sequence . It seems very significant that collinearity is restored at the final Nglycosylation signal . The Polypeptide Sequence The putative polypeptide generated from this nucleotide sequence has a calculated size of 39,546 . Its codon usage and aminoacid composition (Table 1) show the
akmH I AATAAA p40k(TAG)
Pa( 1_ p40k(AtW-
FIG. 3 . The gel readings used to assemble the complete sequence have been drawn in their respective positions and orientations. Above them is a map of the genome showing the positions of the gD-2 and p40k termination codons, the initiation codons of p40k and gE-2, the polyadenylation signals (AATAAA), the BamHI site marking the boundary between Bam-c' and Ram-&, and a Pstl and Xhol site- The sequence of HSV-2(25766) has been determined as far as the Xhol site (Bell, Buckmaster, and Minson, unpublished data), and portions were used to confirm the 333 sequence for which gel readings were only available for one strand. In the portions marked, the 25766 and 333 sequences were identical except for a single base substitution outside the open-reading frame .
THE HSV-2 US7 GENE
same G/C bias in the third base of the triplets and a high content of proline, alanine, and leucine that is seen in other HSV genes (Murchie and McGeoch, 1982 ; Watson et at, 1982) . It has a hydrophobic Nterminal sequence characteristic of a signal peptide, though the putative consensus sequence (Perlman and Halvorson, 1983; von Heijne, 1984) does not clearly indicate where this signal peptide will be cleaved. Of the possible places, cleavage after valine-23 seems most likely. This is in agreement with McGeoch's recent findings from computer analysis of many known signal peptides (McGeoch, 1985) . A 28-residue hydrophobic sequence, the putative transmembrane sequence, has been conserved to a remarkable degree . Only 3 residues have changed. This is unlikely to be due to some constraint on the nucleotide sequence since 10 of the 28 codons have silent third base changes, which is similar to the level of divergence found between the gD-1 and -2 genes . The hydrophobic moment (Eisenberg et at, 1984) across this region, assuming it adopts either a or 3 10 helices, is sufficiently large to suggest that it is interacting in the membrane with other polar species . This is due to the N-terminal polar side chains which appear at every third residue, namely T--Q--Q-- . p40k has conserved the three N-glycosylation signals of US7, and has an extra one revealing an apparent clustering of glycosylation sites in the primary sequence. The longest stretch of conserved nucleotide sequence codes for the oligopeptide AAMARLGAEL, which lies on the cytoplasmic side of the membrane and, together with the preceding six residues, is very strongly predicted by the Robson algorithm (Garnier et al., 1978) to fold into a-helix . The prediction across the US7 repeats including the region before the Nglycosylation signal, and its HSV-2 equivalent, reveal these regions to contain fl-turns and random coil with high probability . Elsewhere in these molecules, no ordered secondary structure could be confidently predicted because of the high proline content and uncertainty regarding the
effects of glycosylation and other posttranslational processing such as sulphation (Hope et at, 1982) . The Intergenic Sequence Turning to the region between the US7 and gE coding sequences, two functions have been found in HSV-1 : transcription termination of the US5, US6 (gD-1), and US7 messengers, and initiation of gE mRNA transcription. The HSV-2 sequence has two polyadenylation signals in this region, only the first of which is found in type 1. The region between the first polyA signal and mRNA termination has a striking 79% conservation in type 2, suggesting that this whole region is functionally significant . Immediately following the point of HSV-1 mRNA termination the HSV-2 sequence has the BamHI site which marks the boundary between the Bam-c' and Bam-d' fragments. The HSV-2 sequence has an unconserved direct repeat of seven nucleotides (GTTA1"1"1') flanking the first polyA signal. Such a feature has not been observed in other mRNA termination sequences, and so is of questionable significance . Twenty-four residues 3' of the polyA signal, the 333 sequence has the consensus YGTGTT found beyond many polyA signals of eukaryotic genes (Gil and Proudfoot, 1984 ; McLaughlan et at, 1985) . However, it has not been conserved, though a CGTGT sequence, two residues upstream, has . Upstream of the gE TATA box, HSV-2 has a large purine-rich insertion that contains the second polyA signal, though the YGTGTT consensus is not present at the appropriate distance. DISCUSSION
Transcription and the Nucleotide Sequence For HSV-1, the entire nucleotide sequence and transcript map of U s has been determined and putative coding regions assigned (McGeoch et at, 1985) . The purpose of this study has been to examine part of HSV-2 U s to see if the genetic content is the same, and to discover what may be
• g0 at., 'TATA' TTTTACTAGA . . . . TACCCCCCCTT PATGG . . . . . . . . . . . * . .CAGGTCTGCOGGC . . .TTGGGRTTT .AACTCCATATRRACCGAGTCTCGAAGGGGCCAR fwuff .fxf a .1 . . a ru w a rr a .1 .1 uu ..u r une „xxu unu xu n owr .
TTTTACTAGAGGAGTATCCCCGCTCCCGTGTACCTCTCCCCCCCTCTCGCACCCT . .CG .Ct . .GCCCTATTtGGG .TGGGA GGA .LTTLGACTLCCCATAAAGCGAGTCTCCAAGGAGGCAF
mRNA start M P C R P L 0 AGGTCGACAGTCCATAAGTCCGTAGCGGCCGACGCGCACC . .TCIICCGCC ICTCGCACCCACAGCTTTTTTTGCGAACCGTCCCGTTCCGWATGCCGTGCCGCCCGTTGCA urrr„ „ • * ux ,x . . „r uuo,r m ru , . . . . .uu w x wrx . . .ux . .u, uu . . .
ACTAGGACAGTTCATAGGCCCGOALCGTGLCCCGCCCACCGCTCTCCCCACGATTAGCCACCGCCCCCACAG . .TCACCT .CGACCCGTCCGATCCCGGTATGCCCGOCCGCTCCCTGCA L Q M P G R S G L V L V G L W V C A T S L V V R L P T V S L V S N S F V D A C A L C P D G V V GGGCCTGGTGCTCGTGGGCCTCTGLGTCTGTGCCACCAGCCTGGTTGTCCGTGGCCCCACGGTCAGTCTGGTATCAAACTCATTTGTGGACGCCGLGGCCTTDGGCCCCGACGGCGTAGT
uu . .u . u .u .ux wuf+o . .u . . uwu . .uu u u .u„uu . .u uu, . u .u u .uu . . nuuu r .u . o
CCCCCTGGCGATCCTGGGCCTGTGGGTCTGCGCCACCGGCCTGGTCGTCCGCGGCCCCACGGTCAGTCTGGTCTCAGACTCACTCGTGGATGCCGGGGCCGTGC0GCCCCAGGGCTTCGT G L A I L G L W V C A T G L V V R G P T V S L V 5 D S L V D A G A V G P Q G F V E E D L L I L G E L R F V G D 0 V P H T T Y Y D G G V E L W H Y P M G H K C P H GGAGGRAGACCTGCTTATTCYCGGGGAGCTTCGCtTTGTOCGGGACCAGGTCCCCCACACCACCTACTACGATGGGGGCGTAGAGCTGTGGCACTACCCCA*GGGACACAAATGCCCACG 480 GGAAGAGCACCTGCGTGTTTTCGGGGAGCTTCATTITGTGGGGGCCCACGi5CCCCACACAAACTACTACCACGGCATCATCGAOCTGTTTCACTACCCCCTGGGGAACCACTGCCCCCC E E D L F H Y P L G N H C P R H V F G E L H F V G A G V P H T N Y Y D G I I E L V V H V V T V T A C P H H P A V A F A L C R A T D S T H S P A Y P T L E L N L N OGTCGTGCATGTCGTCACGCTGACCGCGTGCCCACGTCGCCCCGCCGTGGCATTCGCCCTGTCTCCCGCGACCCACACCACTCACACCCCCGCATATCCCACCCTCGAGCTCAATCTGGC .. w r. a n * uuu . .uo a ,u,uo .o,u u . a .o .u . . u . . u r r urmuwr rou uwrruu, uuo CGTTGTACACLTOGTCACACTCAGCOCATGCCCCCGCCCCCCCGCCGTGGCGTTCACCTTGTGTCGCTCGACGCACCACGCCCACAGCCCCGCCTATCCGACCCTGGAGCTGGGTCTGGC V V H V V T L T A C P H R P A V A F T L C R 5 T H H A H S P A Y P T L E L G L A
Q Q P L L R V Q R A T R D Y A G V Y V L R V W V G D A P N A S L F V L G M A I A CCAACAGCCGCTTTTCCCGGTCCAGAGGGCAACGCGGCACTATGCCCGCGTCTACGTCTTACGCGTATCGCTCOCTCACCCGCCA ACCCCACCTGTTTGTCCTCCGCATGCCCATACC
uu .orr oo
. . . .. .
. . .uo
GCGGCAGCCGCTTCTGCGGGTTCGAACGGCAACGCGCGACTATGCCGGTCTGTATGTCCTGCGCGTATGGGTCGGCAGCGCGACG ACGCCAGCTGTTTGTTTTGGGGGTOGCGCTCTC R Q P L L R V R T A T R D Y A G L Y V L R V W V G S A T N A S- L F V L G V A L S A E G T L A Y N G S A Y G S C D P K L L P S S A P R L A P A S V Y D P A P N Q A COCCOAAGCGACTCTGGCCTACAACCCCTCG.CCTATCGCTCCTGCCACCCGAAACTGCTTCCGTCTTCGCCCCCGCGTCTCGCCCCOCCCAGCCTATACCAACCCGCCCCTAACCAGCC 840 TGC A
ALLGLAC N G T
TTCTGTATAACLGCTC ALTALGGCTCCTGCOATCCGOCGCAGCTTCCCTTTTCGGCCCCLL0CCTGGLACCCTCGALCGTATAOACCCLCGGAGCCTCCCGGCC V Y N G SgD Y G 5 C D P A Q L P F 5 A P R L G P 5 5 V Y T P G A S R P
S T P S T T T S T P S T T I P A P S T T 1 P A P Q A S T T P F P T G D P K CTCCACCCCCTCGACCACCACCTCC ACCCCCTCGACCACCATCCCCGCTCCCTCGACCACC . . .ATCCCCOCTCCCCAAGCATCGACCACGCCCTTCCCCACGGGAGATCCAAA , „ . u - a , . ox , wf .u a . . uw . wux, u . u x fx fx w CACCCCTCCACGGACAACGACATCCCCGTCCTCCCCCCGAGACCCGACCCCCGCCCCCGGGGACACAGGGACGCCCGCGCCC . . .GCG A000GCGAG . . .AO T P P H T T T S P S S P R D P T P A P G D T G T P A P A S C E R
• Q P P G V N H E P P S ACCACAACCTCCCGCGCTCAACCACGAACCCCCATC
L T V T Q 1 I Q I A I P A 5 I T N A T R A T R D S R Y A ATCCCAC GAtCGACCCGCOACTCGCG OCGCOCTAACGGTGACCCAGATAATCCAGATAGCOATCCCCGCGTCCATCAT 1080
000CCGCCC . . . .PATTCCACOGATCGGCCAGCOAATCGAGACACAG • • TAACCOTAGCCCAOCTAATCCAGATCGCCATACCGGCGTCCATCAT A P P N S TJ S A S E 5 H H H L T V A Q V 1 0 I A I P A S I I
A L V F L G S C I C F I AGCCCTGGTGTTTCTGGGGAGCTGTATTTGCTTTAT • CGCCTTTGTGTTTCTGGGCAGCTGTATCTGCTTCAT A F V F L G S C I C F I
H R C Q R R Y R R S R R P I Y S P Q H P T G I S C A V N ACAGATGTCAA000CGCTACCGACGCTCCCGTCGCCCGATTTACACCCCCCAGATCCCCACCGCCATCTCATGCGCGGTGAA n . . .f . . • uu. . . fx 1200 ATAGATGCCAGCLCCGATACAGGCGCCCCCGCOOCCAOATTTACAACCCCGGG 000GTTTCCTGCGCGCTCAA H R C Q R R Y R R P R G Q I Y N P G 0 V S C A V N
• A A M A R L G A E L K S H P S T P P K S R R R S S R T P M P S L T A I A E E CGAAGCGGCCATGGCCCGCCTCGGAGCCGAGCTCAAATCGCATCCGAGCACCCCCCCCAAATCCCGGCGCCGLTCGTCA . . .CCCACCCCAATCCCCTCCCTCACCGCCATCGCCGAAGA • •••• ' • • • CGAGGCGGCCATGGCCCGCCTCGGACCCGAGCTGCGATCCCACCCAAACACCCCCCCCAAACCCCGACGCCGTTCGTCGTCGTCCACGACCATGCCTTCCCTAACGTCOATAOCTGAGGA • A R M A R L G A E L R S H P N T P P K P R R R S S S S T T M P S L T S I A E E
• E P A G A A G L P T P P V D P T T P T P T P P L L V GTCGGAGCCCGCTGGGGCGGGTGOGCTTCCGACOCCCCCCGTGGACCCCACGAGACCCACCCCAACGCCTCCCCTGTTGGTATAGGTCCACGGCCACTLCCCGCGACCACCACATAACCG uu w uu, .,, . ., .w • ••. u u ,
1 4 40
ATCGGAGCCACGTCCAGTCCTCCTCCTCTCCCTCAGTCCTCCC . . .CCCCCCACTCGC . . .CCGACCGCCCCCCAAGAGGTCTAGGTCCAAGCGGGCCOTTCGGCAGLCCCGCCC .A000
G P V V
Poly A tarmlnatlon x ASIA ACCGCAGTCCC .TCAGTTGGG . . . .AATAAACCGGTATTATTTACCTATATCCGTGTATGTCGATTTCTTT CCCCCCCTCCCCGGAAACCAAAGAAGGAAGCAAAGAATGGA ` ` CCCCCATCCTCCTTATTTCCCCCLLAATAAALCLATLTTATTTCCCTATATGCLTGTGT .TOCATCCCTTTCTCATCGTTCCTCATTCCCCCGATGGCATGGGAGG . . .000GTAATGGA Bass I
'TATA' . 0 mRNA start TGGGAGGAGTTC AGGAA L000GGGAGA000CCCGCGOCGLATTTAAGCCGTTGTTGTCTTGACTTTGCCTCTTCTGGCGGGTTGGTGCGGTGCTG eo, . . , • • • •° • ^ f TGGGCGGGGCCCGGGCGGGGAGCAAAAAGAATAAAGGGGGTAGTGTCGGAGA .GCCCCGCCGCGCATTTAAGGAGTCGCCGCCCCGPCTCTGTGTCTTCGCGTGACTTGGTGLLCLGCCL
Met TTTGTTGCGCTCCCATTTTACCCGAACATCGCCTCCTATCCCCCGGACATG , r . r u„ n . ., .r x u w . f f*A*, TCAGCTAOTLTCCLATCTOCCCCCACCOACCCCTCCTGCCACCCGAACATC FIG . 4. In the above alignment, the top pair of lines are the HSV-1(17) nucleotide and US7 sequences (from McGeoch et aL, 1985), and the bottom pair are those for HSV-2(333). There are 120 bases per line . They extend from the end of the gD coding sequence to the initiation codon of gE . The points of mRNA initiation and termination, putative "TATA" boxes, and polyadenylation signals (polyA) in HSV-1 have been marked, along with the conserved palindromes and tandem repeats (both on-
THE HSV-2 US7 GENE TABLE 1 CODON USAGE OF THE
S S S S
TCT TCC TCA TCG
1 12 2 12
Y Y " *
TAT TAC TAA TAG
4 7 0 1
C C * W
TGT TGC TGA TGG
2 7 0 2
CTT OTC CPA CTG
3 3 2 22
P P P P
CCT CCC CCA CCG
3 26 4 9
H H Q Q
CAT CAC CAA CAG
2 9 1 9
R R R R
CGT CGC CGA CGG
2 15 6 5
I I I M
ATT ATC ATA ATG
1 9 2 3
T T T T
ACT ACC ACA ACG
0 10 5 12
N N K K
AAT AAC AAA AAG
1 S 1 0
S S R R
AGT AGC AGA AGG
2 7 3 2
V V V V
GTT GTC GTA GTG
5 12 5 10
A A A A
GCF GCC GCA GCG
1 22 2 11
D D E E
GAT GAC GAA GAG
2 7 3 10
G G G G
GGT GGC GGA GGG
3 15 3 10
F F L L
TTT TTC TTA TTG
L L L L
learned from a comparison of the type 1 and 2 sequences . The transcript mapping experiments of Fig. 2 showed that a 1 .6-kb transcript was detected by Bam-c' but not Bam-1 or Bamd', while all three fragments identified D transcripts . From the sequence data, it is clear that the gD-2 mRNA crosses the lefthand BamHI site of Bam-c' and from its size (2 .55 kb plus the polyA tail) should finish somewhere near the right-hand sitewhere the first polyA signal is seen . Thus, the gD-2 and gE-2 messengers can fairly confidently be assigned to the D1 and D2 transcripts . Bam-c' and Bam-1 also detected the B transcript. Thus it is tempting to suggest that the B, D, and F transcripts form a nested set analogous to the US5, 6, and 7 mRNAs of HSV-1 . The type 2 equivalents of US6 and 7 are conserved, and incomplete sequence data (not shown) suggest that HSV-2 has an analogous open-
reading frame to HSV-1 US5. However, one transcript could be the degradation product of another and transcript B does appear to be much larger than expected. The US7 and p40k upstream sequences show little of note, other than that the 73 residues flanking the initiation point of US7 mRNA transcription are collinear with a distinct 77% homology . Transcriptional control of the p40k gene is likely to be identical to that of US7, though the changed TATA box could perhaps result in a different point of initiation (Breathnach and Chambon, 1981) . The region at the 3' end of the mRNAs has been well conserved, suggesting that this sequence is involved somehow in the control of termination . What significance such sequences have, including the YGTGTT consensus, is not known . The failure to conserve the HSV-2 GTTATTT direct repeat by HSV-1 casts doubt over it having
derlined) . For the type 2 nucleotide sequence, the palindrome has been overlined, and the polyA signals, direct repeats (GTTATTT) and BamHI site underlined. At the amino acid level, the Nglycosylation signals and putative transmembrane sequence have been marked . Within the coding sequence, gaps have been introduced which correspond to exact numbers of codons . Outside, the alignment was arranged to bring recognizable transcriptional control signals together .
HODGMAN AND MINSON
any functional significance, especially since other HSV termination sequences do not possess such a feature . The position of the BamHI site of HSV-2 with respect to the point of mRNA termination (in HSV-1) shows clearly that the type-2 1 .6-kb mRNA probably does lie entirely in Bam-c' as indicated by the northern blots above . The HSV-2 sequence also has a second polyA signal, which is part of a large insertion not found in type 1 . Comparative studies of gE transcription and translation have not been published . However, the kinetics of gD-2 and gE-2 synthesis appear to be similar (Balachandran et aL, 1982), and both promoters are purine rich . Everett (1984) has shown that some parts of the gD-1 promoter can be increased in size without a major change to its transcriptional activity . Whether this is the case for gE will involve further detailed study . The second polyA signal could be involved in differential mRNA processing, as has been suggested in Adenovirus (Nevins and ChenKiang, 1981) . Within the coding sequence, a large conserved palindrome can be seen . Any function for this at the level of DNA is unclear . The RNA folding program of Zuker (Jacobson et aL, 1984) predicted that the RNA of both HSV types would form stable hairpin loops in this region (Fig . 5) . These could be involved in either mRNA recognition or messenger stability, though these possibilities would be difficult to test and then of unknown significance . Implications from the Amino Acid Sequence
The US7 gene product has not been formally identified, but Lee et aL (1982) found a 50-kDa polypeptide produced by in vitro translation of mRNA selected using a DNA fragment including the US7 gene. The data presented here show that HSV-2 has conserved this open-reading frame . The sequence bears all the hallmarks of a membrane-bound glycoprotein, and the analogous product of the Varicella-Zoster virus has been identified as such (Davison et aL, 1985) .
a) HSV-l US7 A
A-U G-C G-C G-C G-C G-C U-A 5'000 ACCA3' b) HSV-2 p40k CAG
C U G-C G-C G-C G-C G-C U-A G-C U-A~; U-A U-A 5'CAU-ACUA3' FIG. 5 . The hairpin loops predicted for the palindromes in the US7 and p40k mRNAs .
The degree of conservation of the putative transmembrane sequence, and the arginine-rich region immediately C-terminal to it, is remarkable . A sequence involved solely in membrane anchoring has no strong constraints at the amino acid level except to maintain its hydrophobicity, thus distinct divergence can be expected (consider gD; Lasky and Dowbenko, 1984) . The N-terminal half of the region has an uncommonly large hydrophobic moment which arises from the occurrence of polar side chains on every third residue . This motif is also seen in gB-1(S--S--S) and US4 of HSV-1 (T--H--S) . Of all the membrane protein entries in the NBRF and Doolittle protein sequence data bases, only the E. coli Lac permease shows any similarity with the sequence TLISVFT. These features suggest that the transmembrane sequences of US7 and p40k interact with some other polar species in the membrane . No other parts of the molecule are sufficiently hydrophobic to represent transmembrane sequence, so the backbone seems unlikely to pass through the membrane more than once and interact with itself.
THE HSV- 2 US7 GENE
The size of the cytoplasmic tail suggests that it is involved in more than simply membrane anchoring . The conservation of the predicted a-helix not only between HSV types but also in the analogous openreading frame of the Varicella-Zoster virus (Davison, 1983) adds weight to this proposition . Thus the molecule seems to have at least two functional halves, one on each side of the membrane. The tight conservation of the transmembrane sequence could be indicating that the two halves are interacting in a unified fashion, for example as some kind of signal receptor . None of the membrane proteins in the protein sequence data bases bear any identifiable homology or similar structural prediction profile to p40k or UST Possible Interactions between Genamic and Polypeptide Functions The tandem repeats of US7, and its equivalent in p40k, have a codon usage which departs from that expected . This probably indicates some function for the nucleotide sequence in addition to protein coding, Other such tandem repeats have been associated with DNA recombination, where copy number variation could arise from unequal crossover (Davison and Wilkie, 1981) . Although the number of these repeats in US7 seems to vary between HSV-1 strains, they do not vary between different clones of the same strain (Minson and Bell, unpublished work), and it therefore seems unlikely that the variation in the number of repeats arises during cloning . The biased base composition results in a region rich in proline, serine, and threonine which is relatively hydrophilic and possesses little or no ordered secondary structure . Thus, whatever the function of the tandem repeats at the nucleotide level, the virus seems to cope with the copy number variation by producing a region of random coil whose length is irrelevant to function. The nucleotide repeats are clearly nonessential to HSV-2, which has a less biased base composition-perhaps because it is returning to the standard for HSV-2
DNA. However, certain amino acid motifs have been conserved, including the PAP repeat and the TTTS sequence . In the past, other workers have commented that the HSV genome's high G + C content is due to one or more factors acting upon the nucleotide sequence, rather than the amino acid sequence governing codon composition (Murchie and McGeoch, 1982) . If a region of polypeptide becomes nonessential for the molecule's structure or function, then it would be expected to diverge faster and be free to adopt a random coil structure. The drive to increase genomic G + C content would fill such areas with proline, glycine, alanine, and arginine residues which would tend to fold into random coil . This situation seems to have arisen in the case of the major divergent region in p40k and the divergent region of gD-the 13 residues prior to the membrane spanning portion, which also has a biased G + C content in the nucleotide sequence . However, the C-terminal region of p40k and US7, and the divergent areas of the thymidine kinase gene, do not have base compositions very different from that of the respective complete genes. So these observations in p40k and gD are not applicable generally, and any significance they have will only become apparent after considerable study of other HSV genes . ACKNOWLEDGMENTS TCH thanks the Science and Engineering Research Council for a postgraduate training award . This work was supported by a Medical Research Council Programme Grant.
REFERENCES N., HARNISH, D ., RAWLS, W . E ., and (1982) . Glycoproteins of herpes simplex virus type 2 as defined by monoclonal antibodies . J. ViroL 44, 344-355 . BIGGIN, M . D ., GIBBON, T . .1 ., and HONG, G. F . (1983) . Buffer gradient gels and ssS labcl as an aid to rapid DNA sequence determination . Pros NaIL Acad Q,_ USA 80, 3963-3965. BREATHNACH, R., and CHAMBON, P . (1981) . OrganiBALACHANDRAN, BACCHETTI, S.
HODGMAN AND MINSON
cation and expression of eucarotic split genes coding for proteins. Annu. Rev. Bioehem. 50, 349-383. BUCKMASTER, E . A ., GOMPELs, U ., and MINSON, A . (1984) . Characterisation and physical mapping of an HSV-1 glycoprotein of approximately 115 X 10 s molecular weight. Virology 139, 408-413 . DAVISON, A. J. (1983) . DNA sequence of the U s component of the varicella-zoster genome. EMBO J. 2,
LEE, G . T.-Y ., PARA, M. F ., and SPEAR, P . G. (1982) . Location of the structural genes for gD and gE, and other polypeptides in the S component of herpes simplex virus type 1 DNA . J. Virol 43, 41-49. McGEOCH, D. J. (1985) . On the predictive recognition of signal peptide sequences . Virus Res . 3, 271-286 . McGEOCH, D. J., DOLAN, A ., DONALD, S ., and RIXON,
2203-2209 . DAVtsoN, A. J., WATERS, D. J ., and EDSON, C. M . (1985). Identification of the products of a varicella-zoster
F . (1985) . Sequence determination and genetic content of the short unique region in the genome of herpes simplex virus type 1. J. Md Riot 181, 1-14 . McKNIGrIT, S . L . (1980) . The nucleic acid sequence and
virus glycoprotein gene . J. Gen . Virot 66, 2237-2242. DAvISoN, A. J ., and WILKIE, N. M . (1981) . Nucleotide sequences of the joint between the L and S segments of herpes simplex virus types 1 and 2. J. Gen Viral 55,315-331. EISENBERG, D ., SCHWARZ, E ., KOMAROMY, M ., and WALL, R. (1984). Analysis of membrane protein sequences with the hydrophobic moment plot . J. Moi
transcript map of the herpes simplex thymidine kinase gene . Nut . Acids Rea 8 . 5949-5964 . MCLAUCHLAN, J., GAFFNEY, D ., WHITTON, J . L., and CLEMENTS, J. B. (1985) . The consensus sequence YGTGTTYY located downstream from the AATAAA signal is required for efficient formation of mRNA 3' termini . Nuc Acids Res. 13, 1347-1368 . MANIATis, T., FRITSCH, E . F ., and SAMBROOK, J . (1982).
Bid 179,125-142. EvERETr, R. D. (1984) . A detailed analysis of an HSV1 early promoter: Sequences involved in traps-activation by viral immediate-early gene products are not early gene specific. Nuc. Acids Res . 12, 30372056 .
"Molecular Cloning A laboratory manual ." CSH Laboratory, New York . MURCHIE, M . J., and McGEocH, D . J. (1982) . DNA sequence analysis of an immediate early gene region of the HSV type 1 genome (map coordinates 0 .950 .978). J. Gem Viral 62,1-15.
GARNCER, J ., OSGUTHORPE, D . J ., and ROBSON, B. (1978) . Analysis of the accuracy and implication of simple methods for predicting the secondary structure of globular proteins. J. Md Bid 120, 97-120. Ga, A ., and PRouDFooT, N. J . (1984). A sequence
NEVINS, J. R., and CHEN-KIANG, S . (1981) . Processing of adenovirus nuclear RNA to mRNA . Adv. Virus
downstream of AAUAAA is required for rabbit dglobin mRNA 3' end formation . Nature (London) 312,473-474 . HOLLAND, L. E ., SANDRI-GOLDIN, R. M., GOLDIN, A. L., GLORtoso, J . C ., and LEVINE, M . (1984) . Transcriptional and genetic analyses of the herpes simplex virus type 1 genome : coordinates 0 .29 to 0 .45. J. Vird
coprotein (gE) of herpes simplex virus types 1 and 2 and tentative mapping of the viral gene for this glycoprotein . J. ViroL 41,137-144 . PERLMAN, D., and HALVORSON, 110 . (1983). A putative signal peptidase recognition site and sequence in eukaryotic and prokaryotic signal peptides . J. MoL
49,947-959. HOPE, R. G., PALFREYMAN, J ., Suit, M ., and MARSDEN, H . S. (1982) . Sulphated glycoproteins induced by herpes simplex virus . J. Gen. Virol. 68, 399-415 . IKURA, K., BETZ, J. L ., SADLER, J. R., and PIZER, L. I. (1983) . RNA transcribed from a 3 .6 kilobase Smal fragment of the short unique region of the herpes simplex virus type 1 genome. J Vird 48, 460-471. INGLIS, M . M ., and DARBY, G . K . (1981) . Adenovirus late sequences linked to herpes simplex virus thymidine kinase may be introduced into eukaryotic cells and transcribed. Nnc Acids Res . 9, 5569-5585. JACOBSON, A. B ., GOOD, L., SIMONETTI, J., and ZUKER, M. (1984) . Some simple computational methods to improve the folding of large RNAs . Nuc Acids Res. 12,45-52. LASKY, L . A ., and DOWBENKO, D . J. (1984). DNA sequence analysis of the type-common glycoprotein D genes of herpes simplex virus types 1 and 2. DNA 3,23-29 .
Res . 26,1-35 . PARA, M . F ., GOLDSTEIN, L., and SPEAR, P. G . (1982). Similarities and differences in the Fc-binding gly-
Bid. 167,391-409. RICHMAN, D. D ., BUCKMASTER, A ., BELL, S ., HODGMAN, C ., and MINSON, A . C . (1986) . Identification of a new glycoprotein of herpes simplex virus type 1 and genetic mapping of the gene that codes for it . J Viral. 51,647-655 . RIGBY, P . W. J ., DIECKMANN, M ., RHODES, C., and BERG, P . (1977) . Labelling deoxyribonucleic acid to high specific activity in vitro by nick translation with DNA polymerase . I. J Mot Bid 113, 237-251 . RIxoN, F., and McGEOCH, D . J. (1985) . Detailed analysis of the mRNAs mapping in the short unique region of herpes simplex virus type . 1 . Nun Acids Res. 13,953-973. SANGER, F ., NICKLEN, S ., and CoULSON, A . R. (1977) . DNA sequencingwith chain-terminating inhibitors . Proc NatL Aced Sci. USA 74,5463-5467 . SANGER, F ., COULSON, A . R ., BARRELL, B. G., SMITH, A. J. H ., and ROE, B. A . (1980). Cloning in singlestranded bacteriophage as an aid to rapid DNA sequencing. J. Mot Bid 143,161-178.
THE HSV-2 US7 GENE
SHARP, J . A ., WAGNER, M . J ., and SUMMERS, W . C. (1983). Transcription of herpes simplex virus genes
and analytical purification of DNA from agarose . Proc. Nat).. Aced Sci. USA 76, 615-619 .
in vivo: Overlap of a late promoter with the 3' end of the early thymidine kinase gene . J ViroL 45, 1017 .
VON HErmE, G . (1984). How signal sequences maintain
SPEAR, P. G. (1984). Glycoproteins specified by herpes simplex virus . In "The Herpesviruses" (B . Roizman, ed.). Plenum, New York . STADEN, R., and McLACHLAN, A. D . (1982) . Codon preference and its use in identifying protein coding regions in long DNA sequences . Nuc. Acids Res. 10, 141-156. VIEIRA, J ., and MESSING, J . (1982) . A new pair of M13 vectors for selecting either strand of double digested restriction fragments . Gene 19, 269-276 . VOGELSTEIN, B ., and GR,LESPm, D . (1979). Preparative
cleavage specificity. J. Mot Bid 173,213-251 . WATSON, R . J. (1983) . DNA sequence of the herpes simplex virus type 2 glycoprotein D gene . Gene 26, 307-312 . WATSON, R. J ., COLBERG-POLEY, A . M ., MARCUS-SERURA, C . J ., CARTER, B . J ., and ENQUIST, L . W . (1983) . Characterisation of the herpes simplex virus type 1 glycoprotein D mRNA and expression of the protein in Xenopus oocytes . Nua Acids Res. It, 15071522 . WATSON, R. J ., WErs, J. H., SALSTROM, J . S ., and ENQUIST, L. W . (1982) . Herpes simplex virus type-1 glycoprotein D gene : Nucleotide sequence and expression in Escherichiacolt Science 218,381-384 .