J. Mol. Hiol. (1985) 181, 153-160
Structure and Genomic Organization Rat Aldolase B Gene Ken-i&i
of the
Tsutsumil, Tsunehiro Mukai2, Reiko Tsutsumi’, Soh Hidakal Yuji Arai2, Katsuji Hori2 and Kiichi Ishikawa’f ‘Department of Biochemistry Yamagata University School of Medicine Zaoh-iida, Yamagata 990-23, Japan 2Department of Biochemistry Saga Medical School Nabeshima, Xaga 840-01) Japan (Received 27 August 1984, and in revised fopm
26 September 1984)
The structure of the chromosomal gene encoding rat aldolase isozyme B has been elucidated by sequenceanalysis of cloned genomic DNA. This gene comprises about 14 x 103 base-pairs of DNA, and is separated into nine exons by eight intervening sequences. A presumed t,ranscription-initiation site was assigned by S1 nuclease protection mapping, and T-A-T-A and C-C-A-A-T boxes were found to be 25 and 126 base-pairs, respectively, upstream from this initiation site. There are three characteristic sequencesof 100 to 200 base-pairs within the region of 870 base-pairs flanking the 5’ side of the gene. These sequencesare flanked on either side by direct repeats and terminate with an A-rich stretch of nucleotides. One of them has block homology with a region in an “ID sequence”, which is reported to be an element for tissue-specific gene regulation and differentiation. The other two are analogous at the sequenceorganizational level with a sort of dispersed repeat, the “Alu family”. These features suggest that these regions are involved in gene regulation and, also, imply evolutionary events such as duplication or insertion. Comparison of this gene sequence with the rabbit aldolase A complementary DNA sequencerevealed some bias in the frequency of nucleotide replacement among the exons, suggesting seiective evolutionary conservation of particular exons encoding functional domains. Comparison with the human aldolase B complementary DNA sequence revealed no such tendency; the homology between the Dwo sequences was very high (about S9o/0), and nucleotide replacements were randomly distributed throughout the protein-coding region. 1. Introduction The glycolytic enzyme, fructose-1,6-bisphosphate aldolase (aldolase; EC 4.1.2.13) is a tetrameric protein composed of a specific combination of different subunits: A (muscle type), B (liver type) and C (brain type) (Penhoet et al., 1966). These isozymes have been extensively characterized with respect to their tissue-specific distribution, change in their concentration during development or carcinogexiesis, and enzymological characteristics (Horeckrr et nl.. 1972; Schapira et al., 1975; Lebherz. 1975). The genes encoding these subunits are not identical and are assumed to be located separately on t’he chromosomes, although they may have closely related structures (Penhoet et al., 1967; Henfield of al., 1979; Lai, 1975). These genes are, t Author to whom correspondence should be sent.
therefore, thought to have originated by duplication of a common ancestral gene during evolution. The expressions of these genes, however. show multiple patterns and seem to be regulated independently, but, in some instances! in a manner showing mutual influence; for example, (1) these three subunits are not all expressed simultaneously within the same cell or tissue; usually one or two of the three types are expressed in the “housekeeping” state, although in some fetal and hepatoma cells all three isozymes are expressed (Lebherz & Rutter, 1969; PenhoetJ et al., 1966; Lebherz, 1975); a.nd (2) increase in the concentration of a pa,rticular t)ype during development, or carcinogenesis is oft,en accompanied by decrease in the level of another preexisting type; e.g., the levels of the A and B t’ypes in the liver change reciprocally during development or hepatocarcinogenesis (Schapira et nl.. 1963; Matsushima ef al., 1968; Gracy et ml.. 1970: Tkehara, e:tni.. 1970: Schapira et al.. 1975: Numazaki
01 r/l., 1981). The study of these closely related aldolase genes is of much interest, with respect to whether the controls of their expression are reflections of their specific structures. especially in the regions regulating transcription, and whether the expressions of these genes are regula,ted in different ways. T’reviously, we and others isolated several complementary DNA clones for rat liver aldolase B and muscle aldolase A (Tsutsumi et nb.. 1983: Simon ~1
2. Materials and Methods (a) NateriaLs
Restriction endonucleases were obtained from Takara Shuzo
(Kpoto,
Japan)
and Bethesda
Research
Labora-
t’ories. Escherichia coli DNA polymerase I, bacterial alkaline phosphataseand polynucleobide kinase were from Takara Shuzo. S1 nuclease was from BoehringerMannheim. ];I-~‘P]ATP (3000 Ci/mmol) and other isotopes were from Amersham. (b) Scrrrning of the rat gene library (‘haron 4A containing partial EcoRI or HaeIJI digests of’ rat (Sprague-Dawley) genomic DNA were kindly provided by Drs T. D. Sargent, R. B. \Vallace. L. I,. ,Iagodzinsky and ,J. Bonnet-. The phage were inoculated into E. co/i &rain DPSOSupF (a gift from Dr Y. Fujii-Kuriyama) and screened by in situ plaque hybridization as described by Bent,on & Davis (1977). with nick-translated aldolase R complementary DXA (Tsutsumi et ~1.. 1983: K. Tsutsumi et al.. 1984) as a probe. DSA fragments of the aldolase B gene were subcloned int,o the EcoRI site of the plasmid pBR322 foi subsequent structural analyses. (c) S, nuclea,se
mappiq
S, nuclease protection mapping was performed essentially as described by Berk 8E Sharp (1977). The DNA fragment was labeled with 32P at the 5’ end, and dissolved in 0.3 M-NaOH and it’s strands were separated on a polyacrylamide/7 M-urea gel. The anti-coding strand was hybridized with liver poly(A)+ RNA. When the hybridizable region was expected to be short, hybridization was performed at 30°C for 3 h in 0.9 M-XaCl, containing 0.09 M-sodium citrat,e. The resulting hybrid was digested with S, nuclease (400 units/ml) at 35°C for 30 min. and the remaining DNA was separated on a polyacrylamide/T nr-urea sequencing gel (Maxam & Gilbert, 1977). Bands were detected by autoradiography.
C’ompletr or partial rvstriction rnzymr digests were analyzed by rlect,rophorrsis on agaroae or polyacrylamidr gels. Sequences were detertninrd by t,he provedurr of Maxam 8r Gilbert (1977). I;‘-“‘P]ATP and (r-32P]dideos~ ATP were used for terminal labeling of’ the 1)X:\
frapmrnts;.
3. Results and Discussion (a)
Isolation
and restriction mapj3ing aldolose K gene
o;f fhe
Previously we reported the isolation and sequence of complement’ary DNA clones for rat aldolase B. the nucleotide sequence corresponding to more than 9096 of the entire mRNX sequence (K. Tsutsumi rf al.. 1984). Using t’hesr complementary DXAs as probes in plaque hybridizat’ion, we isolated several genotnic clones from rat EcoRT and HaeIII gene libraries (Fig. 1). One of them. RAB-16, nas the first identified using complementary DXA as a probe. All the ot,her clones were isolat’ed in the same way using the insert. DNA fragment in RAE-16, except clone RAB-6, which was identified using the DljA fragment’ in RAB-10 as a probe. The insert DXA in RAB-6 has the most extended sequence in the 5’ direction of the gene. and contains the t’ransrription-initiation site, as described below. The restjrict’ion tnap of the aldolase B gene. deduced by analysis of cloned DNA fragments. is shown in Figure 1. Southern blot analysis (Southern. 1975) of the EcoR)I fragment of rat total genomic DNA, examined using either the above cloned DNA or complementjary DNA asa probe. gave bands of essentially the same size as those estimatjed for the cloned genomic D1L’A (data not shown). These findings provide evidence for a single chromosomal locus of the aldolase B gene. Therefore, the restriction map in Figure 1 should indicate the correct genomic organizat#ion of the rat’ aldolase B gene. Axons in the gene were first roughly located h? Southern blot hybridization of various restriction fragments of the cloned DNA using either nicktranslat)ed complementary DNA or 5’-32P-labeled partially- purified aldolase B mR;L’A (Tsutsumi B Ishikawa. 1981) as a probe, and t,hen determined exact)ly by sequence analysis with reference to the complementary DNA sequence determined previously (Tsutsumi ef al.. 1983: K. Tsutsumi et nl.. 1984). Tn this way. we located nine exons. as shown in Figure 1 (h)
Transcription-initiation nldolase
site
uf the
R gene
For clarification of the detailed exon-intron structure ofthe gene, the nucleotide sequencesaround the predicted exons were determined and compared with the complementary DNA sequence. Fot determinationofthe5’boundaryofthegene. however. we had to use S, nuclease protection mapping, since our cloned complementary DXA lacks part of the cap
Rat Aldolase
( a)
( b)
0I
I
I
2
I
1
4
I
B Isozyme
6
Gene
6
I
'2'3'
155 10
I
4
I
12
I
14 kb
'5'6'7=61t
CC)
Poly(A)
1560-1561 UAG 1174
(d)
Figure 1. Organization of the rat aldolase I3 gene. (a) Scale in lo3 base-pairs for (b). (b) The localizations of exons (I to IX, filled boxes) and introns (1 to 8) were determined by Southern blot hybridization analysis with complementary DNA as a probe, and sequence analysis as described in the text. The cleavage sites of several restriction nucleases used routinely are shown: E. EcoRI; H, HindIII; K, KpnI; P, P&I; X, XhoI: B, BarnHI. (c) Schematir represent,ation of aldolase B mRKA. The total length, the positions of AUG and UAG and splicing points are indicated as nucleotide residue numbers from the transcription-initiation site. I to IX correspond to the exons, as in (b). Numbers in parentheses indicate lengths of exons in nucleotides. (d) Three genomic clones RAB-6, RAB-10 and RAB-16. The exons are indic:tt,rd by filled boxes and are numbered as in (b). The positions of EeoRT cleavage sites are indicated as in (b).
an HhaI-AEuI fragment (161 base-pairs) containing part of the first exon in the 1.4 x lo3 base-pair fragment of the EcoRI digest of RAB-lODNA(subclonedinpBR322)wasisolatedand labeledwith 32Patits5’end.Thestrandsofthelabeled DNA were separated, and the anti-coding strand (complementary to the mRNA sequence) was hybridized with liver poly(A)+ RNA, and then subject’ed to S, nuclease digestion (Fig. 2). The prote&ed fragment was applied to a polyacrylamide gel together with the same [32P]DNA fragment processedfor sequencedetermination. Several bands offragmentsof28to31 base-pairsweredetectedon the gel. These corresponded to 5’ G-G-A-T 3’ in the sequence ladder, the most intense band corresponding to T. This point, -4in the coding sequenceof the gene, is probably the tra.nscription-initiation site (cap site) of aldolase B mRNA, since transcription of eukaryotic mRNAs often beginswith purines, and especially with A (Breathnach & Chambon, 1981). The first exon starting from this A contains the region that will hybridize with the region at the 3’ end of rat 18 S ribosomal RNA (Chan etal., 1984) (Fig. 3). At 25 basepairs upstream from the presumed cap site, there is the sequence 5’ T-A-T-A-A-A-A-A 3’, which is homologous to the promoter for transcription in eukaryoticgenes (T-A-T-A box) (Goldberg, 1979).The sequence5’ C-C-A-A-T 3’ t,hat is conserved in many
site. For this purpose,
other genes(C-C-A-A-T box) (Efstratiadis et al., 1980) was also found at position - 126 relative to the cap site. (c) 5’ Flanking
sequence
A flanking sequence of 870 base-pairs on the 5’ side of the gene was also determined (Fig. 3). In this region, t’here are scarcely any characteristic features frequently
of sequence organization. such as a repeated or tandemly arranged sequence
like t,hat seenin the viral enhancer element) (Benoist & Chambon, 1981: Banerji et aZ., 1981: Moreau et nl.. 1981: Gruss et al., 1981). However, unlike other regions. t,here are three A-rich sequences (at positions to -41).
-728 t,o -707, -435 t,o -414 and -62 These sequences show rnorp t,han 700,;
homology with each ot)her, and each ends with 5’ C-C-A-T-C-A-C-A 3’ or an equivalent sequence. In addition. sequences homologous to those immediat,ely
found
about
following
these
100 to
(indicated by horizontal Thus these A-rich blocks
A-rich
hlocaks
were
200 base-pairs upstream arrows in Figs 3 and 4). are located at’ the ends of
sequencesthat are flanked on eit’her side by direct repeats. These structural features imply the possible relation of these A-rich blocks to a certain type
of the repeated
sequence,
the Alu
family;
that
(a)
2oobp (b)
.
5’
3’
C c A A
G T : G :
\
x G G T C T A T 3’
1 G A T 5’
Figure 2. Location of the 5’ end of the aldolase B gene. (a) Restriction fragment used for Si nuclease protection mapping of the mRNA. A, Hh and E indicate cleavage sites of AU, HhaI and EcoRI, respectively. The strandseparated HhaI-AZuI fragment labeled at the 5’ end with 32P used for the experiment is indicated. (b) The St-resistant DNA fragment (Sl) was subjected to electrophoresis on a 10% (w/v) polyacrylamide sequencing gel. Amounts of S, nuclease used (units/ml) are shown above lanes. A DNA sequencing ladder prepared from the same fragment was used as a size marker. From left to right: G, A > C, T+C and C degradation products prepared by the method of Maxam & Gilbert (1977). Arrows indicate the bands of fragments protected against S1 nuclease treatment. bp. base-pairs.
Rat Aldolase B Isozyme Gene -800
CcTTcA~~CCGcmTCpdmcATA~~~T~T~C--------------
( 1.1Kb Asn
Gly
-----------AcAGI64GUiTTAmclTTcGTGTGTcrcc~T Pro
Ile
Val
Glu
Pro
Glu
Val
CCIAl-F
Gil GAG CC3 w\GGll
Tyr
Ser
Val
TAT msrr
al
Leu
Ala
Val
Leu
Pro
Cll
CCT G4T G!Y GN CAT G4C CTAGAG CAC TGCCAG
Asp
Gly
Asp
His
Asp
Leu
Glu
His
Cys
Gln
INTRON 6
Glu
Lys
MG ~TAc~G~CTC~AGTT~T~C~TCTT
Val
Leu
GGG CTG GTA
w
Ala
)
Tyr
Lys
Ala
Leu
Am
Asp
His
His
Val
Tyr
Leu
Glu
Gly
Tyr
TC TTG G!IT GCTGTC TACAPI; GCT CTCAAT G4T CAT CAT GTTTAC CTT GAG OXACC Leu
Leu
Lys
Pro
Asn
Met
Val
Tyr
Ala
Gly
His
Ala
Cys
Tyr
Lys
Lys
Tyr
Thr
Pro
CTG CTA AA(; CC4MC
ATG GTGACT @IT G!i4 CAT RX TGC ACC PA6 A4G TAC AC.4 CCT
Glu
Ala
Gln
Va‘
Ala
Met
Thr
Val
Thr
Ala
Leu
tlls
Arg
Thr
Val
Pro
Ala
Ala
Val
GAG CAAGTG GCTATG GCC ACC GTCACG GCT CTC C4C AGAACT GlT CCl GCA GCTGTG
7
INIXON
EE ~IGTAATGccrrccrrcrccccAGCrrA~~mmTCCrru\cA~ T&-$JKf-J,-jTTT&!$CC _--_ ---_--_-_-
(0.4,(b)
----------------cTG-A~CcT er
Ile
ArrcCrrmmCcA4AG~~,GU;CTAATATCATGCCTCTCTC ly
Gly
Met
Ser
Glu
Glu
Asp
Ala
Thr
Ser
Cys
Phe
Leu
Ser
G
ATC TGC lTT TTG TCT G
Asn
Leu
Asn
Ala
Ile
Tyr
Arg
Cys
Pro
GA ffiC ATGAGT GAG GAG GATGCT AUI CTTAAC CTC &IT GCTATC TAC CGTTGC CCT
1
Met
Ala
His
CTTGTATlTTRrGTTTGillGmC;mTmAGACL7FTCATC~GCT eu
Thr
Ser
Glu
Gin
Lys
Lys
Glu
Leu
Ser
Glu
Arg
Phe
Pro
Ala
L
CAC CGA TTT CCA GCC C
Ile
Ala
Gln
Arg
Ile
Val
Ala
Asn
G
TC ACC TCA G.4G CAG A4G A4G G% CTC TCC FAG ATT GCG CAG CGC ATT GTT GCC MT G ly
Lys
Gly
Ile
Leu
Ala
Ala
Asp
Glu
Se-r
Val
Pro
Arg
Pro
Trp
Lys
Leu
Ser
Phe
Ser
Tyr
Gly
Arg
Ala
Leu
Gln
Ala
Ser
Ala
CTA CCTAGG CCC TGGA4A CTAAGCTTTTCATAC
GGCAE4 GCC CTC CAG GCC AGT GC4
Leu
Lys
Ala
Ala
Trp
Gly
Gly
Lys
Ala
Ala
Asn
Lys
Ala
Thr
Gln
Glu
Ala
Phe
Met
TI'G GCT GCTTGG GGC GGC MG GCT GCAMC !UGL'AG GCAACC CAG GM GCT TTC ATG
2
INTRON
G
Leu
CG MG GGT ATC TTG GCT WI G4T GAGTCT GTG Ghi%%f&WGTCATGCCAUCAA& CACACCGn;CTTGU\CCTTCC1IW\I\GGCAGTTAGACA4 Ala
Asn
TCTTCTCITAUXi%%dGCC hr
Gly
Ser
Ser
Gly
Ala
Cys
Gin
Ala
Ala
Gin
Gly
Gln
Tyr
Val
His
T
MC TGT CAG GCAGCC CMGFGA CAGTAT GTI C4C A Ala
Ser
Thr
Gln
Ser
Leu
Phe
Thr
Ala
Ser
Tyr
Thr
Tyr
CG GGC TCG TCA GGC Gcr GCT TCC ACG LAG TC.4 CTC TTC AC.4 GCC TCC TAC ACC TAC ly
Thr
Met
Gly
1CLTAG G!I ACCATG n Arg
Arg
Gln
Asn
Arg
Leu
Gln
Arg
lie
Lys
Val
Glu
Asn
Thr
GE4AAC CR CTA CAC AGG ATAMG GTG G44MC
Phe
Arg
Glu
Leu
Leu
Phe
Ser
Val
Asp
Asn
Ser
Glu
Ile
Ser
Gln
C CGAAGG CAG TTC CGACUG CTC CTC TITAGTGTG GACPATTCTATCAGC le
Gly
Gly
Val
lle
Leu
Phe
His
Glu
Thr
Leu
Thr
Gin
Lys
Glu
As
ACT WGAGA4
Asp
SW
Ser
I
CAG AGC A Gin
Gly
Lys
TC GGC C&A GTGATC CTT TTC CAT CAG AC( CTC TAG CAG PA6 GAT AGC CA6 CXi4 PAG Leu
Phe
Arg
Asn
lie
Leu
Lys
Glu
Lys
Gly
lie
Val
Val
Gly
Ile
Lys
CTGTTCAGA MC ATT CTC A4G GAG A4G CG4 ATT GTG GTG C&C ATCMG INTRON
TINT&-----
(W8kb)
--~------------W\ATTCCGTCTCACTCCTGCTTG69CCCTTG e"
TCAAC~GllC4TTG~~
Asp
Gln
Gly
Gly
Ala
kTGGAC CMGG4GGT
Pro
ly
Gln
Tyr
Arg
lie
Ser
Asp
Lys
Gin
Ser
Asp
&l'
[email protected]
--__--__--__-
Leu
Lys
Cys
Asp
Gly
Leu
Ser
Gl
CTT &AC GGC 'JC TCC GA Asp
Gly
A EL TGTGCT CAGTACPAG AAAG'X eu
Gly
GCCCCACAG4C
TGWAG!XX~CCCCCTCCCTT~m~& Ala
Ala
( 7.4 kb)
lllGTXG%TATTCilPA~CT~CT~
Cys
Leu
G&C CCGCTTGCAGGAACAMC
G $ ;;; $-- A; $ ~[,~:~“““”
" Arq
(GTGFr4TAcTc
3
Pro
Val
Asp
Phe
I33 GTC G4Cm
Ser
Ser
Leu
Ala
Gly
Lys
Trp
Arg
Ala
Val
L
GCGMG TGG CGTGCTGTG C Ile
Gln
Glu
Asn
Ala
As,,
Ala
TG AL% ATC TCG GAC CAG TGC CCT TCC AGC Cm GCT ATC CA4 EA4 MC GCC MC GCT Leu
Ala
Arg
Tyr
Ala
Ser
11~
Cys
Gln
Gln
INTRON
5
cr13 GcT cGc TAc Gee AGc ATC TGC GAG CAGJ(;TGCTCTCCCCCTCTCAA~~CACAGACCATTCCIG
11
GllGFlAClGmCACTGCTCTGCCTETW!
Figure 3. Pu’ucleotide sequence of the rat aldolase B gene. The sequence of all coding regions and parts of the flanking sequences are shown. The sequence is shown from left to right in the 5’ to 3’ direction. The presumed t,ranscription-initiation site and poly(A) addition site are indicated by vertical arrows. The C-C-A-A-T box. T-A-T-A box. putative ribosomal binding site. initiation codon (ATG). termination codon (TAG) and A-A-TA-A-,4 signal are all underlined with heavy lines. The broken lines and horizontal arrows in the 5’ flanking region indicate the A-rich sequences and direct repeats. respectively. that are discussed in the text. Lengths of intron sequences are shown in parentheses. Hyphens are omitted from the sequence in all Figures for clarity.
-810
-770
-790
-750
(a) GCAATCATTTLTTTT**A*T~GAATC;G--GAG~CIG--I~~”TGCCTG;~TGCCRAGCCTA,‘~~~~*~~~~~~* AM+ rtlrrbt+*+* ( b) TAAATAAA-TCTTTAAAAAAA~ACAAAAC~~~~~~~~~~~~~~~-~~~~~~~~~~-*G~~~~C~ -730
( a) (
b)
-710
-670
-690
TTATCCC;JCAAATAAATAAATGAATAG~T~CATCACAACA~AACAACAAGTAGGAATTCA~GAGTCAGCiiTGCTT * +* * +a +***sr --r,G~TGGGT~CGG~CCCCAGCTCCGAAAAAAGAACCAAAAAAAAAAAAAAAAACC
Figure 4. Comparison of the sequence in the 5’ flanking region of the aldolase B gene with the ID sequence. (a) The seyuenre from position -816 to -653. (b) The ID sequence in the second intron of the rat growth hormone gene (Sutcliffe el al., 1982; Barta et al.. 1981). The underlined region in (b) indicates the 82 base-pair ID consensus sequence. Homologous residues in the 2 sequences are indicated by asterisks. Horizontal arrows indicate direct repeats. Hyphens between nucleotides are omitted for clarity. but are used to indicate deletions made in either strand to achieve a bett,er fit, of homologous regions.
is dispersed throughout mammalian genomes (Jelinek & Schmid, 1982). The Alu families are thought’ to have been dispersed by duplicat#ion of unique DNA sequences at some target site on chromosomal DNA. One of these regions at about position -770 (Fig. 4) shows some homology with the “ID sequence”, which was first ident’ified in brain-specific complementary DNA and is expected to prescribe tissue-specific gene expression (Sutcliffe et d.. 1982). This TD sequencewas also found in the growth
hormone
gene (Sutcliffe
et al.. 1982; Barta
et
nl., 1981). These sequencesin the aldolase B gene might have an enhancer-like function (Moreau et al., 1981: Banerji et nl., 1981). Alternatively, the presence of Alu family-like or ID sequence-like sequences in the aldolase gene may indicate t’he occurrence of duplication or insertion of some DNA sequencenear these regions during evolution and, if so. the rearrangement of the gene st,ructure within the region immediately adjacent to the T-A-T-A box may influence or alter the system controlling gene expression. These possibilit’ies are interesting in relation to the acquirement of tissue-specific expression of isozyme genes from a common ancestral gene. These points require further examination. (d)
Organization
?f exon
and intron
structure
The aldolase B gene consists of nine exons and tight introns. The sequence of all the exons and parts of their flanking regions is shown in Figure 3. The aldolase U gene is about’ 14 x 103 base-pairs long from the transcription-initiation site to the poly(A) addition site. The lengths of the exons numbered in order in the transcriptional 5’ to 3’ direction are: exon I, 71; II, 122; III, 212; IV, 55; \‘. 161: VT, 84; VII. 175; VIII, 200; IX, 480 or 481 base-pairs. The size of exon IX cannot be determined until it is known whether the first A in the poly(A) tail is transcribed. The protein-coding region is split into eight exons (from exon TI to rson TX), which are all identical to the
corresponding regions in the complementary DNA sequence. The exon-intron boundary sequences in the gene are all referable to the 5’ G-T-A-G 3’ rule (Breathnach & Chambon, 1981). The lengths of t’he introns are as follows (in lo3 base-pairs): intron 1. 4.7; 2, 1.0; 3, 08; 4, 1.4; 5, 1.1; 6, 1.1; 7. 0.4: 8, 1.2. From the structural organization of the gene described above, the complete mRNA sequence can be constructed (Fig. 1). The total length from the cap site to the poly(A) addition site was deduced to be 1560 or 1561 nucleotides. The 5’ non-coding region from the cap site is 81 nucleotides long, and is located in exon I and II. The 3’ non-coding region of 387 or 388 nucleotides long, excluding the poly(A) tail, is entirely in exon 1X. The proteincoding sequenceis 1095 nucleotides long and is split, into eight exons, from the initiator ATG codon in rxon TI to the terminat’or TAG codon in exon IX. (e) Sequence
around
the poly(A)
addition
sitv
The last exon, which is the largest’ (480 or 481 base-pairs), contains all the 3’ non-coding region and part of the coding sequence for C-terminal amino acids. A consensussequencefor polyadeny-lation, A-A-T-A-A-A (Proudfoot & Brownlee, 1976), is found 21 base-pairs upstream from the poly(A) addition site. Two interesting feat’ures of the nucleot,ide sequence arc observed around t,hc poly(A) addition site. One is dyad symmetry making possible the formation of a stern-loop structure with t’he poly(A) addition site in the loop (Fig. 3). The other feature is complementary aeyuences that will hybridize to regions within the human small nuclear RNA U4 (U4 snRSA) (Fig. 5). Berget (1984) recently reported that 1‘4 snRNA rnay mediate the polyadenylation process in mRSA synthesis. This snRNA has regions c~otnplementary to A-A-T-A-A-A. and to a second consensuselement CA-Y-II-G (Benoist et nl.. 1980) surrounding the poly(A) addition site. Both these (onsensus elements are present in the aldolase II qew. and there are also two sequences t,hat may
Rat Aldolase
B Isozyme
Figure 5. Kucleot,ide sequence around the poly(A) addition site and possible mode of hybridization wit’h human 1’4 small nuclear RNA. Horizontal arrows facing each other indicate dyad symmetry, and arrows with broken lines indicate direct repeats. The poly(A) addition sit,e is indivatrd by vertical arrows. The heavy line indicates the A-A-T-A-A-A signal. Kucleotide residue numbers of t’he IT4 snRNA from the cap site are shown.
correspond to the latter element: T-A-C-T-G and (‘-AC’-T-G at 6 or i base-pairs and 14 or 15 basepairs. respectively, downstream from the poly(A) addition site. As shown in Figure 5, these regions could st’rictly hpbridize with the corresponding region in t.he U4 snRNA sequence. The transcription-termination site in the aldolase B gene is unknown. and of course, it is unlikely that the human a,nd rat) 114 snRNA sequences are suggested, these identical. However. as Berget feat,ures suggest’ that the two consensuselements. described aborr. in the initial transcript from the gene are recognized by 1-4 snRNA for cutIting and polyadenylation at, the specific site.
159
entire human aldolase B sequence deduced from complementary DNA and genomic clones. The amino acid sequence of rat aldolase B shows about 95”(, homology with t’he human counterpart. Detailed comparison showed about 89’?(, homology between the two protein-coding nucleotide sequences (Table 1). The 175 nucleotides in exon YII. which encode the active site lysine and it’s surrounding region, showed about 9oy, homology with the corresponding region in the human sequence. However. there was no apparent bias in the frequency of nucleotide replacement among the eight exons (II to IX) in the rat aldolase gene. The protein-coding sequences in these exons all have similar homologies (Go& to 93O,) to the corresponding region in the human sequence. although the coding region in exon T’TIT has the lowest homology- (859,). The 3’ non-coding region, which is located entirely in the last exon (IS), has no significant homology with the human cDh’A sequence. except for two conserved regions of about 50 nucleotides long (Besmond rt nl.. 1983; R)ottmann et ~1.. 1984). ln the protein-coding sequencesof rat aldolase B and rabbit aldolase =\ (Tolan et nl.. 1984). these relatively equal extents of nucaleotide replacements throughout t,he axons are less marked: the sequencesin exons TT. I-. VT and \‘TT show relatively higher conservation (74Yo to ifi”, homology) t’han those in other regions (60(?, to 6$C10).This tendency is more noticeable in the caorresponding amino acid sequences (Table 1). Exons V and VTT encode domains containing the site for the interaction with substrate and the active sit’e. respectively. the selective conservation
(f) Sequence comparison
of the exons with humarL aldokse I3 complementary DIVA4 and with rabbit aldolnse A complementary DXL4
Recently.
Gene
These features of functional
may reflect domains in
aldolases. and may also be related tn the difference in the enzymatic
Rot’tmann et rcl. (1984) reported the
properties
of the two
;&lolasrs.
such as the substrate specificities,
Table 1 Comparison of the wucleotide sequence of the protein-coding region in thf, rat a!ldolase B gene with other aldolase complementary DYA sequences Homology betwewl
aldolase
K pen? and
Rabbit
aldolase
(‘omplementar~ DN.1 (“,,)
A4 sequencrt (Amino wid) PO)
(‘ommcnts on pwtrin structure:
t The nucleotidr sequences of human aldolase B complementary DR’A and rabbit aldolase A complementary taken from Rottmann rt al. (1984) and Tolan et al. (1984), respertivelv. z I)ata from T,ai (197.5). Hertman it al. (1976) and Patthp et al. (1959).
1ISA are
160
h-. Tsutsumi
We are grateful to Drs T. 1). Sargent. R. B. Wallace and *J. Bonnrr for a generous gift of the rat #coRI gene library. and to Drs 1,. L. Jagodzinsky and J. Bonner for the HueIT gene library. We also thank Drs Y. Mishima. Y. Fujii-Kuriyama. M. Muramatsu and Y. Xabeshima foi valuable suggestions and preliminary R-loop analysis. Dt T. Tanaka for computer analysis and Mrs M. Seki foi Qping the manuscript. This work was in part, supported by Grants-in-aid from the Ministry of Educat)ion. Science and Culture of Japan. References Banerji. J.. Rusconi. S. & Schaffner, W. (1981). Cell, 27, 299-30s. Barta, A.. Richards, R. I., Baxter, J. D. & Shine, ,J. (1981). Proc. Nat. Acad. Sci., U.S.A. 78, 4867-4871. Benfield, P. A., Forcina, B. G., Gibbons, I. & Perham, B. L. (1979). Biochem. J. 183, 429444. Benoist, C. & Chambon. P. (1981). Nature (London), 290. 304-310. Benoist, C.. O’Hare, K., Breathnach. R. Br Chambon. P. (1980). Nucl. Acids Res. 8, 127-142. Benton, W. D. & Davis. R. W. (1977). Science. 196. 189 182. Berget, S. M. (1984). Nature (London), 309. 179-182. Berk, A. J. & Sharp, P. A. (1977). Cell, 12. 721-732. Besmond. C., Dreyfus, J-C., Gregori. C., Frain. M.. Zakin. M. M.. Trepat. J. S. S: Kahn. X. (1983). Biochem. Biophys. RPS. Commun. 117, 601-609. Breathnach, R. & Chambon, P. (1981). Anrw. Rr?~. Hiochem.
50.
349-383.
Ghan. \‘. L.. Gutell. R.. Noller, H. F. & Wool. 1. (:. (1984). ,I. Riol. (‘hem. 259, 224-230. Efstratiadis, A., Posakony, J. W., Maniatis, T.. Lawn. R. M.. O’Connel, C.. Spritz, R. A., DeRiel. *J. K.. Forget, B. G.. Weissman. S. M.? Slightom. J. L.. Blechl. A. E., Smithies, O., Baralle. F. E., Shoulders. C. C. & Proudfoot. X. ,J. (1980). Cell, 21, 6533668. Goldberg. >l. L. (1979). Ph.D. thesis. Stanford ITniversity, Palo Aho. (‘alifornia. Gracy. R. W., Lacko. A. G., Brox, L. W.. Adelman. R. (‘. Br Horerker. 1~. I,. (1970). Arch. Kiochem. Kiophys. 136, 480.--490. Gruss. P.. Dhar. R,. & Khoury. G. (1981). l’roc. X&. Acad. Sci.. 1’.8.A. 78. 943-947. Hertman, F. C. 8: Brown ,I. B. (1976). .J. Biol. (‘hem. 251. 3057-3062. Horrckrr, 1~. L.. Tsolas. 0. XI Lai, (1. Y. (1972). In The Enzymes (Bayer, P. D. ed.). vol. 7. p11. 213-358. Academic Press. h’ew York. Ikehara. T., Endo. H. & Okada. Y. (1970). .-lrc//. Rio&m. Hiophys. 136, 491-497. Jelinek. \\T. R. B Schmid. C. W’, (1982). A n?rtc. Kc/,. t?ioctwnr. 51. x1:3-844.
et al
Matsushima. (1968). 570.
T.. Kawabe, Rioche,m.
S.. Shibuya.
Hiophys.
RPS.
11. t-z Sugimura. (‘ornnc
nn
30.
7 565
Moreau. I’., Hen. R.. WasylJlk. B.. Everett. K.. Gaub. M. P. & Chambon. P. (1981). Xucl. ilcirls Hrs. 9. 6047 -6068. Mukai. T.. .Joh. K., Miyahara, H., Sakakibara, M.. Arai. Y. & Hori. K. (1984). Riochsern. Rinphys. lies. C’ommun. 119, 575-581. Numazaki. M., Tsutsumi. K.. Tsutsumi, R. & Ishikawa. K. (1984). Eur. gJ. Biochem. 142. 165- 170. Patthy, (‘.. Varadi. A.. Thesz. ,I. & Kovacs. K. (l!J’i!J). Eur. ,/. Hiochem. 99. 309-313. Penhoet. E. E.. Rajkumar. T. R. & Rutter. \I’. *I. (1966). Proc. ,V/xf. Acad. Sci., 1:.9..4. 56. 32751282. Penhoet. R. E.. Kockman, M.. Valentine, R. &. Rutter. IV:. .J. (1967). Biochemistry, 6, 2940-2949. Proudfoot. N. J. & Brownlee. (:. G. (1976). .Vaturu (London), 263, 21 l--214. Rottmann. W. H.. Tolan. 1). R. & Penhoet, E. E. (1984). Proc. .Vat. Aca,d. Rci.. Iv.S,A 81. 2738--2742. Schapira, F., Drryfus. ,J. (‘. Jt Schapira, G. (1963). LVatnrP (London), 200. 995-996. Schapira. F., Hatzfeld. A. & Webrr, :I. (1975). In lsozymes (Markert’. (‘. L.. ~1.). vol. 3. pp. 9X7 1003. Academic Press. Xew York. Simon, M-P.. Besmond. (‘.. (lottreau. I).. Wcber. A.. Chaumet,-Riffaud. I’.. Dreyfus, ,I .-(‘.. Trepat~. .I. S.. Marie. .J. & Kahn. .A. (1983). ,I. Riol. C’hrn~ 258. 14576~14584. Southern, E. (1975). J. Nol. Viol. 98, 503 -517. Sutcliffe, (Z.. Milner, R. J.. Bloom. F. E. & Lerner. I~. :I. (1982). 1’roc. Xat. =1crrd. Sci.. V.S.d 79, 494%4!446. Tolan, 1). It.. Amsden, A. I~.. Putnry, 8. I).. Irrdra. M. S. & Penhoet. E. E. (1984). J. Rlol. Chem. 259. 1127~ 1131. Tsutsumi. K. & Tshikawa. K. (1981). Hiochern. tiiophy,v. Res. f‘onInlu~n. 100. ‘W-412. Tsutsumi, K., Mukai. T., Hidaka. S.. Migahara. H.. Tsutsumi. R,., Tanaka. T., Hori, Ii. &, Ishika,wa. K. (1983). ,I. Biol. (‘hem. 258, 6537-6542. Tsutsumi. K.. Mukai. T.. Tsutsumi, R.. Mori. >I., Daimon, M.. Tanaka. T., Yatsuki. H.. Hori, K. & Tshikalva. K. (1984). J. Hiol. (‘hank. in t.hc press. Tsutsumi. R.. Txutsumi. K.. Sumazaki. M. & ishikawa. I<. (1984). Eur. .I. Biochwn 142. 16lGl6-C.
Edit& by P. Chccmbw