;ID   ATMU5       DNA   ; ATH   ; 4895 BP
;XX
;DE   ATMU5 is a molecular fossil of an autonomous DNA transposon.
;XX
;AC   AP002060
;XX
;DT   01-FEB-2001 (Rel. 6.1, Created)
;DT   01-FEB-2001 (Rel. 6.1, Last updated, Version 1)
;XX
;KW   autonomous DNA transposon; MUDR superfamily; TIR;
;KW   transposase; DNA-binding protein; ATMU5.
;XX
;OS   Arabidopsis thaliana
;XX
;OC   Arabidopsis thaliana
;OC   Eukaryota; Plantae; Embryobionta; Magnoliophyta; Magnoliopsida;
;OC   Dilleniidae; Capparales; Brassicaceae.
;XX
;RN   [1]  (bases 1 to 4895)
;RA   Kapitonov,V. and Jurka,J.
;RL   Direct submission (January 2001)
;XX
;CC   ATMU5 is a fossilized copy of autonomous DNA transposon from 
;CC   MUDR superfamily. 
;CC   It is flanked by a 10-bp target site duplication.
;CC   ATMU5 has ~310-bp terminal inverted repeats (94% identity) and 
;CC   encodes two proteins, ATMU5p1 and ATMU5p2. 
;CC   Conservation of the proteins and TIRs indicates that
;CC   the transposon was active recently. One copy of non-autonomous
;CC   derivative of ATMU5 is present in the genome (AC079674 25687-
;CC   25111). The non-autonomous element is 95% identical to ATMU5 
;CC   and is a result of a deletion of the ATMU5 internal portion.
;CC   ATMU5 is 74% identical with ATMU3 and ATMU4.
;CC   ATMU5p1 is a 684-aa transposase encoded by 4 exons (359-1486, 
;CC   1588-1839, 1926-2591, 2657-2665).
;CC   ATMU5p1:
;CC   MGGVISTIGSGESESVVDRGVEVAVELESVVGDGVEVPPVEAPVGIELVARGEDCVPSRH
;CC   RKRRKIRVGDEDEAEIDPDPDYEDDCAVHGDEDCDAYTDAVGGDDDATGGEGNDDEAVGG
;CC   GDDATGCGGNDDEALGYDDDATSDGGNDDDDGGVEEDANLNLAEEFPEFAGVEEDCSDDD
;CC   PPDDVWEEDKIPDPFSSDDEDESRTREVHGHRDAANEEVLLELKKTYNTPDDFKLAVLRY
;CC   SLKTRYDIKLFRSEARIVAAKCSYVDKDGVQCPWKVYCSYEKTMHKMQIRTYVNNHICVR
;CC   SGHSKMLKRSSIAYLFEERLRVNPKLTKYEMAAEILREYNLEVTPDQCAKAKTKVVKARN
;CC   ASHEAHFARVWDYQAEAQKDSWIKTCRPFIGLDGAFLKWDIKGHLLAAVGRDGDNKIVPI
;CC   AWAVVEIENDDNWDWFLRHLSASLGLCQMANIAILSDKQSGLVKAIHTILPHAEHRQCSK
;CC   HIMDNWKRDSHDLELQRLFWKIARSYTVEEFNNHMMELQQYNRQAYESLQLTSPVTWSRA
;CC   FFRIGTCCNDNLNNLSESFNRTIRQARRKPILDMLEDIRRQCMVRSAKRFLIAEKLQTRF
;CC   TKRAHEEIEKMIVGLRQCERYMARENLHEIYVNNVSYFVDMEQKTCDCRKCQMVGIPCIH
;CC   ATCVIIGKKEKVEDYVSDYYTKDE
;CC   ATMU5p2 is a putative 258-aa DNA-binding protein, it is 
;CC   encoded by 3 exons in the second strand (4513-4361, 4294-4211, 
;CC   3488-2949).
;CC   ATMU5p2:
;CC   MSLEISSVGEEGSGNRGFPSKCRCGRDVVIYTSSSKKNPGRPYFRCPTYQDDHLFKWVED
;CC   CVYEEVVDAIPRISIIDSEVAVLVMVHLLVFVLHDHHLMLFVLVLHNHDLQPFVLVLRDH
;CC   DLQPLVLVLRDHDLQHLVLVLRDHDLQHLVLVIQDHDLQHLVLVIQDHDLQHLVLVIQDH
;CC   DLQHLVLVIQDHDLQHLVLVIQDQPVILDEIVVALVIQDKIVIALVIQDKIEIQTLLVLK
;CC   LKKRVRNTCWIVKKESHK
;CC   ATMU5p2 includes a conservative motif (aa positions 26-63) 
;CC   CHCGLEVVIYTSASKSNPGRPFFRCPTKQDDHLFKWVE present at C-terminus
;CC   of DNA topoisomerase III in human, mouse and drosophila, and
;CC   at C-terminus of polyproteins encoded by ORF3 in banana streak 
;CC   and sugarcane bacilliform retroid viruses.
;XX
;DR   Positions 28694   33589   Accession No AP002060   GenBank (rel. 119.0)
;XX
;SQ   Sequence 4895 BP; 1539 A; 882 C; 1118 G; 1356 T; 0 other;
ATMU5
ggaaaaaatgttaattaatacaccaattttcaaaaagtggtcatttaaaccataaactccatatatggcc
aaataaaacttagtaaaagcgttgaccggccattttatacatctcaaaacgttgaccaagccaaaaaccc
tgcgacgttagcagtcgttaacagagtcgataacaggcgttagcaggcgttaattaatccgtttaaacac
aaaacgacgtcgttttgtgtttaatcgaaaccaacaaaattcccaattcgattctcaatccctaaaatcc
ccaattccaattctcatgccctaaaattcccaaattcgaaaaatcttcaaatcgtcttcccaaatgacga
aaattaggatgggaggcgtgatatctacgattggatcaggggaatcagagagtgttgtggatcgtggagt
tgaagtggcagtcgaattagagagtgtcgtcggagacggagttgaagttccaccggttgaagctccagtc
ggaatagagcttgttgccagaggagaagattgtgttccaagccggcatcgaaaacgaagaaagataagag
tgggggacgaagacgaggctgaaattgatcccgaccccgactacgaagatgattgcgcagttcatggtga
tgaagactgtgatgcttatactgatgcagtaggtggtgacgatgatgcaaccggcggtgaaggtaatgat
gatgaagcagtaggtggtggtgatgatgcaaccggttgtggaggtaacgatgatgaagctttaggttatg
atgatgatgcaacaagcgatggaggtaacgatgatgatgacggaggtgttgaagaagatgctaacctcaa
ccttgcagaagaattcccagagtttgcgggagtagaagaagattgtagtgatgatgatccaccggatgat
gtatgggaggaagataaaattcccgatccgttttcatctgatgacgaagatgagagcagaaccagagaag
tacatggtcatagagatgctgcgaatgaagaggttttgctagaattgaagaaaacttacaacactcccga
tgacttcaagcttgcagtcttgaggtactcattgaagacaaggtacgacattaaactgtttagatcagaa
gctaggattgttgctgcaaagtgtagctatgttgataaggatggtgttcaatgtccgtggaaagtgtatt
gttcatatgagaagactatgcataagatgcaaataagaacttatgtgaataaccatatatgtgtgaggtc
tgggcattcgaagatgttgaagcggtcttccattgcttatttgtttgaggaaaggttgcgagtgaatcca
aagcttacaaaatatgagatggctgctgagatattgagggaatacaacttggaagtgactccggaccaat
gtgctaaagcaaagacgaaagtggtgaaagcaagaaatgctagccatgaagcccattttgcaagagtctg
ggattaccaagcagaggtaataaagcagaatccatggactgagtttgagatagagacaactgcaggggct
gtcattggagccaaacagaggttttttggttatacatttgttttaaggctcaaaaggattcatggataaa
aacgtgtagacctttcataggacttgatggagcttttctgaaatgggatattaaaggccatcttctagct
gcagttggaagagatggagataacaagattgttcctattgcttgggctgtcgttgagatagaaaatgatg
acaactgggactggttcttaagacatctatctgcaagtttggggctttgtcaaatggctaatatagctat
cctctctgataaacaatctgttagtttttttaactttgtgtctttgttaaattggtttgttgttttgttt
actgatttttggtcatgttttatgtcatattgcaggggctagtcaaagctatccatacaatacttccaca
tgctgagcatcgacaatgttcaaaacacatcatggataattggaaaagggacagccatgatttagagctc
cagcgtctgttttggaagatagcacgaagctacacggttgaggagtttaataatcatatgatggagctac
agcagtacaatcgtcaagcttatgaatctctgcagcttactagtccagtgacatggtcaagagcattttt
cagaataggtacatgttgcaatgacaacctcaataatttgagtgagtcttttaataggactattaggcag
gcaaggagaaagccaatattggatatgctggaggatatccggaggcaatgcatggtcagaagtgcaaaga
gattcctaattgctgaaaaattgcagacaaggttcacaaagagagctcatgaagagattgaaaaaatgat
tgttgggcttagacagtgcgagagatacatggccagagaaaacttgcatgagatatatgtgaataatgtt
agctattttgttgatatggagcagaaaacttgtgattgcaggaagtgtcagatggttgggatcccttgta
ttcatgcgacttgtgtgataatcggaaaaaaagaaaaggtggaagactacgtgagtgactattacacaaa
ggtaaggtggcgagaaacttacttgaaaggtataaggcctgtccaagggatgcctttatggtgtaggacg
aataggttgcctgtattaccaccaccatggagaagaggcaatgctgggaggccaagtaactatgctagga
ggaaaggtagaaatgaagctgcagctccttcaaatccaaacaagatgtctcgggaaaagagaatcatgac
atgctctaactgccatcaagaagggcacaacaagcaaaggtgcaacattccaactgtcttgaatccaaca
aagagaccaagaggtcgtcctaggaaaaatcaggtttgtgactctcatttatgactctcctttttaacaa
tccaacattcatttatgactctcctttttaacaatccaacatgtgtttctgactctcttttttaatttaa
gaaccagaagtgtttggatctcaatcttatcttggatcacaagagcaatcacaatcttatcttggatcac
aagagcaaccacaatctcatcaaggatcacaggctgatcttgaatcacaaggaccaagtgctgcaggtcg
tggtcttggatcacaaggaccaagtgctgcaggtcgtggtcttggatcacaaggaccaagtgctgcaggt
cgtggtcttggatcacaaggaccaagtgctgcaggtcgtggtcttggatcacaagaaccaagtgctgcag
gtcgtggtcgcggagcacaaggaccaagtgctgcaggtcgtggtcgcgtagcacaaggaccaagggctgc
aggtcgtggtcgcggagcacaaggacaaagggctgcaggtcgtggttgtggagcacaaggacaaacagca
tcaggtggtggtcgtggagcacaaagacaagcaggtggaccatcacaaggaccgcaaccttatagaagac
atggttcagcacaagcagaagcagaaccacaaggacttcgtcgatttgaatcatggtttgagtgttcaga
caatcttagtcacaagtttctttttgttattgtctgcagcttagtcataagtttctttttggtggttgaa
cttgtctttatgacttagcttagtatcgatttcattttgttgttttgacttgtcttggctttgtgacatt
agtgttataatgtatgaatcgtttttaatgtatgaactttgtttaggattattaatcattgtgaaacaaa
acaagaattgcattaccaatcatggtttcattacaaacttagataaaagccatctaaccaaacacaaaca
ttagtaaccaagaacaaaacaagcattgcattaccatagtaccatttcattacaaacttagataaaagcc
atctaaccaaacacaaacactagtaaccaaacacacatcactagtaaacacaaaccatctaaccaaacac
aaacgctagtaacctctcttcaatttcctcatttttggtcttgtagaacattatgatcacaattgcaatg
aaaattatacaaaggcacaccaagcacactttggtcatcgatttccatctcttcaattctcttttgctcc
acatttcatcttctttcaattcttggatcaaaccctttaattcagcaatctcgagcaacctccgatttgg
cattgtttacctcgctgtcaatgatagagattctcggtattgcatctacaacctcttcatacacacaatc
ttcaacccatttaaacaaatggtcctgtaattccatcgacaattatttacttatgaccaaaaacaataaa
acaaaatcaaatctacttacatcttgataggttggacatcgaaagtaaggtcttcccggattcttcttcg
agcttgatgtatagatgacgacatctctaccacaacgacactttgaaggaaacccacggttcccacttcc
ttcttccccaacactcgagatttctaaactcatatctgcagaaaatggctatgaaattggagttttcgag
ttcaaacgaacaaaaatggagaggcttttgaagattttcgaatttgggaattttagggcatgagaattga
aattggggattttagggattgagaatcgaattgggaattttgttggtttcgattaaacacaaaacgacgt
cgttttgtgtttaaacggattaattaacgcctgttatcgactctgttaacgactgctaacgtcgcagggt
ttttgggcggatcaacgtttttgaatgtataaaatggccggtcaacgcttttactaagttttatttgacc
atatatggaattcatggtttaaatggccactttttgaaagttggtgtattaaacaacatttttcc1