;ID   ATMU3       DNA   ; ATH   ; 4853 BP
;XX
;DE   ATMU3, an autonomous DNA transposon - a consensus sequence.
;XX
;AC   .
;XX
;DT   18-SEP-2000 (Rel. 5.9, Created)
;DT   18-SEP-2000 (Rel. 5.9, Last updated, Version 1)
;XX
;KW   autonomous DNA transposon; MUDR superfamily; TIR;
;KW   transposase; DNA-binding protein; ATMU3.
;XX
;OS   consensus
;XX
;OC   Arabidopsis thaliana
;OC   Eukaryota; Plantae; Embryobionta; Magnoliophyta; Magnoliopsida;
;OC   Dilleniidae; Capparales; Brassicaceae.
;XX
;RN   [1]  (bases 1 to 4853)
;RA   Kapitonov,V. and Jurka,J.
;RL   Direct submission (September 2000)
;XX
;CC   ATMU3 is an autonomous DNA transposon from MUDR superfamily.
;CC   Its copies are flanked by 10-bp target site duplications.
;CC   There are 5-10 copies of ATMU3 in the genome, they are ~99%
;CC   identical to the consensus sequence. The consensus sequence
;CC   was reconstructed based on 4 copies; therefore, it may 
;CC   represent well enough features of the active ATMU3 transposon.     
;CC   ATMU3 has ~280-bp terminal inverted repeats and encodes
;CC   two proteins, ATMU3p1 and ATMU3p2. 
;CC   ATMU3p1 is a 949-aa transposase encoded by 5 exons (390-2198, 
;CC   2293-2471, 2511-2696, 2912-3269 and 3338-3655): 
;CC   MRIGSPMLHESENLVGDGVEAPPRGERDEIGEEVGGETNVVSESCGGEDGAQDAAKATPE
;CC   GEADRIGEEARGQANVVSNVDVADVAEVRSPLRRSKRRQIRLEEEAHDAVEPMPEGEANQ
;CC   IGEEARGQSVDVADATAVCSPLRRSKRRQNRLEEEADVAVEESTRREIGQEEEEALGETS
;CC   CGGEEEAHDEAQESGVAVADAVEVRAPLRRSKRRRIRDEEEEDLEAEVPALDEDDDCAVQ
;CC   GDEDCDEVDDAVDTNREEDDTAGLGVEEDGNLDMERDFPEANGEEEASDNDSGDDIWDED
;CC   KIPDPLSSDDEDDDRVEAARNDLGDPEILLALEKTYNSPEDFKLALLMSSLKTRYDIKLY
;CC   NSEAMVVAAKCVYVSDEGVECPWRVRCSYEKRKHKMQIRTYYNEHTCVRSGHSKMLKVSS
;CC   IGFLFEERLRVNPKLTKHEMVAEILREYKLEVTPDQCAKAKTKVLRARRASHDSHFARIW
;CC   DYQAEVLLRNPGTEFNIETVAGAVIGSKQRFYRLYICFQAQRESWKQTCRPVIGIDGAFL
;CC   KWDIKGHLLAAVGRDGDNRIVRIAWSVVEIENDDNWDWFLRQLSTSLGLCEMTDLAIISD
;CC   KQSGLVKAIHTILPQAEHRQCSKHIMDNWKRDNHDIELQRLFWKIARSYTVEEFNNYQAD
;CC   LKSPITWSRAFFRTGTCCNDNLNNLSESFNRTIRQARRKPLLDLLEDIRRQCMVRTAKRF
;CC   IIADREKKKVESYVNDYYTRNRWRETYFRGIRPVQGMPLWGRLNRLPVLPPPWRRGNAGR
;CC   PSNYARRKGRNEVASSSNPNKMSREKRIMTCSNCLQEGHNKKSCKNATVLSPPKRPRGRP
;CC   RINEEPQGYVEGSDGHDNGSQGQGNVLQGQENVSQGQNNGSQGQNNGSQTQSQRGRGRGT
;CC   QRQRGTTRGAQRQRGRGRGTSQVSEQPQGEAQPQGLAGLAPWFECSRGT
;CC   ATMU3p2 is an 154-aa putative DNA-binding protein encoded by 
;CC   the second strand (exons 4530-4333, 4279-4046 and 4028-3996):   
;CC   MSCNSRNSSGESGGCNSGMISNAEAGGFKSRGFPVKCKCGLEVVMFTSSTAKNPGRPFFR
;CC   CKSCEDLEMELQDHLFKWVEECMYEEVVDALPKISSIDNEIINAKAEVAVEIANLKELMI
;CC   ELKEDGMWSKREIQRWKKMTKVCLCDCNCNINVL
;CC   ATMU3p2 includes a conservative motif (aa positions 37-80) 
;CC   CKCGLEVVMFTSSTAKNPGRPFFRCKSCEDLEMELQDHLFKWVE present at C-terminus
;CC   of DNA topoisomerase III in human, mouse and drosophila, and
;CC   at C-terminus of polyproteins encoded by ORF3 in banana streak 
;CC   and sugarcane bacilliform retroid viruses.
;CC   There are ~100 highly diverged copies of ATMU3p2 encoded by 
;CC   different families of ATMU-like DNA transposons and inserted
;CC   in the genome.
;XX
;DR   [1] (Consensus)
;XX
;SQ   Sequence 4853 BP; 1557 A; 877 C; 1108 G; 1311 T; 0 other;
ATMU3
gggaaaaatgttatttaatacctcaacttacaaaaaatggccaaattaaccgtgaactcgtgaaatggcc
gttttaactctcaacaaaaagttgacttctgttttaactttcaagtttgcgttgactcggcctaattaac
caccgttaaaaatccttctaacagcgtaattgacagccgttttagtccgttaagcatctgttactatagt
cttacgacgtcgttttcgtgctaaagagaaatcaaaatcgagaatagaaattctcaaaacaaaatcaatt
accctaaacccaaatcgaaacctaatcctgcccccaaaatcaaaatcgaaaccctaattgcttcaattcg
ttttctgaaatgccattagttagaaagaatggcgtcgttatgaggattggatctccaatgcttcacgaat
cagagaatttggtgggagatggagttgaagcaccaccgcgaggagaaagagatgaaattggagaagaagt
tgggggtgaaacgaatgtggtatctgagtcttgtggcggagaagacggagcacaagatgcagctaaagca
acgccggaaggagaagcagatcgaattggagaagaagctcgaggtcaagcgaatgtagtatcaaatgttg
atgtggcagatgtggccgaagttcgttctccacttaggcgaagtaaacgaaggcaaatcagattggaaga
agaagctcacgatgcagttgaaccaatgccggaaggagaagcaaatcaaattggagaagaagctcgaggt
cagtccgttgatgtggcagatgcaaccgcagtttgttctccacttaggcgaagtaaacgaaggcaaaata
gattggaagaggaagctgatgttgcagttgaagaatcaactcgtcgtgaaattggacaagaggaagagga
agctttaggtgaaacgtcttgtggcggagaagaggaagctcacgatgaagctcaagaatcaggcgttgct
gtggcagatgcagtcgaagttcgtgctccacttagacgaagtaaacgaaggagaatcagagatgaagaag
aggaggatttagaggctgaagttcctgctcttgatgaagacgatgactgtgcagtccaaggagatgaaga
ctgcgatgaggttgatgatgcagtagatactaatagagaagaagatgatactgctggattaggtgtcgaa
gaagatggtaacttagacatggaaagagattttccagaggctaatggagaagaagaagctagtgacaatg
acagcggagatgatatatgggatgaagacaagattccagatcctttgtcctctgacgatgaagatgatga
tagagtagaggcagctcgaaatgatcttggtgatcctgagattttactagcattggagaagacttataac
tctcctgaagatttcaagcttgctcttttgatgtcttccctaaagacaaggtatgacattaaactttata
attctgaagctatggttgttgctgctaagtgtgtgtatgttagtgatgagggtgttgaatgtccgtggag
agtccgttgctcttatgagaagagaaaacataagatgcaaatacgaacttattacaatgagcatacttgt
gtgaggtcaggacattcgaagatgttaaaggtgtcatctattgggtttttgtttgaagaaaggttgagag
tgaatccaaaactcactaaacatgagatggttgctgagatcttaagagaatacaagttggaagtgactcc
agaccaatgtgctaaggcaaagacaaaagttttgagagctagacgtgctagtcatgattctcattttgct
aggatatgggattatcaagcagaggtgttattgcggaatccggggacagagttcaacatagagacagttg
caggagcagtgattggaagcaagcagagattttaccggttatatatttgttttcaagctcaaagggagtc
atggaaacaaacttgcagacccgtaatagggatagatggagcttttctgaaatgggacataaaaggacat
ctattagccgcagttggaagagatggtgacaatcggattgtccgtattgcttggtctgtagtcgagatag
aaaatgatgacaattgggactggttcttgagacagctctctacaagcttggggctatgcgaaatgactga
tctggcaatcatttcagataaacaatctgttagtctctattctataagattcccttcatatatctactgt
aatttgagatagacaatcatactaaacttgtgttttttttgttgttttgcagggtttagtcaaggctatc
cataccattcttccgcaagccgagcatcgacaatgttcaaaacacatcatggataattggaaaagggaca
accacgacattgagctacaacgtctattttggaagatagcacgcagctacaccgtagaagagttcaataa
ttaccaggcagacttaaaaaggtacaatatccaagcctacacgtctctccaacttactagtccgattaca
tggtctagagcattctttagaaccggtacatgttgcaacgacaatctcaacaatctgagtgagtcattca
atagaaccattagacaagctaggcggaaaccactgttagatcttctagaggatattaggaggcaatgcat
ggttaggacagccaaaaggtttatcattgctgacaggtgcaaaacaaagtacacaccaagagctcatgct
gagattgagaagatgattgctggggtccagaatacacagagatacatgtccagggataatttgcatgaaa
tctatgtcaatggagttggctactttgttgatatggacttaaagacatgcggctgcaggaaatggcaaat
ggttgggatcccatgtgttcatgcaacatgtgtgataatagggaaaaaaagaaggttgagagctatgtga
acgactactacacaagaaataggtggcgagaaacatatttccgtggtattaggcctgtccaagggatgcc
tttgtggggtcgattgaataggctgcctgtcttgccaccaccatggagaagaggcaatgccggaaggcca
agcaattatgcaagaaggaaaggaagaaatgaagttgcctcttcctcaaatccgaacaaaatgtcaaggg
aaaagaggatcatgacatgctctaactgcttgcaagaagggcacaacaagaaatcatgcaaaaatgctac
tgttttaagtccaccaaagagaccaagaggtcgaccaaggataaatgaggtttgtatatctttcatttct
attttcaaaaattctgtttcaaaaactgattgtaatgtttgtattaggaaccacaagggtatgtagaagg
atcagatggacatgataatggctcacaagggcagggtaatgtgttacaagggcaggaaaatgtgtcacaa
gggcagaacaatggctcacaagggcagaacaatgggtcacaaacacaaagccaaagaggaagaggtcgtg
gaacacaaagacaaaggggaacaactcgtggagcacaaagacagaggggaagaggtcgtggaacatcaca
agtgtctgaacaaccacaaggagaagcacaaccgcaaggacttgctggacttgcaccatggtttgaatgt
tctcgtggaacatgatatgctagtctcatgtttgtttttgttgtttgaacttgtctttatgacatatgtt
tattctcggtttgtttttgttgtttgaacttgtctttatgacataatctaagtcttggttacttattgtt
gtttgcacttgtctctctatatgattagcttagtctcagttttaagaagttgaccttttctttcaaatga
aattcattaccattaccaaagctacatgtcattacatagtaaaagcataaccaaacacaaactaaacaac
aaagacatcacattcaaactaagctcccattcgattacacactaataaccaagaacaaatttctggtttt
ttttcttatagaacattgatattacaattgcaatcacactaaaacaaaggcacaccaagcacactttcgt
catctttttccatctctgaatttctcttttgctccacattccatcttctttcaactctatcattagctcc
tttaagtttgcaatttcaacagcaacctcagcttttgcattgattatctcgttatcaatgctggagattt
tgggtagagcatctacaacctcttcatacatacactcctccacccatttgaacaaatgatcctgcaattc
catctcaagcttttaccgaccaaacacaaaaaacatcacattctcatatcaaatctacttacatcttcac
agcttttgcaccgaaagaaaggccttccagggttcttagccgtgctcgatgtaaacatgacgacttcgag
cccacatttacacttaacaggaaacccacggctcttgaaacctccagcttcagcattcgaaatcattcct
gaattgcagcctccactttcaccactcgaattccttgagttgcagctcataatgcagatttgatcgaaaa
atttgagagagaatgatttttagggtttgggttttgatttggggatttttcagaggtttcgtcgactttg
aagtctgttgatagcacaaaacgtcgtcgtatcaccttaacggatgttctgtaacggctgtcaattacgc
tgttagaaggatttttaacggtggttaattaggccgagtcaacgcaaacttaaaggttaaaacagaagtc
aactttttgttgagagttaaaacggccatttcacgagttcacggttaatttggccattttttgtaagttg
aggtattaaataacatttttccc1