;ID ATHILA0_I DNA ; ATH ; 10046 BP ;XX ;DE ATHILA0_I is an internal portion of the ATHILA0 endogenous ;DE retrovirus - a consensus sequence. ;XX ;AC . ;XX ;DT 16-JAN-2001 (Rel. 6.0, Created) ;DT 16-JAN-2001 (Rel. 6.0, Last updated, Version 1) ;XX ;KW Gypsy-like endogenous retrovirus; ATHILA0p1; ATHILA0p2; ;KW gag, pol, ATHILA3_LTR; ATHILA0_I. ;XX ;OS consensus ;XX ;OC Arabidopsis thaliana ;OC Eukaryotae; mitochondrial eukaryotes; Viridiplantae; ;OC Charophyta/Embryophyta group; Embryophyta; Magnoliophyta; ;OC Magnoliopsida; Capparales; Brassicaceae; Arabidopsis. ;XX ;RN [1] (bases 1 to 10046) ;RA Kapitonov,V.V. and Jurka,J. ;RL Direct submission (January 2001) ;XX ;CC ATHILA0_I is an internal portion of the ATHILA0 endogenous ;CC retrovirus. There are several copies of ATHILA0 in the genome; ;CC they are ~95% identical to the consensus sequence. Long terminal ;CC repeats from ATHILA0 are 94% identical with ATHILA3_LTR. ;CC ATHILA0 has generated 5-bp target site duplications. ;CC ATHILA0 encodes the gypsy like pol polyprotein, which ;CC includes protease, reverse transcriptase and RNAse H and ;CC integrase. ;CC ATHILA0_I encodes two proteins, ATHILA0p1 and ATHILA0p2. ;CC ATHILA0p1 (1894 aa, position 145-5829): ;CC MQTRSKGSAHLLPFRDRIDRIARELQETKAKAACDQQRPAAMDQQNRPVDVQDPPNVDQP ;CC RNIGAGDAPRNHHQRQGIVPPPVQNNNFEIKSGLISMIQGNKFHGLPMEDPLDHLDSFDR ;CC LCGLTKINGVTEDMFKLRLFPFSLGDKAHHWEKTLPPDSITSWDDCKKAFLAKFFSNART ;CC ARLRNEISGFTQKNNETFCEAWERFKSYTTQCPHHGFKKASLLSTLYRGALPKIRMLLDT ;CC ASNGNFLNKDVAEGWELVENLAQSDGCYNEDYDRSMRGTGGSEDKQSKDIKALNEKLDKL ;CC LLAQQKQIHYITDEEHFQMQEGGNDQTEELCYIQNQGGFNKGYNNYKPNPNLSYRSTNVA ;CC NPQDQVYPPQQNQSKPFVQYNQGYVPKQQFNGGYQQQNPPPGFTQQPQQAPAAQDPDTKR ;CC MLQQIIQGQTTGALVLEKRLAEINSKVDCSYNELRSKYEDLTSKMTYMERQAVSNTSSTY ;CC TGPHPGKAIQNSKEYAHAVTLRSGRKLINNQPTEKITEDSEVQEGEDQHQNEVQTDEPIK ;CC LDQPSDSLDPLLDRAKPTFEERKAAAAEKNKEFAPPPFKPTMPFPRRFKKELIEKYKTLF ;CC DKQLKEIELRMPLMDAFMLIPHSHKYLKDLIMERTKEVQGMVVIGHECSAIIQKNIIPRK ;CC LGDPGSFTLPCSVGPLSFSKCLCDLGASISLMPLSVARRLGFSKYKPCGIQLILADRSVR ;CC IPHGVLEDLPVKVGSIDIPTDFVILEMDEEPKDPLILGRPFLATAGAIIDVKKGKIDLNL ;CC GKEMKMTFDINKAMKKPTINGQVFWIEEMDRLADELLEELTEGDHLASALTNDGEEGYLH ;CC LETQGYKEYLDAHIPMEGPEEFEELIVPSEEAVSGCTMSLIAEKTNSTEMLDHGGENISS ;CC DDWSELKAPKVDLKPLPKGLRYAFLGPNDTYPVIINDGLSDEQVNQLLNELRKYRRAIGY ;CC SLADIKGISPSLCTHRIHLENESYTSIEPQRRLNPNLKDVVKKEILKLLDADIIYPISDS ;CC TWVSPVHCVPKKGGMTVVKNDLDELIPTRTITGHRMCIDYRKLNAATRKDHFPLPFIDQM ;CC LERLANHVYYCFLDGYSGFFQIAIHPNDQEKTTFTCPYGTFAYKRMSFGLCNAPGTFQRS ;CC MTSIFSDFIEEIMEVFMDDFSVYGSSFSSCLLNLCRVLERCEETNLVLNWEKCHFMVQEG ;CC IVLGHKISGKGIEVDKAKIDVMIQLQPPKTVKDIRSFLGHAGFYRRFIKDFSKIARPLTR ;CC LLCKETEFNFDEDCLKAFHLIKEALVSAPIVQAPNWDHPFEIMCDASDYAVGAVLGQKID ;CC GKLHVIYYASRTMDEAQTRYATTEKELLAVVFAFEKFRSYLVGSKVKVYTDHAALRHIYA ;CC KKETKPRLLRWILLLQEFDMEIIDKKGVENGVADHLSRMRIEDSVPIDDTMPEEQLMFYD ;CC LVNKSFDTKDMLEEAYAVEEEKLPWYADLVNYLVSGMIPQSLDAYKKKKFFRDIHHYYWD ;CC EPYLYKKGSDGLFRRCVSEGEVQGILGHCHGSTYGGHFATFKTAQKILQAGLWWPTMFKD ;CC TQEFIAKCDPCQRVGNISRRNEMPQNPILEVEVFDVWGIDFMGPFNPPSYGNAYILVAVD ;CC YVSKWVEAIAAPTNDHKVVLKMFKSIIFPRYGIPKVVISDGGSHFINKVFEGLLRKHGVK ;CC HKVATPYHPQTSGQVEVSNRQIKAILTKVVGVSRRDWSAKLDETLWAYRTAYKTPIGRTP ;CC FQMLYGKSCHLPVEVEYKAIWATKLLNLEIKGAQEKRAVDLHELEEIRLEAYESSKIYKE ;CC RTKAFHDKKISPKDFNIGDQVLLFNSRLKLFPGKLKSRWSGPFTIKEVLPYGAITLTKEG ;CC SSEFTVNGQRVKRYMADCPIPEGTTVDLQEPINA ;CC ATHILA0p1 includes putative gag, protease, ;CC reverse transcriptase and integrase. ;CC ATHILA0p2 (837 aa, position 6481-8394): ;CC MSSNSNESSMDADFNVDEAESWSTRPQREMEEYRRFSQHAAKVLARDRRRAEIARGKRAM ;CC AEERSLVDEDLGGDEDYVPEITPRATKSLMKKTKLSPDGYYELLAAHEFHGTRYPHSETM ;CC NELGITEDVEYLFEKSGLLGLMTNPHSAYKTEALQFLASLEVELFQGLSSHEAREEGLGY ;CC ITFAVYGKDYVLAIKTLEDMFGFPRGTDVKPKFKKEELSDLWVTIGDDAQFSSSRAKSNA ;CC IRNPCIRYFQKAMANVLYAREKTGPINNGEIELLDIALKDILVYTKNKVPMKGDTNDASP ;CC SMRLLNHLCGFRKWALANKHKRTISIGGVITPILMACGVPLQSTPFAPRWIDIPHLRWAL ;CC FIEHQSHEGMHILKFQHRTEMDARLLLPNQELTTITVRGNIDFNPPNEELFFMERAPPTR ;CC EAPDNEERVESEEGAENEEGEEMDWENYNASRFHFEEHKPPPRVSKSLTVAHKNIGSMSA ;CC WNKFQDKMLEKCGKAIAAIQAVLSCSSSGATMVRENRPEEVVSRRHRVSPSRQSAYEQRE ;CC VSRPQAPARHSSHEHREQKRRRKTRIVRPRSKDLLMSSRRSLDQDTRRDIEQSVEQDPWV ;CC QNEQSVEQNQWDQDGQYSGMNWDAYHGINEQEPTSDN ;CC Presumably, ATHILA0-like retroelements have been acting as ;CC autonomous retroelements. ATHILA3 share 94% identical ;CC LTRs with ATHILA0. Putatively, ATHILA3 is a non-autonomous ;CC derivate of ATHILA0-like element. Their internal sequences, ;CC ATHILA0_I and ATHILA3_I, are 91% and 95% identical over 1700-bp ;CC 3'- and 2600-bp 5'-flanks. The remaining portion of ATHILA3_I ;CC encodes "non-retroviral" protein sequences related to these ;CC present in the previously reported ATHILAs. ;CC A close relative of ATHILA0 is present in Vicia faba (GenBank ;CC GI: 2522228; 69% protein identity over 400 aa). ;XX ;DR [1] (Consensus) ;XX ;SQ Sequence 10046 BP; 3173 A; 2053 C; 2203 G; 2617 T; 0 other; ATHILA0_I atttggcgccgttgccaattcattgttgcattgttacattcaagatatcagaaacttttaagatcaagtt cttttacatttatcaagttactaactcatattttcgtctgcttgtttttgtggtataggtactaatcttt gtgcatgcaaactcgatccaaaggttctgcacacctactaccattcagagacagaattgacagaatagct cgtgagttacaagaaaccaaagcaaaggcagcctgtgatcagcaaagaccagctgctatggatcaacaga acagaccagttgatgttcaagacccacctaatgttgatcaaccaagaaacattggtgctggtgatgcccc aaggaatcatcaccaaagacaagggatagtgcctccaccagttcagaacaacaactttgagatcaagagt ggtctcatctcaatgatccaaggaaacaagtttcatggtctacctatggaagaccccctggaccatcttg acagctttgataggctctgtggccttaccaagatcaatggtgtcactgaagatatgtttaagctcagact atttcccttctctttgggagacaaggcacaccactgggagaagactctgcccccagactccatcacctca tgggatgattgtaagaaagcttttcttgccaagttcttctctaatgctcgcaccgctagattgaggaacg agatctcaggcttcacccagaaaaacaatgaaactttctgtgaagcttgggagaggtttaaaagctacac cactcagtgcccccatcacggtttcaagaaggcttcattattgagcacactctacagaggagctttacca aagatcagaatgctactcgacactgcctccaatggaaacttcctgaacaaggatgtagcagaaggatggg agttggtcgaaaatctagcacaatctgatgggtgctacaatgaagactatgatcgctcaatgagaggaac tggaggaagtgaggacaaacagagcaaggatatcaaggctctgaatgaaaagttagacaagctgttgctg gctcagcagaagcagatacactacatcactgatgaagagcacttccaaatgcaagaaggggggaatgatc aaactgaagagctgtgctacatccagaaccaaggagggttcaacaagggctacaacaactacaagcccaa cccaaacctctcctacagaagcactaatgtagctaacccccaagatcaagtgtatccaccacaacagaat cagtctaagccatttgttcagtacaaccaaggttatgtccctaaacaacagtttaatggaggataccaac agcagaatccaccaccagggttcactcaacaaccacaacaagccccagcagctcaagatcctgacacaaa acgaatgcttcaacaaatcatccagggtcaaactactggagctctagttctggaaaaacgattggctgag attaatagtaaagttgactgttcctacaatgagttaagaagcaaatatgaggatctcacatctaaaatga catacatggaacgtcaagctgtttccaatacctcctcaacgtatacagggccacatccaggaaaagccat tcaaaattccaaggaatatgcacacgcagttacactccgtagtggaaggaaattgatcaacaaccagcca acagaaaagatcactgaggacagtgaagttcaagaaggggaggaccagcatcaaaacgaagttcaaactg atgaaccaattaagcttgatcagccttcagactcactcgaccctctactcgatcgagcaaagccaacttt tgaggaaagaaaagctgcggctgcagaaaaaaataaagaatttgctccgccacccttcaaaccgactatg cctttcccaagaagattcaagaaggaattaatagaaaaatacaaaaccctttttgataagcagctaaagg agattgagctaagaatgccgttaatggatgctttcatgctcattccacactcccacaagtacctcaaaga tctgattatggaaagaaccaaggaagtgcagggaatggtggtaataggccatgaatgcagcgctatcatc cagaaaaatataataccaagaaagttgggagatcctggatccttcaccctaccttgttcagtaggaccat tatctttcagtaaatgcctatgtgatttgggtgcctcaatcagcctcatgcccctatctgtagccagaag attgggttttagtaaatacaagccctgcggtatccaactgatattagctgacagatcagtcagaatacct catggagttctcgaagacctgcctgttaaagtcggatcaatagacatccctactgatttcgtaatactgg agatggatgaggagccaaaagacccattgatcctaggaagaccattcctagctactgcaggagctattat cgacgtcaagaagggaaaaatcgacctaaacctggggaaggagatgaaaatgaccttcgacattaacaaa gctatgaagaaaccaacaatcaatggacaagtcttttggatcgaagaaatggatagattggctgacgaat tactggaagaacttacagaaggagatcacctagcaagtgccttaaccaacgatggagaagaagggtacct acacttagaaacccaagggtacaaagagtatcttgacgcccacataccaatggaaggaccagaagagttt gaggaattgattgttccctcagaagaagcagtatcaggatgcaccatgagcctaatcgctgaaaaaacaa actctactgagatgctcgaccatggaggagagaatataagttcagatgactggtcagaactcaaagctcc aaaggttgaccttaagcctcttccaaaaggtctgaggtacgcgtttctcggtccaaatgatacttatccc gtcattattaatgatggactaagtgatgaacaagtgaaccagttgttgaatgagcttagaaagtatagga gggcaattgggtattctttagctgatattaaaggaatttcacctagtttatgtactcataggatccatct tgaaaatgaatcatatactagtattgaaccacagaggagattaaatccaaatttgaaagatgtggttaaa aaggaaattcttaaattgcttgatgctgatataatctacccaatttctgacagtacatgggtctctcctg tgcactgtgtacccaagaagggaggaatgactgtagttaaaaatgatttagatgagctcattcccactag aactataacaggacatagaatgtgcattgattacaggaaactaaatgctgcaactagaaaggaccatttt cccttgccattcattgatcaaatgttagaaaggctagctaaccatgtttattactgttttcttgatggct atagtggtttctttcaaattgcaattcaccctaatgatcaagaaaaaaccactttcacctgcccctatgg aacctttgcttataagagaatgtcttttggtttatgtaatgctcctggaactttccagagaagcatgact tcgatattctcggattttattgaggagataatggaggtattcatggacgatttttcagtctatggatcct ctttctcctcatgtttgttaaatttgtgcagggttctagaaaggtgtgaggaaaccaatttggtactcaa ctgggagaagtgtcacttcatggttcaggaaggaattgtgctagggcataagatttctggtaaaggaatt gaagtcgacaaggcaaaaatcgatgtcatgattcagctgcaaccccctaagacagttaaggatatcagga gcttcctaggccatgcaggattttacaggaggttcataaaagatttttcaaagattgctcgacccctcac tagattgctgtgcaaggaaactgagttcaactttgatgaagattgtctcaaagctttccatttgataaag gaagcattggtgtctgcccctatcgttcaagcacccaactgggaccacccatttgagatcatgtgtgacg catccgattatgcagttggagctgtcctaggccaaaagattgatggcaaacttcacgtcatctactacgc gagtagaacaatggacgaagctcaaacaagatatgcaaccacagaaaaggagctattagctgtggttttc gcctttgagaaatttagaagctatttggttggctccaaagtgaaggtctacacagaccatgcagcactaa ggcacatctatgccaagaaggaaaccaagccaagacttctaaggtggatactgttgcttcaagagtttga catggaaatcattgacaagaaaggtgttgagaatggagttgcagaccatctctccaggatgagaattgaa gattcagtcccgatagatgacactatgcccgaagaacagcttatgttctacgaccttgttaacaaaagct tcgacacaaaggacatgctggaagaagcatatgcagttgaagaagaaaagttgccctggtatgcagattt agtcaattatttggtaagtggtatgatcccccagagtttggatgcatataaaaaaaagaagttctttaga gacatccaccattactattgggatgagccgtacttgtataaaaaggggagtgatgggctattcaggaggt gtgtctctgaaggagaagttcaaggtatactgggacattgccatggatccacctatggagggcatttcgc aaccttcaagactgcccagaaaattttgcaagcaggtctgtggtggcccacaatgtttaaggatactcag gaattcatagcgaaatgcgatccatgccaaagagttgggaacatatccagaaggaatgagatgccacaga atccaattcttgaggtagaagtttttgatgtgtggggaatagacttcatgggtccgttcaaccctccctc atatggaaatgcttatatcctggtagcagtggattacgtctccaagtgggtggaagcaatagcagcacca accaacgatcacaaagtagtcctgaaaatgtttaagtccattatttttcccagatatggtattccgaagg tagtgataagtgatggaggctcacatttcataaacaaagtctttgaaggtctattgagaaaacatggggt taagcataaggttgccactccctatcaccctcagaccagtggccaagtggaagtatcaaacagacagatc aaagctatactcacaaaagttgttggtgtctcaagaagggattggtcagctaaacttgatgagactttat gggcctatagaaccgcatacaaaacgcctattgggagaaccccattccagatgctatatgggaaatcttg tcatctaccagttgaagtggaatataaagccatttgggcaaccaaacttttgaatttggaaatcaaggga gctcaagaaaaaagagcagtcgacctgcatgaactggaagagattaggctggaagcatatgagagttcaa aaatctacaaagaaagaacaaaagcctttcatgacaaaaagatctcaccaaaagatttcaacattggaga ccaagttctgcttttcaactccagactcaagttgtttccaggaaagctaaaatccaggtggtctgggcct ttcaccatcaaagaggtactgccatatggggcaattactcttactaaggagggaagttctgaatttactg tgaatgggcagagggtcaagcgttatatggcagattgccctattcctgaaggaactacagttgacctcca ggaacccatcaatgcctaaaatagtaagaagtctagcttaagactttaaactagctcacttgggaggaaa tcccatgcctatctctgtacatatttaattttgatattttgatatgctttttagtgtttttggaattcag gaaataaatcaggatttgtagagctgtgaactctgtgatcaagaaatatctgcaacagaagcaattctac tctatcaattggtcgagtaaaattgcatcaccattttactctatcaattggtcgagtatagtgactaaac gaccaagcatggtgatcgagtggcccacttgtctccccatcattctattactcggtggacaaagccttta cccatcatgctcaatcactatttggtcaacacccttatcccaacaaagaaagcataggagaccatcctcc tccactcattcaaatcgaaacctaaagataagatatcactctctctcactttcttcttttgagcgcacct caccatcatctctctcacttgtttcctgcttcacaccatccccaaaatttcagccttcagttttcttcaa aaatcacttgcccccccccctgtttcactcgatcaaaactacaagtgttttctttgtttgagactcacct actaagctttagttttgatttctgagagtttagcttcaccatgagttcaaacagcaacgaatcctcaatg gacgccgatttcaacgtggatgaggctgaatcttggtctactaggcctcagagggagatggaagagtata ggcggttcagtcaacacgccgccaaggtcctagctcgtgatagaagaagagcggaaattgcaagaggtaa gagagctatggcggaagagagaagcctcgtggatgaagaccttggtggggatgaagactatgtccctgaa ataactccaagagccactaaatccttgatgaagaagactaagctatcaccggatggatactatgagcttt tggcggctcacgaattccatggcactcgatacccccactccgaaaccatgaatgagcttggcataacaga ggatgtcgagtacctctttgaaaagagtggtcttttgggtctcatgaccaaccctcactcagcctacaaa acagaagcgcttcagtttcttgcttcgctggaagtagagttgtttcaggggctatctagccatgaagcaa gagaagaagggcttgggtatatcacattcgcagtctatggaaaagactatgttttggctatcaaaactct tgaggatatgttcggattccctagagggactgatgttaaacccaagttcaagaaagaggagctttcggat ctttgggtaacaataggtgatgacgcccaattctcgtcctcaagggccaagagcaacgcaatccgaaacc cctgcattcggtacttccagaaagcaatggctaacgtcctctacgcaagggagaagaccggaccgattaa caatggcgagatagagctgctggatattgccctaaaagatatcctagtctacaccaagaacaaagtgcct atgaagggtgacacaaacgatgcttcaccatcaatgcgtttattgaatcatctttgtggtttcaggaaat gggcactagcgaataagcacaagagaaccatttctataggaggggtgataacacctatactaatggcttg tggagtgccacttcagtcgactccctttgcccctagatggatcgacattcctcacctgagatgggctttg tttattgagcatcagagtcatgaggggatgcatattctcaagtttcagcatagaacggaaatggatgcca ggcttctcctacccaaccaagagctcacaaccatcacagtgagaggtaacatcgatttcaacccgcctaa tgaagaactcttcttcatggaaagagcacccccgacgagagaagctccagacaatgaagaaagggttgag agtgaagaaggagctgagaatgaagaaggagaggagatggattgggagaactacaatgcgagtaggttcc attttgaggaacacaaaccaccaccaagagtgagtaagagtctgactgttgcgcataagaatattggttc gatgagtgcttggaataagtttcaagacaagatgttggaaaaatgtggcaaggccatagccgcaattcag gctgtactgagctgttcatcctcaggcgccaccatggtgagagaaaaccgccctgaagaagtggtttcaa gaaggcacagagtttcaccatcaaggcaatctgcctatgagcaacgagaggtaagtcgcccccaagctcc tgcaagacattcatctcatgagcaccgcgagcagaaaagaaggagaaaaacaagaattgttcggccaagg agcaaggaccttcttatgtccagcagaagatcactcgaccaagatactcgacgtgatatcgagcagagtg tcgagcaggatccgtgggttcagaacgagcagagtgtcgagcaaaatcagtgggatcaagatgggcagta ctcgggaatgaattgggatgcttaccatggcatcaacgagcaggaacccacctctgacaactgaggtaac accacctcaaccattgtatataccatagcatctttttatttttctttgtgtgttttgtgctttggtcttg aattctctttcagatttttattacacaagggactgtgtaatttaagtttgggggagggttcaagacatgt ttaacattgtgttctcttttcttatttcaaattttgcatcatctaaggcatagaaaaaaccataaaaatt tgaaaaattttcagaaaattatttcacaaaaaaagagtgtcatgtagattgcatatcattttaggattgt atatagagtgtttgcatttaggattgctcatttgcataggggataatgatgatttcaaaccgataagcat gttttgattcaatgaagtaagttcaatgcccaggttgttagttctctgttgcataaccgattgtatctct gaagtaagaccgcaccctgctttgaatttgtaacttgactatactactttgatcgaaactcgcttatctg atgccattccctatctattagaacctgaactgaatttgcaattatcatgtctatgcattgtttgtgaact catggctaccatatacatacttggattatcttattcacttcaccaccctctgttaatccaagtagctgtc tctcatttttagagcagtttcccatacccttagcctagccttatttcaagccaaagatcatttgtgtgtg attgtgaggtctttttcgattgagcttggtataacgtgttaggtatgagccgacaagagcagtctttcat gtagacctagcccacgctttttgggttagctaggactaggtggagacttgtatttgggattgggaatgtg tatggaatgaatgggaatgagaatgagaatgaaaatgaaaagaaaaaaaaaaaaaagagagtaagagttt agagttctaagggggatacaaaagtgttaagtctagtaaagggtttggaattcaaagaagttaggcattg tgattcaaagaaaagggtaaaacgccctatgtgctaagattagaagaaaagccttaagtttgtcaaagta taaaaccaaacccgctagactcttaaatcgtttcaaagagaaccccttagaacttgaaccaaaaagagag aaagaatgagaaaaagggtagagttaggacatctatgggactgagatcatcttctgagctatcatactat atcattgggtagatgggagtctgttcttgtatgtgtccattggcttttacctttagcattctactgaagc tcaatcttttctctgagagtcccctgttactcgaccaaaccattagagggaccatttttgtctcttagcc taagcccgaaaccaagtgagttcataaccattgcattgcttgatccactgttcgtgcttaatgaatgtta aagggaattggttgatgtgaatgcttgaatagattaaggcaagttgttaggttgcattgtgacgagtatg gctaacgtttttaagtaagggtctatcatcttgcaacctagaattagctacctggacattgagcttaccc gttttatatgcatgcttcggtttttgaatccccaccttcaaacctctccttcaacttaagattttttgtt tgcttgagggcaagcaagagataagtttgggggagt1