;ID ATLANTYS2_I DNA ; ATH ; 9311 BP ;XX ;DE ATLANTYS2_I is an internal portion of the ATLANTYS2 endogenous ;DE retrovirus - a consensus sequence. ;XX ;AC . ;XX ;DT 26-FEB-2001 (Rel. 6.1, Created) ;DT 26-FEB-2001 (Rel. 6.1, Last updated, Version 1) ;XX ;KW Gypsy-like endogenous retrovirus; ATLANTYS superfamily; gag; RT; ;KW RNase H; integrase; ATLANTYS2_LTR; ATLANTYS2_I. ;XX ;OS consensus ;XX ;OC Arabidopsis thaliana ;OC Eukaryotae; mitochondrial eukaryotes; Viridiplantae; ;OC Charophyta/Embryophyta group; Embryophyta; Magnoliophyta; ;OC Magnoliopsida; Capparales; Brassicaceae; Arabidopsis. ;XX ;RN [1] (bases 1 to 9311) ;RA Kapitonov,V.V. and Jurka,J. ;RL Direct submission (February 2001) ;XX ;CC ATLANTYS2_I is an internal portion of the ATLANTYS2 endogenous ;CC retrovirus. There are several copies of ATLANTYS2_I in the genome; ;CC they are ~97% identical to the consensus sequence. Long terminal ;CC repeats from ATLANTYS2 are deposited in Repbase Update as ;CC ATLANTYS2_LTR. ATLANTYS2 has generated 5-bp target site duplications. ;CC ATLANTYS2_I encodes a 1867-aa polyprotein, ATLANTYS2p1, composed ;CC of gag, protease, reverse transcriptase, RNase H and integrase ;CC domains. ;CC ATLANTYS2p1 (CDS 73-2388 and 2476-5763, predicted by FGENESH): ;CC MVNNQIPGNTVDAEGNPIPPIQTDVPEAAAPATLAELRSMMAQLQQKVNDQEQANRSLAQ ;CC QLEAATSQGQIRTTRFGARHLQDRRAAADLNPTRLVFHTPGNTTRPVRRTAPEIGRDRTE ;CC PAILGNRETNRTERNEPQLPPPRAEVAEADQIGVSDDEDSEENIRWAEEYAREQEISAIK ;CC LSLAKAENEMKLVRSQMHNAVSSAPNIDRILEESHNTPFTHRISNAIISDPGKLRIEYFN ;CC GSSDPKGHLKSFIISVARAKFRPEERDAGLCHLFVEHLKGPALDWFSRLEGNSVDSFQEL ;CC STLFLKQYSVLIDPGTSDADLWSLSQQPNEPLRDFLAKFRSTLAKVEGINDVAALSALKK ;CC ALWYKSEFRKELNLSKPLTIRDALHRASDYVSHEEEMELLAKRHEPSKQTPRIDKSQPSA ;CC PNHKKGAQGGTFVHHEGRNFSGAHNYQADTPRGEAARGRGRGRGRGRGRESYTWTKDQPA ;CC GNEQEYCELHKSYGHHTSRCRSLGAKLAAKFLAGEIGGGLTIEDLEAEKGKTEQVNAVAN ;CC PEQAAPAANPEGPKRGRGNREADDDEPEAARGRIFTILGDSAFCQDTAASIKAYQRKADA ;CC NRNWARPFNGPNDEVTFHESDTNGLDRPHNDPLVITLTIGDFNVERVLVDTGSTLDIIFL ;CC TTLREMKIDMTQIVPTPRPVLGFSGETTMTLGTIKLPVRAKGVTKIVDFSVTDQPTVYNA ;CC IIGTPWLNQFRAVASTYHLCLKFPTSDGVKTIWGNQKNARICFMAAHKLRNPLKIEEARE ;CC STTPTPDPVILICLDDEKPERCVEIGGDLGEELTAELTAFLKENVNTFAWSPEDLPGVSV ;CC DIVSHELNIDPTFKPIKQKRRKLGRERAEAVKAEVEKLLRIGSITEAKYPDWIANPVVVK ;CC KKNGKWRVCVDFTDLNKACPKDSFPLPHIDRLVESTSGNKLLSFMDAFAGYNQIMMNPED ;CC QEKTAFYTEQGIFCYRVMPFGLKNAGATYQRFVNKIFALQIGKTMEVYIDDMLVKSMAEK ;CC DHISHLRECFKQLNLYNVKLNPAKCRFGVRSGEFLGYLVTHRGIEANPKQIEALLGMASP ;CC QNKREVQRLTGRVAALNRFISRSTDKCLAFYDVLRGNKKFEWTTRCEEAFQELKKYLATP ;CC PILAKPVIGEPLYLYVAVSDTAVSGVLVREDRGEQKPIFYVSQTFTGAESRYPQMEKLAL ;CC AVVMSARKLRPYFQSHSIIVMGSMPLRAILHSPSQSGRLAKWAIELSEYDIEYRNKTCAK ;CC SQVLADFIVELPTKEARENPLDTTWLLHVDGSSSKQGSGVGIRLTSPTGEVLEQSFRLNF ;CC EATNNVAEYEALVAGLNLARGLKIGKIRAFCDSQLVANQFNGEYTARDEKMEAYLIHVQN ;CC LAKNFDEFELTRIPRGENTSADALAALASTSDPSLRRVIPVEFIEKPSIELGEEEHVLPI ;CC QISADQDDPDDCSSEWMEPIISYISEGKLPSDKWKARKLKAQAARFVLVDEKLYKWRLSG ;CC PLMTCVEGEAICKIMKEIHGGSCGNHSGGRALAIKIKRHGFFWPTMIKDCENFSKRCKKC ;CC QRHAPTIHQPAELLSSIASPYPFMRWSMDIIGPMHPSKQKKLVLVLTDYFSKWIEAESYA ;CC SIKDAQVENFVWKHILCRHGIPYEIVTDNGSQFISTRFQGFCDKWGIRLSKSTPRYPQGN ;CC GQAEAANKTILDGLKKRLDAKKGSWSDELEGVLWSHRTTPRRATGETPFALVYGTECIIP ;CC AEMIVPSLRRSLSPENTPDNTQRLLDELDLIDERRDSALVRIQNYQNETARHYNSNVRQR ;CC RFHEGDRVLRKVFQNTAEPNAGKLGTNWEGPYLISKVIRPGVYELADLSGKAVPRSWNAM ;CC HLRKYYN ;CC ATLANTYS2_I encodes also a second protein, ATLANTYS2p2, located ;CC in opposite orientation at a place occupied usually by the ;CC env proteins in regular retroviruses. ;CC ATLANTYS2p2 (781 aa, CDS 8837-7874, 7797-7130, 7061-6348): ;CC MSSSQSPSTPSASLVDSSDSNHPDDLPPIYKRRSVWTSSEEDAVSSSNAPEQTTPFTARE ;CC DTNADIARELDLPDDPEPPLVRRSFAPMADEAGTSNWQDVPEPFMPTVKIEDFLYFGPNE ;CC TEDILRLNEQKAFEKAEKKKRKKNKKVIMPDPPGSTLCTERSLSDLRARFGLGAVTLRVP ;CC SPDERADNPPAGFYTLYEGFFYGCFLWLPIPRLVLEYVTSYQIALSQITMRSLRHLLGIL ;CC IRSYESETEITLAHLRNFLEIRRVPKSEVDRYYISPAKGKKIIDGFPSKDEPYTDHFFFV ;CC AIEDAVHEDLLGTVLTRWGILERTLKFLEPIPDDFLSAFHALSARKCDWLKHFSRERVER ;CC ALRLLHGVSCPTSSESSDHRTQFFVDMQSTKLTLREVYAKKKEDKERRLAEEKRLVDAGL ;CC ISPRAAPEATQDGNVIPDAAAPVDAAPAEAQEAEPSAAAPEAVVALPASDKAAGKRVRVD ;CC DESSKKKKKKKKTSGSEAEKVLPIFEDRIASANLLGGCVGPLLPPPDTLLESRKYAETAS ;CC HFLRAVASMNRMVHSYDSAMRSNMEVAGKLAEAESRIQAAEREKNEALSEAAAAKLEREE ;CC VERMAFVNKENAIKMAEQNLKANSEIVRLKRMLSEARGLRDSEVARAIQTTRREVSETFI ;CC AKIKTAEHKVSLLDEVNDRFMYLSQARANAQLIEALEGGGVLEREKEQVDEWLKDFADAE ;CC VNLNRFIAELKDELKAPAPEPAPLSPGGHRSVESLADEAGVTDQSGSLLPAEDNRPSEDL ;CC D ;CC There is 48% identity between ATLANTYS2p1 and ATLANTYS1p1. ;CC ATLANTYS2p2 and ATLANTYS2p1 are only 19% identical to each ;CC other. ;XX ;DR [1] (Consensus) ;XX ;SQ Sequence 9311 BP; 2721 A; 2474 C; 2203 G; 1909 T; 4 other; ATLANTYS2_I atttggcgctagaaggaggggacttgagatttctcttactcccggaacacagaaccaaccacccaattca caatggtcaacaatcaaatccccggtaacacagttgatgcagaggggaacccaatccctccaatccagac agacgttcctgaagccgctgctcccgcgaccctagcggaactaagaagtatgatggctcaacttcagcag aaggtgaacgatcaagaacaggcaaatcgatccttggcgcaacaactcgaagcagctacctcccaaggac agatcaggactactcgtttcggcgcgaggcatcttcaggatcgacgagcagcagcagatctcaaccccac acggctcgtgttccacacgcctggcaatactacaaggcccgtccgccgaaccgcaccggaaatcggaaga gaccgaaccgagccagcgattttgggaaatcgggaaacgaatcgaacagaaagaaacgaaccgcagctcc ctcctccccgagcagaagttgccgaggccgatcagatcggggtctcggacgatgaagattcagaagagaa cattaggtgggctgaagaatacgccagagaacaggaaataagcgccatcaagctctccctagccaaggca gaaaacgagatgaagctcgtgagatcccaaatgcataacgcagtctcctcggccccgaacatcgaccgca ttctggaagagtcccacaacacaccgttcacacacaggatctccaacgcgataatctcagatccaggaaa actaagaatcgagtacttcaacggatcttccgacccgaaaggacacttgaagtcattcatcatctccgtg gcccgagccaaattcagaccagaagaaagagacgccggtctctgtcacctgttcgtcgagcacttgaaag ggccagctctggattggttctcgagactcgaaggaaattctgtggacagttttcaggagctatcgacact cttcctgaagcaatattcggtgctaatcgatcccggcacatcagacgccgacctgtggtcactatctcag cagcctaatgagccacttcgagacttcctcgcaaaattccgatctaccctagccaaagtcgaaggaatca acgacgtagcggctctctctgctctgaagaaagcactgtggtacaaatccgaatttcgaaaggaattaaa tttgtccaaaccactgacaatccgagacgccttgcaccgagcctcggattacgtatcccatgaagaagaa atggaactactagccaaaagacacgaaccgtccaagcaaacgcctcgcatcgataaatcccaacccagtg ctccgaatcacaaaaagggtgctcaaggcgggacattcgttcaccatgaaggacgaaatttctccggagc ccataattaccaggctgatacaccccgaggcgaagccgcccgaggccgaggacgaggccgcggtcgagga cgcggtcgagaatcctacacttggacaaaggatcaacccgcaggaaacgagcaggaatattgcgagttgc ataagagttacggccatcatacttccagatgtcgtagcctcggagcaaagttggcagcaaaattcctagc cggagaaatcggtggaggtttgaccatcgaagacttagaagcggaaaaaggtaaaaccgagcaggtcaac gctgtggccaatcccgagcaggcagcccccgcggcgaaccccgaaggacccaaaagaggccgaggtaatc gcgaagcagacgacgatgagccagaagctgctcggggaaggatcttcacaattttaggggattcggcttt ctgtcaagacacggcggcatcaatcaaggcttatcaaaggaaggccgacgcgaatcgtaactgggcgcgg ccatttaatgggccaaatgacgaagtaacctttcacgaaagcgataccaacggtttagaccgtccgcaca acgatcctttagtcattacactgaccatcggtgatttcaacgtcgaacgagtcctagtcgacacgggaag cacactggacatcatttttcttacaactctgcgagaaatgaagatcgacatgacgcaaatcgtaccaact ccacgacctgtgctcggattctctggggaaaccactatgactctcgggaccatcaaattaccagtccgag ccaaaggggtaacaaaaatcgtcgatttctctgttaccgaccagccgaccgtgtacaacgcgattatcgg cacaccatggttaaatcaattccgagctgtcgcctcgacgtatcatctctgcctgaaatttcccacaagc gacggcgtgaaaaccatctggggaaatcagaaaaatgctcgcatctgcttcatggcagcacacaagctca ggaaccccgtcactgaatcgacggccgacgcgaatcataagaaggccaagcttggccgagctgaagagaa atcaatttccgagcagttatagcagctaaagatcgaggaggctcgggaatctacaacaccaactcccgat ccggtaatcttaatctgccttgacgacgaaaagcccgagcgatgcgtagaaatcggcggagatctgggag aagaactaacagctgaactcaccgccttcctcaaagaaaacgtcaatacattcgcctggtccccagaaga tttgcccggagtaagtgttgacatcgtatcgcacgagctcaacatcgacccgactttcaaacccatcaag cagaagaggagaaaattgggtcgggagcgagcagaagccgtgaaagccgaggtagagaaattattgagga tcggatccatcaccgaggcgaaatatcccgattggatcgcgaacccggtcgtagtaaaaaagaaaaacgg caaatggagagtctgcgtagatttcacagaccttaacaaagcctgcccgaaagacagcttcccattacca cacatcgatcgcctcgtagaatcaacttctggaaacaagctactgtcattcatggacgctttcgctggtt acaaccagatcatgatgaaccccgaagatcaagaaaaaaccgctttctacacagaacaaggcatcttttg ttaccgagtgatgcccttcggactcaagaacgccggggcaacctatcaacgcttcgtcaacaaaatcttc gcattacagatcgggaagacaatggaagtttacatcgacgacatgttggtgaaatccatggcagagaaag atcacatatcccatttacgcgaatgtttcaagcagcttaacctctacaacgtcaaactcaatcctgcaaa gtgccgcttcggagtaagatccggcgagttcctcgggtacctagtcacgcaccgcggcatcgaggcaaat ccgaagcaaatcgaggcattgttgggaatggcgtcacctcagaacaagcgagaagtgcagcgcctaaccg gaagagttgcggcccttaaccgtttcatctctcgctcaaccgacaaatgcttggccttttacgatgtgct tcggggaaacaaaaagttcgaatggacgacccgatgcgaagaagcttttcaggaactcaagaagtacctg gcaactccacccatcctcgcaaaacccgtaatcggagaaccactatacttgtatgttgccgtatcggata ctgcagtcagcggagtgttagtccgagaagacagaggcgagcagaaaccgattttttacgtctcgcagac tttcaccggcgcggaatctcgctatccgcaaatggaaaaacttgctttagcagtcgtaatgtcggctcgg aagctgcgaccctactttcaatcccattccatcatagtaatgggatccatgccactccgcgccatcttac acagtccaagccaatcaggacgtctggctaaatgggcaatcgagctcagcgaatacgacatcgagtatcg gaacaaaacatgtgcaaaatcgcaggtcctagccgattttatcgtcgaactgcccaccaaggaggcccgg gaaaacccactcgacacaacttggcttctacacgtagacggctcgtcatcaaagcaaggctcgggtgtag gcatccgcctcacctcgccaacaggagaggtcctcgagcagtcattcagattaaacttcgaagctaccaa caatgtggccgagtacgaagcgctcgttgccggacttaatctagctcggggactaaagataggaaaaatc cgagctttttgcgattctcagctcgtcgcgaatcaattcaacggagaatacacagctcgggacgaaaaga tggaagcctacctgattcatgttcaaaatctagcgaagaatttcgacgaattcgagttgacaaggattcc acgaggagaaaatacatcggctgacgccctagctgctctagcctcgacatctgacccgagcctgagaaga gtcatcccagtggaattcattgagaagccaagtattgagctcggcgaagaagaacacgtcctcccaatac aaatcagcgcggatcaagacgacccagatgactgcagctcagaatggatggaacccatcataagctatat atccgaagggaaattgccctcggacaaatggaaagctcggaaactcaaagctcaggctgcacgtttcgtt ctagtagatgaaaaactttacaagtggcgattatccggacccttgatgacatgcgtggaaggagaagcga tttgcaagatcatgaaggaaattcacggtggctcgtgcggaaatcattccgggggaagggctttagccat taaaataaaacgccacggattcttctggccgacaatgatcaaagactgcgaaaatttttcaaaacgatgc aaaaaatgtcaaaggcacgcgccaacaatccatcagccagccgagctcttgtcatcaatcgcctcgccat atccattcatgcgatggtcaatggatataattggacctatgcatccctcgaagcaaaaaaagttagtcct cgtcctgaccgactatttctctaagtggatagaagccgaatcttacgccagcataaaggacgctcaagtc gagaacttcgtgtggaaacatatcctatgtcgccacgggataccttatgagattgtcacggataacggct cgcagtttatatcaacccgcttccaaggcttctgtgataaatggggaattcgacttagcaagtcaacacc acgatatccccaaggaaacggccaagccgaagccgctaacaaaacaatcctcgacggattgaagaaacgg ctcgatgctaaaaagggctcgtggtccgacgaactcgaaggtgtactttggtcgcatcggacaactcctc gccgagccacaggagaaacccctttcgccttagtctacggaacggaatgcataattccagccgagatgat agtgccgagcctacgacggagtctatcccccgagaacacccctgataacactcaaaggctcctcgacgaa ctcgatctgatcgatgaacgaagagattcagccctggttcgcatacaaaattatcagaatgaaacggctc gtcattacaactcaaatgttcggcaacgaagattccacgaaggagatcgggtcctccgaaaagttttcca gaacactgccgaaccgaacgctggaaagctcgggacgaactgggaaggaccatacttaatttctaaagtc atccgacccggagtgtatgagctcgctgacttaagcggcaaagccgttccaagatcatggaacgcaatgc acctaaggaaatactacaactaaatccgaggtgactaaacttgaactacgaggtggcttgatccctgaaa agggtacgtaggcagctcgtcttcggacgagttcagctacccccccattaaaaaaggggggagtgggtcc gtatattcatactcccatttttattatcttgtagattttcgaaccgaaacatgaataaaaattctttgca actttttattcggctaatacgatgagcgacggctcggagtatccattacgcctattcggctacagcgcgc tatacaaatacgaggtgaaatctatcagattatttctgataagaaattttcatcttcagaaacccggtta tacttttaacaccggtatcccagctcgttcctaaacgctggtcgggaagtgaatacgatcgggtcgcaac cgaatcatattagamaataaaacggttcggatttttagaatcccaccaaawattttggatttcaaaaaak attggataaagccatccaggcacaggttctacaaatcattggataaagccatccaggcataagtactata accaaacaaaaagaaaaaaaaaaaaaaacaaatcccgcctcgggtccctagtcgagatcttcagatgggc gattatcctcggcaggaaggagagatccggattgatcagtaactcccgcctcgtcggcaagagactcgac cgatctgtgaccacccggactcagaggagcgggctcgggagctggagccttgagttcatctttcagctcg gcgatgaagcgattaagattaacctcagcatcggcgaaatctttcaaccactcatcgacctgctccttct ctctctccagaactccgccaccttcgagcgcctcgatcagctgcgcattggctcgcgcctgagacaagta catgaatcggtcattgacctcatcaaggagcgacactttgtgctcggcggtctttattttggcaatgaag gtctcggaaacctcccgcctcgtcgtctgaatggcccgagccacctcgctatcacgaagccctctcgcct cggataacatccgcttgagacggacgatctcagaattcgccttgagattctgctcggccatcttaatggc attctctttgttcacaaaggccatcctctcgacctcctccctttccagtttcgctgcggcagcttcggag agtgcctcatttttctctcgctcggcggcctgaatccgagactccgcctcggctaacttaccagccacct ccatgttgctccgcatagccgaatcatacgaatgtaccatccggttcatcgaagcgacagcctggaaaag agaaaagttagatttacaatgacaacaccaagacaagttataagtaacaaagcacatacccgcagaaagt gagatgccgtctcggcatacttccgagattccagaagagtatctggaggaggaagcaagggaccaacgca tcccccgagcagattggccgaggcaatacgatcttcaaagatcgggagtactttctccgcctcggagccg gatgtcttctttttcttcttcttcttctttgacgattcatcgtcaacccgaacgcgcttacccgccgctt tgtcactcgcaggtaacgccacaaccgcctcgggcgcagcagcagacggctcggcttcctgagcttcagc gggcgccgcgtcaaccggsgcggcagcatcgggaataacattcccatcttgggtcgcctcaggagcggcc cgcggcgagatcaatcccgcatcaacaaggcgtttctcctccgccaatcgcctctccttatcctctttct tcttggcgtagacctcccgcaaagtaagcttcgtcgattgcatatctacgaaaaattgagtacggtggtc ggaagattccgaactagtaggacaagaaacgccgtgaagaagacgaagcgcacgttcaactcgctcccga gagaaatgcttcagccaatcgcacttccgagccgacaacgcgtgaaaagccgaaagaaaatcgtctggaa tcggctcgagaaacttgagagtgcgttctgcaaaaggaaaactgcaattagaaaacgaaactaacatcgc atcgggtacaaataaatttctaactacaactacccaaaattccccatctcgtcagaaccgtcccgaggag atcttcatgaacagcatcttcgatggctacgaagaagaagtgatcggtatacggctcgtccttgctcggg aacccatcaataattttctttcccttagcgggggaaatgtaataccgatccacttcggattttggcaccc gccggatctcgagaaaatttctcagatgagcaagcgttatctccgtttcagactcgtaactccgaatcaa aatcccgagcaaatgtctcaaagatcgcatcgtgatctgggaaagggcaatctgatatgatgtcacatac tccaggaccagcctcgggatcggcagccacaagaagcaaccataaaagaacccctcatacaaggtataga aacccgcagggggattgtcggctcgctcgtcagggctcggcacacgcaaggttacagcgccaagaccaaa tcgagccctaaggtccgagagagaccgctcggtgcacaatgttgagccgggaggatctggcattatcacc tttttgttcttcttcctcttcttcttctcagccttttcgaaagccttctgctcgttaagacgcaagatat cttctgtctcgttcgggccgaagtaaagaaaatcctcgatcttaaccgtcggcatgaagggctcgggcac atcttgccaattggatgtcccggcttcgtcagccatcggggcaaaagacctccgaacgagaggcggctcg ggatcatcgggcaaatccaattcccgagcgatatcggcattggtatcttcccgagccgtgaagggagtcg tctgctcgggggcattcgaagaagatacagcatcctcttcagaagacgtccaaactgatctccgtttgta gatcgggggaagatcatctgggtgatttgagtcgctagaatcgacaagagaagcgctcggggtcgacgga gactgtgaagaactcattctcggctcgggacgatagacgagaagggaaacaaccagaaaacaagaatttc tatcccggaaaaatcaaaaataacaaaaagaacgacggagcttcaccggaaaagaaaaaacgatagaaat atcaagatcagagagaaggaagcaaaccttgtttattttcttcgaaaacttcaaatgaagtcttgaagga gtggctcgccgtatatatagagttttcaaaggcgcgtcgggtccgcagcaaacattcaaaacgcgttcaa gcgcgtgggctcggcagaaggaaaagatccgcgtgatcctcgggaataagcgagaaacgtttccactcct caagaatagtcggcgacacgcgacagaagtttgaccgtccgagaagatgaagcgacagaactagaagacg aacagtcggctcgggaatccattcatgaagctcatcgtctctctgccaagtcaatgagctggggggcaaa c1