Repbase Reports

2002, Volume 2, Issue 3
March 31, 2002
Copyright © 2001-2016 - Genetic Information Research Institute
ISSN# 1534-830X
Page 1

G4_DM

G4_DM is a non-LTR retrotransposon - a consensus sequence.

Submitted:
31-Mar-2002
Accepted:
31-Mar-2002
Key Words:
Non-LTR retrotransposon; ORF1; ORF2; DNA binding protein; AP endonuclease; reverse transcriptase; RNase H; JOCKEY clad; G4_DM
Source:
consensus
Organism:
Drosophila melanogaster
Taxonomy:
Eukaryota; Metazoa; Arthropoda; Tracheata; Hexapoda; Insecta; Pterygota; Diptera; Brachycera; Muscomorpha; Ephydroidea; Drosophilidae; Drosophila
[1] Authors:
Kapitonov,V.V. and Jurka,J.
Title:
G4_DM, an ancient family of non-LTR retrotransposons from the Jockey clad.
Journal:
Repbase Reports 2:(3) p. 1 (2002)
Abstract:
G4_DM belongs to the JOCKEY clad of non-LTR retrotransposons. Its copies have poly(A) 3' tail and they are flanked by ~10-bp direct repeats generated upon integration into genome. G4_DM forms a separate family of retrotransposons that were recently (several million years ago) active in the Drosophila melanogaster genome. There are ~30 copies of G4_DM in the sequenced genome, they are ~5% divergent from the consensus sequence and ~10% divergent from each other. The consensus sequence, which is a approximation of the G4_DM element that was active ~3 Myr ago, contains two ORFs (positions 2-1078 and 1143-3725) that encode the 359-aa G4p1 and 861-aa G4p2 proteins, correspondingly. G4p1 is a putative DNA/RNA binding protein, its ~200-aa N- and ~100-aa C-terminus are missing because of a lack of sequence data. G4p1: EVQRKKNSLDNSSSTSANKFALLSDGLPDKTGNKYNKNEDLEMVNEDSATDSAKPPPIILSDVNDISEML AYLNSKIKRELFYYKTQRYGHVRVMVKSIEEFRKLVKTLNNDCVQYHTYQLKDDRAFRVVIKNLHFSTNL DEIKSDEESKGHVVRNISNLKSRATKTPLNMFYVDIEPNNKNRDNVKHIGNAIVNIEPPRKNNEIVQCYR CQEFGHTKSYCTKTYRCVKSSSRHPSNICPKNTEQPAKCANCYEEHAASYKGCRIYQELLSKKISYQSKI PEXQXRPEXKXFRNPAKFAPPNKPTYTQQSNDYQSYAQIAAGNSKTNTSLERIEQLLEKQSELTNNLLNM IMLLVNLCK G4p2 is composed of the endonuclease, reverse transcriptase and RNase H domains. Its N-terminus, about 50 aa, is missing because of a lack of sequence data. G4p2: FIKTNEIDIMLISETHFTSKPYIMVVGYDIIRADHRSFXLDLLIRRLKLDGLKFQIMDSIRENAMQAATV TIKCMHADVSVTAIYLPPRFALKEADFKNFFQKLGPQFILRGDFNDKHPWWGSRLTNPKGSELYKCIVNN SITTFSTGKPTYWPTNSRKIPNLKDFVAYFGIPESHMRIMESFDLSSDHSLIIVTYSTVAHILTKPYKVI SANTDINAFKSYLETDKIDHAVELLTEQDKVSYICTKLPARNSQSNQLYLSAEIRQQIQHKRNLRKRWQE TLYPADKRSYNKAASDLKKLLSTLRNESLAEYLRNLDPHSCNHEHNLWRATKYLKRPAKRNTVVRNCNGE WCRSDDEQAKAFAQHLHSVFQPNDIDNPQTEREVDNFLESPCQMSLPIRKISINEVSSEIKWLNSKKAPG SDKIDGITLKILPPKCVRFLTFIFNAMLRVDHFPSQWKCAEIIMILKPNKAENEVTSYRPISLLSIFSKV FEKILLKRMLPILDEFAIIPEHQFGFRRGHGTPEQCHRIINEILSAFESKKYCTATFLDVQQAFDRVWHD GLLYKIKKWLPAPYFLLLKSYLTNRHFYVQQKNEYSPLHFIKAGVPQGSVLGPVLYTLYTADMPVTNTCT VATYADDTAILATSSSKEEASQLLQAELRLIESWFLLWKIKVNALKSAQITFALRRGDCPEVSFNGSAIP QSNCIKYLGLHLDRRLTWKNHIKAKRQQLNQKSLKMTWLLGRKSATTLENKVRLYKAILKPVWTYGIQLW GTASNSNIEILQRYQSKILRQIVNAPFYISNASIHKDLGIPYVKEEIAKHSKKYIDRLRTHENNLALSLV NNNNNVRRLKRFHVLDLPDRY
Derived:
[1] (Consensus)
Download Sequence - Format:
IG, EMBL, FASTA
References:

© 2001-2019 - Genetic Information Research Institute