TE classificationHistorically, eukaryotic transposable elements are classified into two classes: Class I and Class II. Class I is Retrotransposon, which transposes through RNA intermediate. Class II is DNA transposon, which does not use RNA as a transposition intermediate. In other words, Class I includes all transposons encoding reverse transcriptase and their non-autonomous derivatives, while Class II includes all other autonomous transposons lacking reverse transcriptase and their non-autonomous derivatives. Another important information is that the genomes of prokaryotes (bacteria and archaea) does not contain any Class I transposons.
Class I transposon (retrotransposon)Class I is subdivided into two large categories distinguished by the presence of long terminal repeats (LTRs): LTR retrotransposons and non-LTR retrotransposons. Recent studies revealed some other groups of eukaryotic retrotransposons, distinct from the above two by the transposition mechanism and/or the phylogeny of reverse transcriptase. They are DIRS retrotransposons (or tyrosine recombinase-encoding retrotransposons, YR retrotransposons) and Penelope-like retrotransposons (Penelope-like elements, PLE). To avoid the over-subclassification, in the classification implemented in Repbase, however, DIRS retrotransposons are included in LTR retrotransposons and Penelope-like retrotransposons are in non-LTR retrotransposons.
LTR retrotransposon contains LTRs at both ends, between which there are protein-coding regions. Proteins contain several catalytic domains: protease, reverse transcriptase, RNase H and integrase as well as structural proteins, called Gag and Env. LTR retrotransposons mobilize through reverse transcription of their own mRNA as a template, catalyzed by reverse transcriptase. cDNA is generated as an extrachromosomal DNA and it is then integrated into the genome by integrase. Integrase of LTR retrotransposons shows similarity to the transposase of some DNA transposons, especially Ginger1 and Ginger2 superfamilies, indicating the composite origin of LTR retrotransposons. LTR retrotransposons are subdivided into 5 superfamilies: Copia, Gypsy, BEL, DIRS and endogenous retroviruses (ERV). ERVs are retroviruses that omit their extracellular life style and replicate themselves in germ cells. ERVs are further divided into 5 groups, ERV1, ERV2, ERV3, ERV4 and endogenous lentivirus (ELV), but based on the classification of infectious (exogenous) retroviruses, which are classified into 8 genera, ERVs can be classified into more groups. ERV1 corresponds to two retroviral genera, Gammaretrovirus and Epsilonretrovirus, and ERV2 corresponds to Alpharetrovirus and Betaretrovirus. ERV3 and ERV4 do not have a corresponding infectious retrovirus group. International Committee on Taxonomy of Viruses (ICTV) classified some LTR retrotransposons as viruses: family Pseudoviridae for Copia, and family Metaviridae for Gypsy and BEL,
Based on the phylogeny of reverse transcriptase, besides Retrovirus, LTR retrotransposons are related to two viral families, Hepadnavirus and Caulimovirus, and both of them are occasionally integrated into the genomes. Plant Caulimoviruses are often integrated into the genome and Repbase has a category for them (Integrated Virus - Caulimoviridae).
Non-LTR retrotransposons lack LTRs, and usually they have poly A or simple repeats at their 3’-terminus. Non-LTR retrotransposons encode either of three types of endonucleases, restriction-endonuclease-like (RLE), apurinic-endonuclease-like (APE), or GIY-YIG endonuclease. Dualen is an exception that encodes both RLE and APE. Endonuclease nicks one strand of DNA and reverse transcriptase initiates reverse transcription using the exposed 3’ end as a primer and mRNA of non-LTR retrotransposon as a template. This mechanism is called Target-Primed Reverse Transcription (TPRT). TPRT is also used as a mechanism of integration of group II self-splicing introns, which also have reverse transcriptase. Group II intron is absent in eukaryotic nuclear genomes, and thus Repbase does not have group II intron entries.
Non-LTR retrotransposons are classified into groups (CRE, R2, Dualen/RandI, Ambal, L1, RTE, I and CR1) and further divided into many clades. The classification “clade” was first proposed by Malik and Eickbush, 1999, and now more than 30 clades have been proposed, which makes the classification of non-LTR retrotransposons complicated. GIRI serves a simple classification tool RTclass1, which is based on the neighbor-joining tree and the reference non-LTR retrotransposons. As of December 2016, Repbase contains 32 clades (CRE, NeSL, R4, R2, Hero, RandI/Dualen, L1, Proto1, Tx1, Proto2, RTE, RTEX, RTETP, I, Nimb, Ingi, Vingi, Tad1, Loa, R1, Outcast, Jockey, CR1, L2, L2A, L2B, Kiri, Rex1, Crack, Daphne, Ambal, Penelope) in its classification.
Non-autonomous non-LTR retrotransposons show composite structures and they are called short interspersed elements or SINEs, corresponding to long interspersed elements or LINEs, the synonym of autonomous non-LTR retrotransposon. SINEs are classified into 5 groups in Repbase based on the origin of 5’ part of SINEs. SINE1 for 7SL RNA, SINE2 for tRNA, SINE3 for 5S rRNA, SINEU for U1 or U2 snRNA and SINE4 for unknown origin. Another way of classification of SINEs is based on the similarity of their central regions. CORE-SINE, V-SINE, Deu-SINE (or Nin-SINE), Ceph-SINE and Meta-SINE are proposed, although Repbase does not use this classification since it contradicts the classification based on the origin of 5’ regions. Some entries have these classification terms in their keyword section.
Class II transposons (DNA transposons)Repbase contains 23 superfamilies of Class II as of December 2016. Among them, Helitron, Polinton and Crypton show distinct features from other DNA transposons. 18 superfamilies (Mariner/Tc1, hAT, MuDR, EnSpm/CACTA, piggyBac, P, Merlin, Harbinger, Transib, Polinton, Kolobok, ISL2EU, Sola, Zator, Ginger1, Ginger2/TDD, IS3EU and Dada) encode a D-D-D/E-type integrase/transposase for their catalytic reaction of integration. This integrase shares the features of catalytic core with integrase of LTR retrotransposons, and especially, Ginger1, Ginger2/TDD and Polinton superfamilies likely share the origin with LTR retrotransposons. The catalytic cores of Academ, Novosib and Zisupton are less characterized and there still remains a possibility that they encode a protein not related to D-D-D/E-type integrase. It is noteworthy that because the sequence is extremely divergent except the catalytic residues, the presence of conserved D-D-D/E core sequence does not guarantee their common origin; they may have independently evolved. Integrase is featured with RNase H-fold based on its ternary structure.
Notably, the scope and sub-structures of certain superfamilies are not fixed, and new groups could emerge. For example, previously, Harbinger and ISL2EU show similarity to known prokaryotic DNA transposons, IS5 family. Recently, 3 other sibling groups were recognized: Spy, Nuwa and Pangu. The latter three names are not appearing yet in Repbase, but could be combined with the other two groups under one chosen name, such as Harbinger, in the future.
EnSpm/CACTA and Transib share some residues between the second conserved D and E. Two superfamilies, Mirage and Chapaev, which were previously present in the classification in Repbase, are now integrated into EnSpm/CACTA based on their similarity.
MuDR, P, hAT, Kolobok, and Dada share the motif C/DxxH between the second conserved D and E. Rehavkus, previously present in Repbase as a superfamily, was integrated into MuDR. It is reported that MuDR shows similarity to the prokaryotic IS256 family.
Tc1/Mariner and Zator are related to prokaryotic IS630 family. Merlin is related to prokaryotic IS1016 family. Integrases of Ginger1, Ginger2/TDD, Polinton, as well as LTR retrotransposons are related to those of prokaryotic IS3/IS481 family. Sola is subdivided into three groups, Sola1, Sola2 and Sola3.
Crypton encodes a tyrosine recombinase, which is also characterized in some prokaryotic DNA transposons and DIRS retrotransposons. Crypton is subdivided into several groups (CryptonA, CryptonF, CryptonI, CryptonS and CryptonV), which may or may not share the common ancestor in eukaryotes; they may have independently evolved from prokaryotic DNA transposons.
Helitron encodes a protein including helicase and HUH nuclease, and recently the transposition mechanism of Helitron was characterized by experimental analysis. Helitron can be subdivided into two groups, Helitron1 and Helitron2, though Repbase have not implemented this classification yet.
Polinton, also called Maverick, is expected to transpose similarly with other DNA transposons, but it likely generates extrachromosomal DNA and replicate by themselves using the encoded DNA polymerase. Recently Polinton is proposed to be a genome-integrated endogenous virus, and its viral form is designated Polintovirus.
Repbase also contains other types of repeats, such as Satellite Repeats (SAT) and Microsatellites (MSAT), Multicopy genes (rRNA, tRNA, snRNA), Integrated Virus (DNA Virus and Caulimoviridae) and uncharacterized repeat sequences (Repeats). Please note that Repbase entries for multicopy genes are just representatives and do not always correspond to a specific functional copy or an obvious pseudogenized copy.
Kenji K. Kojima, Ph. D.