Browse Search Repeat Masking Download Submit Repbase Reports Education

Help/Information for CENSOR

The CENSOR web server allows users to have query sequences aligned against a reference collection of repeats. The homologous portions are then "censored". Censoring means replacing the aligned portions with masking symbols (N for nucleotides, X for amino acids) in the query sequences. The server automatically classifies all known repeats and adds the classification to the report. The server returns a single page composed of the following sections:

  1. Map of repeats and its graphical representation.
  2. The censored query sequences, with an "N" ("X") replacing each base of the removed repeats.
  3. The local alignment results.
  4. The fragments that were censored out, i.e. fragments homologous to one of the repeats from the reference collection.
  5. Annotation portion of all detected repeats.

Format of the Map of Repeats

Name From To Name From To Class Dir Sim Pos Score
N48 21018 21158 ZMCOPIA1_I 33 170 LTR/Copia c 0.7154 0.72 208
N48 21651 21715 ALFARE1_I 883 943 LTR c 0.7656 0.77 250
N48 21966 22622 DIASPORA_I 2619 3290 LTR/Gypsy c 0.6854 0.69 1523
N48 23152 23200 ATCOPIA35_I 2607 2659 LTR/Copia c 0.8235 0.82 224
N48 23391 24003 DIASPORA_I 1130 1737 LTR/Gypsy c 0.6672 0.67 1355
  • Name
    Column Name contains locus names of submitted query sequences (first column) and Repbase library sequences (fourth column). Repbase names are hyperlinked to their sequences.
  • From/To
    Column From/To contains beginning/ending of positions of fragment on corresponding sequence.
  • Dir
    Values in column Dir indicate orientation ('d' for direct, 'c' for complementary) of repeat fragment - columns 4-6.
  • Sim
    Column Sim contains value of similarity between 2 aligned fragments. Similarity is calculated as:
    Sim = match_count / ( alignment_length - query_gap_length - subject_gap_length + gap_count)
    • match_count - number of matching base positions in alignment;
    • alignment_length - length of alignment, which is number of matches + number of mismatches + length of gaps;
    • query_gap_length - total length of alignment gaps on submitted query sequence;
    • subject_gap_length - total length of alignment gaps on Repbase library sequence;
    • gap_count - number of uninterrupted alignment gaps of any length on either query or subject sequences. From biological point of view one indel, which corresponds to uninterrupted alignment gap, reflects one event in evolution and should impact value of similarity the same way unrelated to its length.
  • Pos
    Column Pos is roughly the ratio of positives to alignment length. This ratio is calculated the same way as we calculate similarity, with positive_count instead of match_count. Positives are pairs of bases that produce positive scores in the alignment matrix. This information is particularly useful for estimating the quality of protein alignments.
  • Mm:Ts
    Column Mm:Ts is a ratio of mismatches to transitions in nucleotide alignment. The closer this number is to 1 the more likely is that mutations are evolutionary.
  • Score
    This column contains the alignment score obtained from blast.
  • Class
    This is class/subclass of repeat as specified in repeat annotation. The hierarchy of classes of repeats is provided below.

    • Transposable Element
      • DNA transposon
        • Mariner/Tc1
        • hAT
        • MuDR
        • EnSpm/CACTA
        • piggyBac
        • P
        • Merlin
        • Harbinger
        • Transib
        • Novosib
        • Helitron
        • Polinton
        • Kolobok
        • ISL2EU
        • Crypton
          • CryptonA
          • CryptonF
          • CryptonI
          • CryptonS
          • CryptonV
        • Sola
          • Sola1
          • Sola2
          • Sola3
        • Zator
        • Ginger1
        • Ginger2/TDD
        • Academ
        • Zisupton
        • IS3EU
        • Dada
        • IS481EU
        • Replitron
      • LTR Retrotransposon
        • Gypsy
        • Copia
        • BEL
        • DIRS
      • Endogenous Retrovirus
        • ERV1
        • ERV2
        • ERV3
        • Lentivirus
        • ERV4
        • Lokiretrovirus
        • Spumaretrovirus
      • Non-LTR Retrotransposon
        • SINE
          • SINE1/7SL
          • SINE2/tRNA
          • SINE3/5S
          • SINE4
          • SINEU/snRNA
        • CRE
        • NeSL
        • R4
        • R2
        • L1
        • RTE
        • I
        • Jockey
        • CR1
        • Rex1
        • RandI
        • Penelope
          • Penelope/Poseidon
          • Neptune
          • Nematis
          • Athena
          • Coprina
          • Hydra
          • Naiad/Chlamys
        • Tx1
        • RTEX
        • Crack
        • Nimb
        • Proto1
        • Proto2
        • RTETP
        • Hero
        • L2
        • Tad1
        • Loa
        • Ingi
        • Outcast
        • R1
        • Daphne
        • L2A
        • L2B
        • Ambal
        • Vingi
        • Kiri
    • Simple Repeat
      • Satellite
        • SAT
        • MSAT
    • Multicopy gene
      • rRNA
      • tRNA
      • snRNA
    • Integrated Virus
      • DNA Virus
      • Caulimoviridae

Graphical Representation of Repeat Mapping

For your convenience censor graphically maps detected repeats with color-coding of different types of repeats. PNG images on censor report page and SVG (Scalable Vector Graphics) images are interactive. Not all web-browsers support SVG by default. To be able to open SVG graphics you need to have the corresponding plugin installed on your computer. Besides high quality scalable graphics, the SVG version as well as PNG has each color coded bar hyperlinked to corresponding Repbase sequences. Among all tested systems SVG version works best on Windows/Internet Explorer platform.


Genetic Information Research Institute and its employees are not liable for any and all losses or damages (real or perceived) associated directly or indirectly with the application of this server and/or software.


Censor server is powered by WU-BLAST
Gish, W. (1996-2005) http://blast.wustl.edu

This work was supported by grant DE-FG03-95ER62139 from the U.S. Department of Energy, Human Genome Program and by grant P41LM06252 from the National Institutes of Health.

© 2001-2024 - Genetic Information Research Institute