Help/Information for CENSOR
The CENSOR web server allows users to have query sequences aligned against a reference collection of repeats. The homologous portions are then "censored". Censoring means replacing the aligned portions with masking symbols (N for nucleotides, X for amino acids) in the query sequences. The server automatically classifies all known repeats and adds the classification to the report. The server returns a single page composed of the following sections:
- Map of repeats and its graphical representation.
- The censored query sequences, with an "N" ("X") replacing each base of the removed repeats.
- The local alignment results.
- The fragments that were censored out, i.e. fragments homologous to one of the repeats from the reference collection.
- Annotation portion of all detected repeats.
Format of the Map of Repeats
Column Name contains locus names of submitted query sequences (first column) and Repbase library sequences (fourth column). Repbase names are hyperlinked to their sequences.
Column From/To contains beginning/ending of positions of fragment on corresponding sequence.
Values in column Dir indicate orientation ('d' for direct, 'c' for complementary) of repeat fragment - columns 4-6.
Column Sim contains value of similarity between 2 aligned fragments. Similarity is calculated as:
Sim = match_count / ( alignment_length - query_gap_length - subject_gap_length + gap_count)
- match_count - number of matching base positions in alignment;
- alignment_length - length of alignment, which is number of matches + number of mismatches + length of gaps;
- query_gap_length - total length of alignment gaps on submitted query sequence;
- subject_gap_length - total length of alignment gaps on Repbase library sequence;
- gap_count - number of uninterrupted alignment gaps of any length on either query or subject sequences. From biological point of view one indel, which corresponds to uninterrupted alignment gap, reflects one event in evolution and should impact value of similarity the same way unrelated to its length.
Column Pos is roughly the ratio of positives to alignment length. This ratio is calculated the same way as we calculate similarity, with positive_count instead of match_count. Positives are pairs of bases that produce positive scores in the alignment matrix. This information is particularly useful for estimating the quality of protein alignments.
Column Mm:Ts is a ratio of mismatches to transitions in nucleotide alignment. The closer this number is to 1 the more likely is that mutations are evolutionary.
This column contains the alignment score obtained from blast.
This is class/subclass of repeat as specified in repeat annotation. The hierarchy of classes of repeats is provided below.
- Transposable Element
- DNA transposon
- LTR Retrotransposon
- Endogenous Retrovirus
- Non-LTR Retrotransposon
- DNA transposon
- Simple Repeat
- Multicopy gene
- Integrated Virus
- DNA Virus
- Transposable Element
Graphical Representation of Repeat MappingFor your convenience censor graphically maps detected repeats with color-coding of different types of repeats. PNG images on censor report page and SVG (Scalable Vector Graphics) images are interactive. Not all web-browsers support SVG by default. To be able to open SVG graphics you need to have the corresponding plugin installed on your computer. Besides high quality scalable graphics, the SVG version as well as PNG has each color coded bar hyperlinked to corresponding Repbase sequences. Among all tested systems SVG version works best on Windows/Internet Explorer platform.
Genetic Information Research Institute and its employees are not liable for any and all losses or damages (real or perceived) associated directly or indirectly with the application of this server and/or software.
Censor server is powered by WU-BLAST
Gish, W. (1996-2005) http://blast.wustl.edu
This work was supported by grant DE-FG03-95ER62139 from the U.S. Department of Energy, Human Genome Program and by grant P41LM06252 from the National Institutes of Health.