Help/Information for CENSOR

The CENSOR web server allows users to have query sequences aligned against a reference collection of repeats. The homologous portions are then "censored". Censoring means replacing the aligned portions with masking symbols (N for nucleotides, X for amino acids) in the query sequences. The server automatically classifies all known repeats and adds the classification to the report. The server returns a single page composed of the following sections:

Map of repeats and its graphical representation.
The censored query sequences, with an "N" ("X") replacing each base of the removed repeats.
The local alignment results.
The fragments that were censored out, i.e. fragments homologous to one of the repeats from the reference collection.
Annotation portion of all detected repeats.

Format of the Map of Repeats

Name	From	To	Name	From	To	Class	Dir	Sim	Pos	Score
N48	21018	21158	ZMCOPIA1_I	33	170	LTR/Copia	c	0.7154	0.72	208
N48	21651	21715	ALFARE1_I	883	943	LTR	c	0.7656	0.77	250
N48	21966	22622	DIASPORA_I	2619	3290	LTR/Gypsy	c	0.6854	0.69	1523
N48	23152	23200	ATCOPIA35_I	2607	2659	LTR/Copia	c	0.8235	0.82	224
N48	23391	24003	DIASPORA_I	1130	1737	LTR/Gypsy	c	0.6672	0.67	1355

Name From To Name From To Class Dir Sim Pos Score N48 21018 21158 ZMCOPIA1_I 33 170 LTR/Copia c 0.7154 0.72 208 N48 21651 21715 ALFARE1_I 883 943 LTR c 0.7656 0.77 250 N48 21966 22622 DIASPORA_I 2619 3290 LTR/Gypsy c 0.6854 0.69 1523 N48 23152 23200 ATCOPIA35_I 2607 2659 LTR/Copia c 0.8235 0.82 224 N48 23391 24003 DIASPORA_I 1130 1737 LTR/Gypsy c 0.6672 0.67 1355

Name
Column Name contains locus names of submitted query sequences (first column) and Repbase library sequences (fourth column). Repbase names are hyperlinked to their sequences.
From/To
Column From/To contains beginning/ending of positions of fragment on corresponding sequence.
Dir
Values in column Dir indicate orientation ('d' for direct, 'c' for complementary) of repeat fragment - columns 4-6.
Sim
Column Sim contains value of similarity between 2 aligned fragments. Similarity is calculated as:
Sim = match_count / ( alignment_length - query_gap_length - subject_gap_length + gap_count)
where:
- match_count - number of matching base positions in alignment;
- alignment_length - length of alignment, which is number of matches + number of mismatches + length of gaps;
- query_gap_length - total length of alignment gaps on submitted query sequence;
- subject_gap_length - total length of alignment gaps on Repbase library sequence;
- gap_count - number of uninterrupted alignment gaps of any length on either query or subject sequences. From biological point of view one indel, which corresponds to uninterrupted alignment gap, reflects one event in evolution and should impact value of similarity the same way unrelated to its length.
Pos
Column Pos is roughly the ratio of positives to alignment length. This ratio is calculated the same way as we calculate similarity, with positive_count instead of match_count. Positives are pairs of bases that produce positive scores in the alignment matrix. This information is particularly useful for estimating the quality of protein alignments.
Mm:Ts
Column Mm:Ts is a ratio of mismatches to transitions in nucleotide alignment. The closer this number is to 1 the more likely is that mutations are evolutionary.
Score
This column contains the alignment score obtained from blast.
Class
This is class/subclass of repeat as specified in repeat annotation. The hierarchy of classes of repeats is provided below.

Graphical Representation of Repeat Mapping

For your convenience censor graphically maps detected repeats with color-coding of different types of repeats. PNG images on censor report page and SVG (Scalable Vector Graphics) images are interactive. Not all web-browsers support SVG by default. To be able to open SVG graphics you need to have the corresponding plugin installed on your computer. Besides high quality scalable graphics, the SVG version as well as PNG has each color coded bar hyperlinked to corresponding Repbase sequences. Among all tested systems SVG version works best on Windows/Internet Explorer platform.

DISCLAIMER

Genetic Information Research Institute and its employees are not liable for any and all losses or damages (real or perceived) associated directly or indirectly with the application of this server and/or software.

ACKNOWLEDGEMENT

Censor server is powered by WU-BLAST
Gish, W. (1996-2005) http://blast.wustl.edu

This work was supported by grant DE-FG03-95ER62139 from the U.S. Department of Energy, Human Genome Program and by grant P41LM06252 from the National Institutes of Health.