About Alignment Characters

Last modified on 29 August 2008.

The alignments provided at the CRW Site use certain special characters to indicate structural annotation. These characters are defined here.

We also address the issues of DIVIDER lines and other rare anomalies in the alignments.

Character Case:

Case	Description
lowercase	Lesser confidence in alignment of that sequence (DEFAULT for newly-aligned sequences).
UPPERCASE	Greater confidence in alignment of that sequence.

Structural Notation Characters:

Char(s)	Char Name(s)	Description	Analysis
-	DASH	Deletion (relative to at least one other sequence in the alignment).
. ~	PERIOD, TILDE	Region not sequenced; thus, presence or absence of any nucleotide is uncertain. Typically appear at the ends of sequences.	Not equivalent to dash.
\|	PIPE	Discontinuity in helix. Used to indicate both irregularities within helices and boundaries between helices.	Treat as a dash for analysis.
( )	PARENTHESES	Enclose hairpin loops. Some "hairpin loops" may contain complex structure elements, such as pseudoknots.	Treat as a dash for analysis.
< > [ ]	ANGLE BRACKETS, SQUARE BRACKETS	Enclose variable regions, with respect to the reference sequence for the molecule. Generally, the regions aligned in square brackets are more conserved and better aligned than those in angle brackets.	Treat as a dash for analysis.
//	DOUBLE SLASH	Marks a break in the backbone (i.e., the two nucleotides separated by the "//" are not covalently connected).	Treat as a dash for analysis.
+	PLUS SIGN	Marks the insertion position of an intron sequence.	Treat as a dash for analysis.
=	EQUALS SIGN	Internal use only (a workaround for an AE2 feature).	Treat as a dash for analysis.
*	ASTERISK	Deprecated internal use character.	Treat as a dash for analysis.

The IUPAC-IBC Ambiguity Codes for Nucleotides:

Reference: Cornish-Bowden A. (1985). Nomenclature for incompletely specified bases in nucleic acid sequences: recommendations 1984. *Nucleic Acid Research* 13:3021-3030.
Ratio	1:1					1:2						1:3				1:4
Symbol	G	A	C	U	T	R	Y	M	K	S	W	H	B	D	V	N
Meaning	G	A	C	U or T	T or U	G or A (puRine)	T, U or C (pYrimidine)	A or C (aMino)	G, T or U (Keto)	G or C (Strong)	A, T or U (Weak)	A, C, T or U	G, T, U or C	G, A, T or U	G, C or A	G, A, T, U or C
Complement	[H]	[B]	[D]	[V]	[V]	[Y]	[R]	[K]	[M]	[W]	[S]	[G]	[A]	[C]	[U or T]	n/a

The AE2 alignment editor (used by the CRW Project) defaults to disallowing nucleotide edits, but does not prevent the insertion of numbers into a sequence. Users should treat number characters as dashes when they appear (or discard the affected sequence if translating to dashes is not possible).

DIVIDER Sequences:

Sequences that are marked as DIVIDERs are intended to assist with visual inspection of alignments by grouping related sets of sequences together. For rRNA alignments, the DIVIDERs typically delimit taxonomic groups (using the NCBI Taxonomy). For intron alignments, we subdivide by both subgroup and intron position. These DIVIDER sequences should not be used for analysis.

Everything Else:

Despite our best efforts, we don't manage to catch every odd entry. Thus, should you encounter any other anomalies in a sequence from these alignments, please contact us, using a Subject: line of "Alignment Issues." Your feedback 1) will help us to keep the alignments and this page useful to the scientific community and 2) is greatly appreciated.

Comparative RNA Web Site and Project The Gutell Lab