CRW Site RDBMS Help

Last modified on 06 June 2014.

Item

Description

Fields of defined information (abbreviation):

Organism
Phylogeny
Common Name
Cell Location (L)
RNA Type (RT)
RNA Class (RC)
Sequence Length (Seq.Length, Size)
% Complete (Cmp)
Accession Number (Acc.Number, AccNum)
Intron Number (IN)
Intron Position (IP)
Exon (EX)
Open Reading Frame (ORF, O)
Secondary Structure Diagrams (Sec. Structures, StrDiags)
Comment
Results/Page
Color Display

Organisms are most commonly named here with the binomial system of nomenclature, "Genus species." Some organism names contain subspecies, strain, or operon information.

Examples:

Homo sapiens
Escherichia coli
Chlorella ellipsoidea (strain IAM C-87)
Campylobacter sputorum subsp. sputorum
Treponema pallidum (rRNA A)

Phylogenetic Classification, m

The organism names available here are classified into the three primary phylogenetic domains [Woese C.R., Kandler O., and Wheelis M.L. (1990). Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eucarya. Proc Natl Acad Sci USA 1990 87:4576-4579].

This phylogenetic classification scheme is maintained by the National Center for Biotechnology Information (NCBI), and can be found at their Taxonomy Browser.

Common Name

The common name database is maintained by the National Center for Biotechnology Information (NCBI), and found at their Taxonomy Browser. Upper case names are specific common names for that "genus species" while lower case are names for a broader class including genus, family or order.

Selected common name groupings have been predefined and made available via the "Animals," "Fungi&Plants," and "Protists" buttons.

L: Cell Location

cellular location of the RNA

(C) Chloroplast
(Y) Cyanelle
(M) Mitochondrion
(N) Nucleus
(V) Virus

RT: RNA Type

Currently, this database actively maintains two RNA types: Introns (primarily group I introns) and Ribosomal RNA (primarily 16S and 23S rRNA). Transfer RNA (tRNA) data will be added in the future.

I = Intron
R = ribosomal RNA (rRNA)
T = transfer RNA (tRNA)

RC: RNA Class

rRNA Classes:

16S = 16S or 16S-like rRNA (INCLUDING mitochondrial 12S rRNA)
23S = 23S or 23S-like rRNA (INCLUDING mitochondrial 16S rRNA)
5S = 5S rRNA

tRNA Classes:

trnA = alanine tRNA
trnC = cysteine tRNA
trnD = aspartic acid tRNA
trnE = glutamic acidtRNA
trnF = phenylalanine tRNA
trnG = glycine tRNA
trnH = histidine tRNA
trnI = isoleucine tRNA
trnK = lysine tRNA
trnM = methioninetRNA
trnN = asparaginetRNA
trnP = prolinetRNA
trnQ = glutamine tRNA
trnR = arginine tRNA
trnT = threonine tRNA
trnV = valine tRNA
trnW = tryptophan tRNA
trnY = tyrosine tRNA

The introns available here are divided into groups I and II. The group I introns are further divided into several subgroups, as initially defined by Michel and Westhof [Michel F. and Westhof E. (1990). Modelling of the three-dimensional architecture of group I catalytic introns based on comparative sequence analysis. Journal of Molecular Biology 216:585-610.]

Intron Classes:

I = Group I intron, unknown subgroup
- IA = Group I intron, A subgroup
  - subgroups IA1, IA2, IA3
- IB = Group I intron, B subgroup
  - subgroups IB1, IB2, IB3, IB4
- IC = Group I intron, C subgroup
  - subgroups IC1, IC2, IC3
- ID = Group I intron, D subgroup
- IE = Group I intron, E subgroup
II = Group II intron, unknown subgroup
- subgroups IIA, IIB
Unclassified = unclassified introns

Ex: Exon (the gene in which the intron is located)

Genes and their abbreviations:

R = rRNA
- 16S = 16S rRNA
- 23S = 23S rRNA
- 5S = 5S rRNA
T = tRNA
- trnA = tRNA-Alanine
- trnC = tRNA-Cysteine
- trnD = tRNA-Aspartate
- trnE = tRNA-Glutamate
- trnF = tRNA-Phenylalanine
- trnG = tRNA-Glycine
- trnH = tRNA-Histidine
- trnI = tRNA-Isoleucine
- trnK = tRNA-Lysine
- trnL = tRNA-Leucine
- trnM = tRNA-Methionine
- trnN = tRNA-Asparagine
- trnP = tRNA-Proline
- trnQ = tRNA-Glutamine
- trnR = tRNA-Arginine
- trnS = tRNA-Serine
- trnT = tRNA-Threonine
- trnV = tRNA-Valine
- trnW = tRNA-Tryptophan
- trnY = tRNA-Tyrosine
Proteins
- A6 = ATPase 6
- A9 = ATPase 9
- CB = COB or CYTB (cytochrome b)
- ClP = clpP (protease)
- DP = DNA Polymerase
- L16 = rpl16 (ribosomal protein L16)
- L2 = rpl2 (ribosomal protein L2)
- LtrB = relaxase
- ND1 = ND1 (NADH dehydrogenase subunit 1)
- ND2 = ND2 (NADH dehydrogenase subunit 2)
- ND3 = ND3 (NADH dehydrogenase subunit 3)
- ND4 = ND4 (NADH dehydrogenase subunit 4)
- ND4L = ND4L (NADH dehydrogenase subunit 4L)
- ND5 = ND5 (NADH dehydrogenase subunit 5)
- ND7 = ND7 (NADH dehyodrogenase subunit 7)
- nrdB = nrdB (ribonucleoside diphosphate reductase B)
- nrdD = nrdD (ribonucleoside triphosphate reductase D)
- OX1 = cytochrome oxidase subunit 1
- OX2 = cytochrome oxidase subunit 2
- OX3 = cytochrome oxidase subunit 3
- pB = petB (apocytochrome b6)
- pD = petD
- psaB = psaB (chlorophyll alpha apoprotein A2)
- psbA = psbA (chloroplast psbA, photosystem II 32 kD protein)
- psbC = psbC (chlorophyll a binding protein)
- rbcL = ribulose-1,5-bisphosphate carboxylase/oxygenase large subunit
- RPC1 = rpoC1 (RNA polymerase C1)
- S10 = rps10 (ribosomal protein S10)
- S12 = rps12 (ribosomal protein S12)
- S14 = rps14 (ribosomal protein S14)
- S16 = rps16 (ribosomal protein S16)
- td = thymidylate synthetase
- terL = terL (bacteriophage LL-H terminase, large subunit)
- YCF3 = ycF3

IN: Intron Number

This number is assigned to an intron to easily distinguish between multiple introns that occur within the same gene. If the gene contains only one intron, the assigned number is 1 (one).

IP: Intron Position

This number indicates the intron's position in a 16S or a 23S rRNA molecule (the value refers to the nucleotide number in the corresponding Escherichia coli molecule). This number is helpful for comparing the various occurances of introns between sequences.

Size: Sequence Length

Length of the sequence (in nucleotides). [NOTE: only sequences that are more than 90% complete and publicly available are currently included in this RDBMS.]

Cmp: % Complete

Completeness of the sequence (as a percentage). [NOTE: only sequences that are more than 90% complete and publicly available are currently included in this RDBMS.]

O: Open Reading Frame

Indicates the presence or absence of an Open Reading Frame of at least 500 nucleotides in an intron sequence. More details about the designations are available.

Y = yes
N = no
U = unknown

AccNum: Accession Number

GenBank Accession Number [NCBI-National Center for Biotechnology Information]. Some sequences are assigned multiple accession numbers. By default, the RDBMS output displays only a single accession number for a sequence. If more accession numbers exist, m will appear next to the visible accession number. Click on m to see all of the accession numbers associated with that row. Click on s to bring you back to a single viewable accession number.

Only View Records w/Sec. Struct. Diagrams

Search exclusively for sequence records with Secondary Structure diagrams.

NOTE: Our secondary structures are available in several formats: PostScript, PDF, and a simple text file containing the sequence and base pairing information. The structure diagrams can be obtained by either browsing the database or using the Data Retrieval Page to download many files at once.

StrDiag: Secondary Structure Diagrams

RDBMS Display of Secondary Structure Diagram Information:

The RDBMS presents an abbreviated form of each secondary structure's full file name to conserve screen space. The following abbreviations are used:

Diagram Version: the meaning of the letter is dependent upon the molecule, as follows:

**CURRENT** = current model version. Internal = internal version of the diagram which not released. Old = older diagram version which needs to be updated. ===== = not applicable.
Version Letter	rRNA			Introns
Version Letter	5S	16S	23S	Group I	Group II
a	Old	Old	Old	Old	CURRENT
b	CURRENT	Old	Old	CURRENT	=====
c	=====	Internal	=====	=====	=====
d	=====	CURRENT	CURRENT	=====	=====

Molecule:

5: 5S rRNA.
16: 16S rRNA.
23.5: 23S rRNA, 5' half (235 in file names).
23.3: 23S rRNA, 3' half (233 in file names).
I1: Group I intron.
I2: Group II intron.

Legend for Secondary Structure Diagrams:

Base Pair Symbols:
- connecting line: canonical pairs (A:U, U:A, C:G, and G:C).
- small closed circle: G:U and U:G pairs.
- large open circle: A:G and G:A pairs.
- large closed circle: all other non-canonical pairs.
Base Pair Symbol Colors:
- red: strongest correlations.
- green: good correlations.
- black: weak but significant correlations.
- gray: invariant pairs.
- blue: pair not detected using the collective scoring method.
For Reference structure diagrams only: every 10th nucleotide is marked with a tick mark, and every 50th nucleotide is numbered.
Secondary structure diagrams were drawn with the program XRNA, which was developed by B. Weiser and H. Noller at the University of California, Santa Cruz.

More About Secondary Structure File Names:

Filenames for the structure diagrams have the following syntax:

[version].[molecule].[phylogenetic domain OR organelle].[organism].[intron class].[exon].[intron position (rRNA) OR number (other)].[file type]

Intron-specific fields are shown in green and only appear in intron file names.

"Intron position (rRNA) OR number (other)" (shown in boldface above) is defined based upon the exon:

intron position: refers to the rRNA position (E. coli numbering) after which the intron occurs.
intron number: indicates multiple introns within a single exon (non-rRNA). Introns are numbered i1, i2, ..., in.

File name examples:

b.233.m.P.anserina.ps is version "b" 23S rRNA (3' half) mitochondrial structure from Podospora anserina, in PostScript format.
a.I1.e.T.usneae.C1.SSU.1512.bpseq is a version "a" Group IC1 intron structure from position 1512 of the nuclear 16S rRNA of Trebouxia usneae, in formatted text.
a.I1.y.C.paradoxa.C3.tLEU.pdf is a version "a" Group IC3 intron structure from the Cyanophora paradoxa cyanelle leucine tRNA gene, in PDF format.
a.I1.m.S.cerevisiae.B1.OX1.i4.ps is a version "a" Group IB1 intron structure from the Saccharomyces cerevisiae cytochrome oxidase subunit 1 gene in PostScript format; the intron described is the fourth intron in this cytochrome oxidase subunit 1 gene.

For a more detailed list, see "Codes used in Structure File Names."

(.PS): PostScript

These secondary structure files are in PostScript format. PostScript files can be viewed with several different PostScript previewers (such as GhostScript). A GhostScript FAQ is available. A general description of PostScript is available in the document "A First Guide to PostScript".

(.PDF): Portable Document Format

Graphics files in this format can be viewed with the program Adobe Reader.

(.BPSEQ): BasePair/SEQuence Information

Basepair and sequence information is presented in a text-based format. Each row presents information for one nucleotide in the sequence; the first field is the position number, the second field is the nucleotide at that position, and the third field contains the position number of its basepair partner (or 0 when unpaired). All secondary and tertiary basepairs are represented in this file, but base triples are not.

For 23S rRNA structures, a single BPSEQ file that contains all of the pairing information is available. Both the "235" and "233" links point to this single file.

Secondary structure diagrams were interactively generated with the X-Windows-based graphics program "XRNA", developed by B. Weiser and H. Noller, University of California at Santa Cruz.

(.ALDEN): ALDEN secondary structure format

The Alden format (named for its developer, Matthew Alden) is a representation of RNA secondary structure. Each position in a sequence is grouped into a structural element segment, relates those segments into their parent structural elements, and shows the connectivity between structural elements. Each nucleotide in the sequence maps to one and only one structural element segment.

The current version of the Alden format presents each structural element segment, in 5' to 3' order, on one line containing 53 comma-delimited values (many of which are empty; this is a sparse format). See "Fields" (below) for details.

The example provided below is from Escherichia coli 5S rRNA (Genbank Accession Number V00336). It contains examples of six of the eight types of structural element (lacks HELIX-KNOT and FREE).

Structure Element Types (8):

HELIX (2 segments): a set of contiguous basepairs.
HELIX-KNOT (2 segments): a helix that forms a pseudoknot in the secondary structure.
HAIRPIN (1 segment): a loop that caps the end of a single helix.
BULGE (1 segment): a loop that connects two helices, without a corresponding loop on the other side of the helices.
INTERNAL (2 segments): a pair of loops that connects two helices, with one loop on each side of the helices.
MULTISTEM (3 or more segments*): a set of loop segments that connects three (or more) helices into a single junction
FREE (1 segment): an unpaired segments that typically links two domains.
TAIL (1 segment): an unpaired segment that contains one end of the molecule.

Fields:

1. First nucleotide (number) of the segment.
2. Last nucleotide (number) of the segment.
3. Type of structural element.
4. Unique ID of structural element.
5. ID of the structural element preceding the current structural element. (N.B.: this is based upon the first segment of the structural element. N.B.: the first structural element has no preceding element.)
6-37. Pairs of nucleotide numbers that indicate the first and last nucleotide numbers of each segment in the structural element.
38-53. ID of each structural element adjacent to the current structural element.

Example (click image for full-size version):

(.RNAML): RNA Markup Language

RNAML is a markup language syntax designed for the exchange of RNA sequence and structure information. The official documentation for RNAML is located at the RNAML website.

The (.NOPBPSEQ) variant format excludes pseudoknot helices (to support programs which cannot accommodate pseudoknots).

(.CT): Connect Table

(.NOPCT): NO Pseudoknots CT

The Connect Table format was designed to relate sequence and base pair information as input to structure-drawing programs. This text-based format presents the nucleotide number, nucleotide, 5' neighbor, 3' neighbor, pairing partner number (0 if unpaired), and original sequence nucleotide number (same as nucleotide number for all files including only one sequence). The official documentation for the CT format is located at the Mfold manual.

The (.NOPBPSEQ) variant format excludes pseudoknot helices (to support programs which cannot accommodate pseudoknots).

(.BRACKET): Vienna Dot-BRACKET Notation

Dot-Bracket Notation represents pseudoknot-free sequences by a string of parens (open and close for the 5' and 3' nucleotides of a base-pair, respectively) and periods (unpaired nucleotides). A formal description of Dot-Bracket Notation is located at the Vienna RNA Webservers Help Page.

Results / Page

Number of sequence and/or secondary structure diagram records per page. The larger the number, the greater the time required to create and display the html page.

All Data Fields:

SEARCH ALL: Search for words or characters that occur in any attribute field.

Search Value (Examples)

Here is where the user specifies what they are searching for. Since this is a relational database, specifying certain attributes will limit the values for other attributes. A few examples of searches are listed here.

Find all "Escherichia coli" records.
- In the Organism search value, type "E", then click on the "V" button to the right. "Escherichia coli" and many other organism names appear in the right frame values window. Click on "Escherichia coli". This name appears in the organism white box. Click the submit button.
- NOTE: These entries are case-sensitive; thus, "escherichia" doesn't work.
- The resulting information table shows the records that satisfy this search. The number of records, search strategy, and sort order are listed in the frame at the bottom of the page.
- Clicking on a field name (e.g. AccNum) results in information about that field in the help browser window.
- Selecting an accession number (e.g. M87049) results in a new window with that GenBank entry.
- Initially only one GenBank (GB) number is shown. To the right of the accession number is an "m" for those records that have more than one GB entry. Clicking on the "m" results in a complete list of GB entries for that record. Clicking on the "s" returns the screen to a single GB entry.
- Available Secondary Structure diagrams are listed in the "StrDiag" field. The PostScript, PDF, or sequence and basepair information is displayed in a new window.
- Access to a detailed phylogenetic classification is provided at the right for each record.
- NOTE: the "edit query" provided in our web page should be used to return to the "Search RNA Information" page from the "List Records" page (do not use the browser back button while using this RNA Web DB).

Search for introns that occur at position 2449 in 23S rRNA.
- RNA Type = intron RNA (click off Ribosomal RNA)
- Exon=23S (click on rRNA, then V in Exon field, click on 23S in values frame)
- Intron Position In rRNA=2449 (V, obtain list of 23S intron positions in the values frame, click on 2449)

Search for all Mitochondrial group I introns and rRNAs, with a minimal amount of paging of the resulting tables.
- Cell Location = Mitochondrion
- Results / Page = 400

S: Sort Order

Sort the output for each attribute; a,b,c, ... -> z, 1,2,3, ... -> $. The user can specify the order that the attributes are sorted. The default sort order is for "Phylogenetic Classification" to be sorted first, followed by "Organism", "Cell Location", and then "RNA Class". For your convenience, the default sort order is displayed on the buttons in the Sort Order column. This order can be changed by clicking on the buttons in the Sort Order column. The first button clicked is sorted first, the second button clicked is sorted second, and so forth. The "Sort Reset" button restores the original default sort order.

R: Reverse Sort Order

The order of a sort for each sortable attribute can be inverted (e.g. z,y,x, ... -> a instead of a,b,c, ... -> z) by clicking on the appropriate button.

V: Values

Shows possible values for that attribute in the right frame window. A specific value is inserted into the search value box when that value in the right frame is clicked.

Comment:

May include other information about the sequence which does not fall into another category.

Color Coding:

Enabling this option causes the output of your search to display all entries for one organisms with the same background color.

Example:

Color Coding Example

Comparative RNA Web Site and Project The Gutell Lab

CRW Site RDBMS Help

StrDiag: Secondary Structure Diagrams

RDBMS Display of Secondary Structure Diagram Information:

Legend for Secondary Structure Diagrams:

More About Secondary Structure File Names: