Last modified on 26 January 2016.

Literature Reference:

Doshi K.J., et al.
Unpublished (2007).


Publicly-Available Materials:


Documentation Excerpt:

The Comparative Analysis Tookit (CAT) provides an engine for:

  • Automatically aligning RNA sequences.
  • Evaluating the quality of an RNA sequence alignment.
  • Creating subalignments (using specific selection criteria).
  • Sorting and annotating RNA sequence alignments via phylogenetic relationships.
  • Creating secondary structure diagrams.
  • Calculating base-pair frequencies and nucleotide frequencies for sequences in an RNA sequence alignment.
  • Calculating consensus sequences for RNA sequence alignments.
  • Calculating identity between aligned sequences in an RNA sequence alignment.
  • Searching an RNA sequence alignment using FASTA.
  • Searching an RNA sequence alignment using identity to an aligned sequence.


Commands

Section Command Short Description
Application Command cat Command-line options when launching the CAT application.
Basic Commands config Load or change CAT configuration options.
batch Execute simple script files containing sets of CAT commands.
exit Exit CAT.
history View a history of commands that have been executed by a user within a given CAT session.
cmdTimer Toggle execution timing for individual commands.
alias View aliases currently registered with a given instance of the CAT application.
Alignment Manipulation Commands renameRows Rename sequences in an alignment using the format: NCBITaxID.CellLocation.Genus.Species.Ordinal.
clearBuffer Clear temporary alignment results from the autoalign command out of memory.
changeCurAln Change the "current" alignment. The "current" alignment is the default alignment which many other commands operate on. After CAT is first launched, the first alignment loaded is set current by default (see the loadAlignment command).
swapRows Swap a temporary result for a given row in the "current" alignment (e.g., an autoalign command result) with the actual contents of the same row in the "current" alignment. Note: This command is not full tested and should not be relied upon.
loadAlignment Load an alignment into memory.
closeAlignments Close alignment(s) already loaded in-memory.
saveAlignment Save an entire alignment already loaded in-memory to a specified file or to create a subalignment from an alignment already loaded in-memory.
listAlignments List the names and sizes of all alignments loaded in-memory.
listSequences List the names and sizes of all sequences in the current alignment.
viewAlignment View an entire alignment or selected rows from an alignment at the command-line. Note: this command works best when the CAT application is launched in a command terminal without automatic line wrapping.
Alignment Search and Selection Commands selectRowsWithLim
  • Select specific rows from an alignment with different criteria:
    • cell location
    • taxonomy
    • identity (to a reference sequence)
    • complete (as annotated in the CRWDB)
  • Create subalignments with specified rows (AE2 or fasta formats).
  • Sort sections of an alignment or a whole alignment according to taxonomy.
  • Insert taxonomy divider rows within the alignment.
searchAlignments Perform a FASTA search over multiple alignments loaded in memory with a given sequence. Searches can encompass the entire sequence, or fragments defined by specific selection criteria.
Alignment Generation Commands autoalign Align multiple sequences using an already aligned sequence as a template.
fullAlignment Have CAT automatically select a set potential aligned sequences (using FASTA) as templates and then autoalign a specified sequence against each potential template.
findQueries Have CAT select a set of unaligned sequences (using FASTA) and then autoalign each unaligned sequence against a specified template sequence.
Analysis Commands evaluate Check the accuracy of the alignment of the given sequence or a set of sequences using sequence-based criteria and structure-based criteria. Percent complete and length can be calculated for the sequence(s) evaluated and the CRWDB updated if desired.
evalRunner A wrapper around the evaluate command. Its primary purposes are to:
  • Provide a simplified syntax for evaluating a set of aligned sequences against a single sequence in the alignment.
  • Directly update the CRWDB with computed values for Percent complete and Sequence Length when evaluation results meet specific thresholds.


Users should look to use the evalRunner command unless they need specific features from the evaluate command, which are not available in the evalRunner command.
consensus Calculate the consensus for either all or a selected subset of sequences from the "current" alignment. The consensus can be calculated across either all or a specified subset of columns from the "current" alignment.
identity Calculate the pairwise identity and overlap between sequences in a given alignment that are already aligned. This command can also search an alignment for sequences that have a given identity and/or overlap to a specified reference sequence.
calcBPFreqs Compute the base-pair frequency data in an alignment, across a given row set, given a reference set of secondary structure base-pairings.
calcNTFreqs Compute the nucleotide frequencies for all columns in an alignment, across a given row set.
calcInDels Count Insertion/Deletion events for a specified sequence with respect to a given reference sequence.
seqComps Compute the nucleotide compositions of sequences in an alignment.
checkAln Check the accuracy of the alignment of sequence computed using the autoalign command, assuming the given sequence is already correctly aligned in the alignment. Note: this command is useful for regression testing new parameters for the autoalign command.
Structure Manipulation/Generation Commands loadStructData Map secondary structure pairings for a specific row in an alignment.
templateDiagram Use this command to create a new secondary structure diagram (XRNA format) using an existing diagram as a template.
projectPairings Use this command to create new secondary structure pairing sets by projecting an existing set of secondary structure base pairs across a set of sequences. The pairings sets are output in BPSEQ, CT, RNAml, Bracket and Alden formats.
Genbank Commands checkAccGB Use this command to check if a sequence in an alignment also exists in GenBank. The check is performed using the NCBI Accession Number stored in the CRWDB for the sequence. If the sequence exists in GenBank, the command can update the Available field in the CRWDB. Note: This command uses NCBI eUtils and Web Services to query Genbank remotely, in contrast to the rest of the commands in this section which require a local copy of Genbank.
checkGBHits Use this command to crosscheck against the CRWDB a set of sequences identified by searching a copy of Genbank that is co-located on the same server from which CAT is invoked.
getGBEntries Use this command to retrieve Genbank entries by locus from a copy of Genbank that is co-located on the same server from which CAT is invoked.
manageGenbank Use this command to manipulate a copy of Genbank that is co-located on the same server from which CAT is invoked.
searchGenbank Use this command to search a copy of Genbank that is co-located on the same server from which CAT is invoked using FASTA.
alnGBSearch Use this command to create a new AE2-formatted alignment from a set of Genbank entries obtained with the searchGenbank command.