Rna Parse

PrimitiveOperations and General Terms used to Describe RNA Secondary Structure

James F. Lynn May 2009

RNA, as a single-stranded,flexible, linear molecule is able to fold back on itself to form avariety of interesting and functional secondary and tertiary shapes. Many of these RNA configurations are exceeding difficult to describe mathematically. This is an introduction to the terms and nomenclature used to represent all possible configurations, both real and hypothetical, that a linear sequence of RNA may form due to interactions with itself only.


RNA is composed of four nucleotides, adenine, guanine, cytosine anduracil; Denoted as A,G,C, and U respectively. For purposes of this discussion the composition of RNA can be generally be thought of as a single, linear sting of these molecules that essentially mirrors one strand of the two stranded DNA counterpart and is read from the “5'”position to the “3'” position. Standard notation for a10-nucleotide RNA segment may thus be represented thusly:


Due to certain molecular properties of the individual nucleotides thesomewhat flexible RNA strand often folds back on itself to formsecondary and tertiary structures according to the following rules: Atends to bond with U, and G with C. There are exceptions to these rules as the folded strand may force additional bonding when one partof the molecule is brought into close proximity with another part. These additional rules will be ignored herein to keep the discussion simple.





Hence,with few exceptions, these 4 rules of bonding act to guild the RNA strand into a variety of secondary and tertiary shapes that interactin the context of both themselves and their biological environment (The living cell.)

The simplest secondary RNA structure is the stem-loop in which the linear strand is folded back on itself. As shown here, a single RNA strandmy fold back on itself to form a complex system of stems and loops –all according to the simple rules of nucleotide to nucleotide bonding previously mentioned.

Of greater complexity, the loop portion of a stem-loop may bond with nucleotides outside of itself to form the RNA pseudoknot which hasnot only interesting in terms of biological activity but is of greatinterest to those who seek to describe it mathematically.

The“dot-bracket” notation is commonly used to help describe RNA secondary structure where bracketed nucleotides denote “base pairs”and a dot or colon denote unbound nucleotides.
In theory, if the loops of the pseudoknot above are flexible enough,they could also interact with one another to form a third bond:

fig.2 Pseudoknot AAAAGGGGUUUUCCCC (((([[[[))))]]]]

Fig3. Unusual RNA “crossing loop pseudoknot” where the bases c have interacted with one another as tandem-compliments.

Other basic variations of the stem-loop include the “bulge” which mayappear anywhere along a stem where compliment bases are not presentand is denoted by a colon:

Fig4. Stem-loop with budge.  ((((((((:::::::::::))))):)))

A type of hypothetical structure who's presence in the literature is notoriously absent is the non-palindromic tandem-complement loop. Up until now all RNA secondary structures consist of at least one palindromic-complement base pair that causes folding on the nucleotide chain into at least one loop. This is dependent on the fact that the RNA sequence contains >0 complement bases andusually “several” complement pairs as one or two complement pairs posses insufficient bonding energy to stay bonded In order to distinguish between a normal stem loop and a linear or tandem complement.

Usual dot bracket notation is not sufficient to describe RNA structures that contain tandem complement areas of bonding. Thus, I've introduced an additional bracket of “{}” to denote such. (As illustrated in figure 3.) We have included a database of these structures on ourwebsite for further investigation.

Fig5. Tandem complement forming a theoretical loop.