James F. Lynn RNAparse/www.RNAparse.com 2012
Linear – Regular Expressions (RE)– Type 3 Grammars
Not much needs to be said about regular expression other than the fact they they are a powerful adjunct to context free and context sensitive grammars. Written as a production an RE may look like this: S::= 'ATGC' //match ATGC S::='[AT]TGC' //match ATGC,TTGC Nested – Context-Free Grammars (CFG) – Type 2 Grammars/Type 2 Grammars with Affix Operator
An RNA stem loop (((......)))
S::= a a::= 'A' [b] 'T' | 'T' [b] 'A' | 'G' [b] 'C | 'C' [b] 'G' // add as many productions as needed b::= 'A' [c] 'T' | 'T' [c] 'A' | 'G' [c] 'C | 'C' [c] 'G' c::= loop loop::= 'ATGC{6}'
A more efficient way of describing stem-loop structures is by means of an affix operator, rho
.
S::= a a::= $x(“”) // lambda ( ('T' rho(x<="A" x))| // 'T' followed by x, where 'A' is affixed to x i.e. base pairing. ('A' rho(x<="T" x))| ('G' rho(x<="C" x))| ('C' rho(x<="G" x)) )<3>// number of iterations to take loop x loop::= 'ATGC{6}' //any RE here Crossing Structures – Context Sensitive Grammars (CSG) – Type 1 Grammars H-type Pseudoknot ((( [[[ ))) ]]]
S::= a
$y(“”) x y This method also works for the following: simple compliment repeats and tandem compliment repeats:
Reverse Compliment Repeat vs Simple Repeat:
Segment 1 can form a stem with segment 2. Segment 2 can form a stem with segment 3. Segment 3 and 4 can also form a stem.
// matches ATGCATATGCAT, AAGCTTAAGCTT, AAATTTAAATTT...
Either of these are also easily handled by our system.
nn-a-loop1-b-loop2-a'-loop3-c-loop4-b'-loop5-c'-nnn
'[AGCT]{2}' //any 2 nts. //end
On close inspection the second figure above contains 3 tandem compliment base pairs, each one with crossing properties (e.g. AGT-TCA) but can be handled similarly (grammar not shown.)
James F. Lynn RNAparse/www.RNAparse.com March 2012 While amino acid interactions (proteins) are more complex than nucleotide (RNA) interactions, the underlying principles can still be based on similar context-free grammatical production predicates. The secondary shapes of proteins topologically equivalent to the stem-loop and pseudoknot structures found in RNA. The figure above represents a toy protein where the helix is held into its shape at 12 points, lettered from a to l. Projected to a flat topology map (below) the crossing properties of each connection becomes more clear: a and c connect across b and so forth. The second set of letters represents how the points of connection are rewritten to represent a point and its compliment (compliment denoted by the prime strike.) Thus: a connects to a' across b, c to c' across b' and d...
If we continue adding amino acids to our toy protein to the point in which the chain may fold back upon the helix and interact with it the topology becomes more complex with an increase of crossing interactions. None the less, the topology is described in the same way.
or [a b a' c b' d e c' f g e' g f' h g' h' g' d'] We can begin to see a few very long distant amino acid interactions that cross other interactions. To grammatically describe such a shape as above turns out to be remarkably easy if we rewrite each connection point as having a compliment point; a-a', b-b',c-c'... where points a = a,
b = b, The Algorithm for defining the protein shape describes just two things, a point and its compliment and the placement of each point and its compliment:
To describe the second, more complex helix:
let a be the compliment of a', such that a b a 'c b' d e c' f g e' g f' h g' h' g' d' A very simple grammar for the second helix is expressed in a single predicate: where “rho <=” represents an AFFIX operator that builds a complement from its corresponding point. my_predicate::=
//end
Contact
jlynn@acsalaska.net
If you wish further information, source code, or built .exe's, contact me. Copywrite 2012 all rights reserved save by permission by James F. Lynn |