CMSC 435 Algorithm Analysis & Design
RNA Folding
(No sharp turns) The ends of each pair are separated by at least 4 intervening bases. If (bi,bj) ϵ S, then j – i > 4
Only bases A and T or G and C will pair with one another.
S is a matching. No base appears in more than 1 pair.
(The no crossing condition) If (bi,bj) and (bk,bh) are two pairs in S, then we cannot have i < k < j < h.
Then any pair of bases that satisfy the above constraints may or may not be paired.
If we let OPT (i, j) represent the maximum number of pairs in the sub-string from bi to bj then the optimal folding will either leave bj unpaired, or pair it with one of the bases bt, where t lies in the permissible range between i and j - 4
OPT(i, j) = max[OPT(i, j – 1), max(i < t < j - 4)(1 + OPT(i, t -1) + OPT(t + 1, j – 1))]
Construct 2 tables of size n+1 by n+1 – one to hold the number of base pairs and one to hold the identity of the paired bases.
Let b = <b1, b2, b3, ...bn> be a sequence of bases in an RNA molecule – denoted by the characters A, C, G, T where A and T can pair with each other and C and G can form pairs.
Algorithm RNAfold (p) {
// initialize the forbidden pairs to 0;
for(int i = 1; i <= n – 5; i++)
for (int j = 0; j < i + 4; j++) {
M[i][j] = 0;
P[i][j] = 0;
}
}
for (int k = 5; k < n; k++) //examine all allowable separations
for (i = 1; i <= n – k; i++) {
j = i + k;
int max = 0, pair;
for (int t = i; t < j – k; t++) {
if (b[t] &&b[j] in the same set) //the two bases can be paired
int temp = 1 + M[i][t-1] + M[t+1][j-1];
if (temp > max) {
max = temp; pair = t;
}
} // end if in same set
if (max > M[i][j-1]) {
M[i][j] = max;
P[i][j] = pair;
} // end if
else {
M[i][j] = M[i][j-1];
P[i][j] = 0;
} // end else
} //end for t
} // end for i
} //end for k
} // end algorithm