Optimal Alignment of Two Strings

 

public Cell[][] Alignment(char[] X, char[] Y) {

    Cell[][] table = new Cell[X.length+1][ Y.length+1];

    for (int i = 0, i <= X.length; i++) {

        table[i][0].cost = i * deltaX;

        table[i][0].dir = "|";

    }

    for (int j = 1; j <= Y.length; j++) {

        table[0][j].cost = j * deltaY;

        table[0][j].dir = "¬";

    }

    for (i = 1; i <= X.length; i++)

        for (j = 1; j <= Y.length; j++) {

            temp1 = alphax[i],y[j] + table[i-1][j-1];

            temp2 = deltaX + table[i-1][j];

            temp3 = deltaY + table[i][j-1];

            if (temp1 < temp2 && temp1 < temp3) {

                table[i][j].cost = temp1;

                table[i][j].dir = "\";

            }

            else if (temp2 <= temp3) {

                table[i][j].cost = temp2;

                table[i][j].dir = "|";

            }

            else {

                table[i][j].cost = temp3;

                table[i][j].dir = "¬";

            }

        }

    return table;

}

 

Let

            deltaX = deltaY = 2

           

                                    0  if x[i] = y[j]

            alphax[i],y[j] =     1  if x[i] and y[j] both vowels or both consonants

3        if one is a vowel and the other a consonant

 

Align the strings

            X = algorithm

            Y = altruism

 

            Construct the table of dimension 10 x 9

            Let l = left, u = up, d = diagonal – directions for retrieving the optimal alignment from the table

 

           

 

0

a

1

l

2

t

3

r

4

u

5

i

6

s

7

m

8

 

0

 

0

2

l

4

l

6

l

 

8

l

10

l

12

l

14

l

16

l

 

a  1

 

2

u

 

0

d

2

l

4

l

6

l

8

l

10

l

12

l

14

l

 

l  2

 

4

u

2

u

0

d

2

l

4

l

6

l

8

l

10

l

12

l

 

g  3

 

6

u

4

u

2

u

1

d

3

l

5

l

7

l

9

l

11

l

 

o  4

 

8

u

6

u

4

u

3

u

4

d

4

d

6

l

8

l

10

l

 

r  5

 

10

u

8

u

6

u

5

u

3

d

5

l

7

l

7

d

9

l

 

i  6

 

12

u

10

u

8

u

7

u

5

u

4

d

5

d

7

l

9

l

 

t  7

 

14

u

12

u

10

u

9

u

7

u

6

u

7

u

6

d

8

l

 

h  8

 

16

u

14

u

12

u

11

u

9

u

8

u

9

u

8

u

7

d

 

m  9

 

18

u

16

u

14

u

13

u

11

u

10

u

11

u

10

u

8

d

 

 

                        a l g o r - i t  h m

                        a l t  - r u i s -  m

 

The Alignment Algorithm is Q(nm) where n = length of Y and m the length of X.  The table also requires Q(nm) space which for long strings is a bigger concern than the Q(nm) processing cost.  We can see that to make each table entry we only need to examine entries in the same column and the column immediately to the left.  We can revise the above algorithm so that the table only maintains two columns.

 

public int SpaceEfficientAlignment(char[] X, char[] Y) {

    int [][] table = new int[X.length+1][2];

    int count = 0;

    for (int i = 0; i <= Y.length; i++)

        table[i][0] = ++count * deltaY;

    for (i  = 1; i <= Y.length; i++) {

        table[0,1] = j * deltaX;

        for (int j = 1; j <= X.length; j++) {

           table[i][1] = min(alphax[i],y[j] + table[i-1][0], deltaY + table[i-1][1], deltaX + table[i][0];

        }

        for (int k = 0; k <= X.length; k++)

            table[k][0] = table[k][1];

    }

}

 

This table has a space requirement that is Q(m), but we have lost half of the information contained in the full table; namely the instructions for how to perform the alignment.  We need to see if there is a way of reducing the space requirements of the algorithm while maintaining the cost and the alignment information.

 

Notice that this problem can be converted into a graph problem if we construct a graph whose vertices are the (i,j) pairs of indices and whose edges are the costs deltaX, deltaY, and alphax[i],y[j] to reach adjacent vertex pairs.  The alignment problem is thus reduced to the problem of finding the minimum cost path between (0,0) and (m,n).  The graph is an (m + 1) x (n + 1) mesh.

 

DynqmicProgrammingMesh

The minimum cost path must go through one of the vertices on the line y = n/2 (assume without loss of generality that n is a power of 2).  The minimum cost path from (0,0) to (m, n) must go through one of the vertices (k, n/2) where k = 1..m.

The min-cost path is therefore mink(min((0,0)..(k,n/2)) + min((k,n/2)..(k,n)).  This leads us to the following divide and conquer algorithm:

 

DivideAndConquerAlignment(X,Y) {

    Let m be the number of symbols in X

    Let n be the number of symbols in Y

    Let P be a list of vertices on the path from (0,0) to (m,n); //P is of max dim m + n

    If m <= 2 or n <= 2 then

            Compute optimal alignment using Alignment(X,Y);

    int [] B = SpaceEfficientAlignment(X, Y[1:n/2]);

    int [] D  =  BackwardSpaceEfficientAlignment(X,Y[n/2+1:n]);

    int q, temp = infinity;

    for (int k = 1; k <= m; k++)

        if (B[k] + D[k] < temp) {

            q = k;

            temp = B[k] + D[k];

        }

    Add (q, n/2) to list P

    DivideAndConquerAlignment(X[1:q], Y[1:n/2]);

    DivideAndConquerAlignment(X[q+1:m], Y[n/2+1:n]);

    return P

}

 

We can write the BackwardSpaceEfficientAlgorithm  analogously to (Forward) SpaceEfficientAlgorithm.

 

Let g(i, j) = min[alphax[i+1],y[j+1] + g(i + 1, j + 1), deltaY + g(i + 1, j), deltaX + g(i, j +1)]  for i < m. j < n

 

Start with g(m, n) = 0 and work your way to the left

 

public int BackwardSpaceEfficientAlignment(char[] X, char[] Y) {

    int [][] table = new int[X.length+1][2];

    int count = 0;

    for (int i = Y.length; i >= 0; i--)

        table[i][0] = (m – i) * deltaY;

    for (i  = Y.length; i >= 1; i--) {

        table[m][1] = ++count * deltaX;

        for (int j = X.length - 1;  j >= 0; j--) {

           table[i][1] = min(alphax[i+1],y[j+1] + table[i+1][0], deltaY + table[i+1][1], deltaX + table[i][0];

        }

        for (int k= 0; k <= X.length; k++)

            table[k][0] = table[k][1];

    }

}