CMSC 435  Algorithm Anhalysis & Design

 

What Problems can be solved on the computer?  The Class NP-Complete

 

When we answer the question, "what problems can be solved on the computer?", we will identify three categories of problems:

  1. Those problems for which there is a polynomial time-algorithm that will produce a solution to all instances of the problem of input size n with a worst-case running time that is O(nk) for some constant k.
  2. Those problems for which there is no known polynomial-time algorithm for finding a solution to all instances of the problem, but for which there are algorithms that will either find approximate solutions in polynomial-time, or will find a solution to an instance of the problem in polynomial-time with a probability p (find solutions some of the time).
  3. Those problems that are provably intractable (for which there can not be a polynomial-time algorithm for finding a solution.)

 

All problems in category 1, are in a class of problems P -- the class of problems for which there is a polynomial-time solution.

 

A second major class of problems, the class to which most of the problems in our second catregory belong, is the class NP -- the class of problems for which there is a polynomial-time algorithm for verifying that a "certificate", containing an instance of the problem and its solution, is correct.  The class P is a sub-class of NP (since any problem that can be solved in polynomial time can also be verified in polynomial-time, if only by solving the problem and comparing solutions), but NP also contains many problems that can be verified in polynomial-time but for which there is no known polynomial-time solution.

 

Decision Problems

When we speak of these classes of problems, we speak only of decision problems -- problems for which there is a yes or no answer such as does this graph G contain a Hamiltonian cycle?  This is not a major limitation, since other problems, such as optimization problems, can be recast as decision problems.  The traveling salesperson problem -- the problem of finding a minimum cost tour for the salesperson to visit all of the cities in his or her district once and only once -- can be recast as a decision problem if we ask whether a tour of length less than some value k exists.  Even problems like sorting can be cast as a decision problem if we ask if a list L of size n is sorted, perform the operation, and answer yes.

 

The Hamiltonian-cycle problem provides a good illustration of a problem in NP.  There is no known polynomial-time algorithm for deciding whether a Hamiltonian cycle of any graph G exists, but given a certificate, there is a polynomial-time algorithm for verifying the certificate.  A Hamiltonian cycle exists in a graph if  there is a path in which every vertex is visited once and only once and which returns to the starting vertex at the end.  The verification algorithm would need the graph in the form of an adjacency matrix, the number of vertices in the graph, and the sequence of vertices in the cycle.  The algorithm would only need to check that every vertex is visited once and only once -- which it could do in linear time, that the cycle begins and ends at the same vetex, and between each pair of vertices in the sequence there is an edge indicated in the adjacency matrix.  The verification can be completed in time O(n) and a verdict of yes or no rendered.

 

Reductions

Suppose that we have a decision problem A that we would like solve in polynomial time,  suppose also that we have a second decision problem B for which we already have a polynomial-time algorithm for returning a verdict of "yes" or "no", then an algorithm for deciding problem A would exist if we had a polynomial-time algorithm for converting every instance of problem A into an instance of problem B and used the existing algorithm to decide that instance of B.

 

The procedure for converting instances A into instances of B is called a reduction, and it must have the following chracteristics:

  1. The transformation takes polynomial-time
  2. The answers are the same -- the answer to a is "yes" if and only if the answer to b is "yes".

 

The algorithm for deciding A in polynomial-time is illustrated in the figure above;

  1. Use the polynomial-time algorithm to convert an instance a of problem A into an instance b of B
  2. Run the polynomial-time algorithm to decide b
  3. Use the decision for b as the answer for a

The two parts in sequence run in polynomial time, so the algorithm for deciding A will also be polynomial-time.

 

While it is not necessarily common nor efficient to construct algorithms for solving problems in class P in this manner, such a construction is indeed feasible and will prove essential for constructing a subclass of NP called NP-complete.

 

Formal Language Notation

 

Formal language notation will be useful for describing decision problems that are coded into a language to be solved on a computing machine.  Here are a number of definitions:

 

An alphabet S is a finite set of symbols.

A language L over S is any set of strings made from the symbols in S.

 

For instance,  if S = {0,1) then L = {10, 11, 101, 111, 1011, 1101,....} is the language of the prime numbers in their binary representation. 

If S = {a,b,c,.....,z} then L = {pop, mom, dad, radar, ...} is the language of palindromes in English vocabulary.

 

The language of all strings over S is denoted by S*.  The empty string is denoted by e, and the empty language by f.

 

From the point of view of Language theory, the instances for any decision problem Q is the set S* where S = {0,1}.  Since Q is characterized by those problem instances that produce a 1 (yes) answer, we can view the decision problem Q as a language L over S = {0,1} where

 

            L = {x Î S* : Q(x) = 1}

 

which reads "the language of strings x contained in S* such that the answer to the decision problem Q for instance x is yes".

Since languages are subsets of S*, one can perform set operations such as union, intersection, and complement on languages.  The complement of a language L is given by:

            _

            L = S*  - L

 

As an example, let us consider the language L defined over S = {a, b, c, ..., z, (space)} as the set of strings where no two consecutive consonants or vowels may occur.  L = {a, b, ab, aba, abab, ba, babe, abecidab, ....}  This language can be decided in time O(n), where n is the length of the string, by an algorithm that implements the following finite state machine. 

 

The complement of this language defined over S is all of the strings in S* that are not in L.  This language can also be decided in polynomial time on the following finite state machine.

 

The fact that both L and its complement are both decided by polynomial-time algorithms illustrates (but does not prove) an important theorem -- that if a language L is in complexity class P, then so is its complement.   Note also that to accept a language L, the algorithm only has to recognize strings that are in L, but to decide the language (as is done here), the algorithm has to recognize and be able to accept or reject all strings in S*.

 

If we are given two languages L1 and L2 and there is a polynomial-time reduction algorithm that converts any string in L2 to a string in L1 we say that L2 is polynomially reducible to L1 and write L2 £P L1.  (The notation can be a bit confusing, but it reads that a problem in L2 is no "harder" than a problem in L1.

 

Classes of decision problems

 

We have already defined complexity classes P and NP as classes of decision problems that can be respectively solved and have solutions verified in polynomial-time.  We have seen that all problems in P are also in NP (P is a subset of NP --

P Í NP).  We have also claimed that for each language in P its complement is also in P.  Thus P must be a subset of both NP and the class of problems co-NP -- the complement of languages that are in NP.  We have also seen that there are problems like the Towers of Hanoi that can neither be solved or verified in polynomial time, and thus lie outside of the class NP.  If we were to construct a set diagram of these complexity classes, the most general description that we could portray would appear as in the figure below.

 

All of the problems in both NP and co-NP are polynomial in the amount of storage they require, and P will be in

NP Ç co-NP since it contains both a language L and its complement.  (There are additional complexity classes that are not included in this diagram.) 

 

The class NP-Complete

 

The question, "Is P = NP?" is the fundamental open question of computer science, and has important ramifications in natural science as well.  Computational complexity was first studied in the 1950's, but since then no one has been able to prove or disprove this assertion.  We have, however, developed  a mechanism for answering this question.  The mechanism involves the complexity class called NP-Complete. 

 

A language L Í {0,1}* is NP-Complete if

  1. L Î NP.
  2. L' £P L for every L' Î NP.   { every language in NP is polynomially reducible to L}

 

The "trick" in finding languages in NP-Complete is to find a first language to add to this class, then to add additional languages by showing that the first language (or any of the subsequent additions) are polynomially reducible to the candidate.

 

The current picture of the relationship between these sets of languages is depicted in the set diagram on the left below, but if any language in NP-Complete is found to have a polynomial-time algorithm which decides every instance of the language, then the diagram on the right pertains and every language in NP is also in P.

 

The reason this is the case can be seen from the diagram shown earlier that ilustrated the use of reductions in constructing a polynomial-time algorithm for solving a problem A.

 

By definition, every language in NP is polynomially reducible to each language in NP-Complete.  Therefore if any problem in NP-Complete is found to have a polynomial-time solution, it can be used as the target language B with the polynomial-time algorithm for deciding yes or no that is shown in the above picture, and we have constructed an algorithm for deciding all of NP in polynomial time.

 

A First NP-Complete Language

 

The first language to belong to the class NP-Complete was found by Cook to be CNF-SAT.

 

Given a set of n booolean variables x1, x2, ..., xn, and a set of m clauses consisting of literals (a variable or its negation) of these variables in conjunctive normal form: (for example)

 

            F = (x1 V ~x3 V x7 V ~x9) /\ (~x2 V ~x3 V ~x7 V x9 V x21) /\ ..../\ (~x1 V x3 V ~x7 V x24)

 

Is there an assignment of true or false to each of the xi such that each of the clauses is simultaneously satisfied and F evaluates to true?  In general one would have to test all 2n possible ways in which these values could be assigned in order to ensure that no solution exists in those situations in which the formula cannot be satisfied.  However if one were provided with a certificate containing a possible assignment for each of these n variables, it could easily be verified in polynomial-time by simple substitution.  CNF-SAT is therefore in NP.

 

To prove that all languages in NP are polynomially reducible to CNF-SAT, we recognize that all languages in NP must be verified on a computing machine, which we will take without loss of generality to be a Turing Machine.  The Turing Machine can be described in terms of its (finite number of) states, its set of symbols that it may read or write, and the movement of its tape. The verification process on a Turing Machine was reduced by Cook into a CNF expression.  Thus every language in NP was reduced to CNF-SAT and NP-Complete had its first member.

 

3-SAT is NP-Complete

 

3-CNF-SAT is very similar to CNF-SAT except that each of the clauses must have exactly 3 variables.  If this seems a silly special case of the more general, it has value for two reasons:

  1. 2-CNF-SAT  is in P, so 3-SAT marks a special "phase transition" between what is doable in polynomial-time and what is not.
  2. 3-SAT will be useful in providing a reduction to other problems such as the Clique problem so that they may be included into NP-Complete.

 

The reduction of SAT to 3-SAT proceeds as follows:

 

            Reduce each clase in SAT to a set of clauses in 3-SAT.

            Assume that such a clause contains r literals say x1, x2,  .. , xr

            Add r-3 new variables  z1, ..., zr-3

            From the single clause in SAT form the set of r-2 clauses

                        (x1 x2 z1), (x3 ~z1 z2), (x4 ~z2 z3), ... , (xr-2 ~zr-5 zr-4), (xr-1 xr ~zr-4)

            where the conjunction symbol V between each literal is implicit

 

Now we have to show that whenever the clause in SAT is true, there is an assignment such that all of the clauses in 3-SAT evaluate to true, and whenever SAT is false, there is no assignment of the zj such that all of the r-2 clauses are satisfied.

 

First show that if the clause in SAT is true, all of the clauses in 3-SAT can be satisfied.  For the clause in SAT to be true at least on of the literals (say xk) must be true.  Then if we take zi to be true for i £ k-2 and zi to be false for i > k each clause will be satisfied. (for example)  let the clause in SAT be (x1 x2 x3 x4 x5 x6)  and assume x4 evaluates to true.  Then z1 and z2 are true and z3 would be false.  Then each of the clauses in 3-SAT

 

                        (x1 x2 z1), (x3 ~z1 z2), (x4 ~z2 z3), (x5 x6 ~z3)

 

evaluate to          (F F T),    (F F T),    (T F F),     (F F T)

 

Next suppose that all of the literals in the clause in SAT evaluated to false, then there is no assignment of the zj that will make all of the clauses in 3-SAT true.   In order for each of the clauses in the above example to be true, we would have to have

 

                        z1 = true,          z2 = true,          z3 = true,         and      ~z3 = true,

 

but this leads to a contradition.

 

Hence, 3-CNF-SAT is in NP-Complete.

 

Other NP-Complete Problems

 

Now that we have added a second problem to the class of NP-Complete, we are in a position to use this new problem to show that it is polynomially reducible to other problems in NP and thus add new problems to the class of NP-Complete.  There are two reasons why we might want to increase the number of problems in this class.  Firstly, the more problems in NP-Complete, the more candidates for finding a polynomial-time algorithm to one of them that would resolve the open question of P = NP.  Secondly, and perhaps more pertinantly, it increases our understanding of the relationship between problems in the various complexity classes.  The chart below indicates the sequence of reductions that are used to add problems to NP-Complete.

 

 

The most straightforward of these reductions is the one showing that HAM-CYCLE £P TSP. 

 

            TSP = {<G, M, k> :     G = (V, E) is a complete, undirected graph,

                                                M is a function from V x V ->  Z  (the cost matrix that maps vertex pairs into an edge cost)

                                                k Î Z, and G has a TSP tour of cost at most k}

 

We must first show that TSP Î NP.  This is easy.  Given a cost matrix representation of the graph with the cost (length) associated with each of the edges and a certificate that lists the sequence of vertices in the cycle, we can verify in polynomial-time that

1.      each vertex (except the start vertex) is visited once and only once,

2.      the start and end vertices are the same,

3.      the total cost of the edges linking each pair of vertices is less than the value K.

Next we show that HAM-CYCLE £P TSP (HAM-CYCLE is polynomially reducible to TSP.  Given a graph G = (V, E) is an instance of HAM-CYCLE, we construct a graph G' from G as follows,

 

            for each pair of vertices i, j        if there is and edge e(i,j) in G then e'(i,j) in G' has cost 0

                                                            else e'(i,j) has cost 1

 

The two graphs G and G' are depicted below. 

The reduction from HAM-CYCLE to TSP is done in a time O(|E'|) which is O(|V|2) and clearly polynomial in the number of vertices of graph G.  Now we can easily see that if and only if there is a TSP tour with k = 0 in G' will there be a HAM-CYCLE in G.