Algorithms and Structural Complexity Theory continued
Preview of Things to Come
We will (try to) see the following:
- Some problems in NP are harder than others because we can use the
hard problems to solve the easier problems.
- It turns out some problems are as hard as all of the other
problems in NP, i.e., a solution to one of those problems leads to a
solution to any other problem in NP. These are the NP-complete problems.
- We'll see some examples of NP-complete problems, like Travelling
Salesman.
- We'll see implications of the existence of NP-complete problems.
For example, if there is a polynomial time solution to the Travelling
Salesman problem (or any other NP-complete problem), then P=NP.
On the other hand, if we can prove there is no such solution, then
P!=NP.
- If P!=NP, then there are some "NP-incomplete" problems that are
neither NP-complete nor polynomial time solvable.
- P and NP are part of a larger class called PSPACE, the problems
that can be solved using polynomial space (but as much time as you want).
- Another class, also in PSPACE, is co-NP. These are the complements
of all the problems (sets) in NP. For example, since COMPOSITE-NUMBER
is in NP, PRIMALITY (deciding primality) is in co-NP.
- No one has proven whether NP=co-NP.
- It turns out that co-P, the complements of problems solvable in
polynomial time, is equal to P, i.e., P is closed under complementation.
- That means that, if we can show NP != co-NP, then we know P != NP
because otherwise NP would be closed under complementation just like P.
- An alternate definition of NP, using something called a nondeterministic
machine, is something you might see in another class or in a book.
The definitions given in the last lecture and the nondeterministic defintion
are equivalent.
Polynomial Reducibility
We say a decision problem D' is polynomial time reducible
to a decision problem D if there exists a polynomial time computable
function f(x) from instances of D' to instances
of D such that x is in D' if f(x)
is in D. We write "D' <p D"
to mean D' is polynomial time reducible to D.
So, if D' <p D, and we can solve D
in time t (with some algorithm), then we can solve D'
in time polynomial in t. Let's look at an example:
Problem: HAMILTONIAN-CYCLE
Instance: A graph G = (V, E)
Question: Does G contain a Hamiltonian cycle? That is,
is there a path (cycle) going from one vertex of G, through all the
other vertices of G exactly once, ending up at the same vertex?
First of all, is this problem in NP? Yes; a certificate for it would be
the list of vertices, in order, that make up the cycle. We can check this
easily in time polynomial in the size of the graph.
Now recall the problem TSP, as presented in the last lecture. We can use
TSP to solve HAMILTONIAN-CYCLE; all we do is let k (the length
of the tour we are looking for) be arbitrarily large, weight the edges
of G with 1, and see if there is a TSP tour (of any length) through
the graph. If there isn't (i.e., there was just no way to get from one
vertex to another without going through a third twice, or the graph was not
connected), then there is no Hamiltonian cycle. If there is a TSP tour,
then there is a Hamiltonian cycle. All of this conversion of the
HAMILTONIAN-CYCLE instance into the TSP instance can be done in linear,
thus polynomial, time. So we say HAMILTONIAN-CYCLE <p TSP.
TSP is "harder" than HAMILTONIAN-CYCLE. Note that the certification of
an instance of HAMILTONIAN-CYCLE can also be converted from a certificate
for TSP in polynomial time. Note also that, if TSP answers "no," there
is no certificate (thus no proof) that the instance isn't in HAMILTONIAN-CYCLE.
It should be noted that two problems can be the same hardness, i.e.,
D <p D' and
D' <p D can both be true at the same time.
Also, two problems may not be related at all, so that
neither
D <p D' nor
D' <p D might be true (so <p
is a partial order). For convenience and added confusion, I write
"harder" when I should write "at least as hard as."
Some problems are harder than all of the problems in NP. One example
is the problem HALTING we saw last lecture; it asks whether a C program
will ever reach exit(). All we have to do is code up a C program
that decides whether our instance is in D, calling exit()
if yes, going into an infinite loop if no. This program can be mechanically
constructed in polynomial time, then all we have to do is solve HALTING (good
luck with that part).
So HALTING <p D for all problems in NP.
Definition: A problem D is called NP-hard
if, for all problems D' in NP, D' <p D.
So an NP-hard problem is something harder than anything in NP.
Definition: A problem D is called NP-complete
if
- It is NP-hard.
- It is in NP.
So an NP-complete problem would in some sense be the hardest problem in
NP. Our definition of <p allows problems to be equally
hard, so we could have many equally hard NP-complete problems that are all
harder than all of the other (alledgedly easier) problems in NP.
It follows that, if D' <p D and D'
is an NP-complete problem, and D is in NP, then D must
also be NP-complete. So, if we can show just one problem NP-complete,
we have a tool to find more NP-complete problems.
Can we show just one NP-complete problem? Yes. Consider the following
decision problem:
Problem: CIRCUIT-SATISFIABILITY
Instance: An acylcic (i.e., no cycles), directed graph G
whose nodes are logic functions: AND, OR, or NOT, or logical variables.
The graph represents a combinatorial logic circuit with n
inputs and 1 output.
Question: Is there any assignment to the n input variables
that will cause the output to become True?
This problem was shown, in the early 1970s, to be NP-complete in a
proof by Cook that became known as Cook's Theorem. It is somewhat involved,
but the basic idea is this: any polynomial time certificate system
can be transformed into a polynomial sized logic circuit (that's sort of
what a computer does, anyway). The circuit encodes the certificate verification
algorithm run on a particular instance. The inputs to the circuit are the
certificate, and the output is True if the certificate verifies
the instance, False otherwise. If there is an assignment that causes
the circuit to output True (a satisfying assignment), then this
assignment is a certificate verifying the instance. Since every problem
in NP has a polynomial time proof system, this technique works for all
of them, so CIRCUIT-SATISFIABILITY can be used to solve any problem
in NP and is thus NP-complete.
From this foundation, we can build up a library of NP-complete
problems that can be used to solved CIRCUIT-SATISFIABILITY or other
previously proven NP-complete problems. It turns out that there are
thousands of them.
Some fun facts about NP-complete problems:
- No one has ever come up with an efficient (i.e., polynomial time) algorithm
for any of them.
- If anyone ever does, then that solution can be used to solve all of them in
polynomial time through polynomial reducibility (and solve all the other
problems in NP as well, like COMPOSITE-NUMBER), and then P would be
the same as NP.
- If anyone can ever prove that just one NP-complete
problem can't be solved in polynomial time, then immediately all
NP-complete problems are immune from polynomial time solutions and
P != NP.
- No one has ever been able to show that any non-trivial problem in NP
isn't NP-complete; to do so would show that some problems aren't
solvable in polynomial time with respect to some others, and that would mean
some problems are in P but not in NP, so P!=NP.
- Even if P=NP, there's no guarantee that solutions to the TSP might
not take, say, O(n100).
- There are good algorithms that can solve many instances of NP-complete
problems in polynomial time. However, for some instances (and it usually
turns out that these are the important instances), the problem still blows
up to exponential time.
- Some problems, like GRAPH-ISOMORPHISM, are conjectured to be in a
class called NP-incomplete, problems that aren't NP-hard but that aren't
in P, either. If P!=NP, then its been shown that these problems must exist;
if P=NP, then these problems can't exist.
- Lots of research has focused on relativized worlds, i.e.,
worlds where algorithms are allowed to consult an oracle
capable of solving problems for them. A TSP oracle, for example, would
be able to solve an instance of the TSP problem in constant time. A
P oracle would be able to solve any problem in P in constant time.
It has been shown that, relative to a "random" oracle (with well defined
meaning of the term random oracle beyond the scope of this class),
that P != NP with probability 1. However, there are relativized worlds
where P = NP, and recently other classes of problems have turned out to
be equal to each other when they weren't equal relative to a random oracle
with probablity 1 (namely. PSPACE and IP, IP being problems
with "interactive proofs").
- It has been said that most computer scientists now believe that P != NP,
although it may be closer to the truth to say that most conjecture that
P != NP but secretly hope that P = NP (since that would make life a lot more
interesting!)
- Kurt Gödel, the famous mathematician, believed that
P = NP (before anyone called them P and NP).
- What would really be bad would be if someone proved that P = NP is
undecideable, i.e., that there exists no proof either way. That would mean
that maybe there is a polynomial time algorithm for an NP-complete problem
(i.e., P = NP), but no one could ever prove it. Some things in math
have turned out to be undecidable (like the continuum hypothesis, i.e.,
that there is nothing more infinite than the integers but less infinite
than the real numbers).