Schur Number Five

This page provides access to results, proofs, and tools presented in the AAAI 2018 paper Schur Number Five by Marijn J.H. Heule. A preprint is available on arXiv.

Introduction for the general public

The Schur Number Five challenge is a typical example of Ramsey Theory. The challenge originates from the early 20th century, when Issai Schur studied the question whether one can avoid a monochromatic solution of the equation a + b = c when coloring the natural numbers 1, 2, 3, ... with finitely many colors. He proved in 1917 that the answer is No.

The question that remains unanswered is how long can one avoid a monochromatic solution of the equation a + b = c. The answer is 4 when using two colors, say red and blue: If one colors the numbers 1 and 4 red and the numbers 2 and 3 blue, then there is no monochromatic solution. However, it is impossible to colors the numbers 1 to 5 with two colors without producing a monochromatic solution: If we color 1 with red, we have to color 2 with blue due to 1 + 1 = 2. This forces us to color 4 with red because of 2 + 2 = 4. After this, 3 must become blue due to 1 + 3 = 4. But then, no matter if we color 5 with red or blue, we end up with a monochromatic solution of 1 + 4 = 5 or 2 + 3 = 5.

What about more than two colors? In a similar fashion, one can manually determine that the answer is 13 when using three colors. This is a nice exercise. However, already when using four colors, the problem becomes extremely hard: It took 50 years before Golomb and Baumert (1965) were able to determine that in this case the answer is 44. The question for five colors remained open for a century. We solved this problem, known as Schur Number Five, and the answer is 160.

More precisely we prove: 1) Every five-coloring of the numbers 1, ..., 161 contains a monochromatic solution a + b = c. 2) There exists a five-coloring of 1, ..., 160, without a monochromatic solution of a + b = c. Part 2) is relatively easy. In fact we computed that there are exactly 2,447,113,088 different such five-colorings. Part 1) is the real hard thing -- every number from 1, ..., 161 gets one of five possible colors, so altogether there are 5161 possible colorings, which all in a sense need to be considered, and need to be excluded. What is 5161? It is approximately 3.42 * 10112, that is, a number with 112 decimal places. The number of particles in the universe is at most 10100, so smaller than the number of possible five-colorings.

Now let's perform real brute-force, running through all the possibilities, one after another: Assume that we can check 100,000 colorings per second. That means it would take 10100 years to finish the computation. So a year of computation for each particle in the universe.

Fortunately there comes SAT solving to the rescue, which actually is really good with such tasks -- it can solve some such task and even more monstrous tasks. Our ``brute-reasoning'' approach solved the problem and resulted into a 2 petabytes proof, thereby crushing the previous record of the largest math proof ever. Though we must emphasise that this is in no way guaranteed, and possibly it will take aeons! SAT solving uses propositional logic, in the special form of CNF (conjunctive normal form). Fortunately, in this case it is easy to represent our problem in this form.

Extreme Certificates

Encoding

SAT encoding of partitioning {1, ..., n} into k sets such that no set contains a monochromatic solution of the equation a + b = c.

  • Schur-encode, generates propositional formulas encode the existence of Schur number certificates. Schur-encode has two mandatory paramemters k referring to the number of colors and n referring to the size of the certificate. The formula is satisfiable if and only if there exist a certificate S(k,n).
  • Schur_160_5.cnf (541 KB, {1,...,160} and 5 colors)
  • Schur_161_5.cnf (542 KB, {1,...,161} and 5 colors)

Variants

Apart from classical Schur number problems, we also studied to variants: Palindromic Schur number problems and Modular Schur number problems. Our results can be used to compute all extreme certifcates of these variants as well.

Symmetry Breaking

The propositional formulas that encode Schur number problems have a color symmetry. This symmetry must be broken to ensure that the SAT solver will not explore isomorphic parts for the search space. The formulas after symmetry breaking are:

Backdoors

An assignment is a backdoor of a formula if the formula under that assignment can be solved in polynomial time given a certain algorithm. We computed a set of 1616 backdoors for Schur_160_5_SBP.cnf, Palin_160_5_SBP.cnf, and Modular_160_5_SBP.cnf to efficiently compute the number of extreme certificates.
Any #SAT solver can be used to compute the number of solutions. We used sharpSAT.

Proof Parts

Proof of unsatisfiability

The proof of unsatisfiability is a certificate that the ``original formula'', Schur_161_5.cnf has no satisfying assignments, thereby showing that S(5) < 161. This proof is over 2 petabytes in size, which requires partitioning it into parts. On a high level there are three parts: the re-ecoding proof, the implication proof, and the tautology proof. Below, a short description is provided of the three proof parts, which can be merged by concatenation.

Re-encoding proof (part I)

The original formula is re-encoded in order to make it easier to solve. The re-encoding consists of breaking the color symemtry and is expressed in the DRAT file re-encode.drat (34 MB). The re-encoded formula is Schur_161_5_SBP.cnf (542 KB) with SBP denoting "with Symmetry Breaking Predicates".

Implication proof (part II)

The implication proof contains for each cube in the partition the smallest clause falsfied by the cube. Adding the appropriate DIMACS header (p cnf 800 10330615) to the implication proof results in an unsatisfiable formula implied-10330615.cnf (950 MB) which is the same as the concatenation of the 10330615-shuf-a* files (discussed below under Cubes), but with all signs flipped and a re-odering of the lines.

Tautology proof (part III)

The final part of the proof validates that the cube partition covers the entire search space. The cube partition can be viewed as a binary search-tree with the cubes as leaf nodes of this tree. For each internal node of this tree, the tautology proof tautology-10330615.drat (2.6 GB) contains a resolvent of its two children. The proof terminates with the empty clause.

Cubes

Top Level Partition

Our partitioning of the re-encoded formula uses two levels. The first level consists of a 10330615 cubes, which are split into five files of 208 MB each: 10330615-shuf-aa, 10330615-shuf-ab, 10330615-shuf-ac, 10330615-shuf-ad, and 10330615-shuf-ae. These files are expressed in the inccnf format.

Second Level Partition

The second level cube files can be reproduced by running the look-ahead solver march_cu (and will soon be available on TACC as well).

  • First compile march_cu, simply using make. In order to compute a cube file for a DIMACS files input.cnf, run: march_cu input.cnf -o output.cubes
In order to produce the compressed DRAT proof files, the following tool is required:
  • A modified version of glucose 3.0, called iGlucose, which accepts inccnf files as input. Run the core version of glucose using the following command line options: glucose input.inccnf -certified -certified-output=proof.drat -vbyte -verb=0 -stop-at-sat -var-decay=0.95

Log files

The five result files results-10330615-aa, results-10330615-ab, results-10330615-ac, results-10330615-ad, and results-10330615-ae (243 MB each) show the solving and validating the second level subproblems using five colors and the range {1..160}. Only 961 out of the 10330615 cubes are SATISFIABLE. The columns show the following information:
  1. file postfix;
  2. cube number;
  3. cube length;
  4. cube (split) time;
  5. number of cubes (second level);
  6. number of conflicts;
  7. conquer time;
  8. conquer result;
  9. size of compressed DRAT proof;
  10. size of compressed LRAT proof;
  11. proof conversion time;
  12. proof validation time; and
  13. validation result.

Proof Example

Consider the problem of avoiding monochromatic solutions of the equation a + b = c with a < b < c, while coloring the natural numbers with two colors, say red and blue. This pattern cannot be avoided indefinitely. The smallest counter-example involves coloring the numbers {1, ..., 9}. This problem can be encoded into SAT using 9 Boolean variables x_i. Assigning x_i to true means putting number i in the red partition, while assigning x_i to false means putting number i in the blue partition. The DIMACS encoding of this problem is shown below.

p cnf 9 32
  1  2  3 0     -1 -2 -3 0
  1  3  4 0     -1 -3 -4 0
  1  4  5 0     -1 -4 -5 0
  2  3  5 0     -2 -3 -5 0
  1  5  6 0     -1 -5 -6 0
  2  4  6 0     -2 -4 -6 0
  1  6  7 0     -1 -6 -7 0
  2  5  7 0     -2 -5 -7 0
  3  4  7 0     -3 -4 -7 0
  1  7  8 0     -1 -7 -8 0
  2  6  8 0     -2 -6 -8 0
  3  5  8 0     -3 -5 -8 0
  1  8  9 0     -1 -8 -9 0
  2  7  9 0     -2 -7 -9 0
  3  6  9 0     -3 -6 -9 0
  4  5  9 0     -4 -5 -9 0

Since this problem has 9 Boolean variables, there are 512 possible partitions. However, a refutation proof, i.e., a proof that provides a counteraxmple of all patitiones only consists of four patterns (more than a factor of 100 less). Those patterns are shown below as a DRAT proof, the format used to represent the proof of the Pythagorean Triples problem.

 1 4 0
   1 0
   4 0
     0