MDELite Constraint Programs
The following database has an m-n association between tables student and course. Association normalization creates a third table takes that pairs the identifier of each student with the identifier of each course that student takes.
dbase(sc,[course,student,takes]).
table(course,[cid,"name"]).
table(student,[sid,"name"]).
table(takes,[tid,cid,sid]).
There are many unique structural constraints to be verified in this database:
- No two students have the same name.
- Every takes tuple has a non-null student ID and a non-null course ID.
- Every takes tuple requires its student ID to correspond to a tuple in the student table; same for course ID.
- No takes (cid,sid) pairs are duplicated.
But also some that are application-specific:
- The compilers and database courses must have a minimum enrollment of 2 students.
- Every student should take one course.
The examples in this document illustrate how each of these constraints can be evaluated.
Remember, Java Streams are stateless, so one has to create an operation that updates a stateful object which retains the list of errors that have been encountered. This is an ErrorReport object, whose usage is:
ErrorReport er = new ErrorReport();
... er.add("this is error1");
... er.add("this is error2");
er.printReportEH(System.out); // throw exception if errors are reported
Table of Contents
- Writing an MDELite Constraint Program
- Checking Unique Tuple IDs
- Checking Field Value Uniqueness Constraints
- Checking Non-Null Fields
- Checking Existence Constraints
- No Field-value Combinations are Duplicate
- Multi-Table Constraints
- Multi-Table Constraints -- Joins
1: Writing an MDELite Constraint Program
All MDELite constraint-checking programs has the following preamble and structure:
import PrologDB.*;
public class DocConstraintExamplesTest {
public static void main(String... args) {
// Step 1: standard command-line processing
MDLUtilities.Marquee4Conform mark = new MDLUtilities.Marquee4Conform(DocConstraintExamplesTest.class, ".sc.pl", args);
String inputFileName = mark.getInputFileName();
String AppName = mark.getAppName();
// Step 2: open database to be validated + get needed tables
DB db = DB.read(inputFileName);
Table course = db.getTableEH("course");
Table student = db.getTableEH("student");
Table takes = db.getTableEH("takes");
ErrorReport er = new ErrorReport();
// Step 3: now perform database checks
//...
// Step 4: finish by reporting collected errors
er.printEH(System.out);
}
Step 1 reads the command-line arguments. The Marquee4Conform does the work with the following arguments
- the Java class object, which is in this case DocConstraintExamplesTest.class.
- the pattern of the databases to process, in this case ".school.pl" is the database suffix. (So accepted database file names are "X.school.pl").
- the command line arguments, which usually is args.
Next, two values are extracted from the command-lines:
- inputFileName -- the name of the database file, typically of the form "blah/blah/Blah/X.school.pl"
- AppName -- if the above is the input file, then AppName = "X"
Step 2 opens the database (inputFileName), extracts the tables needed, and creates an error reporter to collect discovered errors.
Step 3 evaluates constraints. The examples below show code fragments that implement common constraints.
The last statement of a constraint-checking program is one of the following:
- er.printReport(ps) -- print the error report to PrintStream ps; do not terminate program if errors are found
or more typically:
- er.printReportEH(ps) -- like printReport(ps) above, except that a RuntimeException is thrown if errors are found, halting any computation in which this constraint program is run.
2: Checking Unique Tuple IDs
This is a test that is generally unneeded, given that tuple IDs are generated. uniqueID a method whose sole purpose is to report duplicate values in the table id column, by definition the first column of the table. Here's how course table ids are checked:
// Step 1: simplest way uses MDELite built-in support
course.uniqueId(er);
// Step 2: a more general way -- that works for any column
String column = course.getIDName();
course.project(column)
.duplicates()
.error(er,"course table has multiple tuples with id='%s'",t->t.getId());
3: Checking Field Value Uniqueness Constraints
Another common constraint is that a column should have unique values for each tuple in a table, such as no two students have the same name (and that names cannot be null). Again, a Unique report object is used for this task.
// Step 1: MDELite built-in support
student.isUnique("name", er);
// Step 2: more general way
student.project("name")
.duplicates()
.error(er,"multiple tuples in student have name='%s'",t->t.get("name"));
4: Checking Non-Null Fields
Every takes tuple has a non-null student ID and a non-null course ID. A single method, isNotNull, can be used to verify that particular fields of a table are non-null.
// Step 1: MDELite built-in support
takes.isNotNull("cid",er);
takes.isNotNull("sid",er);
// Step 2: manual way
Predicate cidIsNull = t -> t.isNull("cid");
takes.select(cidIsNull).error(er,"takes(%s...) has cid=null",t->t.getId());
takes.ifThen(t->t.isNull("sid"), "sid=null", er); // shorter way
5: Checking Existence Constraints
An existence constraint asserts a tuple-id value in one table corresponds to an existing tuple in another. Example: the student ID in each takes tuple corresponds to a tuple in the student table. Existence is checked by the exists method of MDELite.
// Step 1: MDELite built-in support
takes.isLegit("cid", course, "cid", er);
// Step 2: Manual: compute a table of offending tuples
takes.select(t->t.isNotNull("cid")).leftAntiSemiJoin("cid", course, "cid")
.error(er, "takes(%s,..) has invalid cid value (%s)", t->t.getId(),t->t.get("cid"));
// Step 3: really low-level
takes.select(t -> t.isNotNull("cid") && (!course.exists("cid", t.get("cid"))))
.forEach(t -> {
er.add("takes(%s,..) has invalid cid value (%s)",
t.get("tid"), t.get("cid"));
});
6: No Field-Value Combinations are Duplicate
This is a generalization of unique field values, except now we are interested in combinations of fields. Here is how Constraint 4 could be written:
// Step 1: create table with only cid and sid columns
takes.project("cid","sid")
.duplicates()
.error(er,"tuples in takes with duplicate (%s,%s) values",t->t.get("cid"),t->t.get("sid"));
7: Multi-Table Constraints
Constraint 5 counts the number of takes (really student) tuples enrolled in each database and compiler course and checks that at least 2 students are enrolled. This is the first use of the tuple.map() method which converts a tuple, 'this', into a subtable of existing tuples to which it is related. The number of tuples in this subtable is counted and tested whether or not it is < 2.
// check constraint
course.select(t -> t.is("name", "compilers") || t.is("name", "databases"))
.select(t -> t.join("cid", takes, "cid").count() < 2)
.error(er,"%s course does not have enough students", t->t.get("name"));
8: Multi-Table Constraints -- Joins
Constraint 6 joins tables student, takes, and courses to evaluate a required condition that every student is enrolled in one course. This constraint is complicated enough to have multiple reasonable statements. One is:
// check constraint
student.forEach(s -> {
Table shortTakes = s.rightSemiJoin("sid", takes, "sid");
Table joined = shortTakes.join("cid", course, "cid");
if (joined.count() == 0) {
er.add("student %s is not taking a course", s.get("name"));
}
});
Another is:
Table student = db.getTableEH("student");
Table takes = db.getTableEH("takes");
Table course = db.getTableEH("course");
Table tc = takes.join("cid", course, "cid");
student.select(s -> s.rightSemiJoin("sid", tc, "takes.sid").count() == 0)
.error(er,"student %s is not taking a course", s->s.get("name"));
And a third is:
Table student = db.getTableEH("student");
Table takes = db.getTableEH("takes");
student.select(s -> s.rightSemiJoin("sid", takes, "sid").count() == 0)
.error(er,"student %s is not taking a course", s->s.get("name"));