Programs
that Read and Write MDELite Databases
Table of Contents
- Writing an M2M Transformation
- Printing a Database Schema
- Constructing Schemas
- Creating a Database
- Reading a Database
- Printing a Database
- Reading a Table
- Reading Tuples of a Table and its Subtables
- Filtering Tuples from Tables
- Group By and Aggregate Queries
- Taking Cross Products of Schemas
- Table Joins
- Self Joins
- Reading and Writing Individual Prolog and CSV Tables
- Collecting Streamed tuples into a Table
- Copying data from a Tuple of one Schema to Another
- Sorting Tables
This document illustrates programs that read and write MDELite databases using the dog-owner database below:
:::test/DML/PrologDB/TestData/dogOwner.do.pl:::
1: Writing a M2M Transformation
All MDELite Model-to-Model (or database-to-database) transformations have the following prelude and structure:
---tmplate---
Step 1 executes boilerplate code to process the command line.
This particular M2M program translates a ".do.pl" database to a
".pet.pl" database. That is, to invoke the above program you
would type:
> java DocExamplesTest X.do.pl
or
> java DocExamplesTest X.do.pl output.txt
The first variant produces pet database file X.pet.pl. That is, given X.do.pl as input, the translated database will be named X.pet.pl. The second variant ignores the standard output file (X.pet.pl) and redirects output file output.txt. Other points worth noting are variables:
- inputFileName -- the name of the input database (X.do.pl)
- outputFileName -- the name of the output database (X.pet.pl) or whatever is specified on the command line (e.g. output.txt)
- appName -- if "X.do.pl" is the input database, appName = "X".
Step 2 reads the input database and extracts its tables.
Step 3 reads the schema for the pet database, in this case it
is file "pet.schema.pl" which is local file or a file that is hardwired
to a particular directory. It is also possible to submit the
contents of a schema as a multi-line String which is in the program
itself. Use:
String defOfSchema = "dbase(a,[b,c]).\ntable(b,[id,x,y]).\ntable(c,[z]).\nsubtable(b,[c]).";
DBSchema sch = DBSchema.read("a",defOfSchema);
Also part of Step 3 is to create an empty pet database whose tuples are to be entered by the next step.
Step 4 translates a ".do.pl" database into a ".pet.pl"
database. This is where all the heavy lifting takes place.
Step 5 writes the produced database to
its output file.
The following examples illustrate how to M2M translations.
2: Printing a Database Schema
This
program takes an ooschema declaration and flattens it into a db schema
declaration. The distinction between ooschemas and
schemas is that attributes of supertables
are propagated to each of its subtables, recursively. The
program prints a database schema one
table at a time.
Printing subtable declarations of the database conclude the
program.
---printSchema---
Here is the output of this program:
:::test/DML/PrologDB/Correct/printSchema.txt:::
3: Constructing Schemas
There are two
ways in which database schemas can be programmatically constructed.
The trick is adding table identifiers ONLY to the tables that are
not subtables. (Why? Because attributes of super-tables are inherited
by all subtables. If a subtable is given an identifier, it will
also inherit the identifier of its supertable. This leads to an id replication
mess).
The first program shows the creation of a schema
where you, the programmer, know what are the root (non-subtable)
tables. In this program, an identifier 'cid' is added to the root
table, 'crewman', and in Step 4, this identifier is propagated to all
of its subtables:
---schemaBuild---
The second program builds the same schema, but calls the method addIdentifiersToAllNonSubtables() before "finishing" the schema. This is likely to be the more common way to programmatically assemble schemas:
---schemaBuildnew---
4: Creating a Database
This
program creates the enterprise database which has 3 tuples,
one tuple per table. An ooschema for the database is created
and then converted into a true schema (shown above).
At this point, a database instance of the schema can be
created, one table and one tuple at a time. Different ways of creating a tuple are shown.
---DBBuild---
Here
is the output of this program:
:::test/DML/PrologDB/Correct/dbBuild.txt:::
5: Reading a Database
This
program reads a database given its file, prints the database using
available utilities and also a harder way by extracting out its
individual tables and printing its tuples.
---DBread---
Here
is the output of this program:
:::test/DML/PrologDB/Correct/dbread.txt:::
6: Reading a Table
This
program reads the dog-owner database, which has a table of dogs.
Using Java Streams, the stream of all aussie (Australian Shepherd) tuples is created and then
printed. A similar stream is reconstituted (remember: once
you use a Java stream, it's done and has to be recreated) and then filtered
further, and printed.
---TableRetrieve---
7: Printing a Database
There are two ways of printing a database: compact (default) or pretty print. Here is code that shows both
---prettyprint---
7: Reading Tuples of a Table and its Subtables
This
program reads the tuples of the table pet and all of its subtables. If you want to retrieve tuples of only
that table (and not its subtables), use the getLocalTuples()
method. A couple queries are illustrated.
---InheritanceRetrieve---
8: Filtering Tuples from Tables
Here are several ways to filter tuples from
tables. The first uses a loop with an
if-statement. The second creates a stream of tuples that are
filtered and printed. The third creates a table of only filtered tuples that is then printed.
The only difference between the latter two is that
stream implementations process one tuple at a time as opposed to one table at a time.
---filter---
Here
is the output of this program:
:::test/DML/PrologDB/Correct/ftest.txt:::
9: Group By and Aggregate Queries
A
table T is "grouped by"
a column
(or computed column) producing a stream
of tables. Each table of this stream is really a subtable of the
original table. Each subtable contains all tuples in T
that have the same column
value. The example below groups a dog table by column breed. Each subtable is aggregated to produce a single
tuple. These tuples are then collected into a table of such
tuples and then printed. The next step aggregates the
aggregation table, which yields a table with a single tuple. The
tuple is extracted (by the toTuple()
method), and the field "size" is returned and is compared to the number
of tuples in the original dog table. (Yes, this is a bit baroque,
but it does a good job of illustrating group by and aggregation).
---GroupBy---
The output of this test is:
:::test/DML/PrologDB/Correct/groupby.txt:::
Note: an aggregate operation takes as input:
- the name of the aggregation table (same as the name of the table schema) that it is to produce
- a list of reducers which produce aggregation columns in the output table
Each reducer takes as input:
- a column
designator of the aggregation column(s) it is to place its aggregated
result(s). Use "s" if the aggregation value is unquoted,
otherwise "'s'". In both cases, the aggregation column name is "s" --
but passing the quotes to MDELite says the column type.
- any other information needed to compute its aggregate value.
Examples of reducers and their inputs (they're all similar) and outputs:- GroupValue takes a column designator and h the name of the group by column. This reducer produces the group by value of a subtable
- Count takes a column designator. This reducer returns the count of the number of tuples in a subtable
- IntAgg takes a column designator and an integer aggregation function; it returns the aggregation result of that subtable.
In
general, group-by-aggregates can be nested arbitrarily deep. Here's a
second group-by-aggregate example: it groups the dog table by owner,
joins with other tables to count the number of dogs owned by each owner.---GroupBy2---
The output of this test is:
:::test/DML/PrologDB/Correct/groupby2.txt:::
10: Taking Cross Products of SchemasCopying data from a Tuple of one Schema to Another
Table
joins take cross products of schemas automatically. Occasionally you might
need
to do this yourself. The following program illustrates how
cross
products of schemas are taken and printed.
---crossSchema---
Here
is the output of this program:
:::test/DML/PrologDB/Correct/cross.txt:::
11: Table Joins
This
program reads a dog-owner database, finds the dog, when, and owner
tables, joins the dog table with the when table over dog.did = when.did
join predicate (i.e., the did columns of both tuples must have the same
value) to produce a new table dogXwhen.
Note: the
attributes of
the joined relation are renamed. What used to be attribute T
in
the dog table is now renamed dog.T. Same for all other
attributes, so that one can always distinguish attributes with similar
names. Continuing, this table is printed, and then it is joined with
the owner table over predicate when.oid = owner.oid, and then the table
is printed out.
---TableJoins---
Here
is the output of this program:
:::test/DML/PrologDB/Correct/join.txt:::
12: Self Joins
Here is a program that joins the dog table with
itself over dog identifiers (did). The key to self-joins is
that
a copy of a table for each self-join must be made. If a dog
table
is to be joined with itself, the dog table is really joined with a
copy. If a dog table is to be joined with itself, followed by
itself, the dog table and two distinct copies of the dog table must be
joined. Now, this particular example, of joining a dog table
with
itself isn't particularly useful, but when you have a table that
encodes (parent, child) relationships, joining such tables multiple times allows you
to relate parents to grandchildren, and so on.
---SelfJoins---
13: Reading and Writing Individual Prolog and CSV Tables
You
can read and write individual tables, either in the standard
Prolog format or as a CSV (comma separated value) file. The
name
of the file (whether it ends or not in ".csv") determines whether the
file is to be read/written as a CSV file.
---csvfile---
Here is the simple.csv file -- all CSV tables are assumed to have quoted attributes (and hence quoted values):
:::test/DML/PrologDB/TestData/simple.csv:::
And here it's Prolog formatted output:
:::test/DML/PrologDB/Correct/simple.pl:::
And what it looks like written out:
:::test/DML/PrologDB/TestData/simple.csv:::
14: Collecting Streamed Tuples into a Table
This
program shows how a stream of tuples can be collected into table, and
then the collected table is compared with the original (and found to be
the same).
---toTable---
15: Copying Data from a Tuple of one Schema to Another
A common task is copying data from one tuple of one schema to a
tuple of another. It's not hard, but tedious. Look at
the first step of the code below -- it is a series of assignment statements. The second step defines a
ColumnCorrespondence among the fields of the left schema with fields or
computations in the right. Not all fields of the left schema need
be considered/updated in a correspondence. Also, when a
correspondence is defined, it can be subsequently changed by adding and
deleting column correspondences.
---corresp---
As
an aside, the manual way has a debug option that tests if a tuple is
complete -- this means that a tuple has its full complement of
column-value pairs. In this example all longhorn tuples do.
If a tuple is incomplete, an RuntimeException is thrown if not all attributes of a tuple have values.
This is a way that you can debug a program -- and once you're
convinced you have all columns, you can remove the isComplete test.
16: Sorting Tables
Sorting
is obvious. Let T be a table with column "key". What does
T.sort("key") produce? Clearly a sorted table. But do we want a
sorted copy of T to be output, or do we want T itself to be sorted?
That's the purpose of sort(table,boolean). If the boolean
is true, the input table is updated by being sorted. If false, a
sorted copy of T is produced.
---sort---