The first assignment involves implementing a compiler that takes a simple C-like language called Bali (described below) as input and generates code for a stack machine named SaM (described below). The compiler will use a recursive-descent parser. The assignment is intended to help you better understand recursive-descent parsing and get a feel of what it means to implement a compiler.
To know more about the SaM stack machine and parsing, please refer to the following lecture material:
Create a handwritten recursive-descent parser and SaM code generator for the Bali language. For lexical analysis, you will be using the SaMTokenizer. The compiler should take a file with Bali program as input and produce another file containing the SaM program that executes the Bali program.
The following is the grammar specification of the Bali language. In the grammar specification:
int
). These literals are keywords (reserved words) and can not be used as identifiers for variables or methods.BLOCK -> '{' STMT* '}'
, '{'
and '}'
are terminals.PROGRAM -> METH_DECL*
METH_DECL -> TYPE ID '(' FORMALS? ')' BODY
FORMALS -> TYPE ID (',' TYPE ID)*
TYPE -> int
BODY -> '{' VAR_DECL* STMT* '}'
VAR_DECL -> TYPE ID ('=' EXP)? (',' ID ('=' EXP)?)* ';'
STMT -> ASSIGN ';'
| return EXP ';'
| if '(' EXP ')' STMT else STMT
| while '(' EXP ')' STMT
| break ';'
| BLOCK
| ';'
BLOCK -> '{' STMT* '}'
ASSIGN -> LOCATION '=' EXP
LOCATION -> ID
METHOD -> ID
EXP -> LOCATION
| LITERAL
| METHOD '(' ACTUALS? ')'
| '('EXP '+' EXP')'
| '('EXP '-' EXP')'
| '('EXP '*' EXP')'
| '('EXP '/' EXP')'
| '('EXP '&' EXP')'
| '('EXP '|' EXP')'
| '('EXP '<' EXP')'
| '('EXP '>' EXP')'
| '('EXP '=' EXP')'
| '(''-' EXP')'
| '(''!' EXP')'
| '(' EXP ')'
ACTUALS -> EXP (',' EXP)*
LITERAL -> INT | true | false
INT -> [0-9]+
ID -> [a-zA-Z] ( [a-zA-Z] | [0-9] | '_' )*
Summary:
int
is the only type in this language.true
and false
have the values 1 and 0 respectively. For expressions used in conditions, any non-zero value is true and the value zero is false.You can use the template below to get started. The template shows various ways the SaM tokenizer can be used. It also contains methods to help you get started on the project. The template contains lots of TODOs that you would need to implement. It starts by visiting all the methods in the program using the getMethod()
function which contains logic to accept a valid method declaration. The getExp()
method accepts a valid expression and is mostly left blank. In this function, the implementation should ensure that the following invariant is always maintained: the result of every expression is present on the top of the stack.
NOTE: The template is provided to give you some initial idea on your implementation and you're free to modify the template or develop in your own style.
package assignment1;
import edu.cornell.cs.sam.io.SamTokenizer;
import edu.cornell.cs.sam.io.Tokenizer;
import edu.cornell.cs.sam.io.Tokenizer.TokenType;
public class BaliCompiler {
static String compiler(String fileName) {
// returns SaM code for program in file
try {
SamTokenizer f = new SamTokenizer(fileName);
String pgm = getProgram(f);
return pgm;
} catch (Exception e) {
System.out.println("Fatal error: could not compile program");
return "STOP\n";
}
}
static String getProgram(SamTokenizer f) {
try {
String pgm = "";
while (f.peekAtKind() != TokenType.EOF)
pgm += getMethod(f);
return pgm;
} catch (Exception e) {
System.out.println("Fatal error: could not compile program");
return "STOP\n";
}
}
static String getMethod(SamTokenizer f) {
// TODO: add code to convert a method declaration to SaM code.
// Since the only data type is an int, you can safely check for int
// in the tokenizer.
// TODO: add appropriate exception handlers to generate useful error msgs.
f.check("int"); // must match at begining
String methodName = f.getWord();
f.check("("); // must be an opening parenthesis
String formals = getFormals(f);
f.check(")"); // must be an closing parenthesis
// You would need to read in formals if any
// And then have calls to getDeclarations and getStatements.
return null;
}
static String getExp(SamTokenizer f) {
// TODO implement this
switch (f.peekAtKind()) {
case INTEGER: // E -> integer
return "PUSHIMM " + f.getInt() + "\n";
case OPERATOR: {
}
default:
return "ERROR\n";
}
}
static String getFormals(SamTokenizer f) {
// TODO implement this.
return null;
}
}
Here are some additional assertions regarding the grammar and the language.
There will always be a main method in the input program. The program starts executing from the main method. The main method does not take any arguments.
There will always be a return statement at the end of a method in the input program.
Comments in the input program are automatically handled by the SamTokenizer. The tokenizer discards characters between //
and the end of the line (including //
) and you do not need to worry about it.
There is no overloading of methods.
Methods can be defined either before or after corresponding function calls. You will need a symbol table for each method. One approach would be to have a separate class for the symbol table (using hash tables or any approach). A symbol table object would be created inside your getMethod() method, and be initialized by the getDeclarations() method call. Once initialized it would be passed to all (almost) other method invocations inside the getMethod to make sure each rule has the appropriate information. Each method would have its own symbol table.
A break statement must be lexically nested within one or more loops, and when it is executed, it terminates the execution of the innermost loop in which it is nested. Please take care of illegal break statements.
If a program does not satisfy the grammar above or does not satisfy the textual description of the language, your compiler should print a short, informative error message and/or exit with a non-zero exit status. If you have any question or confusion, you can make a post on Piazza.
Make sure that your compiler is in the java class assignment1.BaliCompiler. Your compiler should take two command-line arguments. The first argument is an input file containing a Bali program. The second argument is an output file that will contain your generated SaM code.
Tips:
Eclipse and IntelliJ IDEA are popular IDEs for Java programming. You can use them for your project. You are recommended to add the source attachment in addition to the jar file to easily navigate the SaM library.
Submit (to Canvas) the following:
The following sequence of commands will be used to evaluate your submission on both public and private testcases.
Compiling Bali program:java -jar compiler.jar test1.bali output.sam
The above command should read the Bali program in test1.bali
file, generate the SaM program, and dump it to output.sam
file.
Running the SaM program:java -cp SaM-2.6.2.jar edu.cornell.cs.sam.ui.SamText output.sam
The above command reads the SaM program in output.sam
file and executes the SaM interpreter. The output of the program i.e. the exit status of the program will be displayed on the terminal. From the SaM interpreters perspective, the exit status is the element on the top of the stack when the interpreter executed the STOP
instruction. From the Bali programs perspective, the exit status is the return value of the main()
method. The exit status is used to evaluate the correctness of your compiler. An example output of this command is shown below:
Program assembled.
Program loaded. Executing.
==========================
Exit Status: 30
If there is an error in the SaM program, a error message will be displayed instead of the exit status.
Important:
Each testcase is assigned a difficulty level: easy, medium and hard. Points are assigned to each testcase based on the difficulty level. 3, 5, and 7 points for easy, medium, and hard testcase respectively. The points assigned to each testcase is mentioned at the top of the testcase for public_testcases.zip. There are additional hidden test cases.