Assignment: | Write a 3-address-code to C translator |
Instructor: | Kathryn S. McKinley |
Assigned: | Wednesday, August 26, 2009 |
Due: | Friday, September 11, 2009, 5PM |
Objective
The goal of this project is to write a translator from 3-address format to C. The purpose of this assignment is to
Project Description
We provide 'csc', which compiles a subset of C (CS for C-Subset) to 3-address format. Your will write a translator to convert from this 3-address format back to C. Note that the generated C code will not resemble the original C code.
The rest of this document describes the c-subset compiler, the 3-address intermediate format, the application binary interface (ABI), the implementation requirements, and the grading criteria.
Getting Started On Lab1
"tar xvfz cs380c_lab1.tgz".This command will produce the following directory structure
cs380c_lab1 src examples lab1The src directory contains the C-subset compiler source file and a make script. The examples directory contains 10 example c files, and two bash scripts (check.sh, check-one.sh) for checking your lab1 implementation (see 5 below).
cd src; ./make.sh // your shell should be bashThe command will produce a binary file for the compiler "csc" in the src directory.
cd ../examples; ../src/csc loop.c;This command will print out the 3-Address Intermediate Format code to your screen.
loop.3addr //the 3-address format file generated by csc loop.3addr.c //the c file generated by your converter loop.3addr.bin //the binary file generated by gcc compiling loop.3addr.c loop.3addr.txt //the file after executing loop.3addr.bin loop.gcc.bin //the binary file after gcc compiles loop.c loop.gcc.txt //the file after executing loop.gcc.binIf the two .txt files are same, it means your converter works correctly for this c file. The output message to your screen should be:
loop.c compiling loop.c loop.c: In function 'main': loop.c:10: warning: return type of 'main' is not 'int' 4a39f3634793c530783afa3ff0b2bd7b loop.gcc.txt 4a39f3634793c530783afa3ff0b2bd7b loop.3addr.txtThe last two lines show the md5sum results of the two .txt file. They should be same.
Our High-Level Language (a subset of C)
This section describes the high-level language C-subset language and the csc compiler. We first give a rough description, some examples, and then a formal description. The language includes:
Note: gcc must correctly compile the output C file of your translator. However, the C output itself is not restricted to C-subset. For example, you may use "goto" in your C output to simplify your implementation.
The 3-Address Intermediate Format
The csc compiler outputs 3-address format. Before taking a look at the specification of the 3-address format, let us look at a simple program and its corresponding representation.
long global_array[24]; struct Point { long x; long y; } p; void initialize_globals() { long i; i = 0; while(i < 24) { global_array[i] = i; } p.x = 13; p.y = 7; } void simple_function(long a, long b) { long local_array[3]; local_array[0] = a; local_array[1] = b; local_array[2] = a + b; if (global_array[2] > a) { global_array[3] = b + p.y; } p.x = local_array[a % 3]; } instr 1: nop instr 2: enter 8 instr 3: move 0 i#-8 instr 4: cmplt i#-8 24 instr 5: blbc (4) [11] instr 6: mul i#-8 8 instr 7: add global_array_base#32576 GP instr 8: add (7) (6) instr 9: store i#-8 (8) instr 10: br [4] instr 11: add p_base#32560 GP instr 12: add (11) x_offset#0 instr 13: store 13 (12) instr 14: add p_base#32560 GP instr 15: add (14) y_offset#8 instr 16: store 7 (15) instr 17: ret 0 instr 18: enter 24 instr 19: mul 0 8 instr 20: add local_array_base#-24 FP instr 21: add (20) (19) instr 22: store a#24 (21) instr 23: mul 1 8 instr 24: add local_array_base#-24 FP instr 25: add (24) (23) instr 26: store b#16 (25) instr 27: mul 2 8 instr 28: add local_array_base#-24 FP instr 29: add (28) (27) instr 30: add a#24 b#16 instr 31: store (30) (29) instr 32: mul 2 8 instr 33: add global_array_base#32576 GP instr 34: add (33) (32) instr 35: load (34) instr 36: cmple (35) a#24 instr 37: blbs (36) [46] instr 38: mul 3 8 instr 39: add global_array_base#32576 GP instr 40: add (39) (38) instr 41: add p_base#32560 GP instr 42: add (41) y_offset#8 instr 43: load (42) instr 44: add b#16 (43) instr 45: store (44) (40) instr 46: add p_base#32560 GP instr 47: add (46) x_offset#0 instr 48: mod a#24 3 instr 49: mul (48) 8 instr 50: add local_array_base#-24 FP instr 51: add (50) (49) instr 52: load (51) instr 53: store (52) (47) instr 54: ret 16 instr 55: nop
The 3-address format uses a simple RISC instruction set, and assumes an infinite number of virtual registers (we discuss the stack frame layout below in the Function ABI section). The operands to these instructions may be one of the following:
The instruction set contains the following instructions:
instr 42: add (41) y_offset#8stands for:
inter_42: r42 = r42 + 8
instr 1: nop
instr 37: blbs (36) [46]means:
instr_37: if (r36 != 0) goto instr_46;
instr 14: load (13) instr 15: move i#-8 j#-16 instr 16: store (15) (11)means:
instr_14: r14 = *(r13) instr_15: r15 = j = i instr_16: *(r11) = r15
instr 45: write x#-64 instr 46: wrlmeans:
instr_45: WriteLong(x); //defined as printf(" %lld", x) instr_46: WriteLine(); //defined as printf("\n");
instr 60: param (59) instr 62: param from#40 instr 63: call [23]means:
instr_63: call function_23(r59, from);
void function(long p1, long p2, long p3) { long x; long y; }is compiled into
instr 1: nop instr 2: enter 16 instr 3: ret 24 instr 4: nopwhich denotes 16 bytes of storage for local variables x and y and then 24 bytes for the formal parameters p1, p2 and p3 that must be popped on a return.
void main() { long i; }is compiled into
instr 1: nop instr 2: entrypc instr 3: enter 8 instr 4: ret 0 instr 5: nop
Function Call ABI
For our 3-address representation, we have used the following Application Binary Interface (ABI), which specifies the addresses/offsets for the global variables, local variables, and formal parameters.
We layout structures and arrays out from lower to higher addresses. Since our only basic datatype occupies 8 bytes, all integers, structs and arrays are 8-byte aligned.
We layout global variables starting at address 32768 and downwards towards zero. This organization implies that the maximum space for global variables is 32768. In your implementation, it would suffice to declare 32768 bytes of storage to hold all globals. In the above example, the global variables are global_array[24] and p. Their starting addresses, represented as offsets from GP are
global_array_base = 32576 = (32768 - (24*8)), and p_base = 32560 = (32768 - (24*8) - 16),respectively.
The stack grows from higher to lower addresses. The stack frame for a function looks like:
Low +---------------------+ addr. | | +---------------------+ -32 | ... | +---------------------+ <---+ -24 | 3rd local | | +---------------------+ | -16 | 2nd local | | +---------------------+ | - 8 | 1st local | | +---------------------+ | Stack FP ----> | FP | | frame +---------------------+ | 8 | LNK | | +---------------------+ | 16 | 2nd param | | +---------------------+ | 24 | 1st param | | +---------------------+ <---+ | | +---------------------+ previous frame. High | | addr. +---------------------+
Accordingly, in the first example, local_array, and the formal parameters a, and b, have the following offsets from FP
local_array_base: -24 a: 24 b: 16
In your generated C code, you may use the same ABI, or you can define your own. For instance, you may choose to explicitly manage a stack to handle function calls, or use C's call-return mechanism to directly handle function calls in the 3-address code.
EBNF Grammar Specification
We provide a partial EBNF grammar specification to help you write C-subset programs and understand the 3-address format output generated by csc.
The EBNF grammar for this language is given below.
Factor = Designator | Number | "(" Expression ")". Term = Factor {("*" | "/" | "%") Factor}. SimpleExpr = ["+" | "-"] Term {("+" | "-") Term}. EqualityExpr = SimpleExpr [("<" | "<=" | ">" | ">=") SimpleExpr]. Expression = EqualityExpr [("==" | "!=") EqualityExpr]. ConstExpression = Expression. FieldList = VariableDeclaration {VariableDeclaration}. StructType = "struct" Ident ["{" FieldList "}"]. Type = Ident | StructType. IdentArray = Ident {"[" ConstExpression "]"}. IdentList = IdentArray {"," IdentArray}. VariableDeclaration = Type IdentList ";". ConstantDeclaration = "const" Type Ident "=" ConstExpression ";". Designator = Ident {("." Ident) | ("[" Expression "]")}. Assignment = Designator "=" Expression ";". ExpList = Expression {"," Expression}. ProcedureCall = Ident "(" [ExpList] ")" ";". IfStatement = "if" "(" Expression ")" "{" StatementSequence "}" ["else" "{" StatementSequence "}"]. WhileStatement = "while" "(" Expression ")" "{" StatementSequence "}". Statement = [Assignment | ProcedureCall | IfStatement | WhileStatement]. StatementSequence = {Statement}. FPSection = Type IdentArray. FormalParameters = FPSection {"," FPSection}. ProcedureHeading = Ident "(" [FormalParameters] ")". ProcedureBody = {ConstDeclaration | VariableDeclaration} StatementSequence. ProcedureDeclaration = "void" ProcedureHeading "{" ProcedureBody "}". Program = {ConstantDeclaration | VariableDeclaration} ProcedureDeclaration {ProcedureDeclaration}.Number and Ident are terminal symbols. Program is the start symbol.
Implementation and Turning in Your Assignment
You are free to implement your translator in the programming language of your choice. The translator "must" compile (if necessary) and run on UTCS linux machines.
You should test your translator with the example programs we provide. In addition, please provide at least three non-trivial source programs for testing.
Your assignment should contain the following.
Turn in your assignment by running the following commands on a UTCS Linux machine.
$ # Go the parent directory of the lab1 directory. $ tgz lab1 lab1/ $ turnin --submit hadi cs380c-lab1 lab1.tgz $ turnin --list hadi cs380c-lab1
Grading Rubric
These assignments are derived from similar ones developed by Keshav Pingali. Martin Burtscher developed the csc compiler that the assignments use. Thank you Keshav and Martin.