Laboratory Assignment 2 CS 340d Unique Number: 50960 Spring, 2019 Given: March 13, 2019 Due: April 22, 2019 This laboratory concerns developing and observing properties about memory allocation and a memory garbage collector (GC). Many computer algorithms use memory in a temporary manner, and then return it for later use by other algorithms. Likely, the most familiar form of memory reuse is a stack, such as used to implement function call and return of typical C-language programs. When a function returns, the memory storage associated with a called function is ``returned'' and may be reused by other algorithms. To manage persistent but dynamically storage structures, many languages provide automated memory allocation and collection mechanisms. Other than stack-based allocation, the C Language does not provide storage allocation primitives. However, some automated storage allocation/collection mechanisms are provided by C++ and Java. In this laboratory, we will explore such allocation and collection mechanism. Systems that provide automatic collection of unreachable storage usually provide a ``garbage collector'' that recycles storage for its repeated use. So, is garbage collection? There have been entire books written on garbage collections; for example, see the book "The Garbage Collection Handbook" by Richard Jones, Antony Hosking, and Eliot Moss, CRC Press, ISBN-13: 978-1-4200-8279-1. For Java users, the automated collection (of some kinds) of storage is implemented with a garbage collector. In dynamic languages, like Lisp, storage is reclaimed and reused automatically. Our investigation of this topic will help you be a better user and programmer; hopefully, this laboratory will make clear how ``automatic'' storage allocation and collection operates. And, we will use the results of this project to create a formal verification tool in Lab #3. In this laboratory, we will implement a copying-style garbage collector. As a part of this laboratory, students are tasked to specify invariants that will help assure the correctness of their implementations. General Comment Before we describe this laboratory assignment, we describe our philosophy for our all laboratory assignments and for many of our homework assignments. We expect our programs to implement their requirements with mathematical precision, but programs are generally specified with natural language. To this point in your education, most programming assignments have included some description of what program you should write, and then, you are expected to interpret the documentation and produce a result. It requires tremendous care and precision to write a precise description of any computation in a natural language -- it is certainly beyond our ability to write completely precise, natural-language specifications. We would like to write mathematical specifications, but that would require us learn mathematics for most of semester. As a community of software developers, this approach would be extremely valuable where it can be deployed, but it is not yet a mature discipline. Even so, we will sometimes refer to programs that can be specified formally. And, time permitting, we may redo this laboratory with better technology later this semester. Laboratory Requirements This laboratory involves implementing a garbage collector. For the interested student, there is an opportunity to convert their garbage collector into a real-time garbage collector. Your garbage collector should be based on the algorithm that can be found in Henry Baker's 1978 CACM article titled: "List Processing in Real-Time on a Serial Computer", CACM, 21(4):280-294, doi: 10.1145/359460.359470. This laboratory requires you to implement the CONS, CAR, CDR, RPLACA, RPLACD, EQ, and ATOM primitives; in turn, these primitives can be used to implement a Lisp system, or a BDD package. You are encouraged to use Baker's article (mentioned above) as a guide to implementing your storage management system. What do these seven functions do? CONS( x, y ) -- Forms a single object from objects x and y CAR( p ) -- When p references a pair x, y, it returns x CDR( p ) -- When p references a pair x, y, it returns y RPLACA( x, a ) -- Replace x of pair x, y, with a RPLACD( x, b ) -- Replace x of pair x, y, with b EQ( r, s ) -- Do r and s reference the same object? ATOM( k ) -- Does k reference a non-pair? Your program should allow a command-line argument to specify the number maximum number of CONS cells that can be stored by your allocation and garbage-collection system; your system should allow for at least 100,000,000 CONS cells. The only atoms that your system needs to implement are 60-bit integers and the constants T and NIL. (Note: a Lisp system would implement additional types of atoms, such as unbounded integers, rationals, symbols, strings, etc.) We specify 60-bit integers as this leaves four bits for type information. The constants T and NIL can be implemented using type information alone. Some familiarity with Lisp may help provide context. Using the functions above, your implementation should also provide: EQUAL( x, y ) -- Returns T, when x and y are identical objects; otherwise, NIL You should be very careful when implementing your project. You should include an invariant that insures that the total amount of reachable space plus the total amount of free space equal the amount of space available for CONS cells. This invariant can be more subtle to define than you might think. On April 1st (uh oh), we will release some list-processing functions (e.g., list append, list reverse, ...) that you will implement as C functions that make reference to the subroutines you implement for the seven functions (loosely) defined above. On April 8th, we will release some tougher tests for your allocator and GC system. These tests will stress the integrity of your allocator and garbage collector. Laboratory Documentation Finally, for the writing component, you need to include in your solution program a 130-line to 150-line description of your implementation. This description should be included as a C-language comment that begins with a line containing only "/*" and ends with a line containing only " */", and written in the (approximate) format of a typical Linux manual entry. Remember, this class carries a writing flag, and this kind of summary will be required for all of the class laboratory assignments. Grading Your laboratory will be graded with the following weights: ?0% - A correctly functioning allocator and garbage collector; your implementation must implement all seven primitives. As noted above, we will release list processing functions that must be implemented. And, your program is expected to operate correctly on our stress tests. 30% - Written description of your solutions. Extra Credit 25% - A correctly functioning, real-time (RT) garbage collector. This incremental RT GC needs to work as seamlessly as the copying collector you have already implemented. Be careful with what you write. We will grade the functioning of your allocator and garbage on sets with millions of requests. And we will read your documentation carefully, looking for problems (grammar, spelling, run-on sentences, tense agreement, manual entry formatting, etc.) -- errors will lower your grade.