Project Guidelines

The project in this course is to do some research to advance the state of the art in computer architecture. Projects should be done in two-person teams. Talk to me if you want to work alone or in a different sized group.

The projects will be graded in the same way that I referee conference papers -- by what you discover in doing the project, how coherently you present your results, how well you put your work in perspective with other research. The goal to shoot for is a conference paper like those published at ASPLOS or ISCA. Since time is limited, however, that goal is hard to reach, and I will reward those that aim high even if they do not completely succeed. The key is ensuring that some aspects of your work are completely done; it is hard to grade a project where the simulator did not quite work.

In keeping with the quantitative nature of computer architecture, projects should be designed with a rigorous experimental focus. Build your project should around a falsifiable hypothesis. You should measure some aspect of a real system or simulate some architectural approach. For example, you could examine a modest extension to a paper in ASPLOS or ISCA or simply revalidate the result of some paper with your own simulator.

The bulk of the project will be organized around a project talk and paper like a traditional conference. In addition, there will be a few intermediate checkpoints to encourage you to get started early and to provide feedback.

Topic interests (Jan 28)

Turn in your name, e-mail address, and a list of 2 or 3 project areas in which you are most interested in working. I will create a list of what everyone in the class is interested in and distribute it. You can use this list to find other people who are interested in similar topics and with whom you might want to work with on the project. If you have already formed a team, your team can turn in a single topic interest list with both names on it.

Topic selection (Feb 6)

Turn in the names of the people in your group and the name of your project with a 1 paragraph description of the hypothesis you plan to test and the general approach you plan to take. To get the ball rolling, below is a list of possible topics. You should also look at recent ASPLOS, ISCA, IEEE Transactions on Computers, ACM Transactions on Computer Systems, IEEE Computer, IEEE Micro, and Microprocessor Report for ideas. I also like topics that apply to your own research strengths to architecture. For example, if you are a database student, you could study the interaction between a database and an architecture.

For many of these studies, keep in mind technology trends. How can you use 10M, 100M, 1G transistors to improve performance? How do we get good performance when a memory access costs 1000 instruction times? How do we deal with new Web, Object Oriented, or Multimedia workloads? How do we get good I/O performance when disks are getting slower and slower relative to CPUs?

Select a paper that interests you from a recent ASPLOS or ISCA proceedings. Construct a simulator that will allow you to reproduce their main results and validate your simulator using their workload or a similar one. Are there any major assumptions the authors didn't mention in the paper? Use your simulator to evaluate their technique under a new workload or improve their technique and quantify your improvements.
One of the key problems in architectures is that it is often more difficult to improve latency than bandwidth. Prefetching is one technique that can hide latency. Here are some possible prefetching topics:
- Quantify the limits of history-based prefetching. Prediction by partial matching (originally developed for text compression) has been shown to provide optimal prediction of future values based on past values. Using PPM, what are the limits of memory or disk prefetching? What input information (e.g., last k instruction addresses, last j data addresses, distance between last k addresses, last value loaded, ...) best predicts future fetches? What is the best trade-off between state used to store history information and prefetch performance? Contact Mike for some code that implements PPM to help you get started.
- Add a small amount of hardware to DRAM memory chips to exploit DRAM internal bandwidths to avoid DRAM latencies. Evaluates the performance benefits that can be gained and the costs of modifying the hardware.
- Examine hybrid, 2-level, and "optimal" compression-based branch prediction schemes under a range of memory or disk workloads.
Algorithm designers for parallel machines face a dilemma. If their algorithms ignore important architectural details, the resulting algorithms may be theoretically pleasing but practically useless. Coversely, if their algorithms account for too many architectural details, the algorithm descriptions may be overly complex and the algorithms may not be portable across different parallel machines. The QSM (Queued Shared Memory) model, developed by Professor Ramachandran et. al, attempts to resolve this dilemma by providing a shared memory abstration that accounts for memory bandwidth and contention, but which hides details such as memory banks, memory latency, and message overhead. The argument for ignoring these details is that they can generally be resolved in standard, architecture-dependent ways. Thus, in this model the algorithm designer can develop simple, architecture neutral algorithms that may be mechanically translated into implementations that will run well on particular architectures. In previous work, it has been demonstrated that QSM algorithms work well on traditional parallel machines such as the Cray C90, Cray J90, and MasPar MP-1. One project is to determine if the QSM is a suitable model for designing algorithms to run on a network of workstations. Here are some ideas on how to proceed.
Other QSM possibilities:
- Examine the architectural details that QSM hides and analytically determine for a number of important algorithms for what combination of (problem size, number of nodes, processor speed, network overhead, ...) these assumptions may be safely ignored by the algorithm designer and when these assumptions must be included to get the "right" algorithm. How do these parameters compare with practical problem sizes and existing architectures? For one of these algorithms, implement an example that illustrates where the QSM model breaks down and compare its measured performance to that of the analytical model.
- Implement a run-time system for automatically translating from a QSM algorithm description to a program that has good performance on some parallel architecture. For example, this system might pipeline network messages to hide latency, group network messages to hide overhead, efficiently multiplex many "virtual processors" onto a smaller number of physical processors, .... (This project is almost certainly far to challenging to pull of in a semester, but is there a small piece of the problem you can attack to make progress to this eventual goal?)
In some current machines, the 2nd or 3rd level cache is larger than what the TLB can map. Surprisingly, you have to trap to the operating system to access cached memory. How much does this limit cache performance? How can we avoid these traps? Larger pages? A multi-level TLB?
Are LFS's good for RAIDs? Are RAIDs good for LFS? Log-structured file systems have various advantages, but can behave poorly in some corner cases (e.g. random writes to a nearly-full never-idle disk). As a result, some have suggested changes to LFS and some have suggested abandoning the idea completely. All of these studies, however, have looked at single-node disk systems. In RAIDs the cost of these solutions may be much higher (since they increase the number of small writes in the system). On the other hand, RAIDs may make the desired LFS segment size larger than single disks, which may make LFS less attractive.
Compare a hardware implementation of a Java virtual machine to software emulation on a fast processor. Given the checkered history of hardware instruction sets targeted to specific high-level languages, why does Sun think this approach will work for Java? Are they right?
Over the past 2 decades, memory sizes have increased by a factor of 1000, and page sizes by only a factor of 2-4. Should page sizes be dramatically larger, or are a few large "superpages" sufficient to offset this trend in most cases?
Compare 4 visions for future architectures -- traditional (big caches, super-pipelined/super-scaler, speculative execution, ...), IRAM (combine processor and RAM on one chip), single-chip multiprocessors, and multi-threading with fast context switches to tolerate memory latencies.
Evaluate load-value locality and load value prediction as a way to reduce memory latency and bandwidth requirements. (Load-value prediction is like banrch prediction -- rather than wait to see what the answer is, the processor guesses what the answer will be and proceeds assuming that that guess is true.)
Extend Transparent Informed Prefetching (Patterson et al (SOSP95)) for page-level prefetching/caches to balance cache-line hardware prefecthing v. hardware caching.
Cooperative caching uses fast networks to access remote memory in liu of disk accesses. One drawback is that a user's data may be stored on multiple machines, potentially opening security holes (eavesdropping, modification). Encryption and digital signatures may solve the problem, but could slow down the system. Evaluate the performance impact of adding encryption and digital signatures to cooperatively cached data and project this performance into the future as processor speeds improve.
As memory latencies increase, cache miss times could approach 1000 cycles or more. This is nearly the same ratio of memory access times as were seeen for early VM paging systems. As miss times become so extremely bad, is it time to give control of cache replacement to the software?
Your good idea here...

Project proposal (Feb 27)

Proposals should include (1) a crisp statement of the hypothesis that you will test, (2) a description of your topic, (3) a statement of why you think the topic is important, (4) a description of the methods you will use to evaluate your ideas, and (5) references to at least three papers you have obtained with a critique of their approaches as they relate to your work. Proposals should not exceed 2 pages in length.

Project checkpoint (April 10)

In 2 pages or less, summarize your progress. Describe any initial results. Describe any changes in your project's scope or direction now that you know more about the topic. List the major milestones you have completed and the milestones that you must complete to successfully finish your study.

Project presentations (May 4 - May 8)

We will divide up the last couple lectures into 20-minute-ish conference-style talks. We will probably have to schedule some additional class time to complete the talks. All group members should deliver part of the talk. The talk should give highlights of the final report, including the problem, motivation, results, conclusion, and possible future work. Time limits will be enforced to let everyone present. Please, practice your talk to make it better and see how long it is. Have a plan for what slides to skip if you get behind. I will provide more advice on the talks later in the semester.

Written report (May 11)

The written reports should follow the same outline you would follow for a conference paper, and they should be 20 or fewer pages in length (double-spaced; shorter if single-spaced). I'll give more suggestions and details later in the semester.