Project Guidelines
The project in this course is to do some research to advance the state
of the art in computer architecture. Projects should be done in two-person
teams. Talk to me if you want to work alone or in a different sized group.
The projects will be graded in the same way that I referee conference
papers -- by what you discover in doing the project, how coherently you
present your results, how well you put your work in perspective with other
research. The goal to shoot for is a conference paper like those published
at ASPLOS or ISCA. Since time is limited, however, that goal is hard to
reach, and I will reward those that aim high even if they do not completely
succeed. The key is ensuring that some aspects of your work are completely
done; it is hard to grade a project where the simulator did not quite work.
In keeping with the quantitative nature of computer architecture, projects
should be designed with a rigorous experimental focus. Build your project
should around a falsifiable hypothesis. You should measure some aspect
of a real system or simulate some architectural approach. For example,
you could examine a modest extension to a paper in ASPLOS or ISCA or simply
revalidate the result of some paper with your own simulator.
The bulk of the project will be organized around a project talk and
paper like a traditional conference. In addition, there will be a few intermediate
checkpoints to encourage you to get started early and to provide feedback.
Topic interests (Jan 28)
Turn in your name, e-mail address, and a list of 2 or 3 project
areas in which you are most interested in working. I will create a list of
what everyone in the class is interested in and distribute it. You can
use this list to find other people who are interested in similar
topics and with whom you might want to work with on the project. If you
have already formed a team, your team can turn in a single topic
interest list with both names on it.
Topic selection (Feb 6)
Turn in the names of the people in your group and the name of your project
with a 1 paragraph description of the hypothesis you plan to test and the
general approach you plan to take. To get the ball rolling, below is a
list of possible topics. You should also look at recent ASPLOS, ISCA, IEEE
Transactions on Computers, ACM Transactions on Computer Systems, IEEE Computer,
IEEE Micro, and Microprocessor Report for ideas. I also like topics that
apply to your own research strengths to architecture. For example, if you
are a database student, you could study the interaction between a database
and an architecture.
For many of these studies, keep in mind technology trends. How can you
use 10M, 100M, 1G transistors to improve performance? How do we get good
performance when a memory access costs 1000 instruction times? How do we
deal with new Web, Object Oriented, or Multimedia workloads? How do we
get good I/O performance when disks are getting slower and slower relative
to CPUs?
-
Select a paper that interests you from a recent ASPLOS or ISCA
proceedings. Construct a simulator that will allow you to reproduce
their main results and validate your simulator using their workload or
a similar one. Are there any major assumptions the authors didn't
mention in the paper? Use your simulator to evaluate their technique
under a new workload or improve their technique and quantify your
improvements.
- One of the key problems in architectures is that it is often more
difficult to improve latency than bandwidth. Prefetching is one
technique that can hide latency. Here are some possible prefetching
topics:
-
Quantify the limits of history-based prefetching. Prediction by
partial matching (originally developed for text compression) has been
shown to provide optimal prediction of future values based on past
values. Using PPM, what are the limits of memory or disk prefetching?
What input information (e.g., last k instruction addresses, last j
data addresses, distance between last k addresses, last value loaded,
...) best predicts future fetches? What is the best trade-off between
state used to store history information and prefetch performance?
Contact Mike for some code that implements PPM to help you get started.
- Add a small amount of hardware to DRAM memory chips to exploit
DRAM internal bandwidths to avoid DRAM latencies. Evaluates the
performance benefits that can be gained and the costs of modifying the
hardware.
- Examine hybrid, 2-level, and "optimal" compression-based
branch prediction schemes under a range of memory or disk workloads.
- Algorithm designers for parallel machines face a dilemma. If their
algorithms ignore important architectural details, the resulting
algorithms may be theoretically pleasing but practically
useless. Coversely, if their algorithms account for too many
architectural details, the algorithm descriptions may be overly
complex and the algorithms may not be portable across different
parallel machines. The QSM (Queued Shared Memory) model, developed by
Professor Ramachandran et. al, attempts to resolve this dilemma by providing
a shared memory abstration that accounts for memory bandwidth and
contention, but which hides details such as memory banks, memory
latency, and message overhead. The argument for ignoring these details
is that they can generally be resolved in standard,
architecture-dependent ways. Thus, in this model the algorithm
designer can develop simple, architecture neutral algorithms that may
be mechanically translated into implementations that will run well on
particular architectures. In previous work, it has been demonstrated
that QSM algorithms work well on traditional parallel machines such as
the Cray C90, Cray J90, and MasPar MP-1. One project is to determine
if the QSM is a suitable model for designing algorithms to run on a
network of workstations. Here are some ideas on how
to proceed.
Other QSM possibilities:
- Examine the architectural details
that QSM hides and analytically determine for a number of important
algorithms for what combination of (problem size, number of nodes,
processor speed, network overhead, ...) these assumptions may be
safely ignored by the algorithm designer and when these assumptions
must be included to get the "right" algorithm. How do these parameters
compare with practical problem sizes and existing architectures? For
one of these algorithms, implement an example that illustrates where
the QSM model breaks down and compare its measured performance to that
of the analytical model.
- Implement a run-time system for automatically translating from a
QSM algorithm description to a program that has good performance on
some parallel architecture. For example, this system might pipeline
network messages to hide latency, group network messages to hide
overhead, efficiently multiplex many "virtual processors" onto a
smaller number of physical processors, .... (This project is almost
certainly far to challenging to pull of in a semester, but is there a
small piece of the problem you can attack to make progress to this
eventual goal?)
- In some current machines, the 2nd or 3rd level cache is larger than
what the TLB can map. Surprisingly, you have to trap to the operating system
to access cached memory. How much does this limit cache performance? How
can we avoid these traps? Larger pages? A multi-level TLB?
- Are LFS's good for RAIDs? Are RAIDs good for LFS?
Log-structured file systems have various advantages, but can behave
poorly in some corner cases (e.g. random writes to a nearly-full
never-idle disk). As a result, some have suggested changes to LFS
and some have suggested abandoning the idea
completely. All of these studies, however, have looked at
single-node disk systems. In RAIDs the cost of these solutions may be
much higher (since they increase the number of small writes in the
system). On the other hand, RAIDs may make the desired LFS segment
size larger than single disks, which may make LFS less attractive.
- Compare a hardware implementation of a Java virtual machine to software
emulation on a fast processor. Given the checkered history of hardware
instruction sets targeted to specific high-level languages, why does Sun
think this approach will work for Java? Are they right?
- Over the past 2 decades, memory sizes have increased by a factor of
1000, and page sizes by only a factor of 2-4. Should page sizes be dramatically
larger, or are a few large "superpages" sufficient to offset
this trend in most cases?
- Compare 4 visions for future architectures -- traditional (big caches,
super-pipelined/super-scaler, speculative execution, ...), IRAM (combine
processor and RAM on one chip), single-chip multiprocessors, and multi-threading
with fast context switches to tolerate memory latencies.
- Evaluate load-value locality and load value prediction as a way to
reduce memory latency and bandwidth requirements. (Load-value prediction
is like banrch prediction -- rather than wait to see what the answer is,
the processor guesses what the answer will be and proceeds assuming that
that guess is true.)
- Extend Transparent Informed Prefetching (Patterson et al (SOSP95))
for page-level prefetching/caches to balance cache-line hardware prefecthing
v. hardware caching.
- Cooperative caching uses fast networks to access remote memory in
liu of disk accesses. One drawback is that a user's data may be stored
on multiple machines, potentially opening security holes
(eavesdropping, modification). Encryption and digital signatures may
solve the problem, but could slow down the system. Evaluate the
performance impact of adding encryption and digital signatures to
cooperatively cached data and project this performance into the future
as processor speeds improve.
- As memory latencies increase, cache miss times could approach 1000
cycles or more. This is nearly the same ratio of memory access times
as were seeen for early VM paging systems. As miss times become so
extremely bad, is it time to give control of cache replacement to the
software?
- Your good idea here...
Project proposal (Feb 27)
Proposals should include (1) a crisp statement of the hypothesis
that you will test, (2) a description of your topic, (3) a statement
of why you think the topic is important, (4) a description of the methods
you will use to evaluate your ideas, and (5) references to at least three
papers you have obtained with a critique of their approaches as they
relate to your work.
Proposals should not exceed 2 pages in length.
Project checkpoint (April 10)
In 2 pages or less, summarize your progress. Describe any initial results.
Describe any changes in your project's scope or direction now that you
know more about the topic. List the major milestones you have completed
and the milestones that you must complete to successfully finish your study.
Project presentations (May 4 - May 8)
We will divide up the last couple lectures into 20-minute-ish conference-style
talks. We will probably have to schedule some additional class time to
complete the talks. All group members should deliver part of the talk.
The talk should give highlights of the final report, including the problem,
motivation, results, conclusion, and possible future work. Time limits
will be enforced to let everyone present. Please, practice your talk to
make it better and see how long it is. Have a plan for what slides to skip
if you get behind. I will provide more advice on the talks later in the
semester.
Written report (May 11)
The written reports should follow the same outline you would follow
for a conference paper, and they should be 20 or fewer pages in length
(double-spaced; shorter if single-spaced).
I'll give more suggestions and details later in the semester.