The goal of this assignment is to understand the differences between the native host, a container and a VM by measuring the performance of certain programs in these different environments and trying to understand what influences the end-to-end performance.
We are interested in doing experimental computer science. We will follow the scientific method, which Wikipedia tells me has been around for a long time. But knowing about science in the abstract is relatively easy; actually doing good science is difficult both to learn and to execute.
Let's start with reproducibility. You will write a report for this lab, and in your report you will include details about your system. Think about what it would take to recreate your results. I won't spell out exactly what information you should include, but please include everything relevant while not bogging down your reader. You should report things like the kernel version for your host and guest system. If you used CloudLab, include details about the hardware of the machine type you used.
Your report should answer every question in this lab and should do so in a way that is clearly labeled.
I have a major pet peeve with excessive digits of precision. Your measurements are usually counts. If you average three counts, don't give me six decimal places of precision even if six digits is the default output format for floats in the language you are using. Decide how many digits are meaningful and then report that many. Also, make your decimal points line up when that makes sense. For example, if you report a mean and a standard deviation, make the decimal places always align so you can see easily if the standard deviation is less than a tenth of the mean (which is a good sign for reproducibility).
I would use C or C++, but you can use whatever programming tools you want. One thing I want
you to do both for this class and for real life is always check the return code of
every single sytem call you ever make. I know it sounds a bit pedantic, but start
the habit now and you will have a happier programming life. For almost every system
call all that means is checking if the return code less than zero and if so
call perror
. When system calls don't work, you really want to know
about it early, trust me on this point.
sudo
to launch a container.
getrusage
at the end of your program and print out the fields.
Pay particular attention to utime
, stime
, maxrss
, minflt
, majflt
, inblock
, oublock
, voluntary and involuntary context switches.
Note that the sum of utime
and stime
may not be equal to the elapsed time reported by time
. If you see that, you will want to measure user/system time a different way.
docker run -it --cpus="2" --memory="2g" ubuntu:22.04 /bin/bash
Your first task will be to write a program that mmaps a 1GB region (either file-backed or anonymous) and writes the first byte of each page (chosen in a random order) exactly once. To access each page of a region exactly once in a random order, you might want to generate a random permutation. Here is an example that takes an array and shuffles it based on Fisher-Yates shuffle:
void shuffle(uint64_t *array, size_t n) { if (n > 1) { size_t i; for (i = 0; i < n - 1; i++) { size_t j = i + rand() / (RAND_MAX / (n - i) + 1); uint64_t t = array[j]; array[j] = array[i]; array[i] = t; } } }
We want to produce deterministic results. You should bind your program to a specific core. Also, for this expriment, we want you to make sure that the entire file is cached in the system's page cache before each time you run the expriment. You can write a simple program that sequentially reads the entire file for several times to load the file into the page cache. Before each expriment, you should use fincore to make sure that the entire file is cached in the page cache. The point here is to make sure that the standard deviation of your results is small. Your results are not deterministic if they vary dramatically from experiment to experiment.
First, let's do the expriment on your host machine:
df -hT
to check the types of filesystems on your host machine.
If you cannot find an existing one, create one using fdisk
and mkfs
.
Whatever filesystem you're using here, you have to use it in all the following expriments.
msync
with the MS_SYNC
flag at the end of your program to make sure that all the file changes are written to the disk.
MAP_PRIVATE
and MAP_SHARED
.
Record the elapsed time for your program to finish.
MAP_ANONYMOUS
. Also, try different mmap flags for it, including MAP_PRIVATE
and MAP_SHARED
.
Record the elapsed time for your program to finish.
Next, let's do the same thing in a container under two settings:
--mount
option or the -v
option to expose the same file used in your previous expriments to your container
and make sure that your program is accessing that file (for reproducability).
cd path/to/your/filesystem mkdir lower upper work merged truncate -s 1g lower/file-1g sudo mount -t overlay overlay -o lowerdir=lower,upperdir=upper,workdir=work mergedNow what does the
lower, upper, work
and merged
directories look like, respectively? Again, use the --mount
option or the -v
option to expose the merged
directory to your container.
Note that if you are using the -v
option, you must specify a target for it.
Otherwise, the overlayfs won't work.
For file-backed
cases, including file-backed private
and file-backed shared
, we want you to report the amount of time your program consumes on its first and second run.
Before starting measuring your program, do make sure that lower/file-1g
is cached in the page cache and the upper
directory is empty (I refer to this state as the initial state of your overlayfs).
Then launch your container and do the same expriment using merged/file-1g
for two times and record the elapsed time for each run.
Now what does the lower, upper, work and merged directory look like, respectively?
To restore the initial state of your overlayfs (to record the elapsed time for your first run), umount
the merged
directory, delete all the folders and redo the above instructions to mount a new overlayfs.
Now, let's do the same thing in a VM under two different settings:
lscpu
to check if your machine supports EPT and use cat /sys/module/kvm_intel/parameters/ept
to make sure that EPT is enabled for your VM.
# make sure that all your VMs are killed sudo rmmod kvm_intel # use kvm_amd for AMD machines sudo modprobe kvm_intel ept=0 # use npt=0 for AMD machines # relaunch your VMsYou can specify
ept=1
(for Intel) or npt=1
(for AMD) to enable EPT again.
Summarize your results in the table below. Please also include the standard deviation of your results.
file-backed private | file-backed shared | anonymous private | anonymous shared | |
---|---|---|---|---|
Native host | ||||
Container using host FS | ||||
Container using overlayfs |
first run: second run: |
first run: second run: |
||
VM with EPT | ||||
VM without EPT |
MAP_PRIVATE
and MAP_SHARED
? Explain the differences.
file-backed private
case in a VM is slow, can you explain why? How to improve the performance?
sequential read | sequential write | random read | random write | |
---|---|---|---|---|
Native host | ||||
Container using host FS | ||||
Container using overlayfs |
first run: second run: |
first run: second run: |
first run: second run: |
first run: second run: |
VM with EPT |
We want you to measure the performance of direct file I/O, including random read/write and sequential read/write.
Write a program that opens the same file used in the previous sections using O_DIRECT
.
Construct an offset_array
and pass it to the function below.
Here, IO_SIZE
is a macro, which defines the size of each I/O request.
We use 4096 bytes as the I/O size.
offset_array
stores the offset of each I/O request.
For sequential read/write, offset_array
should look like {0, 4096, 8192, 12288, ..., FILE_SIZE - 4096}.
n
is the length of the offset_array
.
For random read/write, generate a random permutation of the sequential offset_array
and pass it to the function below.
If the opt_read
flag is true, we read from the file, if it is false, we write to the file.
In the case of container using overlayfs, just like what we did in the last experiment, we want you to report the amount of time your program consumes on its first and second run.
Again, before starting measuring your program, do make sure that lower/file-1g
is cached and the upper
directory is empty.
You can use the same instructions in the last experiment to mount an overlayfs and restore its initial state.
#define IO_SIZE 4096 void do_file_io(int fd, char *buf, uint64_t *offset_array, size_t n, int opt_read) { int ret = 0; for (int i = 0; i < n; i++) { ret = lseek(fd, offset_array[i], SEEK_SET); if (ret == -1) { perror("lseek"); exit(-1); } if (opt_read) ret = read(fd, buf, IO_SIZE); else ret = write(fd, buf, IO_SIZE); if (ret == -1) { perror("read/write"); exit(-1); } } }
Once again we want you to compare the performance measurements and explain the differences. The tools linked above might be helpful to you to better understand the measured performance. Please answer the following questions on your report:
O_DIRECT
do?
This is the final part of your lab.
In this expriment we want you to understand the functionality of swap.
You will use the same program written in the first expriment "Measuring mmap" for this expriment.
More specifically, we want you to consider the anonymous private
case of your program.
Please answer the following questions:
For containers:
--memory="500m"
in the Docker command.
Then Run your program.
Does it finish successfully? Explain why.
--memory-swap="1.5g"
to your docker command (Don't remove the --memory="500m"
flag).
Also, please make sure that swap is enabled on your host machine.
You can use free -m
to check if it is enabled.
Run your program again.
Does it finish successfully? Explain why.
For VMs:
The first section should include everything the reader needs to reproduce all your results. As always, report your experimental platform. Describe the software you are using, like the version of your kernel, VM images and docker images.
The second section should include the results of your first expriment and your answers and explanations to the corresponding questions. Use the table we specified above to report your results. Please specify the units of your measurements. Your report should answer every question and should do so in a way that is clearly labeled. Your explanation should include how you used the tools to help you understand the differences in performance. Don't just include your hypothesis! Use tools to measure your programs and support your hypothesis.
The third section should include the results of your second expriment and your answers and explanations to the corresponding questions. The requirements are the same as the second section.
The final section should include your answers and explanations to the questions in the third expriment. Please also include the output of your program.
Please report how much time you spent on the lab.
Configure your VM with at least two virtual CPUs, but first confirm that your host system has at least two CPUs.
Check for perf
availability in your host system before
checking/installing in the guest.
If you run perf list
on the command line, it will tell
you what counters are supported by your combination of hardware, OS
and perf tools.
I'm not sure if it is necessary, but if you get a lot of variation in
your results for the experiments that follow, you might want to disable
CPU frequency scaling on your system. I would do this in the BIOS, but you
can also try user-level tools like this one that allow you to set the frequency
directly (or perhaps the "-g performance" option would work, I'm not sure).
Here is a tool.
https://manpages.ubuntu.com/manpages/hardy/man1/cpufreq-selector.1.html
Your report should be a PDF file submitted to canvas.
Please include how much time you spent on the lab.
Your code will have to run with many different configurations. Consider usinggetopt
, or maybe you would prefer a configuration file, but I find command line options superior for this sort of task as they are more explicit and more easily scripted.