Home | CS439 |
Welcome to the first CS 439 project! Like all other projects in this course, you will be working in C. The program you will write in this assignment is a simple Unix shell. The shell is the basic text interface of the Unix world, and you will use shells for most of the remaining projects in this course (not the one you wrote!).
While building this shell, you should gain a much better understanding of how real Unix shells work. You will also gain practice working with the OS API which you will implement in later projects.
A shell is an interactive command-line interpreter (CLI) that runs programs on the behalf of its user. Your shell will behave much like the ones on the CS lab machines: it will repeatedly print a prompt, wait for input from the user, process the input, and then take the appropriate actions.
The input provided by the user takes the form of a command line, which is a sequence of ASCII words separated by whitespace. The first word in the command line is either the name of a built-in command or the pathname of an executable file. The remaining words are command-line arguments. If the first word is a built-in command, the shell immediately executes the command in the current process. Otherwise, the word is assumed to be the pathname of an executable program. In this case, the shell forks a child process and then loads and runs the program in the context of the child. The child processes created as a result of interpreting a single command line are known collectively as a job. A job can consist of multiple child processes.
Here are two example commands:
echo hello world /bin/ls -al
In the first command, the program is echo, and the arguments are hello and world. In the second command, the program is /bin/ls and the argument is -al.
Shells will usually execute the appropriate actions by using the fork-exec method to execute another program. However, in some cases, a command cannot be executed by another process. These commands, known as built-in commands, are instead executed directly by the shell.
Shells also offer other convenience features. For example, the user may want to capture the output of a program in a file for later analysis. The user may also want to be able to run several child processes at the same time. You will implement all of these features in this project.
At the end of this project, you will have a simple, but fully-functioning Unix shell.
Filenames, code, and terminal output (generally, anything you might expect to see in the terminal when working on this project) will be in teletype. Usually, if it's prefaced by unix>, this is something you should type in your regular shell, while things prefaced by utcsh> should be given as input to your project.
We provide starter code for this project. Get it from the class web page either by downloading it in your browser or by running this command from the command line:
unix> wget https://www.cs.utexas.edu/%7Eans/classes/cs439/projects/shell_project/shell_project.tar.gz
Put the file shell_project.tar.gz in the directory in which you plan to do your work. Make sure the directory is protected, so that no one outside your group has access. Then do the following:
Files: Makefile # Compiles your shell program and runs the tests README.shell # Used for submission shell_design.txt # Design questions for Project 0 programming_log.shell # Used as directed by guidelines # Files for Part 0 fib.c # Implement fibonacci here # Files for Part 1 handle.c # Implementation Needed mykill.c # Implementation Needed # Files for Part 2/3 argprinter.c # A test program which can be used to debug execv issues util.c # Instructor-provided utility functions util.h # The header file for util.c utcsh.c # Implement your shell here tests # A directory of tests for your shell examples # A directory with example shell scripts in it
Open the Makefile and familiarize yourself with the provided targets.
Fill out the requested information in README.shell, where applicable.
Read over the questions in the design document.
Every time you work, log your time in the Pair Programming Log, according to the Pair Programming Guidelines.
Read this entire handout and consider the overall design of your shell before writing any code.
In this phase of the project, you will learn about the fork() and exec() system calls that you will use in the rest of the project.
Sections 5.4 and 5.6 of OSTEP may be helpful to read before starting this project. You may also wish to consult the class resources (e.g. on C programming and shell usage) before starting.
Update fib.c so that if invoked on the command line with some integer argument n, where n is less than or equal to 13, it recursively computes the nth Fibonacci number.
Example executions:
unix> fib 0 0 unix> fib 3 2 unix> fib 10 55
The trick is that the program must be written recursively, and each recursive call must be made by a new process (i.e. by calling fork() followed by doFib()).
The parent must wait for the child to complete, and the child must pass the result of its computation to its parent.
Note that these restrictions only apply to fib. Modification of existing functions' signatures is allowed in later parts of this project.
In this phase of the project, you will learn about the signal system call and its use and behavior.
You may find it helpful to read the man pages for sigaction, kill, and nanosleep, as well as the implementation of Signal() in sig_util.c.
For further reading on signals and handling them, consult section 8.5 from Bryant & O'Hallaron.
Write a program in handle.c that first uses the getpid() system call to find its process ID, then prints that ID, and finally loops continuously, printing “Still here\n” once every second. Set up a signal handler so that if you hit ˆc (ctrl-c), the program prints “Nice try.\n” to the screen and continues to loop.
Note: The printf() function is technically unsafe to use in a signal handler. A safer way to print the message is to call:
ssize_t bytes; const int STDOUT = 1; bytes = write(STDOUT, "Nice try.\n", 10); if(bytes != 10) exit(-999);
Note: You should use the nanosleep() library call rather than the sleep() call so that you can maintain your 1-second interval between “Still here” messages no matter how quickly the user hits ˆc. Additionally, most solutions that use alarm() are incorrect.
You can terminate this program using kill -9. For example, if the process ID is 4321:
unix> kill -9 4321
Since handle has control of your current terminal window, you'll need to execute the kill command from another window on the same machine.
Update the program from Part 1.2 to catch the SIGUSR1 signal, print “exiting”, and exit with status equal to 1.
Now write a program mykill.c that takes a process ID as an argument and that sends the SIGUSR1 signal to the specified process ID.
For example:
unix>./handle 4321 Still here Still here Still here unix>./mykill 4321 exiting unix> unix>You should follow this formatting exactly. Do not add "pid: " or any other embellishments. Doing so may break our grading scripts. The stub code is provided for you. Note that, once again, your code should execute properly when compiled with the appropriate targets in the Makefile.
In this part of the assignment, you will start building the basic framework of your shell. At the end of this section, you should have a basic functioning shell framework, which you will extend and upgrade in future sections.
Your basic shell will be called utcsh 1.
Note: Parts 1 and 2 will walk you through a recommended implementation order for the shell. You are not required to implement everything in this order, however, you must implement all functionality in both parts to receive full credit.
Remember to read this entire document before implementing anything!
No additional external reading is required, though if you have not worked with command line interfaces before, you may wish to read Ubuntu's command line tutorial, and review Julia Evan's (@b0rk) comic explaining PATH.
You may also find it helpful to read the man pages for strtok, strcmp, and execv, though this will not be required until later.
The core of any shell is the REPL, or the read-evaluate-print loop. This is a loop that does the following three actions repeatedly:
Implement a REPL in the main() function in utcsh.c. Print utcsh> at the start of the line, then read the user's input.
For now, the only command your REPL respond to is the built-in command exit, which will cause the shell to exit by calling exit(0). You should also call exit if you fail to read a line of input. For reading lines of input, you should use getline(). Run man getline to learn more about this function. Make sure you are using the provided Makefile target to compile.
Recall from the introduction that a command consists of ASCII words separated by space. Implement some way to split a command so that you can recover these words, e.g. you should be able to tell that the 4th word of "path a b c d e" is "d".
We recommend that you use strtok() for this. Read man strtok() very carefully: this function can lead to hours of debugging if you're not careful.
Expand your shell's ability to process built-in commands by adding two new built-in commands:
cd: cd always takes exactly one argument. utcsh should call the chdir() system call with the user-supplied argument.
path: the path command takes zero or more arguments, with each argument separated by whitespace from the others. A typical usage might look like this:
utcsh> path /bin /usr/bin
This command will be used in Part 2.5--for now, just worry about being able to separate the arguments of this command without crashing.
In the previous section, we said that cd must have exactly one argument. What happens if the user enters multiple arguments? Your shell declares that an error has occurred.
Whenever an error occurs, your shell should print the error message on stderr and continue. The only time your shell should exit in response to an error is described in Part 2.6. 2
An example snippet for how to print the error is given below. This snippet may not meet all requirements for this project--check it carefully and modify if needed:
char emsg[30] = "An error has occurred\n"; int nbytes_written = write(STDERR_FILENO, emsg, strlen(emsg)); if(nbytes_written != strlen(emsg)){ exit(2); // Shouldn't really happen -- if it does, error is unrecoverable }
The built-in commands you have implemented should error under the following conditions:
It is never acceptable to crash, segfault, or otherwise break the shell in response to bad user input. Your shell must always exit gracefully, i.e. by calling exit() or returning from main().
If the command given is not one of the three built-in commands, it should be treated as the path to an external executable program.
For these external commands, execute the program using the fork-and-exec method discussed in class. Here are some hints to help you out:
For the child process: The child process must execute the given command by using the execv() call. You may not call system() to run a command. Remember that if execv() returns, there was an error (usually caused by incorrect arguments or the file not existing).
For the parent process: The parent should use wait() or waitpid() to wait on the child. Note that the parent does not care about what happens to the child. As long as fork() succeeds, the parent considers the process launch to have been a success.
The execv() syscall is tricky to get right. Read man execv to learn about some of the common pitfalls. However, if you are still confused, here are some additional hints:
It can be tedious to have to type in commands one at a time. One common solution for this is to create a script by putting a related sequence of commands into a file and using the shell to run that file.
Implement a script system: if utcsh is invoked with one argument, instead of reading commands from stdin, it assumes that its argument is a filename and attempts to read commands one at a time from that file instead of from stdin.
You can find example scripts in the examples/ directory. say_hello.utcsh is the most basic script and consists of a bunch of external commands. There is also the more advanced say_hello_path.utcsh, which relies on the path feature (which you will implement in 3.1).
There are two other changes to utcsh when operating in script mode:
Note that until you finish this section, you will not be able to run the automated test suite. Once you have finished this section, you may check Section 4 for details on running the tests.
At this point, you have a basic shell that can run both built-in and external commands, both from a script and from stdin (keyboard input)--for example, you should be able to run the say_hello.utcsh script in the examples directory.
When you implemented external program execution, you assumed that the 0-th argument was the path to an executable file. Unfortunately, this is annoying for users, because nobody wants to type /usr/local/bin/ls every time they want to run the ls command.
The solution to this is a PATH: a set of user-specified directories to search for external programs. When the shell is given a command it does not recognize, it looks for this program in its PATH.
Note that, for the rest of this document, "path" will refer to a string with slashes in it which is used to locate a file, while PATH will be used to refer to a list of paths used to search for binary files. 3
If the program you're given is not an absolute path, i.e. a path which starts from /, you should search for your program in each directory in the PATH. For example, if your PATH is "/bin" "/usr/bin", you would search for /bin/ls and /usr/bin/ls, executing the first one you found (and returning an error if neither exists). You can check that the file exists and is executable using the functions we provide in the skeleton code. If the file does not exist, or it is not executable, this is an error.
The user can set the PATH with the path command. Each argument to the path corresponds to an entry in the shell's PATH. The path command completely overwrites the existing PATH--it does not append entries. If the PATH is empty because the user executed a path command with no arguments, utcsh cannot execute any external programs unless the full path to the program is provided.
A variable for the PATH is already provided for you in the skeleton code, called shell_paths. You can manipulate this variable directly, or by using the helper functions in util.c/util.h.
Reminder: the shell itself does not implement ls or any other program--it simply looks them up in the path and executes them.
Many times, a shell user prefers to send the output of a program to a file rather than to the screen. Usually, a shell provides this nice feature with the > character. This is called redirection of output. Your shell should include this feature.
For example, if a user types ls -al /tmp > output, nothing should be printed to the screen. Instead, the standard output and standard error of the program should be rerouted to the file output.
If the output file already exists, you should overwrite and truncate it. Look through the flags in man 2 open to find out how to do this.
Here are some rules about the redirection operator:
Your shell will allow the user to launch concurrent commands. Remember: when two things are concurrent, they appear to execute at the same time whether they actually run simultaneously or not (logical parallelism). In UTCSH, this is accomplished with the ampersand operator:
utcsh> cmd1 & cmd2 & cmd3 args1
Instead of running cmd1, waiting for it to finish, and then running cmd2, your shell should run cmd1, cmd2, and cmd3 (with whatever args were passed) before waiting for any of them to complete.
Then, once all processes have been started, you must use wait() or waitpid() to make sure that all processes have completed before moving on.
Each individual command may optionally have its own redirection, e.g.
utcsh> cmd1 > file1 & cmd2 arg1 arg2 > file2 & cmd3 > file3
Note that we can now have multiple commands on a single line. For obvious reasons, we shall hereafter refer to a line of input as a command line. Each command line may have one or more commands in it.
Unlike the redirection operator, the ampersand operator might not have spaces around it. For example cmd1 arg1&cmd2 > file2 is a valid command line, and requests the execution of two commands. In addition, some or all of the commands on either side of the ampersand may be blank. This means that, for example, &&&&&&& is a valid command line.
As you process these commands, there are a number of special cases to consider. In doing so, you may assume the following:
A real concern in any text processing program in C how much memory to allocate for text handling. In order to simplify your shell implementation, you are allowed to limit the size of your inputs according to the macros defined in util.h. When interpreting these limits, remember that a command line may consist of multiple commands.
If the input violates these limits, you may print the error message and continue processing. Do not crash the shell if these limits are violated.
It is possible to write the shell in a way that these limits are not needed, but it is slightly more challenging.
Think carefully about how you design your tokenization routines. Right now, you only have to deal with one command. In Part 2, you're going to deal with multiple commands, possibly each with their own redirects, and each of which can error independently of the others. Make sure your design can grow to accommodate this.
A good basic design is to allocate an array of char*, then use strtok to fill it up one element at a time. At the end of this procedure, array[0] should be the 0-th argument, array[1] should be the 1st, and so on. You should then store this information in a way that allows multiple copies (i.e. not in a global structure, which tends to be a bad idea anyways).
Be extremely careful about doing a == b or a = b when a and b are char*. This likely does not do what you think it does. In order to do the operations, look into strcmp(), strcpy(), and strncpy() in string.h.
We are providing the rough number of lines of code used in the reference solution as a rough hint for you, so you can see how much work is needed for each function. These numbers have been rounded to the nearest multiple of 10.
Function | Lines of Code |
---|---|
tokenize_command_line | 50 lines |
parse_command | 60 lines |
eval | 60 lines |
try_exec_builtin | 60 lines |
exec_external_cmd | 30 lines |
main | 50 lines |
To help you check your work, we've provided a small test suite, along with some tools to help you run it.
Each test in the test suite will check three things from your shell:
If any of these differ from the correct values, the test suite will print an error and tell you what part of the output was wrong, along with commands you can run to see the difference.
In order to make this easier on you, we've included some helper rules in the Makefile to let you run tests easily.
No test should run for more than 10 seconds without either passing or failing. If your test runs for longer than this, you likely have an infinite loop in your code.
In general, you should not look directly at the test files themselves unless you want to modify the tests. If you want to run the command that the test runs, use make describe.
Your shell code will be tested and graded with the output of make utcsh or make (the two rules are equivalent in the provided Makefile). To aid you in debugging, two additional rules have been created in the makefile:
make debug will create a binary which is not heavily-optimized and has more debugging information than the default build. If you want to feed your program into a debugger like gdb, valgrind, or rr, you should use this rule to generate it.
make asan will create a binary with sanitizers. Think of these as extra error checking code that the compiler adds to the program for you. When you run a program that has been compiled with sanitizers, the binary itself will warn you about memory leaks, invalid pointer dereferences, and other such issues.
We do not enable the sanitizers by default because they can turn an otherwise-correct program into an incorrect one, e.g. if your program is correct except for a small memory leak, the sanitized binary will still exit with an error.
You may use these rules to quickly generate programs for debugging, but keep in mind that your grade will be based on the binary generated by make utcsh.
As part of this project, you will submit a design document, where you will describe your design to us. Please note that this document is a set of questions that you will answer and is not free form. Your group will submit one design document.
You must work in two-person teams on this project. Failure to do so will result in a 0 for the project. Once you have contacted your assigned partner, do the following:
You must follow the pair programming guidelines set forth for this class.
Please see the Grading Criteria to understand how failure to follow the pair programming guidelines OR fill out the README.shell will affect your grade.
You must follow the guidelines laid out in the C Style Guide or you will lose points. This includes selecting reasonable names for your files and variables.
This project will be graded on the UTCS public linux machines. We will not assist you in setting up other environments, and you must test and do final debugging on the UTCS public linux machines. The statement "It worked on my machine" will not be considered in the grading process.
The execution of your solution shell will be evaluated using the test cases that are included in your project directory. To receive credit for the test cases, your shell should pass the provided test case, as determined by make clean && make utcsh && make check.
Your code must compile without any additions or adjustments, or you will receive a 0 for the test cases portion of your grade.
Do not use _exit() for this assignment---use exit() instead.
You are encouraged to not use linux.cs.utexas.edu for development. Instead, please find another option using the department's list of public UNIX hosts.
You are encouraged to reuse your own code that you might have developed in previous courses to handle things such as queues, sorting, etc. You are also encouraged to use code provided by a public library such as the GNU library.
You may not look at the written work of any student other than your partner. This includes, for example, looking at another student's screen to help them debug or looking at another student's print-out. See the syllabus for additional details.
If you find that the problem is under specified, please make reasonable assumptions and document them in the README.shell file. Any clarifications or revisions to the assignment will be posted to EdStem.
After you finish your code, use make turnin to submit a compressed tarball named turnin.tar for submission. It may be a good idea to unpack this tarball into a clean directory on a UTCS linux system to make sure it still compiles. You should then upload the file to the Project 0 Test Cases assignment on Canvas. Make sure you have included the necessary information in the README.shell and placed your pair programming log in the project directory. Submitting the wrong tar file will result in a grade of 0, in particular, make sure you do not turn in the original skeleton tar file.
Once you have completed your design document, please submit it to the Project 0 Design and Documentation assignment in Canvas. Make sure you have included your name, CS login, and UT EID in the design document.
The purpose of the design document is to explain and defend your design to us. Its grade will reflect both your answers to the questions and the correctness and completeness of the implementation of your design. It is possible to receive partial credit for speculating on the design of portions you do not implement, but your grade will be reduced due to the lack of implementation.
Code will be evaluated based on its correctness, clarity, and elegance according to the Grading Criteria and detailed Project 0 rubric. Strive for simplicity. Think before you code.
The most important factor in grading your code design and documentation will be code inspection and evaluation of the descriptions in the write-ups. Remember, if your code does not follow the standards, it is wrong. If your code is not clear and easy to understand, it is wrong.
Project adapted from one used in OSTEP. Many thanks to the Drs. Arpaci-Dusseau for permission to use their work.
[1]: This is both an homage to the UTCS
department and a play on the name of the popular tcsh
shell.
[2]: Note that we check the return value
of the write call in spite of the fact that all we can do if it's wrong is
exit. This is good programming practice, and you should be sure to always check
the return codes of any system or library call that you make.
[3]: Sometimes you hear PATH
referred to as "the path," but in most real-world contexts,
you will need to deduce which one is meant from context.