LLVM is both a great software to learn and a complicated software to install and get started. It has hundreds of tools and options that can easily overwhelm a beginner trying to get started with the system. This document aims to address this issue and provides an easy way to get started with LLVM. We provide very minimal instructions required to install LLVM, write your first pass and run the pass. After following this document, you should be able to take upon more complicated documentation found online.
A lot of people have realized that installing LLVM is hard and have written great tutorials that you can find online. Two of the important (and very descriptive) ones are from the LLVM website itself. Links to both of them are below. Although these two documents provide a lot of details on installation, the problem also is that they provide a lot of details, which is unnecessary for a beginner. You can skip these documents for now and come back to it later. My recommendation is to complete the tutorial on the current webpage, explore LLVM yourself for a bit by writing/modifying passes, and then read the two documents.
The instructions in this document have been tested on Ubuntu 18.04. Although the steps for installation largely remain the same for other platforms, instructions for installing individual packages differ on MAC and Windows. Please look online for support.
sudo apt install cmake
sudo apt install ninja-build
sudo apt install clang
LLVM releases newer versions quite often. The functionalities on each major version differ greatly from one another and reportedly break backward-compatibility. To make sure everyone is on the same LLVM version and to avoid any trouble during grading, we will use the latest stable release, which is 10.0.1.
cd ~
git clone https://github.com/llvm/llvm-project.git llvm
cd llvm
git checkout -b cs380c llvmorg-10.0.1
The repository is around 1.27 GB in size. Before cloning, make sure you have good network connectivity. After the above commands, you will be on the cs380c
branch.
LLVM uses CMake
to generate a build system. You can specify the build system you want and the specified build system is later used to build the LLVM source files. Some build systems of interest are:
You can choose whatever build system you prefer on your machine. On Ubuntu, I prefer Eclipse with Ninja as the build system. The steps to build LLVM are: a) generate the build system using CMake
, b) use Ninja
to build the huge LLVM codebase. Use the following commands:
cd ~/llvm/
mkdir build
cd build
cmake -G Eclipse\ CDT4\ -\ Ninja -DLLVM_TARGETS_TO_BUILD=host ../llvm/
ninja
On my laptop (i7 CPU, 16 GB RAM machine), the build took 28 minutes. If you are using a laptop, make sure you are plugged in and have a novel to read before starting the build. After the initial build, incremental builds using ninja are very fast (usually within a few seconds).
The object files and the binaries are placed in the ~/llvm/build
directory. LLVM supports a wide range of target architectures like x86, ARM, Sparc, MIPS, etc. Building all the targets will take a really long time. Since you will be executing the compiled programs only on your machine, you are only interested in your machine’s architecture. Hence, you can ask LLVM to only support the architecture in your host machine by specifying the -DLLVM_TARGETS_TO_BUILD=host
flag. By doing so, other target architectures are not built and thus the build time is drastically reduced.
The binaries for LLVM tools are placed in ~/llvm/build/bin
. These tools are used to run the LLVM passes. To make it easy to run the tools, it is a good idea to add the path to the binaries to the bash PATH
variable so that you can invoke the tools from any directory. To do so, add the following line to the ~/.bashrc
file at the end of the file.
export PATH=$PATH:~/llvm/build/bin/
Restart the terminal for the change to take effect.
IDEs are very useful in navigating large codebases. While writing your pass, you will need to refer to different classes and different methods available in these classes. It would be useful to use an IDE for these tasks. Since we asked CMake to create the Eclipse with Ninja build system, along with the ninja build files it also generated Eclipse project files that we can use to open the codebase in Eclipse. To import the LLVM project in Eclipse, do the following:
~/llvm/build
as the root directory.IMPORTANT: To make grading easy, your LLVM pass should be named using YOUR EID in the upper case. In this document, I will use SG12345
as the EID. Please make sure you do not use any name other than your EID in the upper case.
To write a new LLVM pass, you need to create the source files and inform CMake to build your source files while building LLVM. If done right, a shared object file (.so file) will be created for your pass. Follow the steps below.
Step 4.1: Create a directory to host your LLVM pass
Most LLVM transformation passes are present in ~/llvm/llvm/lib/Transforms/
. We will create our first pass here. Create a directory to host our source files.
mkdir ~/llvm/llvm/lib/Transforms/SG12345
Step 4.2 Update the CMake file in the Transforms directory
We need to inform CMake that we have created a new pass in a new directory. To do so, add the following line into ~/llvm/llvm/lib/Transforms/CMakeLists.txt
file at the end of the file.
add_subdirectory(SG12345)
Step 4.3 Create a CMake file for your LLVM pass
Now we need to tell CMake how to compile our pass and what its name should be. First, create a CMake file.
cd ~/llvm/llvm/lib/Transforms/SG12345
touch CMakeLists.txt
Add the following to the CMakeLists.txt
file
add_llvm_library(LLVMSG12345 MODULE
SG12345.cpp
PLUGIN_TOOL
opt
)
The above command says, there is only one source file in the pass named SG12345.cpp
. If you add more source files, add the names of the files here. The command also says the name of the .so file should be LLVMSG12345
. Other details are not relevant.
Step 4.4 CPP file for your LLVM pass
touch SG12345.cpp
You can use our reference SG12345.cpp
file here. The implementation of the assignments will start from this file.
Step 4.5 Compiling your first pass
Once you have implemented your pass, it is time to compile.
cd ~/llvm/build
ninja
A .so file will be created for your pass and will be placed in ~/llvm/build/lib/
directory. Your pass is now ready to be executed on some test programs
To run your LLVM pass, you need some test programs. LLVM passes operate on an intermediate representation (IR). Hence, the test programs need to be converted from their high-level language to LLVM IR. Your pass can then be run on the LLVM IR of the test program.
Step 5.1 Creating test program
cd ~/llvm/
mkdir testcases
cd testcases
touch test1.c
Use the following simple program as test1.c
.
#include <stdio.h>
int main() {
printf("hello world\n");
return 0;
}
void hello() {
return;
}
Step 5.2 Generating LLVM IR
clang -emit-llvm -S test1.c
The LLVM IR for test1.c
is dumped to test1.ll
. The representation used for encoding the IR is called LLVM bytecode. The .ll file is the LLVM bytecode for test1.c
.
Step 5.3 Invoking your LLVM pass
The opt
program is used to run your LLVM pass on the LLVM bytecode. The following command loads the object file of the LLVM pass and invokes the corresponding pass. If our reference LLVM pass is used, this command will print the names of functions in the test program to the standard output.
opt -load ../build/lib/LLVMSG12345.so -SG12345 < test1.ll > /dev/null
That’s it. Now you can go back and modify your pass, test on more complicated programs, read more documentation or just give up on compilers altogether. The choice is yours and the world is your playground! :)