Getting started with LLVM

 
LLVM is both a great software to learn and a complicated software to install and get started. It has hundreds of tools and options that can easily overwhelm a beginner trying to get started with the system. This document aims to address this issue and provides an easy way to get started with LLVM. We provide very minimal instructions required to install LLVM, write your first pass and run the pass. After following this document, you should be able to take upon more complicated documentation found online.

Resources

A lot of people have realized that installing LLVM is hard and have written great tutorials that you can find online. Two of the important (and very descriptive) ones are from the LLVM website itself. Links to both of them are below. Although these two documents provide a lot of details on installation, the problem also is that they provide a lot of details, which is unnecessary for a beginner. You can skip these documents for now and come back to it later. My recommendation is to complete the tutorial on the current webpage, explore LLVM yourself for a bit by writing/modifying passes, and then read the two documents.

  1. Getting started with the LLVM system
  2. Writing an LLVM Pass

Note about the host environment

The instructions in this document have been tested on Ubuntu 18.04. Although the steps for installation largely remain the same for other platforms, instructions for installing individual packages differ on MAC and Windows. Please look online for support.

Step 0. Required software

  1. CMake
    sudo apt install cmake
    [Tested with cmake version 3.10.2]
  2. Ninja
    sudo apt install ninja-build
    [Tested with ninja version 1.8.2]
  3. Clang
    sudo apt install clang
    [Tested with clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final)]
  4. Eclipse - with C++ development support
    Install from the official website.
    [Tested with Eclipse 2019-06 (4.12.0)]

Step 1. Cloning LLVM repository

LLVM releases newer versions quite often. The functionalities on each major version differ greatly from one another and reportedly break backward-compatibility. To make sure everyone is on the same LLVM version and to avoid any trouble during grading, we will use the latest stable release, which is 10.0.1.

cd ~
git clone https://github.com/llvm/llvm-project.git llvm
cd llvm
git checkout -b cs380c llvmorg-10.0.1

The repository is around 1.27 GB in size. Before cloning, make sure you have good network connectivity. After the above commands, you will be on the cs380c branch.

Step 2. Building LLVM

LLVM uses CMake to generate a build system. You can specify the build system you want and the specified build system is later used to build the LLVM source files. Some build systems of interest are:

You can choose whatever build system you prefer on your machine. On Ubuntu, I prefer Eclipse with Ninja as the build system. The steps to build LLVM are: a) generate the build system using CMake, b) use Ninja to build the huge LLVM codebase. Use the following commands:

cd ~/llvm/
mkdir build
cd build
cmake -G Eclipse\ CDT4\ -\ Ninja -DLLVM_TARGETS_TO_BUILD=host ../llvm/
ninja

On my laptop (i7 CPU, 16 GB RAM machine), the build took 28 minutes. If you are using a laptop, make sure you are plugged in and have a novel to read before starting the build. After the initial build, incremental builds using ninja are very fast (usually within a few seconds).

The object files and the binaries are placed in the ~/llvm/build directory. LLVM supports a wide range of target architectures like x86, ARM, Sparc, MIPS, etc. Building all the targets will take a really long time. Since you will be executing the compiled programs only on your machine, you are only interested in your machine’s architecture. Hence, you can ask LLVM to only support the architecture in your host machine by specifying the -DLLVM_TARGETS_TO_BUILD=host flag. By doing so, other target architectures are not built and thus the build time is drastically reduced.

The binaries for LLVM tools are placed in ~/llvm/build/bin. These tools are used to run the LLVM passes. To make it easy to run the tools, it is a good idea to add the path to the binaries to the bash PATH variable so that you can invoke the tools from any directory. To do so, add the following line to the ~/.bashrc file at the end of the file.
export PATH=$PATH:~/llvm/build/bin/
Restart the terminal for the change to take effect.

Step 3. (Optional) Using Eclipse IDE for development

IDEs are very useful in navigating large codebases. While writing your pass, you will need to refer to different classes and different methods available in these classes. It would be useful to use an IDE for these tasks. Since we asked CMake to create the Eclipse with Ninja build system, along with the ninja build files it also generated Eclipse project files that we can use to open the codebase in Eclipse. To import the LLVM project in Eclipse, do the following:

  1. Open Eclipse
  2. File -> Import -> General -> Existing Projects into Workspace
  3. Choose ~/llvm/build as the root directory.

Step 4. Writing your first pass

IMPORTANT: To make grading easy, your LLVM pass should be named using YOUR EID in the upper case. In this document, I will use SG12345 as the EID. Please make sure you do not use any name other than your EID in the upper case.

To write a new LLVM pass, you need to create the source files and inform CMake to build your source files while building LLVM. If done right, a shared object file (.so file) will be created for your pass. Follow the steps below.

  1. Create a directory to host your LLVM pass
  2. Update the CMake file in the Transforms directory
  3. Create a CMake file for your LLVM pass
  4. CPP file for your LLVM pass
  5. Compiling your first pass

Step 4.1: Create a directory to host your LLVM pass

Most LLVM transformation passes are present in ~/llvm/llvm/lib/Transforms/. We will create our first pass here. Create a directory to host our source files.
mkdir ~/llvm/llvm/lib/Transforms/SG12345

Step 4.2 Update the CMake file in the Transforms directory

We need to inform CMake that we have created a new pass in a new directory. To do so, add the following line into ~/llvm/llvm/lib/Transforms/CMakeLists.txt file at the end of the file.

add_subdirectory(SG12345)

Step 4.3 Create a CMake file for your LLVM pass

Now we need to tell CMake how to compile our pass and what its name should be. First, create a CMake file.

cd ~/llvm/llvm/lib/Transforms/SG12345
touch CMakeLists.txt

Add the following to the CMakeLists.txt file

add_llvm_library(LLVMSG12345 MODULE
  SG12345.cpp

  PLUGIN_TOOL
  opt
)

The above command says, there is only one source file in the pass named SG12345.cpp. If you add more source files, add the names of the files here. The command also says the name of the .so file should be LLVMSG12345. Other details are not relevant.

Step 4.4 CPP file for your LLVM pass

touch SG12345.cpp
You can use our reference SG12345.cpp file here. The implementation of the assignments will start from this file.

Step 4.5 Compiling your first pass

Once you have implemented your pass, it is time to compile.

cd ~/llvm/build
ninja

A .so file will be created for your pass and will be placed in ~/llvm/build/lib/ directory. Your pass is now ready to be executed on some test programs

Step 5. Running your first pass on the test input

To run your LLVM pass, you need some test programs. LLVM passes operate on an intermediate representation (IR). Hence, the test programs need to be converted from their high-level language to LLVM IR. Your pass can then be run on the LLVM IR of the test program.

Step 5.1 Creating test program

cd ~/llvm/
mkdir testcases
cd testcases
touch test1.c

Use the following simple program as test1.c.

#include <stdio.h>

int main() {
    printf("hello world\n");
    return 0;
}

void hello() {
    return;
}

Step 5.2 Generating LLVM IR

clang -emit-llvm -S test1.c
The LLVM IR for test1.c is dumped to test1.ll. The representation used for encoding the IR is called LLVM bytecode. The .ll file is the LLVM bytecode for test1.c.

Step 5.3 Invoking your LLVM pass

The opt program is used to run your LLVM pass on the LLVM bytecode. The following command loads the object file of the LLVM pass and invokes the corresponding pass. If our reference LLVM pass is used, this command will print the names of functions in the test program to the standard output.
opt -load ../build/lib/LLVMSG12345.so -SG12345 < test1.ll > /dev/null

That’s it. Now you can go back and modify your pass, test on more complicated programs, read more documentation or just give up on compilers altogether. The choice is yours and the world is your playground! :)