Skip to main content

Section 4.2 OpenMP

Unit 4.2.1 Of cores and threads

From Wikipedia: “A multi-core processor is a single computing component with two or more independent processing units called cores, which read and execute program instructions.” The idea is: What will give you your solution faster than pp using one core? Use multiple cores!

nn A thread is a thread of execution.

  • It is an stream of program instructions. All these instructions may execute on a single core, multiple cores, or they may move from core to core.

  • Multiple threads can execute on different cores or on the same core.

Unit 4.2.2 Basics

A place to read up on OpenMP is again, where else, Wikipedia: https://en.wikipedia.org/wiki/OpenMP. OpenMP is a standardized API (Application Programming Interface) for creating multiple threads of execution from a single program. It consists of

  • Directives to the compiler (pragmas):

    #pragma omp parallel
    

  • A library of routines:

    omp_get_max_threads()
    omp_get_num_threads()
    

    that can, for example, be used to inquire about the execution environment.

  • Environment parameters that can be set

    export OMP_NUM_THREADS=4
    

Unit 4.2.3 Hello World!

We will illustrate some of the basics of OpenMP via the old standby, the "Hello World!" program.

In Week4/C/ look at the contents of file HelloWorld.c. Compile it with the command

gcc -o HelloWorld.x HelloWorld.c

and execute it with

./HelloWorld.x

Copy the file HelloWorld.c to HelloWorld1.c. Modify it to add the OpenMP header file:

#include "omp.h"

Compile it with the command

gcc -o HelloWorld1.x HelloWorld1.c

and execute it with

./HelloWorld1.x

What do you notice?

Next, on the command line, set the environment parameter

export OMP_NUM_THREADS=4

and execute again with

./HelloWorld1.x

What do you notice?

Finally, recompile and execute with

gcc -o HelloWorld1.x HelloWorld1.c -fopenmp \\
./HelloWorld1.x

Pay attention to the -fopenmp, which links the OpenMP library. What do you notice?

You are now running identical programs on four threads, each of which is printing out a message. The problem is that they don't collaborate on a useful computation. We'll get to that later.

Next, we introduce three routines with which we can extract information about the environment in which the program executes and information about a specific thread of execution:

  • omp_get_max_threads() returns the maximum number of threads that are available for computation. omp_get_num_threads() equals the number of threads in the current team: The total number of threads that are available may be broken up into teams that perform separate tasks.

  • omp_get_thread_num() returns the index that uniquely identifies the thread that calls this function, among the threads in the current team. This index ranges from 0 to one less than the number returned by omp_get_num_threads(). In other words, the numbering of the threads starts at zero.

Copy the file HelloWorld1.c to HelloWorld2.c. Modify the body of the main routine to

\lstinputlisting{Week4/HelloWorld2Snippet.c}

Compile it and execute it:

gcc -o HelloWorld2.x HelloWorld2.c 
./HelloWorld2.x

What do you notice?

In the last exercise, there are four threads available for execution (since OMP_NUM_THREADS equals \(4 \)), but only one thread is executing (the team only has one thread). The index of that only thread is \(1 \text{.}\)

Copy the file HelloWorld2.c to HelloWorld3.c. Add the compiler directive

#pragma omp parallel

immediately before the print statement. Compile it and execute it:

gcc -o HelloWorld3.x HelloWorld3.c
./HelloWorld3.x

What do you notice?

Next, replace the entire body of main with

\lstinputlisting{Week4/HelloWorld3Snippet.c}

Compile and execute again. What do you notice?

This last exercise illustrates the difference between the total set of threads that is available and the team of threads that a given thread is a member of, and the index of the thread within that team. It also illustrates that you need to be careful about the scope of a ``parallel section.''

Often, what work a specific thread performance is determined by its index within a team.