Section 4.2 OpenMP
¶Unit 4.2.1 Of cores and threads
¶From Wikipedia: “A multi-core processor is a single computing component with two or more independent processing units called cores, which read and execute program instructions.” The idea is: What will give you your solution faster than pp using one core? Use multiple cores!
nn A thread is a thread of execution.
It is an stream of program instructions. All these instructions may execute on a single core, multiple cores, or they may move from core to core.
Multiple threads can execute on different cores or on the same core.
Unit 4.2.2 Basics
¶A place to read up on OpenMP is again, where else, Wikipedia: https://en.wikipedia.org/wiki/OpenMP
. OpenMP is a standardized API (Application Programming Interface) for creating multiple threads of execution from a single program. It consists of
Directives to the compiler (pragmas):
#pragma omp parallel
A library of routines:
omp_get_max_threads() omp_get_num_threads()
that can, for example, be used to inquire about the execution environment.Environment parameters that can be set
export OMP_NUM_THREADS=4
Unit 4.2.3 Hello World!
¶We will illustrate some of the basics of OpenMP via the old standby, the "Hello World!" program.
In Week4/C/ look at the contents of file HelloWorld.c. Compile it with the command
gcc -o HelloWorld.x HelloWorld.cand execute it with
./HelloWorld.x
Homework 4.2.2
Copy the file HelloWorld.c to HelloWorld1.c. Modify it to add the OpenMP header file:
#include "omp.h"Compile it with the command
gcc -o HelloWorld1.x HelloWorld1.cand execute it with
./HelloWorld1.xWhat do you notice?
Next, on the command line, set the environment parameter
export OMP_NUM_THREADS=4and execute again with
./HelloWorld1.xWhat do you notice?
Finally, recompile and execute with
gcc -o HelloWorld1.x HelloWorld1.c -fopenmp \\ ./HelloWorld1.xPay attention to the -fopenmp, which links the OpenMP library. What do you notice?
You are now running identical programs on four threads, each of which is printing out a message. The problem is that they don't collaborate on a useful computation. We'll get to that later.
Next, we introduce three routines with which we can extract information about the environment in which the program executes and information about a specific thread of execution:
omp_get_max_threads() returns the maximum number of threads that are available for computation. omp_get_num_threads() equals the number of threads in the current team: The total number of threads that are available may be broken up into teams that perform separate tasks.
omp_get_thread_num() returns the index that uniquely identifies the thread that calls this function, among the threads in the current team. This index ranges from 0 to one less than the number returned by omp_get_num_threads(). In other words, the numbering of the threads starts at zero.
Homework 4.2.3
Copy the file HelloWorld1.c to HelloWorld2.c. Modify the body of the main routine to
\lstinputlisting{Week4/HelloWorld2Snippet.c}Compile it and execute it:
gcc -o HelloWorld2.x HelloWorld2.c ./HelloWorld2.xWhat do you notice?
In the last exercise, there are four threads available for execution (since OMP_NUM_THREADS equals \(4 \)), but only one thread is executing (the team only has one thread). The index of that only thread is \(1 \text{.}\)
Homework 4.2.4
Copy the file HelloWorld2.c to HelloWorld3.c. Add the compiler directive
#pragma omp parallelimmediately before the print statement. Compile it and execute it:
gcc -o HelloWorld3.x HelloWorld3.c ./HelloWorld3.xWhat do you notice?
Next, replace the entire body of main with
\lstinputlisting{Week4/HelloWorld3Snippet.c}Compile and execute again. What do you notice?
This last exercise illustrates the difference between the total set of threads that is available and the team of threads that a given thread is a member of, and the index of the thread within that team. It also illustrates that you need to be careful about the scope of a ``parallel section.''
Often, what work a specific thread performance is determined by its index within a team.