Lesson 3: An Introduction to Barriers
Hold up!!! I need that tool! Many times before a processor can start a task, it must be ensured that it has the correct data. For instance, if a processor was to add A=3 and B=4, but another processor decided to change B to 1, the resultant sum would be wrong. OR if a processor was to add up all of the values given to it by the other processors, it would be nice to know that it actually received all of the new values. In both of these scenarios, a barrier of some sort would be beneficial.In shmem, there are a few types of barriers useful for a variety of occasions.
Barriers/Synchronization across all or a subset of PEs:
- shmem_barrier_all
- shmem_barrier
- shmem_sync_all
- shmem_sync
- shmem_quiet
- shmem_wait_until
1. shmem_barrier_all()
- "registers the arrival of a PE at a barrier and blocks the PE until all other PEs arrive at the barrier" (documentation)
- KEY: blocks the PE until ALL local updates and remote memory updates are complete
- essentially, it is a means to synchronize all of the PEs and to ensure all data stores are done
2. shmem_barrier(int PE_start, int logPE_stride, int PE_size, long *pSync)
- This barrier does the same thing as shmem_barrier_all, but only for a subset of PEs.
- Example of useage: Suppose you only want the odd PEs (1,3,5,7,9) to be synced and the even
PEs not equal to 0 (2,4,6,8) to be synced. Then,
PE_start_odd = 1 PE_start_even = 2 logPE_stride_odd = log_2(2) = 1 //striding by 2 PE_size_odd = 10/2 = 5 PE_size_even = (10-2)/2 + 10%2 = 4
- pSync may seem to be a bit strange. Essentially, it is an array that has to be
initialized the same for each PE and its role is to keep track of the sync status of the PEs.
Since we are wanting two different barriers to be running at once, we will create two
of these:
- pSync_odd
- pSync_even
//barrier.c
#include <stdio.h>
#include <shmem.h>
int main()
{
//creating pSync arrays for both groups
static long pSync_odd[SHMEM_BARRIER_SYNC_SIZE];
static long pSync_even[SHMEM_BARRIER_SYNC_SIZE];
//initializing pSync arrays
for (int i = 0; i < SHMEM_BARRIER_SYNC_SIZE; i++){
pSync_odd[i] = SHMEM_SYNC_VALUE;
pSync_even[i] = SHMEM_SYNC_VALUE;
}
shmem_init();
int my_pe = shmem_my_pe();
int num_pes = shmem_n_pes();
//symmetric space
static int num_of_scalpel_injuries;
static int num_of_ice_cream_scoop_injuries;
//if odd:
if (my_pe%2 != 0){
num_of_scalpel_injuries = my_pe-1;
//sync all odd to make sure everyone has written down their answer
//to the number of scalpel injuries survey
shmem_barrier(1, 1, (num_pes/2), pSync_odd);
int num_of_neighbor = shmem_int_g(&num_of_scalpel_injuries, (my_pe+2)%10);
printf("%d: I personally had only %d scalpel injuries in my line of \
work whereas my odd neighbor had %d.\n",
my_pe, num_of_scalpel_injuries, num_of_neighbor);
}
//if even and not 0
else if (my_pe != 0){
num_of_ice_cream_scoop_injuries = my_pe;
//sync all even to make sure everyone has written down their answer
//to the number of ice cream scoop injuries survey
shmem_barrier(2, 1, ((num_pes-2)/2 + num_pes % 2), pSync_odd);
int num_of_neighbor;
if (my_pe == 2){
num_of_neighbor = shmem_int_g(&num_of_ice_cream_scoop_injuries,8);
} else {
num_of_neighbor = shmem_int_g(&num_of_ice_cream_scoop_injuries, my_pe-2);
}
printf("%d: Yay! I am not the only one who hurts themselves with an \
ice cream scoop! My neighbor has done it %d times.\n",
my_pe, num_of_neighbor);
}
shmem_finalize();
return 0;
}
To compile:
oshcc barrier.c -o barrier
To run:
oshrun -np 10 barrier
The above code splits the 10 PEs into two groups, as afore mentioned: (1,3,5,7,9) and (2,4,6,8). Suppose the odd group wants to know how many scalpel injuries their neighbor (to the right) had. Now, lets be honest, we all accidentally cut our lab partners in Biology class.
Whereas, the even group is a bit more civilized and they just want to know if their neighbor (to the left, excluding 0) has ever hurt themselves with an ice cream scoop (I may have just done this . . .).
To note:
- despite the different sizes of the two groups (4 v 5), the psync array is still the same default size SHMEM_BARRIER_SYNC_SIZE.
- This initialization can either be done before or after shmem_init.
- The shmem_g function requires the source to be static.
3. shmem_sync_all()
- syncs all the PEs without ensuring remote storage is complete.
- KEY: does not ensure remote storage is complete -- I know this was just stated above, but this is crucial to both note and understand. You might be wondering, why would I ever want to use it then? Well, because it does not require remote storage to be completed, the synchronization will not take as long. And the PEs can then continue on with their tasks.
4. shmem_sync(int PE_start, int logPE_stride, int PE_size, long *pSync)
- Does the same thing as shmem_sync_all, but with a subset of PEs -- analogous to shmem_barrier_all and shmem_barrier. That being said, to see an example of the arguments in play, see shmem_barrier.
- Args:
- arg0 - PE_start: the PE number to start with
- arg1 - logPE_stride: log_2(stride). That is, if you are striding the PEs by 2, logPE_stride=1, if by 4, logPE_stride=2, etc.
- arg2 - PE_size: the number of PEs that will be in the synced group
- arg3 - *pSync: a pointer to an array of size SHMEM_BARRIER_SYNC_SIZE that all have the default values SHMEM_SYNC_VALUE
