SHMEMmer Wanna Be

My Journey To Understanding and Using SHMEM:

Lesson 4: Combining Tools

List of commands learned so far.
With our new stash of tools lets have a bit of fun and combine a few before adding more tools to our arsenal.

For those of you who are internally geeks and love pi day, we are going to venture approximating pi via monte carlo.
Monte Carlo - a fancy term that just means sample over and over and over again until you get a stable approximation.
There are two common methods of approximating pi with Monte Carlo:
  1. approximate the area in a quarter of a circle, the area under the curve of \(\sqrt{1-x^2}\)
  2. approximate the function: \[f(x) = 1-x^2\]


We are going to focus on the first method. The equation of a circle with radius of 1 is
\[x^2 + y^2 = 1\]
So, if we sample a number randomly (x,y) from {[0,1], [0,1]}, then the probability of it being in the quarter of the circle is:
\[ \frac{\text{area of quarter of the circle}}{\text{area of square}} = \frac{\pi/4}{1} = \pi/4 \]
So, if we take a ton of samples from the set {[0,1], [0,1]}, then the above can be approximated by
\[ \frac{\text{# of samples in the quarter circle}}{\text{total # of samples}} \]

Approximating pi with one PE:

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <math.h>
#include <inttypes.h>

#define NUM 100000

double circle(double x){
	return 1 - pow(x,2);
}

int main(){

    static uint32_t count = 0;
    double f_x;

    //seed the randomizer
    srand(time(0));
    for (int i = 0; i < NUM; i++){
        //generate point(x,y) in first quadrant
        //note: double is a 64 bit floating point
        double x = (double) rand() / (double) RAND_MAX;
        double y = (double) rand() / (double) RAND_MAX;
        f_x = circle(x);
        if (pow(y,2) <= f_x){
            count += 1;
        }
    }
    
    printf("count total: %d\n", count);
    printf("est of pi: %f\n", (double)count/((double)NUM )*4);
    return 0;
}

But! The cool thing about monte carlo approximations, is that the process is embarassingly parallel. There is no point of just having one PE do all the work and get tired out while other PEs are nearby.

Approximating pi with N PEs:

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <math.h>
#include <inttypes.h>

#include <shmem.h>

#define NUM 100000

double circle(double x){
	return 1 - pow(x,2);
}

int main(){
    int me, npes;
    shmem_init();
    me = shmem_my_pe();
    npes = shmem_n_pes();

    static uint32_t count = 0;
    double f_x;

    //seed the randomizer
    srand(time(0)+me);
    for (int i = 0; i < NUM; i++){
        //generate point(x,y) in first quadrant
        //note: double is a 64 bit floating point
        double x = (double) rand() / (double) RAND_MAX;
        double y = (double) rand() / (double) RAND_MAX;
        f_x = circle(x);
        if (pow(y,2) <= f_x){
            count += 1;
        }
    }

    printf("%d: count %d\n", me, count);
    shmem_barrier_all(); //making sure all of the counts have been done
    if (me == 0){
        for (int pe = 1; pe < npes; pe++){
            uint32_t toadd = shmem_int_g(&count, pe);
            printf("to add: %d\n", toadd);
            count += toadd;
        }
        printf("%d: count total: %d\n", me, count);
        printf("%d: ratio: %f\n", me, (double)count/((double)NUM * npes));
        printf("%d: est of pi: %f\n", me, (double)count/((double)NUM * npes)*4);
    }
    return 0;
}


Hold on you might say, there has to be an easier way than just grabbing every single count from all the PEs?
Well, there is!!! And that is the atomic_add.

(See the next lesson)