Reference – Introduction to the Message Passing Interface

Key Points

Introduction to Parallelism	Processes do not share memory and can reside on the same or different computers. Threads share memory and reside in a process on the same computer. MPI is an example of multiprocess programming whereas OpenMP is an example of multithreaded programming. Algorithms can have both parallelisable and non-parallelisable sections. There are two major parallelisation paradigms; data parallelism and message passing. MPI implements the Message Passing paradigm, and OpenMP implements data parallelism.
Introduction to the Message Passing Interface	The MPI standards define the syntax and semantics of a library of routines used for message passing. By default, the order in which operations are run between parallel MPI processes is arbitrary.
Communicating Data in MPI	Data is sent between ranks using “messages” Messages can either block the program or be sent/received asynchronously Knowing the exact amount of data you are sending is required
Point-to-Point Communication	Use `MPI_Send` and `MPI_Recv` to send and receive data between ranks Using `MPI_SSend` will always block the sending rank until the message is received Using `MPI_Send` may block the sending rank until the message is received, depending on whether the message is buffered and the buffer is available for reuse Using `MPI_Recv` will always block the receiving rank until the message is received
Collective Communication	Using point-to-point communication to send/receive data to/from all ranks is inefficient It’s far more efficient to send/receive data to/from multiple ranks by using collective operations
Non-blocking Communication	Non-blocking communication often leads to performance improvements compared to blocking communication However, it is usually more difficult to use non-blocking communication Most blocking communication operations have a non-blocking variant We have to wait for a communication to finish using `MPI_Wait()` (or `MPI_Test()`) otherwise we will encounter strange behaviour
Advanced Communication Techniques	Communicating complex, heterogeneous or non-contiguous data structures in MPI requires a bit more work Any data being transferred should be a single contiguous block of memory By defining derived datatypes, we can easily send data which is not contiguous The functions `MPI_Pack` and `MPI_Unpack` can be used to manually create a contiguous memory block of data
Common Communication Patterns	There are many ways to communicate data, which we need to think about carefully It’s better to use collective operations, rather than implementing similar behaviour yourself
Porting Serial Code to MPI	Start from a working serial code Write a parallel implementation for each function or parallel region Connect the parallel regions with a minimal amount of communication Continuously compare the developing parallel code with the working serial code
Optimising MPI Applications	We can use Amdahl’s Law to identify the theoretical limit in what parallelisation can achieve for performance Strong scaling is defined as how the solution time varies with the number of processors for a fixed total problem size We can use Gustafson’s Law to calculate relative speedup which takes into account increasing problem sizes Weak scaling is defined as how the solution time varies with the number of processors for a fixed problem size per processor Use a profiler to profile code to understand its performance issues before optimising it Ensure code is tested after optimisation to ensure its functional behaviour is still correct
Survey

Glossary

The glossary would go here, formatted as:

{:auto_ids}
key word 1
:   explanation 1

key word 2
:   explanation 2

({:auto_ids} is needed at the start so that Jekyll will automatically generate a unique ID for each item to allow other pages to hyperlink to specific glossary entries.) This renders as:

key word 1: explanation 1
key word 2: explanation 2

Introduction to the Message Passing Interface: Reference

Key Points

Glossary