README.md 3.12 KB
Newer Older
1
# teaMPI: A team based PMPI wrapper for MPI #
2
3
4

### What is this repository for? ###

5
This wrapper adds functionality to MPI enabling replication and team based execution using the PMPI features of MPI.  
6

7
The library re-defines the weak symbols declared by MPI. This means the replication is transparent to the application.  
Ben Hazelwood's avatar
Ben Hazelwood committed
8

Ben Hazelwood's avatar
Ben Hazelwood committed
9
Replication is achieved by splitting `MPI_COMM_WORLD` into a fixed number of "team" communicators.   
10
Each team then sends messages completely independent of other teams.   
11

12
13
Unlike other similar libraries, the replicas are free to proceed completely asynchronously from other replicas.  
They communicate progress and error information via a "heartbeat". A heartbeat is inserted into user code via   
Ben Hazelwood's avatar
Ben Hazelwood committed
14
an `MPI_Sendrecv(...,MPI_COMM_SELF)` call. At each heartbeat, the replicas mark this point on a timeline and   
15
16
compare there progress against the other replicas. If a buffer is provided to the heartbeat then the hashcode   
is also compared with the other replicas. 
17

18
### How do I get set up? ##
19
To build the library:  
Ben Hazelwood's avatar
Ben Hazelwood committed
20
21
1. Run `make` in the lib directory  
2. set the number of teams with the `TEAMS` environment variable (default: 2)  
22

23
To use some example provided miniapps:  
Ben Hazelwood's avatar
Ben Hazelwood committed
24
1. run `make` in the applications folder  
25
2. run each application in the bin folder with the required command line parameters (documented in each application folder)  
26

27
To use with an existing application:  
28
29
1. Link with `-ltmpi -L"path to teaMPI"`   
2. Add "path to teaMPI" to `LD_LIBRARY_PATH`   
30

31
### Example Heartbeat Usage ###
Ben Hazelwood's avatar
Ben Hazelwood committed
32
This application models many scientific applications. Per loop, the two `MPI_Sendrecv` calls act as heartbeats.   
33
34
35
36
The first starts the timer for this rank and the second stops it. Additionally the second heartbeat passes   
the data buffer for comparison with other teams. Only a hash of the data is sent.  

At the end of the application, the heartbeat times will be written to CSV files.
Ben Hazelwood's avatar
Ben Hazelwood committed
37
38
39
40
  
  
  
```C++
Ben Hazelwood's avatar
Ben Hazelwood committed
41
double data[SIZE];
42
43
44
45
46
for (int t = 0; t < NUM_TRIALS; t++)
{
    MPI_Barrier(MPI_COMM_WORLD);

    // Start Heartbeat
Philipp Samfass's avatar
Philipp Samfass committed
47
    MPI_Sendrecv(MPI_IN_PLACE, 0, MPI_BYTE, MPI_PROC_NULL, 1, MPI_IN_PLACE, 0, 
Ben Hazelwood's avatar
Ben Hazelwood committed
48
        MPI_BYTE, MPI_PROC_NULL, 0, MPI_COMM_SELF, MPI_STATUS_IGNORE);
49
50
51
52
53
54

    for (int i = 0; i < NUM_COMPUTATIONS; i++) {
        // Arbitrary computation on data
    }

    // End Heartbeat and compare data
Philipp Samfass's avatar
Philipp Samfass committed
55
    MPI_Sendrecv(data, SIZE, MPI_DOUBLE, MPI_PROC_NULL, -1, MPI_IN_PLACE, 0, 
Ben Hazelwood's avatar
Ben Hazelwood committed
56
        MPI_BYTE, MPI_PROC_NULL, 0, MPI_COMM_SELF, MPI_STATUS_IGNORE);
57
58
59
60
61

    MPI_Barrier(MPI_COMM_WORLD);
}
```

Ben Hazelwood's avatar
Ben Hazelwood committed
62
63
64
65
66
67
68
### What if I want to communicate between teams myself? ###
To get access to the original MPI function, prefix it with a P. For example, MPI_Send becomes PMPI_Send. 

If you wish to map between original rank numbers, know how many teams there are, perform a global barrier, etc. take a look at the Rank.h file. 
It has access to all the data internal to teaMPI. To use the functions declared in it, simply `#include "Rank.h"` and add the path to `teampi/lib` to the compilation `-I` flags.  


69
### Who do I talk to? ###
Ben Hazelwood's avatar
Ben Hazelwood committed
70
Ben Hazelwood (benjamin.hazelwood@durham.ac.uk)
Tobias Weinzierl's avatar
Tobias Weinzierl committed
71
Tobias Weinzierl (tobias.weinzierl@durham.ac.uk)
Philipp Samfass's avatar
Philipp Samfass committed
72
Philipp Samfass (samfass@in.tum.de)