Commit bddd6a78 authored by Tobias Weinzierl's avatar Tobias Weinzierl

First draft of parts of the guidebook

parent c31312f2
\teaMPI\ is an open source library built with C++. It plugs into MPI via
MPI's PMPI interface plus provides an additional interface for advanced
advanced task-based parallelism on distributed memory architectures.
Its research vision reads as follows:
\item {\bf Intelligent task-based load balancing}. Applications can hand over
tasks to \teaMPI. These tasks have to be ready, i.e.~without incoming
dependencies, and both their input and their output have to be serialisable.
It is now up to \teaMPI\ to decide whether a task is injected into the local runtime or temporarily moved
to another rank, where we compute it and then bring back its result.
\item {\bf MPI idle time load metrics}. \teaMPI\ can plug into some MPI
calls---hidden from the application---and measure how long this MPI call
idles, i.e.~waits for incoming messages. It provides lightweight
synchronisation mechanisms between MPI ranks such that MPI ranks can globally
identify ranks that are very busy and ranks which tend to wait for MPI
messages. Such data can be used to guide load balancing. Within \teaMPI, it
can be used to instruct the task-based load balancing how to move data around.
\item {\bf Black-box replication}. By hijacking MPI calls, \teaMPI\ can split
up the global number $N$ of ranks into $T$ teams of equal size. Each team
assumes that there are only $N/T$ ranks in the system and thus runs completely
independent of the other teams. This splitting is completely hidden from teh
application. \teaMPI\ however provides a heartbeat mechanism which identifies
if one team becomes slower and slower. This can be used as a guideline for
resiliency---assuming that failures in ranks that will eventually fail first
manifest in a speed deterioration of their team.
\item {\bf Replication task sharing}. \teaMPI\ for teams can identify tasks
that are replicated in different teams. Everytime the library detects that a
task has been computed that is replicated on another team and handed out to
\teaMPI, it can take the task's outcome, copy it over to the other team, and
cancel the task execution there. This reduces the overhead cost of
resiliency via replication (teams) massively.
\item {\bf Smart progression and smart caching}. If a SmartNIC (Mellanox
Bluefield) is available to a node, \teaMPI\ can run a dedicated helper code on the SmartNIC which
polls MPI all the time. For dedicated MPI messages (tasks), it can hijack
MPI. If a rank A sends data to a rank B, the MPI send is actually deployed to
the SmartNIC which fetches the data directly from A's memory (RDMA). It is in
turn caches on rank B from where it directly deployed into the memory of B if
B has issues a non-blocking receive. Otherwise, it is at least available on
the SmartNIC where we cache it once it is requrested.
\item {\bf Smart snif}. Realtime-guided load balancing in HPC typically
suffers from the fact that the load balancing is unable to distinguish
illbalancing from network congestion. As a result, we can construct situations
where a congested network suggests to the load balancing that ranks were idle,
and the load balancing consequently starts to move data around. As a result,
the network experiences even more stress and we enter a feedback cycle. With
SmartNICs, wee can deploy heartbeats to the network device and distinguish
network problems from illbalancing---eventually enabling smarter timing-based
load balancing.
\item {\bf Smart balancing}. With SmartNICs, \teaMPI\ can outsource its
task-based node balancing completely to the network. All distribution
decisions and data movements are championed by the network card rather than
the main CPU.
\item {\bf Smart replication}. With SmartNICs, \teaMPI\ can outsource its
replication functionality including the task distribution and replication to
the network.
\section*{History and literature}
\teaMPI\ has been started as MScR project by Benjamin Hazelwood under the
supervision of Tobias Weinzierl.
After that, it has been significantly extended
and rewritten by Philipp Samfass as parts of his PhD thesis.
The two core papers describing the research behind the library are
\item Philipp Samfass, Tobias Weinzierl, Benjamin Hazelwood, Michael
Bader: \emph{TeaMPI -- Replication-based Resilience without the (Performance)
Pain} (published at ISC 2020) \url{}
\item Philipp Samfass, Tobias Weinzierl, Dominic E. Charrier, Michael Bader:
\emph{Lightweight Task Offloading Exploiting MPI Wait Times for Parallel
Adaptive Mesh Refinement} (CPE 2020; in press)
\section*{Dependencies and prerequisites}
\teaMPI's core is plain C++17 code.
We however use a whole set of tools around it:
\item GNU autotools (automake) to set up the system (required).
\item C++17-compatible C++ compiler (required).
\item MPI 3. MPI's multithreaded support and non-blocking collectives
\item Intel's Threading Building Blocks (TBB) or OpenMP 4.5 (required).
\item Doxygen if you want to create HTML pages of PDFs of the in-code
\section*{Who should read this document}
This guidebook is written for users of \teaMPI, and for people who want to
extend it.
The text is thus organised into three parts:
First, we quickly describe how to build, install and use \teaMPI.
Second, we describe the vision and rationale behind the software as well as its
application scenarios.
Third, we describe implementation specifica.
Philipp Samfass,
Tobias Weinzierl
\ No newline at end of file
This diff is collapsed.
\chapter{Using \teaMPI}
\chapter{Setting up a developer versino of \teaMPI}
\part{Building, installing and using \teaMPI}
\part{Use cases}
% \part{Large-scale \Peano\ applications}
% \newpage
% \input{60_exahype}
% \newpage
\usepackage[top=3cm,bottom=3cm,left=3cm,right=3cm,headsep=10pt,a4paper]{geometry} % Page margins
%\usepackage{enumitem} % Customize lists
% \setlist{nolistsep} % Reduce spacing between bullet points and numbered lists
% \usepackage{booktabs} % Required for nicer horizontal rules in tables
% \usepackage{marvosym}
\usepackage{xcolor} % Required for specifying colors by name
\definecolor{black}{rgb}{0.1,0.1,0.1} % Define the orange color used for
\definecolor{grey}{rgb}{0.7,0.7,0.9} % Define the orange color used for
\definecolor{darkgrey}{rgb}{0.1,0.1,0.1} % Define the orange color used for
\definecolor{ocre}{RGB}{243,102,25} % Define the orange color used for highlighting throughout the book
\definecolor{green}{RGB}{25,243,102} % Define the orange color used for
% highlighting throughout the book \definecolor{aureolin}{rgb}{0.99, 0.93, 0.0} % Define the yellow color used for
% \definecolor{airforceblue}{rgb}{0.36, 0.54, 0.66}% Define the blue color used for
\parskip 1.5ex % paragraph spacing
\newenvironment{remark}{\par\vspace{10pt}\small % Vertical white space above the remark and smaller font size
\leftmargin=35pt % Indentation on the left
\rightmargin=25pt}\item\ignorespaces % Indentation on the right
width=1pt,circle,fill=green!25,font=\sffamily\bfseries,inner sep=2pt,outer
sep=0pt] at (-15pt,0pt){\textcolor{black}{R}};\end{tikzpicture}} % Orange R in
% a circle
\advance\baselineskip -1pt}{\end{list}\vskip5pt} % Tighter line spacing and white space after remark
\leftmargin=35pt % Indentation on the left
\rightmargin=25pt}\item\ignorespaces % Indentation on the right
width=1pt,circle,fill=ocre!25,font=\sffamily\bfseries,inner sep=2pt,outer sep=0pt] at (-15pt,0pt){\textcolor{ocre}{D}};\end{tikzpicture}} % Orange R in a circle
\advance\baselineskip -1pt}{\end{list}\vskip5pt} % Tighter line spacing and white space after remark
\hypersetup{hidelinks,backref=true,pagebackref=true,hyperindex=true,colorlinks=false,breaklinks=true,urlcolor= ocre,bookmarks=true,bookmarksopen=false,pdftitle={Title},pdfauthor={Author}}
pdftitle={teaMPI Documentation},
pdfauthor={Tobias Weinzierl},
pdfsubject={teaMPI Documentation}
% basicstyle=\tiny,
% basicstyle=\Small,
framexleftmargin=1mm, framextopmargin=1mm, frame=shadowbox,
teaMPI Documentation
Philipp Samfass, Dr.~rer.~nat.~habil.~Tobias Weinzierl
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment