 tobias committed Apr 24, 2017 1 \chapter{Running \exahype\ on some supercomputers}  tobias committed Jul 11, 2017 2 \label{sec:apx-supercomputers}  tobias committed Apr 10, 2017 3 4 5 6 7 8  In this section, we collect some remarks on experiences how to use \exahype\ on particular supercomputers. \section{Hamilton (Durham's local supercomputer)}  tobias committed Apr 24, 2017 9 10 \label{section:supercomputers:Hamilton}  tobias committed Apr 10, 2017 11 12 13 14 15  We have successfully tested \exahype\ with the following modules on Hamilton 7: \begin{code} module load intel/xe_2017.2 module load intelmpi/intel/2017.2  tobias committed Apr 12, 2017 16 module load gcc  tobias committed Apr 10, 2017 17 18 \end{code}  tobias committed Apr 12, 2017 19 20 \noindent Given \exahype's size, it is reasonable to use \texttt{/ddn/data/username} as  tobias committed Apr 12, 2017 21 22 work directory instead of the home. SLURM is used as batch system and appropriate SLURM scripts resemble  tobias committed Apr 12, 2017 23 \begin{code}  tobias committed Apr 21, 2017 24 #!/bin/bash  tobias committed Apr 12, 2017 25 #SBATCH --job-name="ExaHyPE"  tobias committed Apr 12, 2017 26 #SBATCH -o ExaHyPE.%A.out  Dominic Etienne Charrier committed Jun 04, 2017 27 #SBATCH -e ExaHyPE.%A.err  tobias committed Apr 12, 2017 28 29 30 #SBATCH -t 01:00:00 #SBATCH --exclusive #SBATCH -p par7.q  tobias committed Apr 21, 2017 31 32 #SBATCH --nodes=24 #SBATCH --cpus-per-task=6  tobias committed Apr 12, 2017 33 34 #SBATCH --mail-user=tobias.weinzierl@durham.ac.uk #SBATCH --mail-type=ALL  tobias committed Apr 21, 2017 35 36 source /etc/profile.d/modules.sh  tobias committed Apr 12, 2017 37 38 39 40 module load intel/xe_2017.2 module load intelmpi/intel/2017.2 module load gcc  tobias committed Mar 24, 2018 41 42 43 setenv I_MPI_FABRICS "tmi"  Tobias Weinzierl committed Jun 26, 2017 44 export I_MPI_FABRICS="tmi"  tobias committed Apr 21, 2017 45   tobias committed Apr 12, 2017 46 47 48 mpirun ./ExaHyPE-Euler EulerFlow.exahype \end{code}  tobias committed Apr 12, 2017 49 50 51 \noindent For the Euler equations (five unknowns) on the unit square with polynomial order $p=3$, $h=0.001$ is a reasonable start grid as it yields a tree of depth 8.  Tobias Weinzierl committed Jun 26, 2017 52 53 54 55 56 57 58  Hamilton relies on Omnipath. Unfortunately, the default fabric configuration of Intel MPI seems not to work properly for \exahype\ once the problem sizes become big. You have to tell MPI explicitly which driver/fabric to use. Otherwise, your code might deadlock.  Sven K committed Jun 29, 2017 59 One version that seems to work is \texttt{dapl} chosen by  Tobias Weinzierl committed Jun 26, 2017 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 \begin{code} export I_MPI_FABRICS="dapl" \end{code} \noindent While \texttt{dapl} seems to be very robust, we found it slightly slower than \texttt{tmi} as used in the script above. Furthermore, it needs significantly more memory per MPI rank. Therefore, we typically use \texttt{tmi} which however has to be set explicitly via \texttt{export} on Hamilton. One of the big selling points of Omnipath is that it is well-suited for small messages sizes. Compared to other (Infiniband-based) systems, it thus seems to be wise to reduce the package sizes in your \exahype\ specification file. Notably, we often get improved performance once we start to decrease \texttt{buffer-size}.  tobias committed Apr 12, 2017 78   tobias committed Apr 10, 2017 79 80  \section{SuperMUC (Munich's petascale machine)}  tobias committed Apr 24, 2017 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 \label{section:supercomputers:SuperMUC} There are very few pitfalls on SuperMUC that mainly arise from the interplay of IBM's MPI with Intel TBBs as well as new compiler versions. Please load a recently new GCC version (Intel by default uses a too old version) as well as TBBs manually before you compile \begin{code} module load gcc/4.9 module load tbb \end{code} \noindent and remember to do so in your job scripts, too: \begin{code} #!/bin/bash #@ job_type = parallel ##@ job_type = MPICH #@ class = micro #@ node = 1 #@ tasks_per_node = 1 #@ island_count = 1 #@ wall_clock_limit = 24:00:00 #@ energy_policy_tag = ExaHyPE_rulez #@ minimize_time_to_solution = yes #@ job_name = LRZ-test #@ network.MPI = sn_all,not_shared,us #@ output = LRZ-test.out #@ error = LRZ-test.err #@ notification=complete #@ notify_user=tobias.weinzierl@durham.ac.uk #@ queue . /etc/profile . /etc/profile.d/modules.sh module load gcc/4.9 module load tbb \end{code}  tobias committed Apr 10, 2017 117   tobias committed Apr 24, 2017 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 \noindent If you use Intel's TBB in combination with \texttt{poe} or MPI, please ensure that you set \begin{code} export OMP_NUM_THREADS=28 export MP_TASK_AFFINITY=core:28 \end{code} \noindent manually, explicitly and correctly before you launch your application. If you forget to do so, \exahype's TBB launches the correct number of TBB threads as specified in your \exahype\ specification file, but it pins all of these threads to one single core. You will get at most a speedup of two (from the core plus its hyperthread) in this case\footnote{Thanks to Nicolay Hammer from LRZ for identifying this issue.}.  tobias committed Apr 10, 2017 136 137   Tobias Weinzierl committed Oct 02, 2017 138 %\section{Tornado KNL (RSC group prototype)}  tobias committed Apr 10, 2017 139   tobias committed Apr 12, 2017 140 141   Tobias Weinzierl committed Oct 02, 2017 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 \section{Archer's KNL partition (EPCC supercomputer)} \label{section:supercomputers:archer} Archer's default Java version does not meet \exahype's requirements and the Java configuration does not provide the toolkit with enough heap memory (cf.~Section \ref{section:appendix-toolkit:troubleshooting}). Furthermore, we haven't used the Cray tools yet but stick to Intel and there have to load a well-suited GCC version manually: \begin{code} module load java/jdk1.8.0_51 module swap PrgEnv-cray PrgEnv-intel module load gcc \end{code} \noindent To accomodate the toolkit, we use the modified Java invocation: \begin{code} java -XX:MaxHeapSize=512m -jar Toolkit/dist/ExaHyPE.jar \end{code} \noindent For shared memory support, we encountered three issues: \begin{enumerate} \item We should use EPCC's compiler macro \texttt{CC} instead of manually invocations of the compilers. \item The module does not initialise the \texttt{TBB\_SHLIB} variable that we use in our script. So we have to set it manually. \item The default compiler behaviour links all libraries static into the executable. However, the TBB libs are not available in their static variant. To change this behaviour, we had to instruct the linker explicitly to link against shared memory library variants. \end{enumerate} \noindent Overall, these three lines fix the behaviour: \begin{code} export EXAHYPE_CC=CC export TBB_SHLIB="-L/opt/intel/compilers_and_" \ "libraries_2017.0.098/linux/tbb/lib/intel64/gcc4.7 -ltbb" export CRAYPE_LINK_TYPE=dynamic \end{code}  Tobias Weinzierl committed Oct 03, 2017 186 187 188 189 190 191 192 193 194 195 196 197 198 199  \noindent Similar to SuperMUC, we observe that a plain launch of executables through \texttt{aprun} {\em does not allow the codes to exploit shared memory parallelism}. We explicitly have to unlock the cores for the scripts in the run command through \begin{code} aprun -n ... -d coresPerTask ... -cc depth \end{code} where \texttt{-cc} configures the pinning. According to the Archer documentation, this configuration still does not enable hyperthreading. If hyperthreading is required, we have to append \texttt{-j 4} to the invocation, too.  tobias committed Apr 12, 2017 200 201 202 203 204 205 206 207 \section{RWTH Aachen Cluster} We have successfully tested \exahype\ on RWTH Aachen's RZ clusters using MUST. Here, it is important to switch the GCC implementation before you compile, as GCC is by default version 4.8.5 which does not fully implement C++11. \begin{code} module load UNITE must  tobias committed Apr 20, 2017 208 209 210 211 212 213 214  #module unload gcc module unload openmpi module switch intel gcc/5 module load intel openmpi  tobias committed Apr 14, 2017 215 216 217  export SHAREDMEM=none export COMPILER=manual  tobias committed Apr 14, 2017 218 export EXAHYPE_CC="mpiCC -std=c++11 -g3"  tobias committed Apr 14, 2017 219 export COMPILER_CFLAGS="\$FLAGS_FAST"  tobias committed Apr 12, 2017 220 221 \end{code}  tobias committed Apr 14, 2017 222 223 224 225 226 227 \noindent The above setups use the compiler variant \texttt{manual} as RWTH has installed MUST such thatight \texttt{mustrun} automatically throws the executable onto the right cluster. To create a binary that is compatible with this cluster, the flags from \texttt{FLAST\_FAST} are to be used.  tobias committed Apr 12, 2017 228   Tobias Weinzierl committed Nov 28, 2017 229 230 231 232 233 234 235 236 237 238 239 240 241  \section{CoolMUC 3} LRZ's KNL system CoolMUC 3 drives Omnipath as well. Therefore, ensure that you set the MPI fabric properly as soon as you use more than one node. Otherwise, \exahype\ will deadlock: \begin{code} export I_MPI_FABRICS="tmi" \end{code}  Luke Bovard committed Nov 21, 2018 242 243 244 245 246 247 248 249 250 251 \section{Hazelhen (Cray)} Cray may configure the intel compiler to link in all libraries statically but TBB by default is not built statically so add the following to the \texttt{TBB\_SHLIB} \begin{code} -dynamic -ltbb \end{code} i.e. before the link command.  Tobias Weinzierl committed Nov 28, 2017 252 253 254   Sven K committed Jun 29, 2017 255 256 257 258 259 260 261 262 263 264 265 266 267 268 \section{Frankfurt machines} For the machines \begin{itemize} \item Generic Ubuntu laptop \item Iboga \item LOEWE \item FUCHS \item Hamilton \item SuperMUC \end{itemize} please see the configuration settings in the \texttt{ClusterConfigs} directory in the \texttt{Miscellaneous} in the main repository.  Tobias Weinzierl committed Oct 02, 2017 269