Symbolic flux calculations reduce speed significantly
I just completed some Likwid performance measurements for the generic and optimised kernels. For the optimised kernels, I tested a variant using "symbolic variables" in the flux and eigenvalue computation and another variant using classic array indexing (optimised-nonsymbolic). The files are suffixed by a ".likwid.csv". I further attached measured Peano adapter times The files are suffixed by a ".csv".
Setup
-
Compressible Euler equations (Euler_Flow)
-
pure ADER-DG scheme (no limiter)
-
polynomial orders p=3,5,7,9; regular 27^3 grid (3D)
-
TBB threads=1,12,24.
-
Intel icpc17 (USE_IPO=on).
-
nonfused (3 algorithmic phases) vs. fused (a single pipelined algorithmic phase) ADER-DG implementation
-
no predictor reruns did occur for the fused implementation
Preliminary Results
-
Optimised kernels are faster than the generic ones (I kind of expected this
😉 ) -
Raw array access (optimised-nonsymbolic) is significantly faster than using the "symbolic variables"(optimised).
-
Fused scheme pays off (as long as number of reruns is low; very interesting for linear PDEs (no reruns here))
Files
Euler_ADERDG-no-output-generic.csv
Euler_ADERDG-no-output-generic.likwid.csv
Euler_ADERDG-no-output-optimised.csv
Euler_ADERDG-no-output-optimised-nonsymbolic.csv