ExaHyPE-Engine issueshttps://gitlab.lrz.de/exahype/ExaHyPE-Engine/-/issues2017-10-11T11:50:37+02:00https://gitlab.lrz.de/exahype/ExaHyPE-Engine/-/issues/188Global Reduction of Time Stepping Data2017-10-11T11:50:37+02:00Ghost UserGlobal Reduction of Time Stepping DataI currently do the following to reduce and broadcast global
values, like e.g. the minimum time step size:
* I broadcast time step data from master to worker
all the way down to the "lowest" worker.
* I reduce time step data from worker...I currently do the following to reduce and broadcast global
values, like e.g. the minimum time step size:
* I broadcast time step data from master to worker
all the way down to the "lowest" worker.
* I reduce time step data from worker to master
all the way up to the global master rank.
I could use a simple MPI_Reduce and a simple MPI_Gather to
perform the above steps.~~
Postponed.https://gitlab.lrz.de/exahype/ExaHyPE-Engine/-/issues/191Pass std::vectors to patchwise functions2017-10-13T20:55:05+02:00Ghost UserPass std::vectors to patchwise functions- It's safer - size is available e.g.
- Signatures are clearer. User kernels still get the raw pointers.
- We can just do vector.data() to pass a pointer to the existing kernels.- It's safer - size is available e.g.
- Signatures are clearer. User kernels still get the raw pointers.
- We can just do vector.data() to pass a pointer to the existing kernels.https://gitlab.lrz.de/exahype/ExaHyPE-Engine/-/issues/175Symbolic flux calculations reduce speed significantly2019-03-21T10:56:35+01:00Ghost UserSymbolic flux calculations reduce speed significantlyI just completed some Likwid performance measurements for the generic and optimised kernels.
For the optimised kernels, I tested a variant using "symbolic variables" in the
flux and eigenvalue computation and another variant using
classi...I just completed some Likwid performance measurements for the generic and optimised kernels.
For the optimised kernels, I tested a variant using "symbolic variables" in the
flux and eigenvalue computation and another variant using
classic array indexing (optimised-nonsymbolic).
The files are suffixed by a ".likwid.csv".
I further attached measured Peano adapter times
The files are suffixed by a ".csv".
Setup
--------
* Compressible Euler equations (Euler_Flow)
* pure ADER-DG scheme (no limiter)
* polynomial orders p=3,5,7,9;
regular 27^3 grid (3D)
* TBB threads=1,12,24.
* Intel icpc17 (USE_IPO=on).
* nonfused (3 algorithmic phases) vs. fused (a single pipelined algorithmic phase) ADER-DG implementation
* no predictor reruns did occur for the fused implementation
Preliminary Results
----------------------------
* Optimised kernels are faster than the generic ones (I kind of expected this 😉)
* Raw array access (optimised-nonsymbolic) is significantly faster than using the "symbolic variables"(optimised).
* Fused scheme pays off (as long as number of reruns is low; very interesting for linear PDEs (no reruns here))
Files
-----
[Euler_ADERDG-no-output-generic.csv](/uploads/f671c93a33e70c9549853f1f51518c9d/Euler_ADERDG-no-output-generic.csv)
[Euler_ADERDG-no-output-generic.likwid.csv](/uploads/0d52ad214f0b1d6cfae4d658a9997cb5/Euler_ADERDG-no-output-generic.likwid.csv)
[Euler_ADERDG-no-output-optimised.csv](/uploads/422288cdc222b1b250b3aad9e2ad73a1/Euler_ADERDG-no-output-optimised.csv)
[Euler_ADERDG-no-output-optimised-nonsymbolic.csv](/uploads/0f84240941230d4bc9f06c76b154396b/Euler_ADERDG-no-output-optimised-nonsymbolic.csv)
[Euler_ADERDG-no-output-optimised.likwid.csv](/uploads/135bd71c668ca0c4c4ee5b7988ea2f17/Euler_ADERDG-no-output-optimised.likwid.csv)
[Euler_ADERDG-no-output-optimised-nonsymbolic.likwid.csv](/uploads/7606eefe559499a1f6de62b5f33905d0/Euler_ADERDG-no-output-optimised-nonsymbolic.likwid.csv)