ExaHyPE issueshttps://gitlab.lrz.de/groups/exahype/-/issues2019-03-07T19:41:06+01:00https://gitlab.lrz.de/exahype/ExaHyPE-Engine/-/issues/221Deeply check the public repository for CCZ42019-03-07T19:41:06+01:00Ghost UserDeeply check the public repository for CCZ4It is at https://github.com/exahype/exahype and we don't want to have the CCZ4 system in the code or any old commits. Make this sure by inspecting the code.
Cannot do it now since the repo is 70MB in size and I'm in a train with a bad w...It is at https://github.com/exahype/exahype and we don't want to have the CCZ4 system in the code or any old commits. Make this sure by inspecting the code.
Cannot do it now since the repo is 70MB in size and I'm in a train with a bad wifi.https://gitlab.lrz.de/exahype/ExaHyPE-Engine/-/issues/222Fix the M/h computation in the CCZ4/Writers/TimingStatisticsWriter.cpph2019-04-15T12:41:12+02:00Ghost UserFix the M/h computation in the CCZ4/Writers/TimingStatisticsWriter.cpphFrom a mail at Luke:
>
I just noticed there is something wrong in the M/h determination:
>
As explained in my last mail to Tobias and you, I just divide these
two numbers. However, the time in the first column of stdout measures
the tim...From a mail at Luke:
>
I just noticed there is something wrong in the M/h determination:
>
As explained in my last mail to Tobias and you, I just divide these
two numbers. However, the time in the first column of stdout measures
the time since program start.
>
In ExaHyPE, the grid setup sometimes takes a considerable amount --
like 10 minutes. If you measure the M/h straight after the first
timesteps after these 10 minutes, you get of course totally wrong
numbers. However, if you measure after 1000 minutes runtime, the 10
minutes grid setup do not change the result so much.
>
It is not hard to substract the time the grid setup needs in order to
improve the correctness of the number. You can do this either by hand
(just look up when the first time step started) or we can do this in
code (CCZ4/Writers/TimingStatisticsWriter.cpph).https://gitlab.lrz.de/exahype/ExaHyPE-Engine/-/issues/223Implement new Space time predictor without (variable number of) picard loops2019-04-15T12:34:30+02:00Ghost UserImplement new Space time predictor without (variable number of) picard loopsThe task is relatively easy: Diff Dumbsers fortran prototype (in repository) and check out the changes in the generic kernels.
From Dumbser, 11. März 2018 um 11:20:
> Wäre natürlich super, wenn Du meinen Fortran Code in C übersetzen kö...The task is relatively easy: Diff Dumbsers fortran prototype (in repository) and check out the changes in the generic kernels.
From Dumbser, 11. März 2018 um 11:20:
> Wäre natürlich super, wenn Du meinen Fortran Code in C übersetzen könntest.
> Wenn es Probleme gibt,
> machen wir wieder eine Skype session.
> Das Format der Schleifen und der nötigen Berechnungen ist quasi dasselbe wie
> für alle anderen Rechnungen
> im space-time predictor, d.h. da kann man sehr viel übernehmen. Nur, dass
> man den Zeitindex nicht mehr
> mitschleppen muss, sondern nur im Raum arbeiten kann. Der einzige Kernel der
> geändert werden muss ist der
> space-time predictor (in 2D und 3D).
>
> Ich würde nur den second und third order initial guess implementieren, siehe
> den Code in
> SpaceTimePredictor.f90 unter
>
> #ifdef SECOND_ORDER_INITIAL_GUESS
>
> #ifdef THIRD_ORDER_INITIAL_GUESS
>
> Ich habe mich diese Woche auf Scaling und den Vergleich Runge-Kutta DG /
> ADER-DG konzentriert,
> d.h. an dem 2D FO-fCCZ4 habe ich nicht mehr weitergearbeitet. Ich will erst
> das GRMHD Paper vom
> Tisch haben.https://gitlab.lrz.de/exahype/ExaHyPE-Engine/-/issues/224Numerical details how to evolve CCZ4 with FD2019-04-15T12:34:55+02:00Ghost UserNumerical details how to evolve CCZ4 with FDCollection of what was written by Dumbser in various E-Mails in order to proceed in the Runge Kutta - Finte Differencing code (Cactus/Antelope/Okapi):
### Dumbser, 3. April 2018 um 11:03: FO-CCZ4 with Finite differencing works
> since S...Collection of what was written by Dumbser in various E-Mails in order to proceed in the Runge Kutta - Finte Differencing code (Cactus/Antelope/Okapi):
### Dumbser, 3. April 2018 um 11:03: FO-CCZ4 with Finite differencing works
> since Sven has reported some difficulties with the implementation of FO-CCZ4 in the Einstein toolkit last week, and since I wanted to understand the potential problems in depth, I have simply written my own finite difference code for the Einstein equations, based on central finite differences in space and Runge-Kutta time integration. According to Sven's and Elias' description, this is exactly what you are also doing in Cactus, right? The implementation is straightforward, since FD schemes are extremely simple.
>
>
>
> To do my tests, I have just copy-pasted my Fortran subroutine PDEFusedSrcNCP into the finite difference code, and I then insert FD point values of Q and central FD approximations for the first spatial derivatives. To save CPU time, I have done all computations in 1D so far.
>
>
>
> Please find attached the results that I have obtained for the Gauge wave with amplitude A=0.01 and A=0.1 until a final time of t=1000. I have used sixth order central FD in space and a classical third order Runge-Kutta scheme in time. For FO-CCZ4, I have set all damping coefficients to zero (kappa1=kappa2=kappa3=0), and I use c=0 with e=2. Zero shift (sk=0) and harmonic lapse. CFL number based on e is set to CFL=0.5 at the moment. Now the important points:
>
>
>
> 1. Runge-Kutta O3 is for now preferrable over Runge-Kutta O4, since it is intrinsically dissipative. The reason is that the fourth order time derivative term in the Taylor series on the left hand side remains with RK3, while it cancels with RK4, and when moving the term q_tttt to the right hand side and after Cauchy-Kowalevsky procedure, it becomes a fourth order spatial derivative term with negative sign, which is good for stability (second spatial derivatives must have positive sign on the right hand side, fourth spatial derivatives must have negative sign for stability. this is easy to check via Fourier series and the dispersion relation of the PDE).
>
>
>
> 2. The RK3 alone is not enough to stabilize the scheme for the larger amplitude A=0.1 of the Gauge Wave, but it is sufficient for A=0.01. I therefore explicitly needed to subtract a term of the type - dt/dx*u_xxxx, which is essentially a fourth order Kreiss-Oliger-type dissipation with appropriately chosen viscosity coefficient.
>
>
>
> I will now replace the Kreiss-Oliger dissipation which I do not like with the numerical dissipation that you would have obtained with a Rusanov flux in a fourth order accurate finite volume scheme. In the end, the dissipation operator will again be written as a finite difference, but there I know at least exactly what is going on and we will exactly know the amount of dissipation to be put (it can only be a function of the largest eigenvalue). So there will be NO parameter to be tuned. I will keep you updated on this.
>
>
>
> From my own results I can conclude that everything is working as expected, i.e. you must have at least one bug in your implementation of FO-CCZ4 in Cactus; or you have run the tests with the wrong parameters (please use CFL=0.5 for the moment, Kreiss-Oliger dissipation with a viscosity factor so that you get -dt/dx*u_xxxx on the right hand side in the end, please set kappa1=kappa2=kappa3=0 and set e=2 and c=0 in FO-CCZ4). If your code still does not run, I can send you my finite difference Fortran code to help you with the debugging.
>
>
>
> While running and implementing my FD code for FO-CCZ4 I have also been working on the vectorization of FO-CCZ4 in ADER-DG. The interesting news: the finite difference code requires more than 20 microseconds per FD point update, and ADER-DG with the good new initial guess for the space-time polynomial and proper vectorization needs also about 20 microseconds per DOF update for FO-CCZ4, i.e. ADER-DG is indeed becoming competitive with FD, who would have every believed this last year :-) On the new 512bit vector machines of RSC in Moscow, we expect the PDE to run even twice as fast, since the vector registers are twice as large as the current state of the art. We are aiming at a time per DOF update of about 10 microseconds. I will keep you informed.
### Dumbser, 4. April 2018 um 10:51:
>
However, my latest experiments show that you can also use RK4 together with a finite-volume type dissipative operator, which is very simple to
implement and which does not require any parameters to be tuned. It will just replace the Kreiss-Oliger dissipation. And by the way: in this setting,
the scheme can be run with CFL=0.9, which is what we want. I will send around more details later.
### Dumbser, 4. April 2018 um 18:23
> there are again good news from the finite difference for FO-CCZ4 front. Instead of your classical Kreiss-Oliger dissipation, I suggest to use the following dissipation operator, which should simply be "added" to the time derivatives of
>
> all quantities on the right hand side, i.e.:
>
>
> ```
> dudt(:,i,j,k) = dudt(:,i,j,k) - 1.0/dx(1)* 3.0/256.0* smax * ( -15.0*u(:,i+1,j,k)-u(:,i+3,j,k)-15.0*u(:,i-1,j,k)+6.0*u(:,i+2,j,k)+20.0*u(:,i,j,k)+6.0*u(:,i-2,j,k)-u(:,i-3,j,k) )
> ```
>
>
> where dudt(:,i,j,k) is the time derivative of the discrete solution computed by the existing Fortran function PDEFusedSrcNCP and smax is the maximum eigenvalue in absolute value. This operator derives from
>
>
> ```
> - 1/dx(1)*( fp - fm ),
> ```
>
>
> where the dissipative flux fp is defined as
>
>
> ```
> fp = - 1/2 * smax * ( uR - uL ),
> ```
>
>
> and uR and uL are the central high order polynomial reconstructions of u evaluated at the cell interface x_i+1/2. The flux fm is the same, but on the left interface x_i-1/2.
>
>
>
> Please find attached the new results for the Gauge Wave with A=0.1 amplitude. Everything looks fine, i.e., the ADM constraints as well as the waveform at the final time. Note that this simulation was now run with the fourth order
> Runge-Kutta scheme in time and using a CFL number of CFL=0.9 based on the maximum eigenvalue.https://gitlab.lrz.de/exahype/ExaHyPE-Engine/-/issues/225Provide vectorized user functions in the optimized kernels2019-03-07T19:41:41+01:00Ghost UserProvide vectorized user functions in the optimized kernelsThis is something Jean Matthieu should do.
Then we can immediately test a couple of PDEs, such as Euler, GRMHD or CCZ4. Dumbser shared his vectorized code also somewhere.This is something Jean Matthieu should do.
Then we can immediately test a couple of PDEs, such as Euler, GRMHD or CCZ4. Dumbser shared his vectorized code also somewhere.Jean-Matthieu GallardJean-Matthieu Gallardhttps://gitlab.lrz.de/exahype/ExaHyPE-Engine/-/issues/227Reflection for the parameter system2019-03-07T20:43:51+01:00Ghost UserReflection for the parameter systemCurrently, using runtime parameters requires to parse them somewhere, leading to code like
```c++
void GeometricBallLimiting::readParameters(const mexa::mexafile& para) {
radius = para["radius"].as_double();
std::string where = para...Currently, using runtime parameters requires to parse them somewhere, leading to code like
```c++
void GeometricBallLimiting::readParameters(const mexa::mexafile& para) {
radius = para["radius"].as_double();
std::string where = para["where"].as_string();
toLower(where);
if(where == "inside") limit_inside = true;
else if(where == "outside") limit_inside = false;
else {
logError("readParameters()", "Valid values for where are 'inside' and 'outside'. Keeping default.");
}
logInfo("readParameters()", "Limiting " << (limit_inside ? "within" : "outside of") << " a ball with radius=" << radius);
}
```
associated to a structure where the parameters are stored,
```c++
struct GeometricBallLimiting : public LimitingCriterionCode {
double radius; ///< Radius of the ball (default -1 = no limiting)
bool limit_inside; ///< Whether to limit inside or outside (default inside)
GeometricBallLimiting() : radius(-1), limit_inside(true) {}
bool isPhysicallyAdmissible(IS_PHYSICALLY_ADMISSIBLE_SIGNATURE) const override;
void readParameters(const mexa::mexafile& parameters) override;
};
```
This is lot's of overhead and really redundant.
In Cactus, the user can declare parameters *including their description/meaning and valid values* in a nice language, they are then made available as a structure by the glue code, all the parsing is abstracted away. Example of a Cactus parameter file (CCL file):
```
real eta "Damping coefficient for the Gamma Driver" STEERABLE=always
{
0:* :: "should be 1-2/M"
}0.2
KEYWORD evol_type "Which set of equations to evolve"
{
"BSSN" :: "traditional BSSN"
"Z4c" :: "Z4c"
"CCZ4" :: "(Covariant) and conformal Z4"
"FOCCZ4" :: "First order formulation of the CCZ4"
}"Z4c"
boolean include_theta_source "Only FO-CCZ4: set to false to remove the algebraic source terms of the type -2*Theta" STEERABLE=always
{
} yes
```
In the code, one then just has something like
```c++
struct parameters {
double eta;
std::string evol_type;
boolean include_theta_source;
}
```
which is already filled nicely with values.
While at least I certainly don't want to code such a glue code for ExaHyPE, in contrast with minimal effort we can get much better then the current MEXA system. In fact, it would be nice to use *OOP reflection* to automatically register class attributes for common data types (int/double/bool/string). That means I would be fine with writing
```c++
struct GeometricBallLimiting {
double radius;
enum class limit_at { inside, outside };
REGISTER_PARAM(radius, DEFAULT(-1), "Radius of the ball");
REGISTER_PARAM(limit_at, DEFAULT(limit_at::inside), "Where to limit");
}
```
In fact, something like this is possible some macro magic.https://gitlab.lrz.de/exahype/ExaHyPE-Engine/-/issues/157Plotter variables are not constants2018-09-10T19:41:33+02:00Ghost UserPlotter variables are not constantsActually the ExaHyPE runtime treats the number of writtenUnknowns per plotter as a runtime variable, ie.
```
plot hdf5::flash ConservedWriter
variables = 19
time = 0.0
repeat = 0.00001
output = ./hdf5-flash/c...Actually the ExaHyPE runtime treats the number of writtenUnknowns per plotter as a runtime variable, ie.
```
plot hdf5::flash ConservedWriter
variables = 19
time = 0.0
repeat = 0.00001
output = ./hdf5-flash/conserved
end plot
```
one can change `variables = x` without recompiling. However, the toolkit wants this to be a constant:
```
plot hdf5::flash ConservedWriter
variables const = 19
time = 0.0
repeat = 0.00001
output = ./hdf5-flash/conserved
end plot
```
otherwise it says `ERROR: eu.exahype.parser.ParserException: [71,17] expecting: 'const'`
This should not happen, ie. the toolkit grammar should accept without `const`.
As always, there is assumably no case when somebody wants to do this **except benchmarking plotter file formats which is exactly what I'm doing now** and the typical generated code by the toolkit is not aware of a non-constexpr number of variables, but all the `ExaHyPE/exahype/plotters/` code actually treats the number as runtime constant and I see no reason why to artificially introduce something constexpr here.https://gitlab.lrz.de/exahype/ExaHyPE-Engine/-/issues/153Constants in specfiles2018-09-10T19:42:06+02:00Ghost UserConstants in specfilesI have too much constants that a specfile syntax
```
solver Limiting-ADER-DG GRMHDSolver
variables const = rho:1,vel:3,E:1,B:3,psi:1,lapse:1,shift:3,gij:6
order const = 4
maximum-mesh-size...I have too much constants that a specfile syntax
```
solver Limiting-ADER-DG GRMHDSolver
variables const = rho:1,vel:3,E:1,B:3,psi:1,lapse:1,shift:3,gij:6
order const = 4
maximum-mesh-size = 6.0009
maximum-mesh-depth = 0
time-stepping = global
kernel const = generic::fluxes::nonlinear
language const = C
constants = initial_data:tovstar,boundary_x_lower:reflective,boundary_y_lower:reflective,boundary_z_upper:outgoing,tovstar-mass:1.234,tovstar-rl-ratio:2.345
```
would make sense. Tobias thinks in such a case a user would do something like
```
solver Limiting-ADER-DG GRMHDSolver
variables const = rho:1,vel:3,E:1,B:3,psi:1,lapse:1,shift:3,gij:6
order const = 4
maximum-mesh-size = 6.0009
maximum-mesh-depth = 0
time-stepping = global
kernel const = generic::fluxes::nonlinear
language const = C
constants = configuration-file:foobar.txt
```
and outsource the configuration in a language the user likes. However, this breaks with the idea of a single specfile for a single run. Therefore, I see two options
## Add a DATA section after the specfile
This is what many script languages do, for instance perl ([the DATA syntax in perl](https://stackoverflow.com/questions/13463509/the-data-syntax-in-perl)). The idea is just that the parsers ignore what goes after the end of the specfile (ie. the line containing `end exahype-project`). Users could dump there any content in their favourite language. I would vote for this as it is super easy to implement, allows file concatenation and flexibility.
## Allow user constant section
We could also just allow users to do something like
```
solver Limiting-ADER-DG GRMHDSolver
variables const = rho:1,vel:3,E:1,B:3,psi:1,lapse:1,shift:3,gij:6
order const = 4
maximum-mesh-size = 6.0009
maximum-mesh-depth = 0
time-stepping = global
kernel const = generic::fluxes::nonlinear
language const = C
constants = parameters:appended
limiter-kernel const = generic::musclhancock
limiter-language const = C
dmp-observables = 2
dmp-relaxation-parameter = 1e-2
dmp-difference-scaling = 1e-3
steps-till-cured = 0
simulation-parameters
foo = bar
baz = bar
blo = bar
blu = bar
etc.
end simulation-parameters
plot vtk::Cartesian::vertices::limited::binary ConservedWriter
variables const = 19
time = 0.0
repeat = 0.00166667
output = ./vtk-output/conserved
end plot
...
```
This would go well with the specfile syntax. In order to implement, we need
* Such a section with any key-value pairs added to the grammar, so the toolkit does not complain
* Support in the `Parser.cpp` (which is not too hard to add)https://gitlab.lrz.de/exahype/ExaHyPE-Engine/-/issues/152Dynamic plotter registration2019-03-07T19:39:00+01:00Ghost UserDynamic plotter registrationUsers shall be able to add, comment out or remove plotters at runtime without recompiling. The plotter ordering should not be fixed.
This is not too hard to obtain, just requires changes at `KernelCalls.cpp` with having a generated plot...Users shall be able to add, comment out or remove plotters at runtime without recompiling. The plotter ordering should not be fixed.
This is not too hard to obtain, just requires changes at `KernelCalls.cpp` with having a generated plotter registration function (semi pseudocode)
```c++
Writer* kernels::getNamedWriter(std::string name, Solver& solver) {
if(name == "ConservedWriter") return new GRMHD::ConservedWriter(*static_cast<exahype::solvers::LimitingADERDGSolver*>(solver));
if(name == "usw") return new GRMHD::SomeOtherWriter(*static_cast<exahype::solvers::ADERDGSolver*>(solver));
if(name == ...
else
failure: Dont now this plotter type.
}
```
We then can replace the generated current section
```c++
exahype::plotters::RegisteredPlotters.push_back( new exahype::plotters::Plotter(0,0,parser,new GRMHD::ConservedWriter( *static_cast<exahype::solvers::LimitingADERDGSolver*>(exahype::solvers::RegisteredSolvers[0])) ));
exahype::plotters::RegisteredPlotters.push_back( new exahype::plotters::Plotter(0,1,parser,new GRMHD::ConservedWriter( *static_cast<exahype::solvers::LimitingADERDGSolver*>(exahype::solvers::RegisteredSolvers[0])) ));
exahype::plotters::RegisteredPlotters.push_back( new exahype::plotters::Plotter(0,2,parser,new GRMHD::ConservedWriter( *static_cast<exahype::solvers::LimitingADERDGSolver*>(exahype::solvers::RegisteredSolvers[0])) ));
exahype::plotters::RegisteredPlotters.push_back( new exahype::plotters::Plotter(0,3,parser,new GRMHD::ConservedWriter( *static_cast<exahype::solvers::LimitingADERDGSolver*>(exahype::solvers::RegisteredSolvers[0])) ));
exahype::plotters::RegisteredPlotters.push_back( new exahype::plotters::Plotter(0,4,parser,new GRMHD::IntegralsWriter( *static_cast<exahype::solvers::LimitingADERDGSolver*>(exahype::solvers::RegisteredSolvers[0])) ));
```
with a non-generated section (pseudocode)
```c+++
for(const int& solvernum : Parser->getSolvers()) {
int plotternum=0;
for( const string& plottername : Parser->getPlotterNamesForSolver(solvernum)) {
exahype::plotters::RegisteredPlotters.push_back(new exahype::plotters::Plotter(solvernum,plotternum++,parser,getNamedWriter(plottername, exahype::solvers::RegisteredSolvers[solvernum]));
}
}
```
This is something I can do on my own. If the Parser gives the neccessary data...https://gitlab.lrz.de/exahype/ExaHyPE-Engine/-/issues/92Multi-solvers / Parameter Studies / Sensitivity Analysis2019-09-20T15:46:45+02:00Ghost UserMulti-solvers / Parameter Studies / Sensitivity Analysis### Progress:
* Multi-solver infrastructure is implemented for all solvers (ADER-DG,Godunov FV,Limiting ADER-DG).
Further tests and more debugging are necessary for MPI and TBB - especially with respect to the limiting ADER-DG so...### Progress:
* Multi-solver infrastructure is implemented for all solvers (ADER-DG,Godunov FV,Limiting ADER-DG).
Further tests and more debugging are necessary for MPI and TBB - especially with respect to the limiting ADER-DG solver.
### Issues/Features
* 30/12/16: Parameter studies require using the same solver with different initial conditions and/or source terms.
The problem and discretisation (order,variables,plotters,...) of the solver does not change in these studies.
I plan to enable such studies with an annotation {<studyNumber>}, e.g., "solver SolverType MySolver{10}". that is interpreted by the Toolkit
which then creates a constructor that passes the number of the study (0 to 9 in the above example), and
further adds the required number of solvers to the registry.
The current Plotter infrastructure does not support this idea yet because of its dependence on exahype::Parser.
* ~~30/12/16: Limiting-ADER-DG: Limiter domain seems to be required to often while the actual limiter domain does not change per solver.~~ Fixed.
* ~~30/12/16: MPI crashes for multiple ADER-DG/Limiting ADER-DG solvers (seg-fault.)~~ Fixed.https://gitlab.lrz.de/exahype/ExaHyPE-Engine/-/issues/151Single-node MPI strong scaling differences between 2d and 3d2019-09-20T15:46:36+02:00Ghost UserSingle-node MPI strong scaling differences between 2d and 3dI am investigating the strong scaling behaviour of the 2d and 3d versions of ExaHyPE.
While the 2d version shows reasonable scalability, the 3d version does not.
* Experiments are performed on a single-node of SuperMUC Phase 2.
* A...I am investigating the strong scaling behaviour of the 2d and 3d versions of ExaHyPE.
While the 2d version shows reasonable scalability, the 3d version does not.
* Experiments are performed on a single-node of SuperMUC Phase 2.
* All plotters are turned off
* To exclude interconnect effects, all experiments are performed on a single-node.
In my experiments, I switch the master-worker communication (M/W)
on or off as well as the neighbour communication (N).
## Only Peano communication (M/W=off,N=off)
ranks | adapter name | iterations | total CPU time [t]=s | average CPU time [t]=s | total user time [t]=s | average user time [t]=s |
----------|-----------------------|-------------------|-----------------------------------|---------------------------------------|----------------------------------|---------------------------------------|
2 | ADERDGTimeStep | 29 | 37.07 | 1.27828 | 316.768 | 10.923 |
3 | ADERDGTimeStep | 29 | 36.25 | 1.25 | 305.752 | 10.5432 |
12 | ADERDGTimeStep | 29 | 27.04 | 0.932414 | 206.142 | 7.10834 |
28 | ADERDGTimeStep | 29 | 10.27 | 0.354138 | 24.4012 | 0.841421 |
## M/W=on, N=off
ranks | adapter name | iterations | total CPU time [t]=s | average CPU time [t]=s | total user time [t]=s | average user time [t]=s |
----------|-----------------------|-------------------|-----------------------------------|---------------------------------------|----------------------------------|---------------------------------------|
2 |ADERDGTimeStep | 29 | 37.58 | 1.29586 | 316.044 | 10.8981 |
3 | ADERDGTimeStep | 29 | 36.3 | 1.25172 | 306.977 | 10.5854 |
12 | ADERDGTimeStep | 29 | 27.21 | 0.938276 | 207.078 | 7.14064 |
28 | ADERDGTimeStep | 29 | 10.27 | 0.354138 | 24.5317 | 0.845921 |
## M/W=off, N=on
ranks | adapter name | iterations | total CPU time [t]=s | average CPU time [t]=s | total user time [t]=s | average user time [t]=s |
----------|-----------------------|-------------------|-----------------------------------|---------------------------------------|----------------------------------|---------------------------------------|
2 | ADERDGTimeStep | 29 | 39.52 | 1.36276 | 337.709 | 11.6451
3 | ADERDGTimeStep | 29 | 99.04 | 3.41517 | 995.378
12 | ADERDGTimeStep | 29 | 121.76 | 4.19862 | 1106.45 | 38.1534
28 | ADERDGTimeStep | 29 | 18.24 | 0.628966 | 105.858 | 3.65027
## M/W=on, N=on
Slightly worse than M/W=off,N=on.
# Insights:
* Rank 3 and 12 performance are load balancing issues.
* For the 28 rank run, the LB only deploys 10 ranks. This is actually a well-balanced setup for 10 ranks. (If we set 10 ranks, we have a load balancing issue again.)https://gitlab.lrz.de/exahype/ExaHyPE-Engine/-/issues/69Do memory precision tests2019-03-07T19:23:10+01:00Ghost UserDo memory precision testsTobias schreibt:
>
ich braeuchte mal Deine Hilfe (ja, natuerlich schnell, weil das in die Praesentation rein soll, aber das haste eh vermutet ;-). Die neue ExaHyPE-Version von gestern Nacht kann mit Zahldarstellungen <IEEE double Standa...Tobias schreibt:
>
ich braeuchte mal Deine Hilfe (ja, natuerlich schnell, weil das in die Praesentation rein soll, aber das haste eh vermutet ;-). Die neue ExaHyPE-Version von gestern Nacht kann mit Zahldarstellungen <IEEE double Standard arbeiten. Das ist teuer (erste Schritte suggerieren einen Faktor 5 in der Laufzeit, aber das koennte auch am Chip liegen, d.h. ich muss die Russenmaschine testen), aber es reduziert halt den Speicherbedarf einer Simulation etwas. Ich bekomm 10% schon fuer den einfachen Euler 2d mit Ordnung 3 - hoffe aber, dass das bei anderen Setups deutlich signifikanter ist. Einschalten laesst sich das Feature, indem man
>
double-compression = 0.000001
>
auf etwas zwischen 0 und 1 setzt. Das andere Flag spawn-double-compression-as-background-thread kannste vergessen, das ist meine Baustelle. Meine Frage/Bitte ist nun: Hast Du einen sinnvollen Benchmark zur Hand aus der Astrophysik und kannst Du mal
double-compression = 0.0
double-compression = 0.0001
double-compression = 0.000001
double-compression = 0.000000000001
>
damit ausprobieren und mir sagen, ob Werte grosser 0 die Loesung qualitativ verschlechtern? Ich hab keine Ahnung, was Ihr Euch typischerweise anseht (Ankunftszeiten/Amplituden/...?) deswegen brauch ich da ein qualifiziertes Auge. Wenn Du parallel noch drauf schaust, was er als memoryUsage ausgibt (macht er jetzt automatisch, wenn man nicht mit MPI uebersetzt), dann wuerde mir das sehr helfen, da eine Einschaetzung zu bekommen.
>
Nachtrag: So saehe dann eine Optimisation Sectino aus
```
optimisation
fuse-algorithmic-steps = on
fuse-algorithmic-steps-factor = 0.99
timestep-batch-factor = 0.0
skip-reduction-in-batched-time-steps = on
disable-amr-if-grid-has-been-stationary-in-previous-iteration = off
double-compression = 0.000001
spawn-double-compression-as-background-thread = off
end optimisation
```
@svenk: Machen.https://gitlab.lrz.de/exahype/ExaHyPE-Engine/-/issues/58Reduce memory footprint2019-09-20T15:46:51+02:00Ghost UserReduce memory footprint# Open issues
* We have consecutive heap indices for volume data and face data. We thus need to store one index for each
and can get the others by incrementing the index up to the bound we know of as developers.
We distinguish b...# Open issues
* We have consecutive heap indices for volume data and face data. We thus need to store one index for each
and can get the others by incrementing the index up to the bound we know of as developers.
We distinguish between cell and face data since we have helper cells that do not allocate cell data but face data.
* We can remove prediction and volumeFlux fields completely if we perform also the time integration
in the volume integral and the boundary extrapolation routines.
This would further make it easier to switch between global and local time stepping.
Here, we would just load a different kernel for the boundary extrapolation and allocate space-time face data
if the user switches local time stepping on.
* Allocate all the temporary arrays like the rhs, lQi_old, etc. only once per Thread and not dynamically
during the kernel calls. **(This could be done easily now in each solver!)**
* Create one "big" ADERDGTimeStep function in kernels/solver. This might help the compiler/is more Cache-friendly.
# Done
I think the following has been Tobias' idea originally;
It is not necessary to store temporary data on the heap for every cell description.
We need to analyse which ADER-DG fields are temporary and which
need to be stored persistently on the heap.
From my point of view, the following variables are temporary:
* spaceTimePredictor
* predictor
* spaceTimeVolumeFlux (includes sources)
* volumeFlux (includes sources)
The spacetime fields have a massive memory footprint.
They scale with (N+1)^{d+1} and d*(N+1)^{d+1}.
I thus propose that we assign each thread its own spaceTimePredictor spaceTimeVolumeFlux, predictor, and volume flux fields
and remove the fields from the heap cell descriptions.
This would reduce the memory footprint of the ADER-DG method dramatically (and might further lead to more cache-friendly code ?).
In a second step, we should kick out the volumeFlux field completely, don't do the time integration of the spaceTimeVolumeFlux,
and directly perform the volume integral with the spaceTimeVolumeFlux.
Implementation details
* Allocate arrays in Prediction mapping for each threadhttps://gitlab.lrz.de/exahype/ExaHyPE-Engine/-/issues/148LimitingADERDGSolver alsos performs limiter status spreading for user refinement2019-09-20T15:46:35+02:00Ghost UserLimitingADERDGSolver alsos performs limiter status spreading for user refinementPotential optimisation:
* LimitingADERDGSolver alsos performs limiter status spreading for user refinement.
This can potentially be turned off. Requires an enum return type instead of a bool.Potential optimisation:
* LimitingADERDGSolver alsos performs limiter status spreading for user refinement.
This can potentially be turned off. Requires an enum return type instead of a bool.https://gitlab.lrz.de/exahype/ExaHyPE-Engine/-/issues/57Finite volumes solver / Limiter2019-09-20T15:46:56+02:00Ghost UserFinite volumes solver / Limiter# Treating NaNs in the update vector
If NaNs occur in the update vector, a rollback
using the update vector is not possible anymore.
We thus need to introduce a "previousSolution" field
to the ADER-DG solution.
We might be able to g...# Treating NaNs in the update vector
If NaNs occur in the update vector, a rollback
using the update vector is not possible anymore.
We thus need to introduce a "previousSolution" field
to the ADER-DG solution.
We might be able to get rid of the "update" field if we directly
add a weighted (by dt and quad weight) update to the "solution" field
from the volume integral and surface integral evaluation.
# Status of the implementation of the limiting ADER-DG scheme (unordered notes)
* The limiter workflow works now for uniform meshes with Intel's TBBs.
[sod-shock-tube_limiting-aderdg-P_3_Godunov-N_7_dmp-only.avi](/uploads/b71d24c7b82f805c604f31c1b322fcae/sod-shock-tube_limiting-aderdg-P_3_Godunov-N_7_dmp-only.avi)
* Found and fixed some more bugs. Now everything looks fine. The limiter has a local characteristic and only
requires a rollback/reallocation of memory in every fifth time step (49 out of 255) in the experiment
shown [sod-shock-tube_limiting-aderdg-P_3_Godunov-N_7_dmp-only.avi](/uploads/1e065c7c5c2a52aefb551ef0a03f0ac5/sod-shock-tube_limiting-aderdg-P_3_Godunov-N_7_dmp-only.avi), where we ran an experiment with P=3 order ADER-DG approx, \delta_0=1e-2, \epsilon=1e-3.
As long as the limiter domain does not change, we do not need to reallocate new memory and do a
recomputation of certain cells in our novel implementation.
* Find another simulation using a 9-th order ADER-DG approximation and parameters \delta_0=10^-4 , \epsilon=10^-3 here:
[sod-shock-tube_limiting-aderdg-P_9_Godunov-N_19_dmp_pad.avi](/uploads/fb466322f7df5339790aac4afca9ce6c/sod-shock-tube_limiting-aderdg-P_9_Godunov-N_19_dmp_pad.avi)
~~We observee a strange x velocity in this benchmark. Can't be seen in the video. The both profile of the Shock-Tube differs signifantly
in vicinity of the contact discontinuity.~~This is actually the momentum density. Everything is fine.
* Explosion problem with P=5 and delta_0=1e-4 and epsilon=1e-3:
[explosion_limiting-aderdg-P_5_Godunov-N_11_dmp_pad.avi](/uploads/4ace50cdcbfcbc1706432d0cc180fcea/explosion_limiting-aderdg-P_5_Godunov-N_11_dmp_pad.avi). Here the limiter domain must be adjusted after most of the time steps
* I performed some further optimisations of the DMP evaluation. I found that we can easily perform a
"loop" fusion of the min and max computation and the DMP.
* Implemented the physical admissibility detection (PAD) now as well and use it to
detect non-physical oscillations in the initial conditions. Here, I found that we can directly
pass it the solution min and max we computed as part of the DMP calculation. No need
to loop over the whole degrees of freedom and the Lobatto nodes again.
* ~~There is still an issue with the discrete maximum principle which detects too many
cells to be troubled.~~ Was resolved by tuning the DMP parameters.
* AMR and Limiting need to be combined. This is in principle "a simple" wiring of FV patches
along the ADER-DG tree. We further need spatial averaging and interpolation operators.
Then, we can follow the AMR timestepping implementation of the ADER-DG scheme.
# Writing a Godunov type first order method
Due to the issues I encountered while [rewriting](#issueswithrewritingthefinitevolumessolver) the
pseudo-TVD donor cell type finite volumes solver, Tobias and me decided
to write a simple first order Godunov type finite volumes solver that only exchanges volume averages/fluxes
between direct (face) neighbours.
Replacement of this simple method by a more complex one will be tackled in later stages of the project -- if necessary.
# Issues with higher order finite volumes solvers
* Higher-order methods have a reconstruction stencil which is larger than the
Riemann solver stencil (simple star).
* Reconstruction and time evolution of boundary extrapolated values at one face of a patch can only be performed
after all volume averages from the direct neighbour and corner neighbour patches are available.
I have identified the following phases of the FVM solver now:
* Gather: Just get the neighbour values (arithmetic intensity = 0). This will move into the Merging mapping.
* Spatial reconstruction: Compute spatial slopes in the cells or something similar. This will move into the SolutionUpdate mapping.
* Temporal evolution of boundary-extrapolated values: This will move into the SolutionUpdate mapping.
* Riemann solve: This will move into the SolutionUpdate mapping.
* Solution update; This will move into SolutionUpdate mapping.
* Extrapolation of volume averages: This will send layers of the volume averages to the neighbours.
We send layers instead of a single layer in order to only have one data exchange instead of two.
With the above strategy, we can merge FVM solvers into the current framework. The difference to the ADER-DG solver
is that the computational intensity of the neighbour merging operation is zero.
We will further need to consider neighbour data exchange over edges (3-d) and corners in our
merging scheme. Neighbour data exchange over edges in 3-d will require additional falgs.
Exchange over corners does not require additional flags.
# Issues with rewriting the finite volumes solver
I tried to decompose our pseudo-TVD donor cell type finite volumes
solver into a Riemann solve and a solution update part,
and ran into the following issues.
## Open issues with the donor cell type pseudo-TVD finite volumes solver implementation:
* The current solver uses updated solution values in the solution update.
We have to distinguish between old values and new values or
have to introduce an update.
* We need to consider two layers of the neighbour to compute all extrapolated boundary
values (wLx,wRx,wLy,wRy). Currently only one layer ist considered. This means
that the boundary extrapolated values and thus the face fluxes at the outermost faces
are computed wrongly.
* To tackle the above problem, we either need to exchange a single cell of
the diagonal neighbours or we need to rely on two data exchanges.
Tobias told me there exists however a trick to circumvent this:
Reconstructing the diagonal neighbours' contributions with values from
the direct neighbours.
## Limitations of the donor cell type pseudo-TVD finite volumes solver:
* The solver is not TVD in multiple dimensions.
* We do neither perform corner correction nor (dimensional) operator splitting. We should
thus observe loss of mass conservation, large dispersion errors, and a reduced CFL stability limit.
The reduced CFL stability limit was taken into account in our implementation.
This is also a problem of the multi-dim. Godunov method in the unsplit form.
## Follow up: Low-order Euler-DG patch with direct coupling to ADER-DG method?
## Dissemination
For the following benchmarks I always used copy boundary conditions which are equal to
outflow/inflow boundary conditions as long as everything flows out of the domain.
* Euler (Compressible Hydrodynamics)
* Explosion Problem with 27^2 grid, P=3, and N=2*3+1 FV subgrid:
![lim-aderdg_explosion](/uploads/6459f9b4a775a482cdfec24e129c56ae/lim-aderdg_explosion.png)
[lim-aderdg_explosion.avi](/uploads/6a38fa00a6bf41fe3aefbee380a03396/lim-aderdg_explosion.avi)
* Sod Shock Tube with 27^2 grid, P=3 and N=2*3+1 FV subgrid:
![lim-aderdg_euler_sod-shock-tube](/uploads/eb2a5e57ca8f37665c2dffe48dbf1e22/lim-aderdg_euler_sod-shock-tube.png)
[lim-aderdg_euler_sod-shock-tube.avi](/uploads/61dfedf7db4ead996d7fd6b9796ba5c0/lim-aderdg_euler_sod-shock-tube.avi)
![hires_antialias](/uploads/26843a0f50e52ded919cfc12b777360b/draft2_hires_antialias.png)
* SRMHD (Special Relativistic Magnetohydrodynamics; basically: Special Relativistic Euler+Maxwell)
* Blast Wave setup as in 10.1016/j.cpc.2014.03.018 with 27^2 grid, P=5, and N=2*5+1 FV subgrid:
![lim-aderg_mhd__blast-wave](/uploads/feb026167c7c2d882c132f3b775c37ac/lim-aderg_mhd__blast-wave.png)
[lim-aderdg_mhd_blast-wave.avi] (/uploads/4d715c47a03bd3850d3c0398e39da58c/lim-aderdg_mhd_blast-wave.avi)
* Rotor setup as in http://adsabs.harvard.edu/abs/2004MSAIS...4...36D with 9^2 grid, P=9, and N=2*9+1 FV subgrid:
![lim-aderdg_mhd_rotor_9x9_P9](/uploads/b35e7dc2dc137068e7416a1952d02a1a/lim-aderdg_mhd_rotor_9x9_P9.png)
[lim-aderdg_mhd_rotor_9x9_P9.avi](/uploads/c5ee8ff29f18a7201be66a1a26b664dc/lim-aderdg_mhd_rotor_9x9_P9.avi)
https://gitlab.lrz.de/exahype/ExaHyPE-Engine/-/issues/37Precision issues2019-04-15T12:37:23+02:00Ghost UserPrecision issuesI observed that we can only apply the restriction and prolongation operations
a certain number of times until the projected constant values in my test are not equal anymore
to the reference values (with respect to a tolerance of 1e-12)....I observed that we can only apply the restriction and prolongation operations
a certain number of times until the projected constant values in my test are not equal anymore
to the reference values (with respect to a tolerance of 1e-12).
For more infos see the todos in 2D and 3D definitions of
GenericEulerKernelTest::testFaceUnknownsProjection(),
and
GenericEulerKernelTest::testVolumeUnknownsProjection()https://gitlab.lrz.de/exahype/ExaHyPE-Engine/-/issues/143New template-less pure virtual API2019-03-07T19:23:41+01:00Ghost UserNew template-less pure virtual APITODO: Implement this new API.
![usersolver_layout](/uploads/b314b4b3a6196966cec01c80918fb0c9/usersolver_layout.png)
Tobias favours it. Don't know how it goes with optimized kernels. SMall patch for the piucture from Tobias:
```
/* N...TODO: Implement this new API.
![usersolver_layout](/uploads/b314b4b3a6196966cec01c80918fb0c9/usersolver_layout.png)
Tobias favours it. Don't know how it goes with optimized kernels. SMall patch for the piucture from Tobias:
```
/* NEW: */
kernels::aderdg::generic::c::spaceTimePredictorLinear(BasisSolverAPI& solver, other parameters ... );
```
In textual form [Sketch.h](/uploads/8bcfc8c2920e7532f73e1dac5f8f5a07/Sketch.h)https://gitlab.lrz.de/exahype/ExaHyPE-Engine/-/issues/142ADERDG, inverseDX2018-06-19T11:17:55+02:00Jean-Matthieu GallardADERDG, inverseDX* Problem: most kernel use 1/dx instead of dx. Slow operation that could easily be optimized away.
* Solution:
- Peano implement a cellDescription.getInverseSize()
- ADERDGSolver use it
- ADERDG Kernel adapted
* Already done:
- Op...* Problem: most kernel use 1/dx instead of dx. Slow operation that could easily be optimized away.
* Solution:
- Peano implement a cellDescription.getInverseSize()
- ADERDGSolver use it
- ADERDG Kernel adapted
* Already done:
- Optimized kernel adapted, generate the inverseDx in the ADERDGSolver (code isolated with preprocessor)
@di25coxJean-Matthieu GallardJean-Matthieu Gallardhttps://gitlab.lrz.de/exahype/ExaHyPE-Engine/-/issues/140Intel compiler bug on certain large C++ files ("exahype/repositories/Reposito...2019-08-25T11:41:31+02:00Ghost UserIntel compiler bug on certain large C++ files ("exahype/repositories/Repository...")With Intel 17 (Intel 15/16 apparently not affected) aka `COMPILER=Intel, SHAREDMEM=TBB, MODE=Debug, DISTRIBUTEDMEM=None.` there is a bug when compiling files like
./ExaHyPE/exahype/repositories/RepositoryExplicitGridTemplateInstant...With Intel 17 (Intel 15/16 apparently not affected) aka `COMPILER=Intel, SHAREDMEM=TBB, MODE=Debug, DISTRIBUTEDMEM=None.` there is a bug when compiling files like
./ExaHyPE/exahype/repositories/RepositoryExplicitGridTemplateInstantiation4LimiterStatusMergingAndSpreadingMPI.o
and
./ExaHyPE/exahype/repositories/RepositoryExplicitGridTemplateInstantiation4GridErasing.o
and similar. This is known to Tobias and should be reported to Nicolay Hammer at LRZ or similar.https://gitlab.lrz.de/exahype/ExaHyPE-Engine/-/issues/1362D / 3D build check is not performed correctly2019-03-18T21:55:01+01:00Ghost User2D / 3D build check is not performed correctlyWe have a startup check in the code that makes sure that the specfile has some same `const` build constants as for instance the ADERDG polynomial order. However, this test fails for checking that for instance a 2D build is also runned wi...We have a startup check in the code that makes sure that the specfile has some same `const` build constants as for instance the ADERDG polynomial order. However, this test fails for checking that for instance a 2D build is also runned with a 2D grid specification. Thus we get strange errors as in https://gitlab.lrz.de/exahype/ExaHyPE-Engine/commit/10b1261b34e2e239f9a4fe1e2c42f0e38d85a207 :
```
0.0459649 error Invalid simulation end-time: notoken
(file:/home/koeppel/numrel/exahype/Engine-ExaHyPE/./ExaHyPE/exahype/Parser.cpp,line:377)
` ``
And that not after the beginning but after one timestep.
There should be a well-speaking check instead at the beginning.