Faster mesh refinement

Problem 1 (Costly operation is performed while no parallelism available)

Peano's shared-memory parallelism is based on identifying regular subgrids in the tree. If a new cells is introduced to the tree it might not be yet be identified as part of a regular subgrid and the operations performed on the cell are thus parallelised.

Solution: If imposing initial conditions or evaluating a refinement criterion is too expensive we could let the user choose to perform it as background task. This might lead to more mesh setup iteration but potentially to a better exploitation of the available cores. It should also benefit the hiding of MPI communication during the mesh setup.

Problem 2 (Overall concurrency)

ExaHyPE's regular shared-memory parallelism during the mesh setup is currently further limited as multiple cells might write to the same vertex in order to set refinement events.

Solution: We should be able to solve this by inverting the control. The vertex checks in touchVertexLastTime if any cell has set a refinement event, and refines if that is the case. This would increase the concurrency of the enterCell operations.

Problem 3 (Memory)

ExaHyPE's initial mesh setup is performed at the beginning by a single rank. Gradually more and more ranks are added. In order to prevent that any of the ranks runs out of memory during the initial mesh setup, it might make sense to only temporarily allocate memory, impose initial conditions, evaluate the refinement criterion, and then free the memory again (better: recylce it). After the initial mesh setup, we would then allocate memory on all ranks and impose initial conditions.

Problem 4 (Load Balancing)

The load-balancing does currently only count the number of cells. It does not take the different cell types in ExaHyPE's grid into account. The helper cell types Descendant and Ancestor have way less work to do than the compute cells of type Cell.