|
|
# ExaHyPE Convergence Studies framework
|
|
|
|
|
|
At [Misc/ConvergenceAnalysis](https://gitlab.lrz.de/exahype/ExaHyPE-Engine/tree/master/Miscellaneous/ConvergenceAnalysis) we do have some Python code to manage convergence studies. This wiki page shall document how to use this.
|
|
|
At [Misc/ConvergenceAnalysis](https://gitlab.lrz.de/exahype/ExaHyPE-Engine/tree/master/Miscellaneous/ConvergenceAnalysis) we do have some Python code to manage convergence studies. This wiki page shall document how to use this. It's helpful to read it before starting to use the scripts, just to get an idea of the overall setup.
|
|
|
|
|
|
## Overview about the simulation starter script (Python)
|
|
|
|
... | ... | @@ -32,6 +32,8 @@ optional arguments: |
|
|
|
|
|
This allows you to start (or submit, if you use a queing system inbetween) a number of different ExaHyPE simulations in parallel. It will also do the simulation directory setup for you by making use of an intermediate _runner_ script which is [RunScripts/runTemplatedSpecfile.sh](https://gitlab.lrz.de/exahype/ExaHyPE-Engine/blob/master/Miscellaneous/RunScripts/runTemplatedSpecfile.sh). This shell script does the specfile templating and calls the binary.
|
|
|
|
|
|
If you have a look into the `runShuVortex.py` program, you see that it basically computes a number of paramter combinations and then sets lot's of environment variables. All of them are passed to `runTemplatedSpecfile.sh`. One of them is the environment variable `QRUN` which can be used to insert a batch queue command as `srun` (SLURM) or `llrun` (Loadleveler) to distribute the convergence tests on a cluster. This is convenient as soon as your tests get large. Otherwise, they are just called as individual processes on the current system. In any case, the `runShuVortex.py` program uses `subprocess.Popen` to run all programs in background, so in any case all parameter combinations are run at the same time.
|
|
|
|
|
|
## Compiling for all polynomial orders
|
|
|
|
|
|
The infrastructure presented here does not compile ExaHyPE for you, especially not for different polynomial orders for ADERDG. While you can use the same binary to run different mesh resolutions, unfortunately the polynomial order is a compile-time constant, thus you need as many binaries as you have polynomial orders you want to test.
|
... | ... | @@ -58,9 +60,202 @@ After having these binaries, you are ready to invoke the respective convergence |
|
|
|
|
|
## Watching progress
|
|
|
|
|
|
There is the convenient [showSimulationProgress.sh](https://gitlab.lrz.de/exahype/ExaHyPE-Engine/blob/master/Miscellaneous/ConvergenceAnalysis/showSimulationProgress.sh) which allows you look what your simulations are doing.
|
|
|
There is the convenient [showSimulationProgress.sh](https://gitlab.lrz.de/exahype/ExaHyPE-Engine/blob/master/Miscellaneous/ConvergenceAnalysis/showSimulationProgress.sh) which allows you look what your simulations are doing. For instance, I call
|
|
|
|
|
|
```
|
|
|
$ export SIMBASE="AlfenWave3D/simulations/"
|
|
|
$ ./showSimulationProgress.sh
|
|
|
SimulationName NumReductionFiles EachRedLength NumLinesLog FinishedStatus LastTimeStep Walltime
|
|
|
p2-meshsize0.00452674897119 28 0 42 FAILED None started 77.92system
|
|
|
p2-meshsize0.0135802469136 28 3 756 FAILED step 136 t_min =0.114497
|
|
|
p2-meshsize0.0407407407407 28 17 3506 FAILED step 676 t_min =1.55366
|
|
|
p2-meshsize0.122222222222 28 105 8440 FAILED step 1656 t_min =10.3281
|
|
|
p2-meshsize0.366666666667 28 121 2712 FINISHED step 534 t_min =12.0001 0.17system
|
|
|
p3-meshsize0.00452674897119 28 0 42 FAILED None started 210.19system
|
|
|
p3-meshsize0.0135802469136 28 2 271 FAILED step 40 t_min =0.0198056
|
|
|
p3-meshsize0.0407407407407 28 14 4558 FAILED step 878 t_min =1.27636
|
|
|
p3-meshsize0.122222222222 28 121 15937 FINISHED step 3012 t_min =12.0004 11.08system
|
|
|
p3-meshsize0.366666666667 28 121 4558 FINISHED step 902 t_min =12.0098 0.25system
|
|
|
p4-meshsize0.00452674897119 28 0 42 FAILED None started 63.94system
|
|
|
p4-meshsize0.0135802469136 28 0 42 FAILED None started 181.59system
|
|
|
p4-meshsize0.0407407407407 28 7 2611 FAILED step 504 t_min =0.516747
|
|
|
p4-meshsize0.122222222222 28 50 9037 FAILED step 1745 t_min =4.80483
|
|
|
p4-meshsize0.366666666667 28 121 6587 FINISHED step 1302 t_min =12.0037 0.34system
|
|
|
p5-meshsize0.00452674897119 28 0 42 FAILED None started 30.15system
|
|
|
p5-meshsize0.0135802469136 28 0 42 FAILED None started 27.11system
|
|
|
p5-meshsize0.0407407407407 28 3 1399 FAILED step 263 t_min =0.175834
|
|
|
p5-meshsize0.122222222222 28 94 23346 FAILED step 4600 t_min =9.21398
|
|
|
p5-meshsize0.366666666667 28 121 10153 FINISHED step 1997 t_min =12.0012 8.21system
|
|
|
p6-meshsize0.0135802469136 28 0 42 FAILED None started 98.98system
|
|
|
p6-meshsize0.0407407407407 28 2 651 FAILED step 115 t_min =0.0649305
|
|
|
p6-meshsize0.122222222222 28 27 8369 FAILED step 1593 t_min =2.55488
|
|
|
p6-meshsize0.366666666667 28 121 12014 FINISHED step 2366 t_min =12.0005 17.31system
|
|
|
p7-meshsize0.0135802469136 28 0 42 FAILED None started 43.14system
|
|
|
p7-meshsize0.0407407407407 28 2 336 FAILED step 53 t_min =0.0236267
|
|
|
p7-meshsize0.122222222222 28 15 5219 FAILED step 1007 t_min =1.32551
|
|
|
p7-meshsize0.366666666667 28 121 16017 FINISHED step 3078 t_min =12.001 14.39system
|
|
|
p8-meshsize0.0407407407407 28 2 159 FAILED step 18 t_min =0.00534956
|
|
|
p8-meshsize0.122222222222 28 11 5735 FAILED step 1122 t_min =1.0002
|
|
|
p8-meshsize0.366666666667 28 121 22687 FINISHED step 4489 t_min =12.0011 15.88system
|
|
|
p9-meshsize0.0407407407407 28 0 64 FAILED None started
|
|
|
p9-meshsize0.122222222222 28 7 4298 FAILED step 838 t_min =0.560361
|
|
|
p9-meshsize0.366666666667 28 121 30277 FINISHED step 5986 t_min =12.0016 37.42system
|
|
|
```
|
|
|
|
|
|
The overview tells me that no more test is running and gives me a CSV table (seperated by `\t`) which is suitable for further processing.
|
|
|
|
|
|
## Generating a report
|
|
|
|
|
|
After or during your runs, you can use the [finish-convergence-table.py](https://gitlab.lrz.de/exahype/ExaHyPE-Engine/blob/master/Miscellaneous/ConvergenceAnalysis/reporting/finish-convergence-table.py) command to generate convergence tables out of your simulation. Consider that currently it makes use of the `SIMBASE` environment variable to determine where your simulations are stored. Probably you have to play around a bit with this.
|
|
|
|
|
|
As always, it is quite messy to deal with the paths. What I typically do is softlinking everything together, ie. in the folder `misc/ConvergenceAnalysis/`
|
|
|
|
|
|
```
|
|
|
AlfenWave3D/
|
|
|
├── MHD_AlfenWave3DConvergence.exahype
|
|
|
├── reporting -> ../reporting/
|
|
|
├── runAlfenWave3D.py
|
|
|
├── simulations -> /path/to/a/large/scratch/disk/exahype/simulations/MHDCPP_AlfenWave3D_LegendreConvergence
|
|
|
└── simulations.txt
|
|
|
```
|
|
|
|
|
|
`simulations.txt` is initially generated by `../showSimulationProgress.sh | tee simulations.txt`. You can delete lines by hand to avoid having them seen by the reporting tool. As you see I also keep the actual simulations on a scratch disk as they can get quite large (having 50GB here).
|
|
|
|
|
|
In such a setup, I just call `SIMBASE="simulations/" ./reporting/finish-convergence-table.py` and get an output similar to
|
|
|
|
|
|
```
|
|
|
Using all p orders
|
|
|
Applying prefix 'simulations/' on each simulation name
|
|
|
Will work with 34 simulations:
|
|
|
0. simulations/p2-meshsize0.00452674897119
|
|
|
1. simulations/p2-meshsize0.0135802469136
|
|
|
2. simulations/p2-meshsize0.0407407407407
|
|
|
3. simulations/p2-meshsize0.122222222222
|
|
|
4. simulations/p2-meshsize0.366666666667
|
|
|
5. simulations/p3-meshsize0.00452674897119
|
|
|
6. simulations/p3-meshsize0.0135802469136
|
|
|
7. simulations/p3-meshsize0.0407407407407
|
|
|
8. simulations/p3-meshsize0.122222222222
|
|
|
9. simulations/p3-meshsize0.366666666667
|
|
|
10. simulations/p4-meshsize0.00452674897119
|
|
|
11. simulations/p4-meshsize0.0135802469136
|
|
|
12. simulations/p4-meshsize0.0407407407407
|
|
|
13. simulations/p4-meshsize0.122222222222
|
|
|
14. simulations/p4-meshsize0.366666666667
|
|
|
15. simulations/p5-meshsize0.00452674897119
|
|
|
16. simulations/p5-meshsize0.0135802469136
|
|
|
17. simulations/p5-meshsize0.0407407407407
|
|
|
18. simulations/p5-meshsize0.122222222222
|
|
|
19. simulations/p5-meshsize0.366666666667
|
|
|
20. simulations/p6-meshsize0.0135802469136
|
|
|
21. simulations/p6-meshsize0.0407407407407
|
|
|
22. simulations/p6-meshsize0.122222222222
|
|
|
23. simulations/p6-meshsize0.366666666667
|
|
|
24. simulations/p7-meshsize0.0135802469136
|
|
|
25. simulations/p7-meshsize0.0407407407407
|
|
|
26. simulations/p7-meshsize0.122222222222
|
|
|
27. simulations/p7-meshsize0.366666666667
|
|
|
28. simulations/p8-meshsize0.0407407407407
|
|
|
29. simulations/p8-meshsize0.122222222222
|
|
|
30. simulations/p8-meshsize0.366666666667
|
|
|
31. simulations/p9-meshsize0.0407407407407
|
|
|
32. simulations/p9-meshsize0.122222222222
|
|
|
33. simulations/p9-meshsize0.366666666667
|
|
|
For the following number of cells, we have this number of simlations:
|
|
|
nCells
|
|
|
3 120
|
|
|
9 240
|
|
|
27 240
|
|
|
81 240
|
|
|
243 240
|
|
|
dtype: int64
|
|
|
So we can compare up to 120 simulations for convergence analysis
|
|
|
Did not find a target row for rowindex=960 and row[nSmaller]=3.0
|
|
|
Did not find a target row for rowindex=961 and row[nSmaller]=3.0
|
|
|
....
|
|
|
Did not find a target row for rowindex=1078 and row[nSmaller]=3.0
|
|
|
Did not find a target row for rowindex=1079 and row[nSmaller]=3.0
|
|
|
Computing convergence tables...
|
|
|
Comptuted this convergence table for the individual reductions
|
|
|
(as l1norm, infnorm=max, etc.)
|
|
|
pOrder nCells nSmaller plotindex time l1norm l2norm max ol1norm ol2norm omax
|
|
|
0 2 243 81 1 0.000000 5.935810e-15 6.859539e-15 1.949363e-17 5.563183 5.445540 1.324645
|
|
|
1 2 243 81 2 0.113159 8.769101e-04 1.515260e-03 7.498691e-06 -4.307702 -4.065516 -7.543604
|
|
|
2 2 243 81 3 0.203506 1.260693e-03 2.129472e-03 1.158534e-05 -3.502515 -3.485054 -7.113232
|
|
|
3 2 243 81 4 0.316721 1.729339e-03 2.717190e-03 1.548186e-05 -3.054514 -3.113887 -6.979140
|
|
|
4 2 243 81 5 0.407395 2.135272e-03 3.205626e-03 1.624925e-05 -2.797949 -2.888795 -6.714172
|
|
|
5 2 243 81 6 0.520524 2.600140e-03 3.690549e-03 1.571305e-05 -2.652873 -2.733840 -6.348429
|
|
|
6 2 243 81 7 0.610364 2.904012e-03 3.952305e-03 1.793602e-05 -2.528211 -2.586664 -6.191396
|
|
|
7 2 243 81 8 0.700212 3.128929e-03 4.159902e-03 2.009075e-05 -2.423108 -2.460692 -6.037988
|
|
|
8 2 243 81 9 0.812600 3.285496e-03 4.334478e-03 2.152800e-05 -2.337630 -2.364027 -5.916524
|
|
|
9 2 243 81 10 0.902466 3.334288e-03 4.440350e-03 2.171277e-05 -2.245383 -2.271061 -5.762643
|
|
|
10 2 243 81 11 1.015415 3.352039e-03 4.554054e-03 2.079752e-05 -2.167977 -2.200194 -5.584308
|
|
|
11 2 243 81 12 1.106436 3.426979e-03 4.668185e-03 2.007723e-05 -2.123705 -2.145614 -5.451924
|
|
|
12 2 243 81 13 1.219761 3.638330e-03 4.867186e-03 2.041572e-05 -2.122487 -2.116238 -5.385107
|
|
|
13 2 243 81 14 1.310206 3.850491e-03 5.088447e-03 2.387204e-05 -2.123862 -2.096613 -5.474315
|
|
|
14 2 243 81 15 1.400829 4.108181e-03 5.362513e-03 2.634123e-05 -2.136524 -2.090681 -5.476362
|
|
|
15 2 243 81 16 1.512964 4.468311e-03 5.763021e-03 2.768166e-05 -2.174506 -2.111839 -5.445373
|
|
|
16 2 243 81 17 1.602857 4.749672e-03 6.089651e-03 2.892119e-05 -2.194068 -2.120132 -5.411790
|
|
|
17 2 243 81 18 1.715849 5.062576e-03 6.487604e-03 2.768224e-05 -2.224052 -2.143836 -5.318092
|
|
|
18 2 243 81 19 1.806547 5.316989e-03 6.777938e-03 3.313686e-05 -2.251057 -2.159539 -5.445738
|
|
|
19 2 243 81 20 1.919804 5.606125e-03 7.073194e-03 3.913272e-05 -2.292125 -2.184210 -5.583435
|
|
|
20 2 243 81 21 2.009998 5.793040e-03 7.266357e-03 4.203198e-05 -2.323061 -2.202253 -5.671021
|
|
|
21 2 243 81 22 2.100461 5.974524e-03 7.428400e-03 4.284762e-05 -2.354664 -2.218698 -5.668471
|
|
|
22 2 243 81 23 2.213446 6.145398e-03 7.625884e-03 4.075708e-05 -2.373849 -2.233139 -5.549591
|
|
|
23 2 243 81 24 2.304054 6.198621e-03 7.806177e-03 4.405571e-05 -2.366718 -2.238317 -5.588768
|
|
|
24 2 243 81 25 2.417349 6.312932e-03 8.083243e-03 4.480552e-05 -2.361322 -2.253834 -5.552686
|
|
|
25 2 243 81 26 2.507843 6.476530e-03 8.332133e-03 4.282694e-05 -2.376002 -2.274678 -5.518626
|
|
|
26 2 243 81 27 2.620821 6.799000e-03 8.668525e-03 4.293392e-05 -2.356810 -2.276919 -5.548072
|
|
|
27 2 243 81 28 2.711360 7.051210e-03 8.937031e-03 4.496222e-05 -2.392513 -2.309219 -5.684857
|
|
|
28 2 243 81 29 2.801547 7.263387e-03 9.193089e-03 4.557691e-05 -2.063632 -1.996257 -5.619965
|
|
|
29 2 243 81 30 2.913845 7.454898e-03 9.480219e-03 4.806180e-05 -2.057752 -1.973569 -5.296870
|
|
|
... ... ... ... ... ... ... ... ... ... ... ...
|
|
|
1050 3 9 3 91 9.000255 3.409400e-04 4.843725e-04 1.323807e-07 0.000000 0.000000 0.000000
|
|
|
1051 3 9 3 92 9.100516 3.425519e-04 4.866032e-04 1.315677e-07 0.000000 0.000000 0.000000
|
|
|
1052 3 9 3 93 9.200773 3.402395e-04 4.862375e-04 1.173909e-07 0.000000 0.000000 0.000000
|
|
|
1053 3 9 3 94 9.301009 3.366218e-04 4.839696e-04 1.148353e-07 0.000000 0.000000 0.000000
|
|
|
1054 3 9 3 95 9.401256 3.343641e-04 4.815920e-04 1.253258e-07 0.000000 0.000000 0.000000
|
|
|
1055 3 9 3 96 9.501481 3.355014e-04 4.809585e-04 1.261497e-07 0.000000 0.000000 0.000000
|
|
|
1056 3 9 3 97 9.601758 3.398562e-04 4.839315e-04 1.259638e-07 0.000000 0.000000 0.000000
|
|
|
1057 3 9 3 98 9.701991 3.432496e-04 4.866591e-04 1.287844e-07 0.000000 0.000000 0.000000
|
|
|
1058 3 9 3 99 9.800237 3.432600e-04 4.881150e-04 1.252259e-07 0.000000 0.000000 0.000000
|
|
|
1059 3 9 3 100 9.900453 3.397293e-04 4.860228e-04 1.069149e-07 0.000000 0.000000 0.000000
|
|
|
1060 3 9 3 101 10.000690 3.371396e-04 4.842733e-04 1.143260e-07 0.000000 0.000000 0.000000
|
|
|
1061 3 9 3 102 10.100930 3.356367e-04 4.820951e-04 1.262916e-07 0.000000 0.000000 0.000000
|
|
|
1062 3 9 3 103 10.201200 3.386493e-04 4.835371e-04 1.298948e-07 0.000000 0.000000 0.000000
|
|
|
1063 3 9 3 104 10.301430 3.426307e-04 4.861500e-04 1.317317e-07 0.000000 0.000000 0.000000
|
|
|
1064 3 9 3 105 10.401690 3.443840e-04 4.883702e-04 1.315583e-07 0.000000 0.000000 0.000000
|
|
|
1065 3 9 3 106 10.501950 3.423637e-04 4.882526e-04 1.180119e-07 0.000000 0.000000 0.000000
|
|
|
1066 3 9 3 107 10.600170 3.387176e-04 4.860060e-04 1.145525e-07 0.000000 0.000000 0.000000
|
|
|
1067 3 9 3 108 10.700420 3.365541e-04 4.840547e-04 1.247856e-07 0.000000 0.000000 0.000000
|
|
|
1068 3 9 3 109 10.800640 3.371060e-04 4.830214e-04 1.259031e-07 0.000000 0.000000 0.000000
|
|
|
1069 3 9 3 110 10.900930 3.411756e-04 4.856226e-04 1.273240e-07 0.000000 0.000000 0.000000
|
|
|
1070 3 9 3 111 11.001140 3.445510e-04 4.880775e-04 1.304516e-07 0.000000 0.000000 0.000000
|
|
|
1071 3 9 3 112 11.101400 3.443442e-04 4.892281e-04 1.272134e-07 0.000000 0.000000 0.000000
|
|
|
1072 3 9 3 113 11.201620 3.408293e-04 4.872521e-04 1.093249e-07 0.000000 0.000000 0.000000
|
|
|
1073 3 9 3 114 11.301860 3.379666e-04 4.854567e-04 1.150771e-07 0.000000 0.000000 0.000000
|
|
|
1074 3 9 3 115 11.400090 3.364830e-04 4.834092e-04 1.271547e-07 0.000000 0.000000 0.000000
|
|
|
1075 3 9 3 116 11.500350 3.392393e-04 4.845533e-04 1.300543e-07 0.000000 0.000000 0.000000
|
|
|
1076 3 9 3 117 11.600600 3.432829e-04 4.871475e-04 1.314490e-07 0.000000 0.000000 0.000000
|
|
|
1077 3 9 3 118 11.700870 3.451572e-04 4.890965e-04 1.317974e-07 0.000000 0.000000 0.000000
|
|
|
1078 3 9 3 119 11.801130 3.434541e-04 4.892414e-04 1.190691e-07 0.000000 0.000000 0.000000
|
|
|
1079 3 9 3 120 11.901330 3.394892e-04 4.867112e-04 1.138752e-07 0.000000 0.000000 0.000000
|
|
|
|
|
|
[1080 rows x 11 columns]
|
|
|
Doing plots
|
|
|
Plots are for porders= [2, 3, 4, 5, 6, 7, 8, 9] and ncells= [3, 9, 27, 81, 242]
|
|
|
|
|
|
Wrote report to simulations/generated-report.html.
|
|
|
Wrote report to simulations/evolution.html.
|
|
|
Convergence test results:
|
|
|
Convergence factor is 11.45
|
|
|
Convergence test is FAILED
|
|
|
```
|
|
|
|
|
|
The script generated two HTML files which you can open with your webbrowser also locally by just coping the full path to the address line as `file:///home/yourname/exahype/Miscellaneous/ConvergenceTests/AlfenWave3D/simulations/generated-report.html`. You also notice that in my example, the convergence test _failed_ as we have a falsy convergence order of `11.45` while we expect something like `1`.
|
|
|
|
|
|
You can run this convergence reporting program as many times as you want, it will always redo the whole computation with all data which are accessible. Thus you can also get kind of incremental output during the simulation run.
|
|
|
|
|
|
## Setting up a workflow
|
|
|
|
|
|
We can chain the starting python scripts together with the reporting in order to start one single program and get one single return value which either tells us "_Yes, we have convergence_" or "_No, we don't have convergence_". As both the starter and the reporting is Python, this gluing is simple. However, not yet finished. |
|
|
\ No newline at end of file |