Sven Köppel · d2bb3d8a
--- a/Convergence-Studies.md
+++ b/Convergence-Studies.md
 # ExaHyPE Convergence Studies framework

-At [Misc/ConvergenceAnalysis](https://gitlab.lrz.de/exahype/ExaHyPE-Engine/tree/master/Miscellaneous/ConvergenceAnalysis) we do have some Python code to manage convergence studies. This wiki page shall document how to use this.
+At [Misc/ConvergenceAnalysis](https://gitlab.lrz.de/exahype/ExaHyPE-Engine/tree/master/Miscellaneous/ConvergenceAnalysis) we do have some Python code to manage convergence studies. This wiki page shall document how to use this. It's helpful to read it before starting to use the scripts, just to get an idea of the overall setup.

 ##  Overview about the simulation starter script (Python)

@@ -32,6 +32,8 @@ optional arguments:

 This allows you to start (or submit, if you use a queing system inbetween) a number of different ExaHyPE simulations in parallel. It will also do the simulation directory setup for you by making use of an intermediate _runner_ script which is [RunScripts/runTemplatedSpecfile.sh](https://gitlab.lrz.de/exahype/ExaHyPE-Engine/blob/master/Miscellaneous/RunScripts/runTemplatedSpecfile.sh). This shell script does the specfile templating and calls the binary.

+If you have a look into the `runShuVortex.py` program, you see that it basically computes a number of paramter combinations and then sets lot's of environment variables. All of them are passed to `runTemplatedSpecfile.sh`. One of them is the environment variable `QRUN` which can be used to insert a batch queue command as `srun` (SLURM) or `llrun` (Loadleveler) to distribute the convergence tests on a cluster. This is convenient as soon as your tests get large. Otherwise, they are just called as individual processes on the current system. In any case, the `runShuVortex.py` program uses `subprocess.Popen` to run all programs in background, so in any case all parameter combinations are run at the same time.
+
 ## Compiling for all polynomial orders

 The infrastructure presented here does not compile ExaHyPE for you, especially not for different polynomial orders for ADERDG. While you can use the same binary to run different mesh resolutions, unfortunately the polynomial order is a compile-time constant, thus you need as many binaries as you have polynomial orders you want to test.
@@ -58,9 +60,202 @@ After having these binaries, you are ready to invoke the respective convergence

 ## Watching progress

-There is the convenient [showSimulationProgress.sh](https://gitlab.lrz.de/exahype/ExaHyPE-Engine/blob/master/Miscellaneous/ConvergenceAnalysis/showSimulationProgress.sh) which allows you look what your simulations are doing.
+There is the convenient [showSimulationProgress.sh](https://gitlab.lrz.de/exahype/ExaHyPE-Engine/blob/master/Miscellaneous/ConvergenceAnalysis/showSimulationProgress.sh) which allows you look what your simulations are doing. For instance, I call
+
+```
+$ export SIMBASE="AlfenWave3D/simulations/"
+$ ./showSimulationProgress.sh 
+SimulationName	NumReductionFiles	EachRedLength	NumLinesLog	FinishedStatus	LastTimeStep	Walltime
+p2-meshsize0.00452674897119             	 28	 0	 42	 FAILED	 None started	 77.92system
+p2-meshsize0.0135802469136              	 28	 3	 756	 FAILED	 step 136 t_min =0.114497	 
+p2-meshsize0.0407407407407              	 28	 17	 3506	 FAILED	 step 676 t_min =1.55366	 
+p2-meshsize0.122222222222               	 28	 105	 8440	 FAILED	 step 1656 t_min =10.3281	 
+p2-meshsize0.366666666667               	 28	 121	 2712	 FINISHED	 step 534 t_min =12.0001	 0.17system
+p3-meshsize0.00452674897119             	 28	 0	 42	 FAILED	 None started	 210.19system
+p3-meshsize0.0135802469136              	 28	 2	 271	 FAILED	 step 40 t_min =0.0198056	 
+p3-meshsize0.0407407407407              	 28	 14	 4558	 FAILED	 step 878 t_min =1.27636	 
+p3-meshsize0.122222222222               	 28	 121	 15937	 FINISHED	 step 3012 t_min =12.0004	 11.08system
+p3-meshsize0.366666666667               	 28	 121	 4558	 FINISHED	 step 902 t_min =12.0098	 0.25system
+p4-meshsize0.00452674897119             	 28	 0	 42	 FAILED	 None started	 63.94system
+p4-meshsize0.0135802469136              	 28	 0	 42	 FAILED	 None started	 181.59system
+p4-meshsize0.0407407407407              	 28	 7	 2611	 FAILED	 step 504 t_min =0.516747	 
+p4-meshsize0.122222222222               	 28	 50	 9037	 FAILED	 step 1745 t_min =4.80483	 
+p4-meshsize0.366666666667               	 28	 121	 6587	 FINISHED	 step 1302 t_min =12.0037	 0.34system
+p5-meshsize0.00452674897119             	 28	 0	 42	 FAILED	 None started	 30.15system
+p5-meshsize0.0135802469136              	 28	 0	 42	 FAILED	 None started	 27.11system
+p5-meshsize0.0407407407407              	 28	 3	 1399	 FAILED	 step 263 t_min =0.175834	 
+p5-meshsize0.122222222222               	 28	 94	 23346	 FAILED	 step 4600 t_min =9.21398	 
+p5-meshsize0.366666666667               	 28	 121	 10153	 FINISHED	 step 1997 t_min =12.0012	 8.21system
+p6-meshsize0.0135802469136              	 28	 0	 42	 FAILED	 None started	 98.98system
+p6-meshsize0.0407407407407              	 28	 2	 651	 FAILED	 step 115 t_min =0.0649305	 
+p6-meshsize0.122222222222               	 28	 27	 8369	 FAILED	 step 1593 t_min =2.55488	 
+p6-meshsize0.366666666667               	 28	 121	 12014	 FINISHED	 step 2366 t_min =12.0005	 17.31system
+p7-meshsize0.0135802469136              	 28	 0	 42	 FAILED	 None started	 43.14system
+p7-meshsize0.0407407407407              	 28	 2	 336	 FAILED	 step 53 t_min =0.0236267	 
+p7-meshsize0.122222222222               	 28	 15	 5219	 FAILED	 step 1007 t_min =1.32551	 
+p7-meshsize0.366666666667               	 28	 121	 16017	 FINISHED	 step 3078 t_min =12.001	 14.39system
+p8-meshsize0.0407407407407              	 28	 2	 159	 FAILED	 step 18 t_min =0.00534956	 
+p8-meshsize0.122222222222               	 28	 11	 5735	 FAILED	 step 1122 t_min =1.0002	 
+p8-meshsize0.366666666667               	 28	 121	 22687	 FINISHED	 step 4489 t_min =12.0011	 15.88system
+p9-meshsize0.0407407407407              	 28	 0	 64	 FAILED	 None started	 
+p9-meshsize0.122222222222               	 28	 7	 4298	 FAILED	 step 838 t_min =0.560361	 
+p9-meshsize0.366666666667               	 28	 121	 30277	 FINISHED	 step 5986 t_min =12.0016	 37.42system
+```
+
+The overview tells me that no more test is running and gives me a CSV table (seperated by `\t`) which is suitable for further processing.

 ## Generating a report

 After or during your runs, you can use the [finish-convergence-table.py](https://gitlab.lrz.de/exahype/ExaHyPE-Engine/blob/master/Miscellaneous/ConvergenceAnalysis/reporting/finish-convergence-table.py) command to generate convergence tables out of your simulation. Consider that currently it makes use of the `SIMBASE` environment variable to determine where your simulations are stored. Probably you have to play around a bit with this.

+As always, it is quite messy to deal with the paths. What I typically do is softlinking everything together, ie. in the folder `misc/ConvergenceAnalysis/`
+
+```
+AlfenWave3D/
+├── MHD_AlfenWave3DConvergence.exahype
+├── reporting -> ../reporting/
+├── runAlfenWave3D.py
+├── simulations -> /path/to/a/large/scratch/disk/exahype/simulations/MHDCPP_AlfenWave3D_LegendreConvergence
+└── simulations.txt
+```
+
+`simulations.txt` is initially generated by `../showSimulationProgress.sh | tee simulations.txt`. You can delete lines by hand to avoid having them seen by the reporting tool. As you see I also keep the actual simulations on a scratch disk as they can get quite large (having 50GB here).
+
+In such a setup, I just call `SIMBASE="simulations/" ./reporting/finish-convergence-table.py` and get an output similar to
+
+```
+Using all p orders
+Applying prefix 'simulations/' on each simulation name
+Will work with 34 simulations: 
+ 0. simulations/p2-meshsize0.00452674897119
+ 1. simulations/p2-meshsize0.0135802469136
+ 2. simulations/p2-meshsize0.0407407407407
+ 3. simulations/p2-meshsize0.122222222222
+ 4. simulations/p2-meshsize0.366666666667
+ 5. simulations/p3-meshsize0.00452674897119
+ 6. simulations/p3-meshsize0.0135802469136
+ 7. simulations/p3-meshsize0.0407407407407
+ 8. simulations/p3-meshsize0.122222222222
+ 9. simulations/p3-meshsize0.366666666667
+ 10. simulations/p4-meshsize0.00452674897119
+ 11. simulations/p4-meshsize0.0135802469136
+ 12. simulations/p4-meshsize0.0407407407407
+ 13. simulations/p4-meshsize0.122222222222
+ 14. simulations/p4-meshsize0.366666666667
+ 15. simulations/p5-meshsize0.00452674897119
+ 16. simulations/p5-meshsize0.0135802469136
+ 17. simulations/p5-meshsize0.0407407407407
+ 18. simulations/p5-meshsize0.122222222222
+ 19. simulations/p5-meshsize0.366666666667
+ 20. simulations/p6-meshsize0.0135802469136
+ 21. simulations/p6-meshsize0.0407407407407
+ 22. simulations/p6-meshsize0.122222222222
+ 23. simulations/p6-meshsize0.366666666667
+ 24. simulations/p7-meshsize0.0135802469136
+ 25. simulations/p7-meshsize0.0407407407407
+ 26. simulations/p7-meshsize0.122222222222
+ 27. simulations/p7-meshsize0.366666666667
+ 28. simulations/p8-meshsize0.0407407407407
+ 29. simulations/p8-meshsize0.122222222222
+ 30. simulations/p8-meshsize0.366666666667
+ 31. simulations/p9-meshsize0.0407407407407
+ 32. simulations/p9-meshsize0.122222222222
+ 33. simulations/p9-meshsize0.366666666667
+For the following number of cells, we have this number of simlations:
+nCells
+3      120
+9      240
+27     240
+81     240
+243    240
+dtype: int64
+So we can compare up to 120 simulations for convergence analysis
+Did not find a target row for rowindex=960 and row[nSmaller]=3.0
+Did not find a target row for rowindex=961 and row[nSmaller]=3.0
+....
+Did not find a target row for rowindex=1078 and row[nSmaller]=3.0
+Did not find a target row for rowindex=1079 and row[nSmaller]=3.0
+Computing convergence tables...
+Comptuted this convergence table for the individual reductions
+(as l1norm, infnorm=max, etc.)
+      pOrder  nCells  nSmaller  plotindex       time        l1norm        l2norm           max   ol1norm   ol2norm      omax
+0          2     243        81          1   0.000000  5.935810e-15  6.859539e-15  1.949363e-17  5.563183  5.445540  1.324645
+1          2     243        81          2   0.113159  8.769101e-04  1.515260e-03  7.498691e-06 -4.307702 -4.065516 -7.543604
+2          2     243        81          3   0.203506  1.260693e-03  2.129472e-03  1.158534e-05 -3.502515 -3.485054 -7.113232
+3          2     243        81          4   0.316721  1.729339e-03  2.717190e-03  1.548186e-05 -3.054514 -3.113887 -6.979140
+4          2     243        81          5   0.407395  2.135272e-03  3.205626e-03  1.624925e-05 -2.797949 -2.888795 -6.714172
+5          2     243        81          6   0.520524  2.600140e-03  3.690549e-03  1.571305e-05 -2.652873 -2.733840 -6.348429
+6          2     243        81          7   0.610364  2.904012e-03  3.952305e-03  1.793602e-05 -2.528211 -2.586664 -6.191396
+7          2     243        81          8   0.700212  3.128929e-03  4.159902e-03  2.009075e-05 -2.423108 -2.460692 -6.037988
+8          2     243        81          9   0.812600  3.285496e-03  4.334478e-03  2.152800e-05 -2.337630 -2.364027 -5.916524
+9          2     243        81         10   0.902466  3.334288e-03  4.440350e-03  2.171277e-05 -2.245383 -2.271061 -5.762643
+10         2     243        81         11   1.015415  3.352039e-03  4.554054e-03  2.079752e-05 -2.167977 -2.200194 -5.584308
+11         2     243        81         12   1.106436  3.426979e-03  4.668185e-03  2.007723e-05 -2.123705 -2.145614 -5.451924
+12         2     243        81         13   1.219761  3.638330e-03  4.867186e-03  2.041572e-05 -2.122487 -2.116238 -5.385107
+13         2     243        81         14   1.310206  3.850491e-03  5.088447e-03  2.387204e-05 -2.123862 -2.096613 -5.474315
+14         2     243        81         15   1.400829  4.108181e-03  5.362513e-03  2.634123e-05 -2.136524 -2.090681 -5.476362
+15         2     243        81         16   1.512964  4.468311e-03  5.763021e-03  2.768166e-05 -2.174506 -2.111839 -5.445373
+16         2     243        81         17   1.602857  4.749672e-03  6.089651e-03  2.892119e-05 -2.194068 -2.120132 -5.411790
+17         2     243        81         18   1.715849  5.062576e-03  6.487604e-03  2.768224e-05 -2.224052 -2.143836 -5.318092
+18         2     243        81         19   1.806547  5.316989e-03  6.777938e-03  3.313686e-05 -2.251057 -2.159539 -5.445738
+19         2     243        81         20   1.919804  5.606125e-03  7.073194e-03  3.913272e-05 -2.292125 -2.184210 -5.583435
+20         2     243        81         21   2.009998  5.793040e-03  7.266357e-03  4.203198e-05 -2.323061 -2.202253 -5.671021
+21         2     243        81         22   2.100461  5.974524e-03  7.428400e-03  4.284762e-05 -2.354664 -2.218698 -5.668471
+22         2     243        81         23   2.213446  6.145398e-03  7.625884e-03  4.075708e-05 -2.373849 -2.233139 -5.549591
+23         2     243        81         24   2.304054  6.198621e-03  7.806177e-03  4.405571e-05 -2.366718 -2.238317 -5.588768
+24         2     243        81         25   2.417349  6.312932e-03  8.083243e-03  4.480552e-05 -2.361322 -2.253834 -5.552686
+25         2     243        81         26   2.507843  6.476530e-03  8.332133e-03  4.282694e-05 -2.376002 -2.274678 -5.518626
+26         2     243        81         27   2.620821  6.799000e-03  8.668525e-03  4.293392e-05 -2.356810 -2.276919 -5.548072
+27         2     243        81         28   2.711360  7.051210e-03  8.937031e-03  4.496222e-05 -2.392513 -2.309219 -5.684857
+28         2     243        81         29   2.801547  7.263387e-03  9.193089e-03  4.557691e-05 -2.063632 -1.996257 -5.619965
+29         2     243        81         30   2.913845  7.454898e-03  9.480219e-03  4.806180e-05 -2.057752 -1.973569 -5.296870
+...      ...     ...       ...        ...        ...           ...           ...           ...       ...       ...       ...
+1050       3       9         3         91   9.000255  3.409400e-04  4.843725e-04  1.323807e-07  0.000000  0.000000  0.000000
+1051       3       9         3         92   9.100516  3.425519e-04  4.866032e-04  1.315677e-07  0.000000  0.000000  0.000000
+1052       3       9         3         93   9.200773  3.402395e-04  4.862375e-04  1.173909e-07  0.000000  0.000000  0.000000
+1053       3       9         3         94   9.301009  3.366218e-04  4.839696e-04  1.148353e-07  0.000000  0.000000  0.000000
+1054       3       9         3         95   9.401256  3.343641e-04  4.815920e-04  1.253258e-07  0.000000  0.000000  0.000000
+1055       3       9         3         96   9.501481  3.355014e-04  4.809585e-04  1.261497e-07  0.000000  0.000000  0.000000
+1056       3       9         3         97   9.601758  3.398562e-04  4.839315e-04  1.259638e-07  0.000000  0.000000  0.000000
+1057       3       9         3         98   9.701991  3.432496e-04  4.866591e-04  1.287844e-07  0.000000  0.000000  0.000000
+1058       3       9         3         99   9.800237  3.432600e-04  4.881150e-04  1.252259e-07  0.000000  0.000000  0.000000
+1059       3       9         3        100   9.900453  3.397293e-04  4.860228e-04  1.069149e-07  0.000000  0.000000  0.000000
+1060       3       9         3        101  10.000690  3.371396e-04  4.842733e-04  1.143260e-07  0.000000  0.000000  0.000000
+1061       3       9         3        102  10.100930  3.356367e-04  4.820951e-04  1.262916e-07  0.000000  0.000000  0.000000
+1062       3       9         3        103  10.201200  3.386493e-04  4.835371e-04  1.298948e-07  0.000000  0.000000  0.000000
+1063       3       9         3        104  10.301430  3.426307e-04  4.861500e-04  1.317317e-07  0.000000  0.000000  0.000000
+1064       3       9         3        105  10.401690  3.443840e-04  4.883702e-04  1.315583e-07  0.000000  0.000000  0.000000
+1065       3       9         3        106  10.501950  3.423637e-04  4.882526e-04  1.180119e-07  0.000000  0.000000  0.000000
+1066       3       9         3        107  10.600170  3.387176e-04  4.860060e-04  1.145525e-07  0.000000  0.000000  0.000000
+1067       3       9         3        108  10.700420  3.365541e-04  4.840547e-04  1.247856e-07  0.000000  0.000000  0.000000
+1068       3       9         3        109  10.800640  3.371060e-04  4.830214e-04  1.259031e-07  0.000000  0.000000  0.000000
+1069       3       9         3        110  10.900930  3.411756e-04  4.856226e-04  1.273240e-07  0.000000  0.000000  0.000000
+1070       3       9         3        111  11.001140  3.445510e-04  4.880775e-04  1.304516e-07  0.000000  0.000000  0.000000
+1071       3       9         3        112  11.101400  3.443442e-04  4.892281e-04  1.272134e-07  0.000000  0.000000  0.000000
+1072       3       9         3        113  11.201620  3.408293e-04  4.872521e-04  1.093249e-07  0.000000  0.000000  0.000000
+1073       3       9         3        114  11.301860  3.379666e-04  4.854567e-04  1.150771e-07  0.000000  0.000000  0.000000
+1074       3       9         3        115  11.400090  3.364830e-04  4.834092e-04  1.271547e-07  0.000000  0.000000  0.000000
+1075       3       9         3        116  11.500350  3.392393e-04  4.845533e-04  1.300543e-07  0.000000  0.000000  0.000000
+1076       3       9         3        117  11.600600  3.432829e-04  4.871475e-04  1.314490e-07  0.000000  0.000000  0.000000
+1077       3       9         3        118  11.700870  3.451572e-04  4.890965e-04  1.317974e-07  0.000000  0.000000  0.000000
+1078       3       9         3        119  11.801130  3.434541e-04  4.892414e-04  1.190691e-07  0.000000  0.000000  0.000000
+1079       3       9         3        120  11.901330  3.394892e-04  4.867112e-04  1.138752e-07  0.000000  0.000000  0.000000
+
+[1080 rows x 11 columns]
+Doing plots
+Plots are for porders= [2, 3, 4, 5, 6, 7, 8, 9]  and ncells= [3, 9, 27, 81, 242]
+
+Wrote report to simulations/generated-report.html.
+Wrote report to simulations/evolution.html.
+Convergence test results:
+Convergence factor is 11.45
+Convergence test is FAILED
+```
+
+The script generated two HTML files which you can open with your webbrowser also locally by just coping the full path to the address line as  `file:///home/yourname/exahype/Miscellaneous/ConvergenceTests/AlfenWave3D/simulations/generated-report.html`. You also notice that in my example, the convergence test _failed_ as we have a falsy convergence order of `11.45` while we expect something like `1`.
+
+You can run this convergence reporting program as many times as you want, it will always redo the whole computation with all data which are accessible. Thus you can also get kind of incremental output during the simulation run.
+
+## Setting up a workflow
+
+We can chain  the starting python scripts together with the reporting in order to start one single program and get one single return value which either tells us "_Yes, we have convergence_" or "_No, we don't have convergence_". As both the starter and the reporting is Python, this gluing is simple. However, not yet finished.
\ No newline at end of file