[Design decision][CodeGenerator] Optimizing unused feature
@di25cox @svenk @gi26det
I'm currently rewriting the SpaceTimePredictor Code Generator and have the following issue:
The source terms for the SpaceTimePredictor and VolumeIntegral (in memory: 4th block of tempFluxUnknowns = lFhi and tempSpaceTimeFluxUnknowns[0] = lFi in 3D, tempSpaceTimeFluxUnknowns[1] = gradQ and tempStateSizedVector = BgradQ) are totally unused in Euler and therefore weren't implemented at all in the optimized kernel.
I could implement them like the generic kernel do (always active) but it would be highly inefficient as a lot of tensor product and function call for nothing would be required, also the generic kernel should be optimized to be able to chose to disable this feature altogether if not required.
So I'll need to force the user to chose to activate the use of source term or not (at least for the optimized kernel, the generic ones can later be adapted the same way). I see 3 way of doing this efficiently:
- Write it somewhere in the spec file so that the Code Generator can generate code without source term if required
- Use preprocessor instruction and tell the user to modify the local makefile or a new compile configuration file to enable/disable it and then mark the source term related code with preprocessor branching
- Use templating and template metaprogramming to generate two or more version of the code, then let the user chose which one to use by changing a boolean that will be read in the MySolver_generated to choose the correct implementation
Example of solution 3:
template <boolean useSource>
void test() {
if(useSource) {
printf("Source term code used");
} else {
printf("No Source term code");
}
}
// [...] in Mysolver_generated
if(useSourceLocalBoolean) {
test<true>();
} else {
test<false>();
}
The advantage is that the branching optimization is done at compile time with 2 version of the code generated and the branching evaluation will be very cheap (once per cell per timestep). The drawback is that the code become more complex.
I would prefer solution 1 or 2 but we can discuss it as it is not urgent matter (I'm implementing without source at first).