Algorithm to sample from TruncatedNormalDistribution
Summary
Currently, the (default) algorithm to compute the truncated normal distribution is likely incorrect (implemented in TruncatedNormalDistribution.java
).
@Override
public double sample() {
for (int i = 0; i < maxIterations; i++) {
double sample = super.sample();
if (sample >= min && sample <= max)
return sample;
}
throw new IllegalArgumentException("Max iteration count reached on sampling for truncated distribution. Parameters bound and min are not suitable.");
So the logic is simple: re-sample if the current sample is out of bounds. And error if no sample is found.
Does anybody know if there is any theory or anything behind it to do it this way (or is it a hack-ish implementation)? It is certainly not the best, as it requires the max_iterations parameter which is definitely not required for a "true distribution". I mark it as a bug because the class name suggests something else -- even though it is probably not too far away, it could be a potential source of error, especially when doing uncertainty quantification (Valentina asked me to get the proper distribution in Vadere).
In case there is no guarantee, that this algorithms mimics the truncated distribution without error (actually, that it can raise an IllegalArgumentException with a random factor is already a bug...), in my opinion it is better to change this. So I think changing is better than creating a new "TheRealTruncatedNormatDistribution.java". This way, we can eliminate the potential source of error and we can save some performance drawbacks (if the bounds are very close, a lot of re-sampling is required -- we eliminate the max_iterations parameter then).
@hm-mgoedel @hm-kleinmei @BZoennchen @stsc
Because it is a much used code-piece, I want to ask: Are there any objections?
Here is a Java implementation which uses only Apache Common Math as a dependency: