add tests for mean, stddev and 95% conf. interval, fix typo

14 jobs for metrics in 21 minutes and 49 seconds (queued for 5 seconds)