Top Twelve Tip #3

Objectives drive which statistics to use

What is the correct numerical summary to use, mean or median? Which type of hypothesis test should be used, parametric or nonparametric? The answer should depend solely on your objectives.

If your goal is to estimate the total mass of an LNAPL or other contaminant in the groundwater at a site, sum up each location's contribution and multiply by the volume to obtain the total for the site. For multiple measurements at a sampling location, the mean is the appropriate statistic to use to summarize the location. The mean is a standardized total for that location. When you sum a series of values to produce a total, as when the interest is in the mass, volume, or cumulative exposure, the mean is the appropriate summary statistic to use. Summing a series of medians will underestimate the total amount.

If your goal is to express the typical concentrations seen at ten wells in the region, the median is a better choice than the mean. A median is resistant to the effect of unusual values. When nine of the ten wells have low concentrations but one is much higher, the median is relatively unaffected by the one high value and looks much like the concentrations in the other nine. The mean would be pulled up toward the high value, perhaps being higher than all of the other nine observations. When the interest is in a representative value, the median is the appropriate summary statistic.

If your interest is in testing differences between groups, consider what you are planning to test. The question “does one group generally have higher values than the second?” is a frequency question -- do higher values occur more frequently in one group? Nonparametric methods directly test differences in frequencies -- they are computed using ranks (percentiles). Do not decide which type of test to use based on whether data follow a normal distribution – this pre-test is decades out of date. Nonparametric tests work well on data that follow a normal distribution. The newer permutation tests will test differences in means without requiring that data follow a normal distribution. Decide which type of test to use based solely on the objectives of your study. If you are interested in totals, mass, etc., test differences in group means using a permutation test. If you are interested in whether one group exhibits higher values than another, test differences in percentiles use a nonparametric test.