Top Twelve Tip #12
“An approximate answer to the right question is worth a great deal more than a precise answer to the wrong question." -- attributed to John Tukey.


Specific capacity, a standardized measure of yields of water from wells, was measured in hundreds of wells across the Appalachian region of the US in a USGS report of the late 1980s. Is there a difference in specific capacity among the four major rock types? The figure below connects each group’s mean with a straight line.

TT12-fig1

The wrong question: assuming each group’s data follows a normal distribution, do the mean concentrations differ among the groups? This can be answered with Analysis of Variance, and the p-value that is produced every time ANOVA is run equals 0.06. Conclusion: there is not sufficient evidence to say that the group means differ. However, we know that at least three of these groups do not follow a normal distribution, so there is likely a loss of power (p-values too high). But you get the same (incorrect) p-value each time the test is run.

The right question: do the mean concentrations differ? (see tip #5 on whether the mean is the best measure of center, but we will use it here). To answer this regardless of the shapes of data within groups, run a permutation test. Permutation tests scramble the group assignments for data, representing situations equally likely when the null hypothesis of no group difference is true. Between 1000 and 10,000 scrambles are usually made; differences in means and the resulting F statistics are calculated. Below is a plot of 999 F statistics as a histogram representing the null hypothesis. The one result from our data is the dashed line. The proportion of scrambled results that equal or exceed the dashed line (proportional area of the bars at or above the line) is the p-value for the permutation test, here at 0.039. The means are declared different, as the p-value is below 0.05. No assumption of data shape was required. In several trials we have obtained F-values at or above 2.512 in 3.6% to 4.0% of the cases. We’ve sacrificed exactness in order to obtain correctness. The means do differ, and the permutation test, unlike ANOVA, is able to see that.

TTT12 HistPossibleFvalues