Top Twelve Tip #4
The hardest thing for a human being to do

Hypothesis tests have come under fire in recent years. Recommendations by a few journals have been that the result of a test, called the "p-value”, should not be included in its articles. This is a bit like burning books – knowledge is dangerous to some people. What a p-value does is to encourage a human to do the hardest thing possible for us – make a decision.

Hypothesis tests all work in a similar fashion. It shouldn’t be as confusing as it often seems. The “null hypothesis” is the ‘no signal’ situation: no difference between groups, no correlation, no trend..... In the figure below, it is represented by the blue histogram of possible test results. These are the results possible when there is no difference in the mean concentration between four groups. The vertical dashed line is the test statistic from our data set of concentrations at four specific sites. Note that our test statistic of 2.360 is at the upper end of the histogram values. It is possible, but not likely, that our test statistic could result from a situation where there actually is no difference in the mean concentrations at the four sites. Possible, but not likely. How likely? That is the p-value. As the p-value gets smaller, the ‘no signal’ situation is less and less believable. In this case the p-value is 0.042, a 4.2% chance.

Based on the p-value from the data, the scientist decides that 'no difference' is sufficiently unlikely, and rejects it -- the data indicate a difference is present. The groups likely have different mean concentrations. How small is a small enough p-value to declare there is a difference? When p is less than "alpha", the probability of falsely declaring there is a difference, a signal, when in fact there is none in the field. Alpha is the rate of false positives. Historically alpha has been set at 0.05, 5 percent. Alpha doesn’t have to be at 0.05, it is only tradition, and can be reset by the scientist or by regulation. It must be set by someone before the data are collected. Otherwise, we’ll never make that decision. The p-value is the summary of the data's signal strength (smaller is stronger). Alpha is the tool to enable a human to do what is hardest for us -- make a decision!