>>Return to Tell Me About Statistics!
For more information on p-values, see vol.7.
We have explained that “the smaller the p-value, the more accurate the conclusion that there is an effect (difference) in the population.” However, can we say that the smaller the p-value is, the more effective?
For example, assume that the new drug Y is effective in the population.
(A constant 0.05 is applied for the significance level.)
Then, if the new drug Y is judged to be effective in a comparison study with existing drug X with p<0.05, and new drug Z is judged to be effective with p<0.01, is it accurate to say that the new drug Z is more effective than new drug Y?
In this article, we will explain the difference between a p-value and an effect.
Hypothesis testing depends on the ability to reject the null hypothesis
The first thing to do in a statistical hypothesis test is to formulate a null and an alternative hypothesis.
Suppose we want to prove that the new drug is superior to the existing treatment in terms of efficacy. The test rejects the null hypothesis that the new drug’s effectiveness is equal to that of the existing drug.
If the null hypothesis is rejected, the alternative hypothesis is correct, but the problem is that the probability of the alternative hypothesis is not evaluated.
If the null hypothesis cannot be rejected, the “conclusion is withheld”.” In other words, the correct interpretation is that “there was no significant difference” or “we did not know if there was an effect”.” This is called “asymmetry of hypothesis testing.”
“No significant difference” cannot be interpreted as “there was no difference in efficacy between the existing drug and the new drug.” “No significant difference” = “high probability of false positives (alpha error)” = “no indication of a difference.”
Is a smaller p-value an indicator supporting effectiveness?
In conclusion, the smaller the p-value, the lesser the difference. In other words, we cannot conclude that there was a significant effect based on a smaller p-value.
To compare p-values, let us juxtapose the antipyretic effect in the table and consider the data.
[Table] Comparison of the antipyretic effect of each drug
New drug Y lowers body temperature the most compared to existing drug X and new drug Z, with a mean decrease in body temperature of 2.2.
Moreover, new drug Z also significantly lowers body temperature compared to existing drug X. The p-value obtained from the test of existing drug X and new drug Y was 0.041, and that of existing drug X and new drug Z was 0.009 (Figure).
If we were to select a drug based on a small p-value, the new drug Z would be more effective (better result).
In other words, selecting a drug based solely on the size of the p-value, will result in an inaccurate decision by selecting a drug with low efficacy.
[Figure] Comparison of mean decrease in body temperature between new and existing drugs
The p-value is a measure of the strength of confidence, not a measure of the size of the effect. A single p-value or statistical significance is not a measure of the magnitude of the resulting effect or importance.
In fact, if the new drug Y had also been administered to more subjects, the p-value might be even smaller than 0.041.
A very small value does not by itself prove anything; it is more important to have several well-designed follow-up tests with consistently the same results, even if the p-value is only about 5%.
Since the p-value is a guide, a rough statistical range may be indicated by a number marked with *, as in the following example.
* 0.01≦p<0.05
** 0.001≦p<0.01
*** p<0.001
Also, p≥0.05 is sometimes written as n.s. (not significant).
Thus, there are limitations to the information provided by p-values. When reading a clinical trial article, it is important to check the p-value and whether the difference in effect on the primary endpoint is clinically significant.
>>Return to Tell Me About Statistics!
Comments are closed