In research, statistical significance is a measure of the probability of the null hypothesis being true compared to the acceptable level of uncertainty regarding the true answer. If we break apart a study design, we can better understand statistical significance.[1][2][3][4][5][6][7]
When creating a study, the researcher has to start with a hypothesis, that is they must have some idea of what they think the outcome may be. We will use the example of a new medication to lower blood pressure. The researcher hypothesizes that the new medication lowers systolic blood pressure by at least 10 mmHg compared to not taking the new medication. The hypothesis can be then stated, "Taking the new medication will lower systolic blood pressure by at least 10 mmHg compared to not taking the medication." In science, researchers can never prove any statement as there are infinite alternatives as to why the outcome may have occurred. They can only try to disprove a specific hypothesis. The researcher must then formulate a question they can disprove while coming to their conclusion that the new medication lowers systolic blood pressure. The hypothesis, to be disproven, is the null hypothesis and typically the inverse statement of the hypothesis. Thus, the null hypothesis for our researcher would be "Taking the new medication will not lower systolic blood pressure by at least 10 mmHg compared to not taking the new medication." The researcher now has the null hypothesis for the research and must next specify the significance level or level of acceptable uncertainty.
Even when disproving a hypothesis the researcher will not be 100% certain of the outcome. The researcher must then settle for some level of confidence or the significance level for which they do want to be correct. The significance level is given the Greek letter alpha and specified as the probability the researcher is willing to be incorrect. Our researcher wants to be correct about their outcome 95% of the time, or the researcher is willing to be incorrect 5% of the time. Probabilities are stated as decimals with 1.0 being completely positive (100%) and 0 being completely negative (0%). Thus, the researcher who wants to be 95% sure about the outcome of their study is willing to be wrong 5% of the time about the study result. The alpha is the decimal expression of how much they are willing to be wrong. For the current example, the alpha is 0.05. We now have the level of uncertainty the researcher is willing to accept (alpha or significance level) of 0.05 or 5% chance they are not correct about the outcome of the study.
Now the researcher can perform their research. In the example, the researcher would give some individuals the new medication and other individuals, no medication. The researcher then looks to see how the blood pressure changes after receiving the new medication and performs a statistical analysis of the results to obtain a p-value (probability value). There are numerous different tests used in research which can provide a p-value. It is imperative to use the correct statistical analysis tool when calculating the p-value. If the researchers use the wrong test, the p-value will not be accurate, and this result can mislead the researcher. The p-value is best described as the probability that the null hypothesis is true given the researcher's current set of data. In our example, the researcher found blood pressures did tend to decrease after taking the new medication. The researcher then used the help of their statistician to perform the correct analysis and arrives at a p-value of 0.02 for the decrease in blood pressure for those taking the new medication versus those not taking the new medication. Our researcher now has the three required pieces of information to look at statistical significance: the null hypothesis, the significance level, and the p-value.
The researcher can finally assess the statistical significance regarding the new medication. A study result is stated to be statistically significant if the p-value of the data analysis is less than the prespecified alpha (significance level). In our example, the p-value is 0.02 which is less than the pre-specified alpha of 0.05, so the researcher concludes there is statistical significance for the study.
What does this mean? The p-value of 0.02 implies that there is a 2% chance of the null hypothesis being correct, true, or explained by the current set of data. Remember the null hypothesis states that there is no significant change in blood pressure if the patient is or is not taking the new medication. Thus in this example, there is only a 2% chance the null hypothesis is correct based on the obtained data. The researcher pre-specified an alpha of 0.05 implying they wanted the chance of the null hypothesis to be less than 5% before rejecting the null hypothesis. As the p-value is 0.02 and less than the alpha of 0.05 the researcher rejects the null hypothesis because there is statistical significance. By rejecting the null hypothesis, the researcher accepts the alternative hypothesis. The researcher rejects the idea that there is no difference in systolic blood pressure with the new medication and accepts the alternative that there is a difference of at least 10 mmHg in systolic blood pressure when taking the new medication
If the researcher had prespecified an alpha of 0.01, implying they wanted to be 99% sure the new medication lowered the blood pressure by at least 10 mmHg, then the p-value of 0.02 would be greater than the prespecified alpha of 0.01. The researcher would conclude the study did not reach statistical significance as the p-value is equal to or greater than the pre-specified alpha. The research would then not be able to reject the null hypothesis.
A study is statistically significant if the p-value is less than the pre-specified alpha. Stated succinctly:
There are a few issues of concern when looking at statistical significance. These issues include choosing the alpha, statistical analysis method, and clinical significance.
Many current research articles specify an alpha of 0.05 for their significance level. It can not be stated strongly enough that there is nothing special, mathematical, or certain about picking an alpha of 0.05. Historically, the originators concluded that for many applications an alpha of 0.05 or a one in 20 chance of being incorrect was good enough. It is imperative for the researcher to consider what their confidence level should truly be for the research question they are asking. Many times a smaller alpha, say of 0.01, may be more appropriate.
When creating a study, the alpha, or confidence level, should be specified before any intervention or collection of data. It is easy for a researcher to "see what the data shows" then pick an alpha to give a statistically significant result. Such approaches compromise the data and results as the researcher is more likely to be lax on confidence level selection to obtain a result that looks statistically significant.
A second important issue is selecting the correct statistical analysis method. There are numerous methods for obtaining a p-value. The method chosen depends on the type of data, number of data points, and the question being asked. It is important to consider these questions during the study design so the statistical analysis can be correctly identified before the research. The statistical analysis method can then help determine how to collect the data correctly as well as the number of data points needed. If the wrong statistical method is used, the results may be meaningless as an incorrect p-value would be calculated.
There is a key distinction in statistical significance versus clinical significance. Statistical significance determines if there is mathematical significance to the analysis of the results. Clinical significance means the difference is important to the patient and the clinician. In our study, the statistical significance would be present as the p-value was less than the pre-specified alpha. The clinical significance would be the 10 mmHg drop in systolic blood pressure.[6]
Two studies can have a similar statistical significance but vastly differ in clinical significance. Let us use a hypothetical example of two new chemotherapy agents for treating cancer. Drug A was found to increase survival by at least ten years with a p-value of 0.01 and alpha for the study of 0.05. Thus, this study has statistical significance (p-value less than alpha) and clinical significance (increased survival by ten years). A second chemotherapy agent, Drug B, is found to increase the survival by at least 10 minutes with a p-value of 0.01 and alpha for the study of 0.05. The study for Drug B also found statistical significance (p-value less than alpha) but no clinical significance (a 10-minute increase in life expectancy is not clinically significant). In a separate study, those taking Drug A lived on average eight years after starting the medication versus living for only two more years on average for those not taking Drug A with a p-value of 0.08 and alpha for this second study of Drug A of 0.05. In this second study of Drug A, there is no statistical significance (p-value greater to or equal alpha).
Even when a result is statistically significant, it may not be correct. A researcher may thoughtfully design a study with a confidence level of 95% (thus the alpha would be 0.05) and obtain a p-value of 0.04. As the p-value of 0.04 is less than the alpha, the study would be considered to have statistical significance. Based on the alpha of 0.05, the researcher is only 95% sure they are correct in their conclusion. There is thus a 5% chance that although the results are statistically significant the conclusion is nonetheless incorrect.
[1] | Hayat MJ, Understanding statistical significance. Nursing research. 2010 May-Jun; [PubMed PMID: 20445438] |
[2] | Mondal H,Mondal S, Statistical Significance is Prerequisite in Study. Journal of clinical and diagnostic research : JCDR. 2017 Sep; [PubMed PMID: 29207700] |
[3] | Heston TF,King JM, Predictive power of statistical significance. World journal of methodology. 2017 Dec 26; [PubMed PMID: 29354483] |
[4] | Nakagawa S,Cuthill IC, Effect size, confidence interval and statistical significance: a practical guide for biologists. Biological reviews of the Cambridge Philosophical Society. 2007 Nov; [PubMed PMID: 17944619] |
[5] | Haig BD, Tests of Statistical Significance Made Sound. Educational and psychological measurement. 2017 Jun; [PubMed PMID: 29795925] |
[6] | Jiménez-Paneque R, The questioned p value: clinical, practical and statistical significance. Medwave. 2016 Sep 9; [PubMed PMID: 27636600] |
[7] | Mariani AW,Pêgo-Fernandes PM, Statistical significance and clinical significance. Sao Paulo medical journal = Revista paulista de medicina. 2014; [PubMed PMID: 24714985] |