A basic understanding of statistical concepts is necessary to effectively evaluate existing literature. Statistical results do not, however, allow one to determine the clinical applicability of published findings. Statistical results can be used to make inferences about the probability of an event among a given population. Careful interpretation by the clinician is required to determine the value of the data as it applies to an individual patient or group of patients.
Good research studies will provide a clear, testable hypothesis, or prediction, about what they expect to find in the relationships being tested. The hypothesis will be grounded in the empirical literature, based on clinical observations or expertise, and should be innovative in its tests of a novel relationship or confirmation of a prior study. There are at minimum two hypotheses in any study: (1) the null hypothesis assumes there is no difference or that there is no effect, and (2) the experimental or alternative hypothesis predicts an event or outcome will occur. Often the null hypothesis is not stated or is assumed. Hypotheses are tested by examining relationships between independent variables, or those thought to have some effect, and dependent variables, or those thought to be moved or affected by the independent variable. These also are called predictor and outcome variables, respectively.
Statistics are used to test a study’s alternative or experimental hypothesis. Statistical models are fitted based on the nature, type, and other characteristics of the dataset. Data typically involves levels of measurement, and these determine the type of statistical models that can be applied to test a hypothesis. Nominal data are those variables containing two or more categories without underlying order or value. Examples of nominal data include indicators of group membership, such as male or female. Ordinal data is nominal data that includes an order or rank but has undefined spacing between groups or levels, such as faculty ranking, or educational level. Interval data is ordinal data with clearly defined spacing between the intervals and no absolute zero points. An example of interval data is the temperature scale, as the magnitude of the difference between intervals is consistent and measurable (one degree). Ratio data are interval data that include an absolute zero such as the amount of student loan debt. Nominal and ordinal data are categorical, where entities are divided into distinct groups, whereas, interval and ratio data are considered continuous such that each observation gets a distinct score.
It is up to the researcher to appropriately apply statistical models when testing hypotheses. Several approaches can be used to analyze the same dataset, and how this is accomplished depends heavily on the nature of the wording in a researcher’s hypothesis. There exist a variety of statistical software packages, some available for free while others charge annual license fees, that can be used to analyze data. Nearly all packages require the user to have a basic understanding of the types of data and appropriate application of statistical models for each type. More sophisticated packages require the user to use the program’s proprietary coding language to perform hypothesis tests. These can require a good amount of time to learn, and errors can easily slip past the untrained eye.
It is strongly recommended that unfamiliar users consult with a statistical analyst when designing and running statistical models. Biostatistician consultations can occur at any time during a study, but earlier consultations are wise to prevent the introduction of accidental bias into study data and to help ensure accuracy and collection methods that will be adequate to allow for tests of hypotheses.
Statistical Significance
If the probability of obtaining a test statistic value by chance (p-value) is less than .05, then the experimental hypothesis is accepted as true. Another way of to think about p-values is the probability that the null hypothesis is true, which for a cutoff of p is less than .05 would mean there is a less than 5% chance that the difference observed is not a true difference. However, when interpreting statistical results, the p-value alone is not enough. Significant does not always equate to important. Very small, potentially unimportant effects can turn out to be statistically significant.
To evaluate the clinical relevance or importance of a significant result, one must be certain to consider the size of the effect. Effect measures are standardized to allow application across different scales of measurement. The following are some of the more common ways effect sizes can be estimated:
One common measure of effect is the correlation coefficient, r. In general, small effects, or r=.10, indicate that the effect explains 1% of the total variance. Likewise, r=.30 is considered a medium effect, and r=.50 is considered large, explaining 25% of the variance and holding greater clinical relevance. The square of a correlational r value indicates the proportion of variance explained by the relationship tested. Similarly, confidence intervals offer a way to determine the clinical strength or magnitude of observed effects. A 95% confidence interval indicates a range of plausible values around another parameter (e.g., mean or odds ratio) where there is a 95% chance that the data within that interval truly captures the value observed in the population being studied. Confidence intervals also provide information about accuracy, as smaller intervals suggest greater precision; whereas, larger intervals may suggest a high level of variability. It has been recommended that, at a minimum, studies should report estimates of effect and confidence intervals to allow for appropriate interpretation of their results.
It is also important to note that although a study may be designed and statistically tested in a way that suggests inference and causation could be concluded (e.g., longitudinal observations of change over time), only studies that employ a randomized and/or controlled design will permit causative declarations to be made from their results.