Introduction
A basic understanding of statistical concepts is necessary to evaluate existing literature effectively. Statistical results do not, however, allow one to determine the clinical applicability of published findings. Statistical results can be used to make inferences about the probability of an event among a given population. Careful interpretation by the clinician is required to determine the value of the data as it applies to an individual or group of patients.[1] Good research studies provide a clear, testable hypothesis, or prediction, about what they expect to find in the relationships being tested.[2] The hypothesis is grounded in the empirical literature and based on clinical observations or expertise. It should be innovative in testing a novel relationship or confirming a prior study. There are at minimum 2 hypotheses in any study:
The null hypothesis assumes there is no difference or no effect, and (2) the experimental or alternative hypothesis predicts an event or outcome. Often, the null hypothesis is not stated or is assumed. Hypotheses are tested by examining relationships between independent variables, or those thought to have some effect, and dependent variables, or those thought to be moved or affected by the independent variable. These are also called predictor and outcome variables, respectively.
Statistics are used to test a study’s alternative or experimental hypothesis. Statistical models are fitted based on the dataset's nature, type, and other characteristics. Data typically involves measurement levels, which determine the type of statistical models that can be applied to test a hypothesis.[3] Nominal data are those variables containing 2 or more categories without underlying order or value. Examples of nominal data include indicators of group membership, such as male or female. Ordinal data is nominal data that includes an order or rank but has undefined spacing between groups or levels, such as faculty ranking or educational level. Interval data is ordinal data with clearly defined spacing between the intervals and no absolute zero points. An example of interval data is the temperature scale, as the magnitude of the difference between intervals is consistent and measurable (one degree). Ratio data are interval data that include an absolute zero, such as the amount of student loan debt. Nominal and ordinal data are categorical, where entities are divided into distinct groups, whereas interval and ratio data are considered continuous, giving each observation a distinct score.[4]
It is up to the researcher to appropriately apply statistical models when testing hypotheses. Several approaches can be used to analyze the same dataset, and how this is accomplished depends heavily on the nature of the wording in a researcher’s hypothesis.[5] Various statistical software packages exist, some available for free while others charge annual license fees that can be used to analyze data. Nearly all packages require the user to understand the types of data and the appropriate application of statistical models for each type. More sophisticated packages require the user to use the program’s proprietary coding language to perform hypothesis tests. These can require much time to learn, and errors can easily slip past the untrained eye. It is strongly recommended that unfamiliar users consult a statistical analyst when designing and running statistical models. Biostatistician consultations can occur at any time during a study, but earlier consultations are wise to prevent the introduction of accidental bias into study data and to help ensure accuracy and collection methods that are adequate to allow for tests of hypotheses.