T Test

Raoul Wadhwa; Raghavendra Marappa-Ganeshan

Definition/Introduction

William Sealy Gosset first described the t-test in 1908, when he published his article under the pseudonym 'student' while working for a brewery.[1] In simple terms, a Student's t-test is a ratio that quantifies how significant the difference is between the 'means' of 2 groups while considering their variance or distribution.

Issues of Concern

Selecting appropriate statistical tests is a critical step in conducting research.[2] Therefore, there are 3 forms of Student’s t-test about which physicians, particularly physician-scientists, need to be aware: (1) 1-sample t-test, (2) 2-sample t-test, and (3) 2-sample paired t-test. The 1-sample t-test evaluates a single list of numbers to test the hypothesis that a statistic of that set is equal to a chosen value, for instance, to test the hypothesis that the mean of the set of numbers is equal to zero. For example, consider the following question: what is the average serum sodium concentration in adults? Currently, 140 mEq/L is an approximate center of a reference range of 135 to 145 mEq/L; thus, the null hypothesis is that the average serum sodium concentration in adults equals 140 mEq/L.

If you believe these numbers are wrong (alternate hypothesis) or want to test the original hypothesis, you could collect blood from a set of subjects, measure the sodium concentration in each sample, and then take the mean of this set. If the mean is 140.1 mEq/L, you probably do not have convincing evidence that the above numbers are faulty (since 140 and 140.1 are fairly close). Thus, you would fail to reject the null hypothesis. However, if your sample has a mean of 70 mEq/L, this could be preliminary evidence (assuming rigorous methodology) and could end up rejecting the null hypothesis. The decision-making process would be trickier if the sample's mean were 134 or 150 mEq/L. The t-test can reduce subjective influence when testing a null hypothesis. Before testing a hypothesis, researchers should choose the alpha and beta values of the test. Loosely, the alpha parameter determines the threshold for false-positive results (eg, if the mean serum sodium concentration is 140 mEq/L, but the t-test rejects the original hypothesis in favor of your new hypothesis). The beta parameter determines the threshold for false-negative results (eg, if the true mean serum sodium concentration is 200 mEq/L, but the t-test fails to reject the old hypothesis). Methods of alpha and beta selection are outside this topic's scope.

While the 1-sample t-test allows you to test the statistic of a single set of numbers against a specific numeric value, the 2-sample t-test allows testing the values of a statistic between 2 groups. In this case, a research question could be: do children and adults have the same mean serum sodium concentration? Testing this hypothesis would require sampling 2 groups, a group of adults and a group of children, and comparing the mean serum sodium concentrations between these 2 groups in a manner analogous to the 1-sample t-test described above. The paired t-test is used in scenarios where measurements from the 2 groups have a link to one another. In the example above concerning the mean serum sodium concentration of children and adults, the implicit assumption was that all the measurements would be completed at 1 point in time in a set of children and a distinct set of adults. However, it would also be possible to measure serum sodium concentrations in a set of children, wait a few years until they are adults, and then measure the serum sodium concentrations again. Here, each adult's sodium concentration corresponds to exactly 1 child's sodium concentration. A paired 2-sample t-test can be used to capture the dependence of measurements between the 2 groups.

These variations of the student's t-test use observed or collected data to calculate a test statistic, which can then be used to calculate a p-value. Often misinterpreted, the p-value is equal to the probability of collecting data that is at least as extreme as the observed data in the study, assuming that the null hypothesis is true.[3] Examples best illustrate this concept, as in the article's questions. Often, a threshold value is set before the study (equal to the alpha mentioned above); if the resulting p-value is below the preset threshold, there is sufficient evidence to reject the null hypothesis.

In the above scenarios, before using any form of the t-test, one must ensure that the assumptions for the test have been met. This article does not list or explain these assumptions in detail. Please follow the guidance of a trained statistician when designing research studies and conducting data analysis.

Clinical Significance

Given the rate of research progress, disease management (medical or surgical) continuously evolves. To follow the framework of evidence-based medicine, physicians must be able to read and critically evaluate primary literature.[4][5] The ability to do this successfully requires at least a basic foundation of knowledge in statistics, including common biases (eg, nonresponse bias), standard study designs (eg, randomized controlled trials), and common statistical pitfalls researchers face (eg, statistically significant results that are not clinically significant).[6][7] Understanding a student’s t-test is a start for clinicians gaining this necessary foundation of knowledge.

Details

References

[1]

Drummond GB, Tom BD. Statistics, probability, significance, likelihood: words mean what we define them to mean. The Journal of physiology. 2011 Aug 15:589(Pt 16):3901-4. doi: 10.1113/jphysiol.2011.215103. Epub [PubMed PMID: 21844004]

[2]

Beath A, Jones MP. Guided by the research design: choosing the right statistical test. The Medical journal of Australia. 2018 Mar 5:208(4):163-165 [PubMed PMID: 29490219]

[3]

Andrade C. The P Value and Statistical Significance: Misunderstandings, Explanations, Challenges, and Alternatives. Indian journal of psychological medicine. 2019 May-Jun:41(3):210-215. doi: 10.4103/IJPSYM.IJPSYM_193_19. Epub [PubMed PMID: 31142921]

Level 3 (low-level) evidence

[4]

Ioannidis JP, Why Most Clinical Research Is Not Useful. PLoS medicine. 2016 Jun; [PubMed PMID: 27328301]

[5]

Ioannidis JP. Why most published research findings are false. PLoS medicine. 2005 Aug:2(8):e124 [PubMed PMID: 16060722]

[6]

Amrhein V, Greenland S, McShane B. Scientists rise up against statistical significance. Nature. 2019 Mar:567(7748):305-307. doi: 10.1038/d41586-019-00857-9. Epub [PubMed PMID: 30894741]

[7]

Lang T. Twenty statistical errors even you can find in biomedical research articles. Croatian medical journal. 2004 Aug:45(4):361-70 [PubMed PMID: 15311405]