McNemar And Mann-Whitney U Tests

Joshua Henrina Sundjaja; Rijen Shrestha; Kewal Krishan

McNemar And Mann-Whitney U Tests

Author: Joshua Henrina Sundjaja Author: Rijen Shrestha Editor: Kewal Krishan Updated: 7/17/2023 8:48:00 PM

Definition/Introduction

All good research is based on a meticulous and well-designed question in the form of a hypothesis. To test this hypothesis, one must conduct an experiment with strict guidelines to obtain robust results. The results are then tested using statistics to examine its significance and conclude if a new treatment/ diagnostic modalities/biomarker is a better alternative to prevalent practice. Thus, statistical tests are an important component of research, especially in the fields of medicine.

Historically, statistical testing has been a grueling and labor-intensive process. Thanks to modernization and the use of computers, statistical analysis can now be accomplished through many commercially available programs, such as the Statistical Program for Social Sciences (SPSS) or Software for Statistics and Data Science (STATA).

Conventionally, statistical tests divide into two major groups, parametric and non-parametric. The prerequisite of using a parametric analysis is that the data tested assumes a normal (Gaussian) distribution. If the data is not in a normal distribution, non-parametric tests are used. There are many non-parametric tests analogous to parametric tests in continuous variables, namely Mann-Whitney U test and independent t-test, Wilcoxon signed-rank test, and paired t-test, Kruskal Wali's test and Analysis of Variance (ANOVA), and Spearman rank correlation coefficient and Pearson product-moment coefficient.[1]

McNemar test

For nominal variables, in the form of a 2 x 2 table, three types of statistical tests can be used. The first one is the Fisher's exact test. The preconditions for its use are binary data and unpaired samples. The second one is the McNemar test, which requires binary data as in Fisher's exact, albeit with paired samples. The third one is the Chi-squared test, requiring a sample size of more than 60 subjects, with more than five counts in each cell. The Chi-squared test can also be useful for a contingency table of more than 2 x 2, i.e., 3 x 3, 4 x 4, and so on.[2]

The McNemar test is a non-parametric test used to analyze paired nominal data. It is a test on a 2 x 2 contingency table and checks the marginal homogeneity of two dichotomous variables. The test requires one nominal variable with two categories (dichotomous) and one independent variable with two dependent groups. Also, the two groups in the dependent variable must be mutually exclusive, i.e., cannot be in more than one group. The minimal sample size required for the McNemar test is at least ten discordant pairs. The formula for calculating the Chi-squared value for McNemar test appears in Image 1, where b is the false positive count, and c is the false negative count.

If the Chi-squared value is significant, the null hypothesis is rejected, meaning there is a substantial difference in the marginal proportions of the tests, i.e., the newer treatment/ diagnostic modalities/biomarker is a better alternative to prevalent practice.

It should be noted that if the sum of discordant pairs (b+c) is small (<25), even if the total sample size is large, the statistical power of the McNemar test is low. Thus, in research studies with small sample size, and the sum of discordant pairs is less than 25, the exact binomial test can be used. Alternatively, Edwards's continuity correction is another option.[3] Nevertheless, the use of an exact test in studies with few subjects will produce unnecessary large p values with poor power.

Therefore, others developed a more precise approach to deal with this situation. The McNemar mid-p test considerably improves the statistical significance without violating the nominal level. Furthermore, if small but frequent violations at the nominal level are acceptable, then the McNemar asymptotic test, is the most powerful test that for this purpose.[4]

Mann Whitney U test

Mann Whitney U test or Wilcoxon Rank-Sum test, on the other hand, is an analog of the parametric Student's t-test. It compares the means between two independent groups with the assumption that the data is not in a normal distribution. Therefore, it is useful for numerical/continuous variables. For example, if researchers want to compare two different groups' age or height (continuous variables), in a study with non-normally distributed data, then the Mann Whitney U test can be used.

Issues of Concern

Register For Free And Read The Full Article

Get the answers you need instantly with the StatPearls Clinical Decision Support tool. StatPearls spent the last decade developing the largest and most updated Point-of Care resource ever developed. Earn CME/CE by searching and reading articles.

Search engine and full access to all medical articles
10 free questions in your specialty
Free CME/CE Activities

Free daily question in your email
Save favorite articles to your dashboard
Emails offering discounts

Learn more about a Subscription to StatPearls Point-of-Care

Issues of Concern

There are several issues of concern regarding the use of the McNemar test and the Mann-Whitney U test, explained as follows:

1. McNemar test compares paired categorical data. However, it can not be used to measure an agreement because the McNemar test compares the overall proportion. For example, if a researcher is comparing the test results of subjects examined by two different persons, and the proportion of subjects who pass the test between these populations are the same, this cannot be concluded as evidence of an agreement.[5]

2. Mann-Whitney U test is a common test for comparing the median between two non normally distributed groups. However, researchers often forget the assumption that the data is derived from independent random samples from two distinct populations but with the same shape (distribution). Thus, when conducting this test, aside from reporting the p-value, the spread, and the shape of the data should be described, as it may relate to clinically significant and relevant findings.[6] Also, the signs of skewness and variance of heterogeneity need investigation, and accordingly, if these factors existed, the Welch U test should be used.[7]

Clinical Significance

Basic statistical knowledge is imperative for every researcher working in the Life sciences. Research findings require examination using fundamentally correct statistical analysis to maintain the external validity of studies, which is important if we want to extrapolate our results to the general population. Unfortunately, statistical errors are not uncommon. Based on a study that assesses statistical errors and methodological pitfalls in dissertations needed for a Medical Doctorate (MD) degree at the National Cancer Institute, Cairo, statistical tests were appropriate in only 13 out of a total of 62 studies (24.5%).[8] Therefore, the aid of a biostatistician or an expert in public health should be available to assist researchers/students.

Nursing, Allied Health, and Interprofessional Team Interventions

Research findings should have excellent external validity, i.e., where the results extrapolate to the general population. Equally important is the robustness of the methods used, especially statistical analysis. Biostatistics should be an integral part of the curriculum at all levels of the university. The assistance of a biostatistician or a public health expert can be instrumental in choosing the appropriate design for the study, the analytical tools to be used, and the usefulness of the results. The inclusion of a statistician in the research team also develops a multi-disciplinary approach and can lead to better outcomes for the research.

Media

(Click Image to Enlarge)

Formula for calculating Chi-squared for McNemar Test Contributed from the Public Domain

References

[1]

Winters R, Winters A, Amedee RG. Statistics: a brief overview. Ochsner journal. 2010 Fall:10(3):213-6 [PubMed PMID: 21603381]

Level 3 (low-level) evidence

[2]

du Prel JB, Röhrig B, Hommel G, Blettner M. Choosing statistical tests: part 12 of a series on evaluation of scientific publications. Deutsches Arzteblatt international. 2010 May:107(19):343-8. doi: 10.3238/arztebl.2010.0343. Epub 2010 May 14 [PubMed PMID: 20532129]

[3]

Hazra A, Gogtay N. Biostatistics Series Module 4: Comparing Groups - Categorical Variables. Indian journal of dermatology. 2016 Jul-Aug:61(4):385-92. doi: 10.4103/0019-5154.185700. Epub [PubMed PMID: 27512183]

[4]

Fagerland MW, Lydersen S, Laake P. The McNemar test for binary matched-pairs data: mid-p and asymptotic are better than exact conditional. BMC medical research methodology. 2013 Jul 13:13():91. doi: 10.1186/1471-2288-13-91. Epub 2013 Jul 13 [PubMed PMID: 23848987]

[5]

Ranganathan P, Pramesh CS, Aggarwal R. Common pitfalls in statistical analysis: Measures of agreement. Perspectives in clinical research. 2017 Oct-Dec:8(4):187-191. doi: 10.4103/picr.PICR_123_17. Epub [PubMed PMID: 29109937]

Level 3 (low-level) evidence

[6]

Hart A. Mann-Whitney test is not just a test of medians: differences in spread can be important. BMJ (Clinical research ed.). 2001 Aug 18:323(7309):391-3 [PubMed PMID: 11509435]

[7]

Fagerland MW, Sandvik L. The Wilcoxon-Mann-Whitney test under scrutiny. Statistics in medicine. 2009 May 1:28(10):1487-97. doi: 10.1002/sim.3561. Epub [PubMed PMID: 19247980]

[8]

Allam RM, Noaman MK, Moneer MM, Elattar IA. Assessment of Statistical Methodologies and Pitfalls of Dissertations Carried Out at National Cancer Institute, Cairo University. Asian Pacific journal of cancer prevention : APJCP. 2017 Jan 1:18(1):231-237 [PubMed PMID: 28240524]