Definition/Introduction
The median is the middle value in a set of numbers and is equivalent to the 50th percentile. In other words, the median is the midpoint of a set of numbers, with half of the values less than the median and half above it.[1][2][3][4]
A short example will help clarify this point. In the set of 7 numbers (8, 6, 9, 5, 8, 23, 4), the median is found by first sorting the numbers from lowest to highest (in ascending order). The sorted set becomes (4, 5, 6, 8, 8, 9, 23). The median is the middle number. As there are 7 numbers in the set, the fourth value, '8', is the median. Notably, the median differs from the mean.
- The mean is the "average" of the values in the set, calculated as follows: (4+5+6+8+8+9+23)/7 = 63/7 = 9
What should be done if the set contains an even number of values? In this case, the median is the average of the 2 middle numbers. To illustrate this principle, the second set of numbers (7, 3, 10, 2, 9, 2, 1, 4) should be considered. After arranging the set of numbers in ascending order, we get (1, 2, 2, 3, 4, 7, 9, 10). As the second set of numbers has 8 numbers, the 2 middle values are 3 and 4.
- The median is the average of these 2 numbers, which is calculated as follows: (3+4)/2 = 3.5
Notably, the median of 3.5 is different from the mean, or average, of the set of numbers, which is 4.75.
- This is calculated as follows: (1+2+2+3+4+7+9+10)/8 = 38/8 = 4.75
Issues of Concern
A set of numbers containing a total of n members is considered. After sorting the set (in either ascending or descending order), the median can be calculated as follows:[1][2]
- For sets with an odd number of members, where n is odd:
- Median = Value of the ([n+1]/2)th item in the sorted set.
- For sets with an even number of members, where n is even:
- Median = (Value of the [n/2]th item + value of the [n/2 + 1]th item)/2 in the sorted set.
The simple sets from above can be used to work through examples for both cases.
- For a set where n is odd, we can consider the first set (8, 6, 9, 5, 8, 23, 4).
- After sorting, the set becomes (4, 5, 6, 8, 8, 9, 23).
- In this set, n = 7.
- The median is calculated as follows:
- Value of the ([n+1]/2)th item in the sorted set = ([7+1]/2)th item in sorted set = ([8]/2)th item in sorted set = 4.
- Thus, the median is the value of the 4th item in the sorted set, which is 8.
- For a set where n is even, we can consider the second set (7, 3, 10, 2, 9, 2, 1, 4).
- After sorting, the set becomes (1, 2, 2, 3, 4, 7, 9, 10).
- In this set, n = 8.
- The median is calculated as follows:
- Value of the ([n/2]th item + [n/2 + 1]th item)/2 in the sorted set = ([8/2]th item + [8/2 + 1]th item)/2 in the sorted set = Value of the (4th item + 5th item)/2 in the sorted set = Value of (3 + 4)/2 = Value of (7)/2.
- Thus, the median is = 3.5.
The most common mistake is failing to distinguish between the mean, median, and mode. The mean, or "average," of a set of numbers is calculated by adding all the numbers and then dividing by the number of items in the set.[1]
The median, which is the focus of this discussion, is calculated as described in the function section above.
The mode is the most frequent value in a set or the value that appears the most often. In the dataset (4, 5, 6, 8, 8, 9, 23), the number 8 occurs twice, which is more frequent than any other number, making 8 the mode. A dataset can have more than 1 mode or no mode at all. The dataset (2, 3, 3, 5, 5, 7, 7) has 3 modes—3, 5, and 7. Each of these numbers appears twice, which is more frequent than 2, which occurs only once. This dataset is considered trimodal.
If a dataset has 2 modes, it is called bimodal. If a dataset has more than 3 modes, it is considered multimodal. For instance, in the dataset (6, 7, 8, 9), no number appears more frequently than the others, so there is no mode.
Median, mode, and mean are similar when the dataset follows a relatively normal (bell-curve) distribution but can differ significantly in other types of distributions (please refer to the "Clinical Significance" section for examples).
Clinical Significance
The median is frequently reported in the scientific literature for a good reason. In science and medicine, researchers often focus on the time until a specific event occurs, such as the decay time of a radioisotope or the survival time of a patient with a particular cancer. Starting with 11 radioisotopes with decay times of (1, 1, 1, 2, 4, 5, 5, 5, 6, 11, 41) seconds, one can calculate both the median and the mean within 1 minute. The median, which is the middle value of the set, is 5 seconds, while the mean is 7.45 seconds.
If the dataset (1, 1, 1, 2, 4, 5, 5, 5, 6, 11, 41) represents survival in years for patients with a specific cancer, the median might be more useful. The median can be determined once the sixth patient has died, as this indicates that half of the patients have died and half are still alive. In contrast, calculating the mean requires knowing the total survival time for all patients, which means waiting until the 41-year mark for the final patient to die. Waiting for the event to occur for all patients to calculate the mean becomes a lengthy process, making the median a more practical and timely measure.
A second notable characteristic of the median over the mean is its resistance to outliers or extreme values. For instance, in the dataset (1, 1, 1, 2, 4, 5, 5, 5, 6, 11, 41), the value 41 significantly skews the mean, which is 7.45. By removing this outlier, the revised dataset (1, 1, 1, 2, 4, 5, 5, 5, 6, 11) yields a median of 4.5 and a mean of 4.1. The median decreased by only 10% (from 5 to 4.5), whereas the mean dropped by 45% (from 7.45 to 4.1) after removing the outlier. Thus, the median may be a better indicator of the realistic middle of the data center of the data compared to the mean in skewed datasets. Skewed data includes any dataset with outliers or values at either the extremely high or extremely low end of the dataset.
Considering the dataset (1, 1, 1, 2, 4, 5, 5, 5, 6, 11, 41) for survival years of cancer patients, it may be more realistic to report a median survival of 5 years, indicating that half of the patients did not survive beyond 5 years. In contrast, reporting a mean survival of 7.45 years suggests that, on average, the 11 patients lived 7.45 years, while only 2 patients lived longer than this mean. Notably, only 2 patients lived longer than this mean. While the 41-year survival might be an appealing outlier, it may not provide a realistic picture of the overall data.
Nursing, Allied Health, and Interprofessional Team Interventions
Although the median provides a measure of the middle of the data, it may not always be the most appropriate metric. In some cases, the mean may be a more desirable attribute, as statistical analysis with the mean can be more robust and simpler to compute.
In cases where neither the median nor the mean adequately represents the middle of the data, alternative measures may be needed. Both the median and mean are most effective when the data exhibits a single prominent peak (unimodal distribution). However, in datasets with multiple prominent peaks (such as bimodal, trimodal, or multimodal distributions), the median may not offer the most accurate representation.
Let us imagine a curve resembling 2 hills on either side of a valley, with a bump on the left side of the data for small values and a second bump on the right side for large values (a bimodal dataset). The median may be located in the valley if it represents the middle value. However, in this case, the median may not provide the most useful information, such as the locations of the peaks and data clusters.