>>Return to Tell Me About Statistics!
In Vol.1, we explained that error bars are used to express the degree of error (error) in data, and that ± standard deviation, ± standard error (SE), percentile, and 95% confidence interval (CI) are used in different ways.
In this article, we will explain “percentile” in detail.
In what cases are percentiles used?
Standard deviation was used to indicate variation in the obtained data, standard error (SE) was used to indicate variation in estimates such as the mean, and CI was used to indicate the reliability of the mean.
However, some clinical laboratory data, such as urinary albumin and total cholesterol levels, may show outstandingly high values (outliers). Data with such outliers are not well fitted to a normal distribution, so they are presented as a range (interquartile range) from the 25th to the 75th percentile points rather than mean ± standard deviation or mean ± standard error (SE).
What is a percentile?
Percentile (M) represents the data located in the Mth percentile, counting from the smallest. The 50th percentile value (median) indicates data in the middle when arranged in ascending (or descending) order. If there were nine laboratory values, the 5th data point was the value in the 50th percentile.
【Median for an odd number of data】
Calculation Formula: (Number of data+1)×0.5
From Table 1,(9+1)×0.5=10×0.5=5
The fifth value, 26, is the median.
[Table 1] Median with odd numbers of data points
【Median for even number of data】
Calculation Formula: (Number of data+1)×0.5
From Table 2,(10+1)×0.5=11×0.5=
The 5.5th value, that is, 27, which is the average of the 5th (26) and 6th (28) values, was the median.
[Table 2] Median with an even number of data points
We continue with the formula for the 25th percentile value (lower quartile) in Table 3.
Calculation Formula: (Number of data+1)×0.25
(10+1)×0.25=11×0.25=2.75
The second data is 22.
The third data is 24.
The 2.75th data would be between 22 and 24.
The fractional rank is 0.75.
The second data+difference×fractional rank=22+2×0.75=23.5
So, the 25th percentile (lower quartile point) value is 23.5.
Using the same formula, the 75th percentile (upper quartile) was 33.5.
[Table 3] How to find the 25th percentile
Interquartile range
The interquartile range is the value of “upper quartile” – “lower quartile.”
Looking at the interquartile ranges of data A and B in Figure 1, we can see that data A has a wider interquartile range than data B.
The interquartile range measures the variability of the data, and like the standard deviation, the larger the value, the greater the variability.
[Figure1]Comparison of interquartile ranges of the two datasets
Example of box-and-whisker plot
For data with outliers, a “box-and-whisker plot” is often used to show the variability of the data, with the maximum, 75th percentile (upperquartile), 50th percentile (median), 25th percentile (lower quartile), and minimum values represented as “boxes” and “whiskers. (Figure 2)
Please check the paper carefully to see if the top and bottom of the “whiskers” are the maximum and minimum values or the 90th and 10th percentile points.
[Figure2] How to read a box-and-whisker plot
>>Return to Tell Me About Statistics!
Comments are closed