>>Return to Tell Me About Statistics!

Among data belonging to a group, data that are extremely large (or small) compared to others is called an “outlier.” Abnormal values and outliers caused by measurement errors, recording errors, and so on, are conceptually different; however, in practice, they may be indistinguishable.

There are two methods to find outliers :

・When the distribution is not normal/when you do not know: apply a boxplot

・When the distribution is normal: apply the Smirnoff-Grubbs test

How to find outliers using a boxplot

As shown in the figure, a boxplot of seven summaries was created by adding the upper and lower inner boundary points.

[Figure]

The data that fall outside the range of the upper and lower inner boundary points are considered outliers.

How to find the upper and lower inner boundary points

(1) Calculate the upper and lower measuring points.

Interquartile range = 3rd quartile – 1st quartile

Upper point = 3rd quartile + interquartile range x 1.5

Lower point = 1st quartile – interquartile range x 1.5

(2) The upper inner boundary point is determined as follows:

Is there a maximum value within the range between the upper and lower points?

・Yes → The upper inner boundary point is the maximum value.

・No → The upper inner boundary point is the upper point.

(3) The lower inner boundary point is determined as follows:

Is there a minimum value between the upper and lower points?

・Yes → The lower inner boundary point is the minimum value.

・No → The lower inner boundary point is the lower point.

*It is not easy to distinguish between outliers and abnormal values; therefore, it is necessary to carefully investigate the circumstances and causes of extreme values.

Using a boxplot makes it easier to understand the upper and lower inner boundary points. Do not think that it is too hard; simply ask “Are there minimum and maximum values within the range of the upper and lower points?”

Points to note regarding outliers

An outlier is an extremely large or small value in data belonging to a population. However, abnormal values refer to outliers with known causes, such as measurement or recording errors.

Even if extreme values exist in the data, they are not necessarily outliers.

>>Return to Tell Me About Statistics!

Categories:

Tags:

Comments are closed