253 - Explain how the following techniques can be used to examine outliers: tabulation, histograms, box plots, correlation analysis, scatter plots, local statistics

Explain how the following techniques can be used to examine outliers: tabulation, histograms, box plots, correlation analysis, scatter plots, local statistics

Concepts

  • [AM7-6] Outliers
    An outlier is an unexpected value that differs significantly from other observations. Definition of an outlier is not absolute and the concept itself is precisely defined only by selection of appropriate criteria in concrete statistical observations. When considering outliers, it is important to determine whether the value of the outlier is incorrect data or it is otherwise outstanding, but correct data. If we consider outliers in the case when they base on sample surveys, another assessment is necessary. Namely, the assessment of whether an outlier is representative or not. The box plot is a useful graphical display for examining the outliers. Using median, lower and upper quartiles, extreme values are identified in the tails of the distribution. The value beyond inner fence on either side is considered a mild outlier. The value beyond an outer fence is considered an extreme outlier. Histograms also emphasize the existence of outliers. The histogram depends on how we design the classes, so we can get different histograms for the same data. Graphical and quantitative checks are obligatory if the histogram shows possible outliers. Outliers can also be examined by calculating the correlation between two datasets (Pearson correlation coefficient, Spearman rank correlation coefficient…). Scatter plots reveals a basic linear relationship with a pattern. An outliner is defined as a data point that deviates from other values. Outliers can also be examined by local outlier factor, which is based on a concept of a local density. Points with substantially lower density than their neighbours are considered as outliers.