Image classification results in a raster file in which the individual raster elements are labelled by class. As image classification is based on samples of the classes, the actual quality of the classification result should be checked. This is usually done by a sampling approach in which a number of raster elements of the output are selected and both the classification result and the “true world class” are compared. Comparison is done by creating an error matrix from which different accuracy measures can be calculated. The true world class is preferably derived from field observations. Sometimes, sources for which higher accuracy can be assumed, such as aerial photos, are used as a reference.
Various sampling schemes have been proposed for selecting pixels to test. Choices to be made relate to the design of the sampling strategy, the number of samples required, and the area of the samples. Recommended sampling strategies in the context of land cover data are simple random sampling or stratified random sampling. The number of samples may be related to two factors in accuracy assessment: (1) the number of samples that must be taken in order to reject a data set as being inaccurate; or (2) the number of samples required to determine the true accuracy, within some error bounds, of a data set. Sampling theory is used to determine the number of samples required. The number of samples must be traded-off against the area covered by a sample unit. A sample unit can be a point but it could also be an area of some size; it can be a single raster element but may also include surrounding raster elements. Among other considerations, the “optimal” sample-area size depends on the heterogeneity of the class.
Table 1: The error matrix with derived errors and accuracy expressed as percentages. A, B, C and D refer to the reference classes; a, b, c and d refer to the classes in the classification result. Overall accuracy is 53%.
Once sampling for validation has been carried out and the data collected, an error matrix, also sometimes called a confusion matrix or an contingency matrix, can be established (Table 1). In the table, four classes (A, B, C, D) are listed. A total of 163 samples were collected. The table shows that, for example, 53 cases of A were found in the real world (‘reference’), while the classification result yielded 61 cases of a; in 35 cases they agree.
The first and most commonly cited measure of mapping accuracy is the overall accuracy, or proportion correctly classified (PCC). Overall accuracy is the number of correctly classified pixels (i.e. the sum of the diagonal cells in the error matrix) divided by the total number of pixels checked. In the table above, the overall accuracy is (35 + 11 + 38 + 2)∕163 = 53%. The overall accuracy yields one value for the result as a whole.
Most other measures derived from the error matrix are calculated per class. Error of omission refers to those sample points that are omitted in the interpretation result. Consider class A, for which 53 samples were taken. Eighteen out of the 53 samples were interpreted as b, c or d. This results in an error of omission of 18∕53 = 34%. Error of omission starts from the reference data and therefore relates to the columns in the error matrix. The error of commission starts from the interpretation result and refers to the rows in the error matrix. The error of commission refers to incorrectly classified samples. Consider class d: only two of the 21 samples (10%) are correctly labelled. Errors of commission and omission are also referred to as Type I and Type II errors, respectively.
Omission error is the corollary of producer accuracy, while user accuracy is the corollary of commission error. The user accuracy is the probability that a certain reference class has indeed actually been labelled as that class. Similarly, producer accuracy is the probability that a sampled point on the map is indeed that particular class.
Another widely used measure of map accuracy derived from the error matrix is the kappa or κ coefficient. Let xijdenote the element of the error matrix in row i and column j, r denote number of classes and N total sum of all elements of the error matrix. Then kappa coefficient is computed as:
where xi+ = ∑ j=1rxij and x+i = ∑ j=1rxji are the sums of all elements in row i and column i, respectively.
Kappa coefficient takes into account the fact that even assigning labels at random will result in a certain degree of accuracy. Kappa statistics, based on kappa coefficient, can be applied to test if two data sets, e.g. classification results, have different levels of accuracy. This type of testing is used to evaluate different RS data or methods for the generation of spatial data.