1924 - Reclassify (group) a nominal attribute domain to fewer, broader classes

Reclassify (group) a nominal attribute domain to fewer, broader classes

Concepts

  • [AM14-3] Classification and transformation of attribute measurement levels
    Classification is a technique for purposely removing detail from an input data set in the hope of revealing important patterns (of spatial distribution). In the process, we produce an output data set, so that the input set can be left intact. This output set is produced by assigning a characteristic value to each element in the input set, which is usually a collection of spatial features that could be raster cells or points, lines or polygons. If the number of characteristic values in the output set is small in comparison to the size of the input set, we have classified the input set. The input data set may, itself, have been the result of a classification. In such cases we refer to the output data set as a reclassification. For example, we may have a soil map that shows different soil type units and we would like to show the suitability of units for a specific crop. In this case, it is better to assign to the soil units an attribute of suitability for the crop. Since different soil types may have the same crop suitability, a classification may merge soil units of different type into the same category of crop suitability. In classification of vector data, there are two possible results. In the first, the input features may become the output features in a new data layer, with an additional category assigned. In other words, nothing changes with respect to the spatial extents of the original features. Figure a of Examples illustrates this first type of output. A second type of output is obtained when adjacent features of the same category are merged into one bigger feature. Such a post-processing function is called spatial merging, aggregation or dissolving. An illustration of this second type is found in Figure b of Examples. Observe that this type of merging is only an option in vector data, as merging cells in an output raster on the basis of a classification makes little sense. Vector data classification can be performed on point sets, line sets or polygon sets; the optional merge phase only makes sense for lines and polygons. User-controlled classifications require a classification table or user interaction. GIS software can also perform automatic classification, in which a user only specifies the number of classes in the output data set. The system automatically determines the class break points. The two main techniques of determining break points being used are the equal interval technique and the equal frequency technique. Equal Interval Technique The minimum and maximum values vmin and vmax of the classification parameter are determined and the (constant) interval size for each category is calculated as (vmax - vmin) ∕ n, where n is the number of classes chosen by the user. This classification is useful in that it reveals the distribution pattern, as it determines the number of features in each category. Equal Frequency Technique This technique is also known as quantile classification. The objective is to create categories with roughly equal numbers of features per category. The total number of features is determined first, then, based on the required number of categories, the number of features per category is calculated. The class break points are then determined by counting off the features in order of classification parameter value.