[IP3-5] Image segmentation

The term image segmentation denotes the process of algorithmically grouping neighbouring pixels that are similar. What sounds rather straight forward, is in fact a great computational challenge, some even call it an ill-posed problem, because there is a high degree of ambiguity in this process. The two attributes in the general definition provided above, i.e. neighbouring and similar, evoke the principles of regionalisation as a fundamental concept in geography. Regionalisation is the bottom-up approach to congregate adjacent elements with the aim to form a larger unit. (Conversely, this could be understood in a top-down manner when subdividing a larger whole into smaller homogeneous units). This follows the general notion of hierarchical organisation according to general systems theory (GST). The organisation of a state in smaller administrative units is a good example for a hierarchical structure, the composition of the human body by organs, cells, etc. another. In image analysis such regions are commonly referred to image regions, originating from the concept of “photomorphic regions”, literally meaning regions formed on images – originally by human interpreter through manual delineation. Today, advanced pixel grouping algorithms aim to delineate homogenous regions in an image automatically. As those regions usually are assumed to match with real-world objects, it is often stated in literature that image segmentation generates image objects. Deriving some general heuristics on their properties (colour, size, shape, orientation, etc.) we can label these objects according to a given semantic scheme. The procedure of object delineation and classification using object features and relations is a fundamental principle in object-based image analysis (OBIA). Due to the effect of spatial autocorrelation (the tendency of neighbouring pixels to be similar irrespective of scale or geographical location), pixel grouping is ambiguous and by no means trivial, but not arbitrary either. Intuitively, image regions are those quasi-homogeneous areas that we perceive as landscape units on a specific scene (a lake, a forest patch, a single tree, a building, a residential area). According to hierarchy theory, we can assume that we find multiple scales within a single image even, according to the level of detail we are interested in. Whether or not a specific grouping of pixels is considered valid, e.g. because it corresponds to a real-world object, can hardly be answered unanimously, but rather needs to be judged by experts in the respective application domain. That is why often in literature we find the term ‘meaningful objects’. Image segmentation is as a sub-field of computer vision and aims to apply computer algorithms to generate image regions (a.k.a. tokens) within digital image analysis. There are several strategies for performing image segmentation, all resting on the following general principles: (1) regions do not overlap; (2) regions are (relatively) homogenous; regions are (relatively) different to neighbouring regions; regions are fairly equally sized (belong to one scale domain) but can be built in several hierarchical scales. General strategies include (1) edge-based segmentation and (2) region-based segmentation, and multi-scale segmentation as a specific case. Also referred to spatial classification emphasizing the constraint of spatial contingency, image segmentation aggregates neighbouring pixels, but – as compared to statistical clustering techniques – does not provide a unique set of classes (either semantic or statistic) in the feature space. Recently the term semantic segmentation has emerged in the machine-learning community, which is in fact a combination of segmentation and categorisation (labelling) via deep learning methods (e.g. convolutional neural networks).

External resources

Learning outcomes

Self assessment

Completed

Outgoing relations

Incoming relations

Contributors