Data checks and repairs

Introduction

Data checks and repairs refers to the step when acquired data sets must be checked for quality in terms of the accuracy, consistency and completeness. Often, errors can be identified automatically, after which manual editing methods can be used to correct the errors. Alternatively, some software may identify and automatically correct certain types of errors. The geometric, topological, and attribute components of spatial data can be distinguished.

Figure 1: Clean-up operations for vector data.

“Clean-up” operations are often performed in a standard sequence. For example, crossing lines are split before dangling lines are erased, and nodes are created at intersections before polygons are generated (Figure 1).

With polygon data, one usually starts with many polylines (in an unwieldy format known as spaghetti data) that are combined and cleaned in the first step (Figure 2a-b). This results in fewer polylines with nodes being created at intersections. Then, polygons can be identified (Figure 2c). Sometimes, polylines that should connect to form closed boundaries do not, and must, therefore, be connected either manually or automatically. In a final step, the elementary topology of the polygons can be derived (Figure 2d).

Figure 2: Successive clean-up operations for vector data, turning spaghetti data into topological structure.

 

Learning outcomes

Prior knowledge

Outgoing relations

Incoming relations

Learning paths