2 - Spatial data modelling: computer representations

Explain and be able to apply basic vector and raster spatial data structures including selecting a suitable data structure for geographic phenomena (level 1, 2 and 3).

Concepts

  • Line representation

    Line data are used to represent one-dimensional objects such as roads, railroads, canals, rivers and power lines. Again, there is an issue of relevance for the application and the scale that the application requires.

    For the example of mapping tourist information, bus, subway and tram routes are likely to be relevant line features. Some cadastral systems, on the other hand, may consider roads to be two-dimensional features, i.e. having a width as well as length. Previously, we noted that arbitrary, continuous curvilinear features are as equally difficult to represent as continuous fields. GISs, therefore, approximate such features (finitely!) as lists of nodes: the two end nodes and zero or more internal nodes, or vertices, define a line. Other terms for “line” that are commonly used in some GISs are polyline, arc or edge. A node or vertex is like a point, but it only serves to define the line and provide shape in order to obtain a better approximation of the actual feature. The straight parts of a line between two consecutive vertices or end nodes are called line segments. Many GISs store a line as a sequence of coordinates of its end nodes and vertices, assuming that all its segments are straight. This is usually good enough, as cases in which a single straight line segment is considered an unsatisfactory representation can be dealt with by using multiple (smaller) line segments, instead of one.

    Still, in some cases we would like to have the opportunity to use arbitrary curvilinear features to represent real-world phenomena. Think of a garden design with perfectly circular or elliptical lawns, or of detailed topographic maps showing roundabouts and the sidewalks. In principle all of this can be stored in a GIS, but currently many systems do not accommodate such shapes. A GIS function supporting curvilinear features uses parameterized mathematical descriptions, a discussion of which is beyond the scope of this textbook. Collections of (connected) lines may represent phenomena that are best viewed as networks. With networks, interesting questions arise that have to do with connectivity and network capacity. These relate to applications such as traffic monitoring and watershed management. With network elements—i.e. the lines that make up the network—extra values are commonly associated, such as distance, quality of the link or the carrying capacity.

  • Topological data model

    The boundary model is an improved representation that deals with the disadvantages of the naive polygon which is described in polygons. It stores parts of a polygon’s boundary as non-looping arcs and indicates which polygon is on the left and which is on the right of each arc. A simple example of the boundary model can be seen below. It illustrates which additional information is stored about spatial relationships between lines and polygons. Obviously, real coordinates for nodes (and vertices) will also be stored in another table. The boundary model is also called the topological data model as it captures some topological information, such as polygon neighbourhood, for example. You can read more about topological information in Topology. Observe that it is a matter of a simple query to find all the polygons that are the neighbour of a given polygon, unlike the case above.

  • Regular Tessellation

    A regular tessellation is a partitioning of space into mutually exclusive cells that together make up the complete study area and in which the cells have the same shape and size. A simple example of this is a rectangular raster of unit squares, represented in a computer in the 2D case as an array of n x m elements. 

  • Vector Representation

    Tessellations do not explicitly store georeferences of the phenomena they represent. A georeference is a coordinate pair from some geographic space, and is also known as a vector. Instead, tesselations provide a georeference of the lower left corner of the raster, for instance, plus an indicator of the raster’s resolution, thereby implicitly providing georeferences for all cells in the raster. In vector representations, georeferences are explicitly associated with the geographic phenomena.

  • Irregular Tessellation

    Irregular Tessellations are partitions of space into mutually distinct cells, but now the cells may vary in size and shape, allowing them to adapt to the spatial phenomena that they represent. Irregular tessellations are more complex than regular ones, but they are also more adaptive, which typically leads to a reduction in the amount of computer memory needed to store the data.

    Regular tessellations provide simple structures with straightforward algorithms that are, however, not adaptive to the phenomena they represent. This means they might not represent the phenomena in the most efficient way. For this reason, substantial research effort has been put into irregular tessellation.

  • Tessellation

    Tessellation (or tiling) is a partitioning of space into mutually exclusive cells that together make up the complete study space. For each cell, a (thematic) value is assigned to characterize that part of space. There are regular and irregular tessellations.

    In a regular tessellation, the cells have the same shape and size; a simple example of this is a rectangular raster of unit squares, represented in a computer in the 2D case as an array of n × m elements. These tessellations are known under various names in different GIS packages: Rasters or raster map. The size of the area that a single raster cell represents is called the raster's resolution.

    Irregular Tessellation are partitions of space into mutually distinct cells, but now the cells may vary in size and shape, allowing them to adapt to the spatial phenomena that they represent.

  • Point representation

    Points are defined as single coordinate pairs (x, y) in 2D space, or coordinate triplets (x, y, z) in 3D space. Points are used to represent objects that are best described as shapeless, size-less, zero-dimensional features. Whether this is the case really depends on the purposes of the application and also on the spatial extent of the objects compared to the scale used in the application. For a tourist map of a city, a park would not usually be considered a point feature, but perhaps a museum would, and certainly a public phone booth might be represented as a point. In addition to the georeference, administrative or thematic data are usually stored for each point object that can capture relevant information about it. For phone-booth objects, for example, this may include the telephone company owning the booth, its phone number and the date it was last serviced.

  • Area representation

    When area objects are stored using a vector approach, the usual technique is to apply a boundary model. This means that each area feature is represented by some arc/node structure that determines a polygon as the area’s boundary. A polygon representation for an area object is another example of a finite approximation of a phenomenon that may have a curvilinear boundary in reality.

  • Resolution

    Resolution describes the ability to resolve small details (in space, time or the electromagnetic spectrum). When applied to spatial data, the term resolution is commonly associated with the cell width of the tessellation applied.

  • Spatial-Temporal data model

    The way we represent relevant components of the real world in our models determines the kinds of questions we can or cannot answer. We have already discussed representation issues for spatial features, but so far we have ignored issues for incorporating time. The main reason is that GISs still offer limited support for doing so. As a result, most studies require substantial efforts from the GIS user in data preparation and data manipulation. Also, besides representing an object or field in 2D or 3D space, the temporal dimension is of a continuous nature. Therefore, in order to represent it in a GIS we have to discretize the time dimension.

    Spatio-temporal data models are ways of organizing representations of space and time in a GIS. Several representation techniques have been proposed in the literature. Perhaps the most common of these is the “snapshot state”, which represents a single moment in time of an ongoing natural or man-made process. We may store a series of these “snapshot states” to represent “change”, but we must be aware that this is by no means a comprehensive representation of that process. Further treatment of spatio-temporal data models is outside the scope of this book and readers are referred to Langran (1992) for a discussion of relevant concepts and issues.

  • Geographical representation

    A geographical representation considers the way geoinformation, such as fields and objects, are represented in computers. 

    A geographic field can be represented by means of a tessellation, a TIN or a vector representation. The choice between them is determined by the requirements of the application in mind. It is more common to use tessellations, notably rasters, for field representation, but vector representations are in use too.

    The representation of geographic objects is most naturally supported with vectors. After all, objects are identified by the parameters of location, shape, size and orientation, and many of these parameters can be expressed in terms of vectors.

  • Triangulated Irregular Networks

    Triangulated Irregular Networks (TINs) is a commonly-used data structure in GIS software. It is a standard implementation techniques for digital terrain models, but it can also be used to represent any continuous field. The principles on which a TIN is based are simple. It is built from a set of locations for which we have a measurement, for instance an elevation. The locations can be arbitrarily scattered in space and are not usually on a regular grid. Any location together with its elevation value can be viewed as a point in three-dimensional space. From these 3D points, we can construct an irregular tessellation made of triangles.

  • Map scale

    In the practice of spatial data handling, one often comes across questions like “What is the resolution of the data?” or “At what scale is your data set?” Now that we have moved firmly into the digital age, these questions sometimes defy an easy answer. Map scale can be defined as the ratio between the distance on a printed map and the distance of the same stretch in the terrain.

    A 1:50,000 scale map means that 1 cm on the map represents 50,000 cm (i.e. 500 m) in the terrain. “Large-scale” means that the ratio is relatively large, so typically it means there is much detail to see, as on a 1:1000 printed map. “Small-scale”, in contrast, means a small ratio, hence less detail, as on a 1:2,500,000 printed map.