List

4 - Data management: GIS architecture and spatial databases

Explain the basic architecture of a GIS, its main components and how these components are related (level 2). Create a simple spatial database given a spatial problem (level 1, 2 and 3).

Concepts

Data storage and maintainance

The way that data are stored plays a central role in their processing and, eventually, our understanding of them. In most available systems, spatial data are organized in layers by theme and/or scale. Examples are layers of thematic categories, such as land use, topography and administrative subdivisions, each according to their mapping scale. An important underlying principle is that a representation of the real world has to be designed such that it reflects phenomena and their relationships as naturally as possible. In a GIS, features are represented together with their attributes—geometric and non-geometric—and relationships. The geometry of features is represented with primitives of the respective dimension: a windmill probably as a point; an agricultural field as a polygon. The primitives follow either the vector or the raster approach.

Vector data types describe an object through its boundary, thus dividing the space into parts that are occupied by the respective objects. The raster approach subdivides space into (regular) cells, mostly as a square tessellation of two or three dimensions. These cells are called pixels in 2D and voxels in 3D. The data indicate for every cell which real-world feature is covered, provided the cell represents a discrete field. In the case of a continuous field, the cell holds a representative value for that field. The Table below lists advantages and disadvantages of raster and vector representations.

Table: **Raster and vector representations compared**
	Raster representation		Vector representation
	Advantages		Advantages
∙	simple data structure	∙	efficient representation of topology
∙	simple implementation of	∙	adapts well to scale changes
	overlays	∙	allows representing networks
∙	efficient for image processing	∙	allows easy association
			with attribute data
	disadvantages		Disadvantages
∙	less compact data structure	∙	complex data structure
∙	difficulties in representing	∙	overlay more difficult to implement
	topology	∙	inefficient for image processing
∙	cell boundaries independent	∙	more update-intensive
	of feature boundaries

The storage of a raster is, in principle, straightforward. It is stored in a file as a long list of values, one for each cell, preceded by a small list of extra data (the “file header”), which specifies how to interpret the long list. The order of the cell values in the list can, but need not necessarily, be left to right, top to bottom. This simple encoding scheme is known as row ordering. The header of the raster will typically specify how many rows and columns the raster has, which encoding scheme was used, and what sort of values are stored for each cell.

Raster files can be large. For efficiency reasons, it is wise to organize the long list of cell values in such a way that spatially nearby-cells are also near to each other in the list. This is why other encoding schemes have been devised. The reader is referred to Laurrini and Thompson (1992) for a more detailed discussion.

Low-level storage structures for vector data are much more complicated, and a discussion of this topic is beyond the scope of this textbook. The best intuitive understanding can be obtained from Topological Data Model, which illustrates a boundary model for polygon objects. Similar structures are in use for line objects. For further, advanced, reading see Samet(1990). GIS packages support both spatial and attribute data, i.e. they accommodate spatial data storage using a vector approach and attribute data using tables. Historically, however, database management systems (DBMSs) have been based on the notion of tables for data storage.

GIS applications have been able to link to an external database to store attribute data and make use of its superior data management functions. Currently, all major GIS packages provide facilities to link with a DBMS and exchange attribute data with it. Spatial (vector) and attribute data are still sometimes stored in separate structures, although they can now be stored directly in a spatial database. Maintenance of data, spatial or otherwise, can best be defined as the combination of activities needed to keep the data set up to date and as supportive as possible for the user community. It deals with obtaining new data and entering them into the system, as well as possibly replacing outdated data. The purpose is to have an up-to-date, stored data set available. After a major flood, for instance, we may have to update road-network data to reflect that roads have been washed away or have become otherwise impassable.

The need for updating spatial data originates from the requirements posed by the users, as well as the fact that many aspects of the real world change continuously. Data updates can take different forms. It may be that a completely new survey has been carried out, from which an entirely new data set will be derived, to replace the current set. This is typically the case if the spatial data originate from remote sensing, for example, from a new vegetation-cover set or from a new digital elevation model. Furthermore, local ground surveys may reveal local changes, such as new constructions or changes in land use or ownership. In such cases, local changes to a large spatial data set are typically required. Such local changes should take into account matters of data consistency, i.e. they should leave other spatial data within the same layer intact and correct.

Spatial database

Spatial databases, also known as geo-databases, are implemented directly on existing DBMSs using extension software to allow them to handle spatial objects.
GIS

A GIS (geographic information system) is the computerized system that facilitates four sets of capabilities to handle georeferenced data including:

1. Data Capture and Preparation
2. Data Management (storage and maintenance)
3. Data Manipulation and Analysis
4. Data Presentation
Relational data model
A data model is a language that allows the definition of:
- the structures that will be used to store the base data;
- the integrity constraints that the stored data have to obey at all moments in time;
- the computer programs used to manipulate the data.
For relational data models, the structures used to define the database are attributes, tuples, and relations. Computer programs either perform data extraction from the database without altering it, in which case they are termed queries, or they change the database contents, in which case we speak of updates or transactions.