[IP1-3-1-1] DEM generation with 'Structure-from-Motion'

Structure from motion (SfM) describes the photogrammetric process for estimating the 3D structure of a scene, whereby correspondences between multiple images are established and used to detect motion parallax. When a camera moves over a surface while taking successive overlapping images, the distances between features on the surface will change from one image to the next. The changes depend on the distance of the feature points to the camera, and thus the surface elevation. This motion parallax can be used to generate an accurate 3D representation of the surface. The photogrammetric problem of SfM is similar to stereo vision, but has gained popularity with the advent of inexpensive cameras which have variable internal geometries, unlike metrically stabilized cameras traditionally used in airborne mapping. Even with less accurate or even missing GPS location and orientation metadata, SfM still allows for the creation of (hyper)local DEMs as long as the imagery contains sufficient overlap. Airborne or spaceborne platforms can be used, provided that 2D frame-based cameras are used which can be represented with a pinhole mathematical model. Generating a digital elevation model (DEM) from SfM is typically handled automatically using specialized software. Firstly, image correspondences are detected. Feature points are identified in the individual images using local contrast feature detectors. The features extracted from all the images are matched with all the available overlapping images and erroneous matches are filtered out. The process typically results in hundreds or thousands of tie-points per image, which allows for robust matching even with large a priori uncertainties in camera orientation. A bundle adjustment, solving for the 3D coordinates of the feature points, the position and orientation of the camera and its internal characteristics then results in an initial, so-called sparse 3D point cloud. Next, ground control points (GCPs) can be introduced. These are surface features (naturally present or introduced into the scene) which can be identified at the pixel level in the images by users. Measured also in the field with an accuracy smaller than the pixel size, they can be used to constrain the bundle adjustment solution to improve georeferencing and camera calibration to an accuracy similar to that of the GCP measurement or the GSD size. Since this process yields a match only for a small subset of all pixels, an additional step, called dense image matching is added. It starts from the exact position and orientations resulting from the bundle adjustment to rectify the images and overlay two or more images, to compare them row by row and in 16 different directions in a process called semi-global matching (SGM). Matching pixels are identified along these lines, and 3D intersection distances photogrammetrically inferred. By combining results from different directions, a 3D coordinate for almost every pixel is obtained with similar accuracy. Finally, DEM products with a regularly spaced grid are generated and exported based on the dense point cloud. Depending on the point classes used in the export (obtained through topographic filtering or deep-learning-based classification of the dense point cloud), the outcome will be a digital surface model (DSM) or digital terrain model (DTM).

External resources

Learning outcomes

Self assessment

Completed

Outgoing relations

Contributors