2099 - Discuss how remote sensing data is organized and stored

Discuss how remote sensing data is organized and stored

Concepts

  • [PS3-3] Data storage
    EO data consist of unstructured image data and structured descriptive information attached to the image, which is also called metadata. EO systems are rapidly developing and data sensors resolution are continuously improving. As a result, a vast amount of EO data is generated every day, and their volumes have been in geometric progression growth. According to the current literatures, storage and management methods of EO data are divided into four groups from the perspective of basic technologies: 1. File systems: Traditionally, EO data were manually managed and organized by means of file systems that share and exchange data through storage devices. However, for large amounts of EO data this method leads to inefficiency of management, extra expenses of storage spaces, and weak data security. File systems cannot efficiently support for data retrievals, analyses, and uses in practical applications and research work nowadays. For solving these problems, parallel file system and distributed file system (see below) were presented to support data-intensive applications. 2. Relational Data Base Management Systems (RDBMS): At present, storage and management manners of major EO data are to combine RDBMS and middle-wares. On one hand, traditional RDBMS functionalities are expanded to adapt to the storage and management features of EO data. Adding new data types or encapsulating complicate data types as an object in RDBMS are two general ways to expand functionalities of traditional RDBMS. The former can meet basic requirements of EO data storage and management, but is unable to directly operate spatial data and create spatial indexes. This solution is mainly taken by Database Management System (DBMS) developers, such as Spatial GeoRaster of Oracle, Spatial Extender of IBM DB2, PostGIS of PostgreSQL, and Spatial Extension of MySQL. On the other hand, geographical software expands their data management abilities by developing spatial database engine middle-wares, which is always taken by software enterprises that develop geographical information system (GIS). Spatial Database Engine (SDE) is between users and DBMS. For data storing, SDE is responsible for receiving and storing user data into RDBMS; for data retrieving, it reads data from RDBMS and show them through user interfaces. This resolution stores EO data into RDBMS and interactively manages them by user interfaces provided by SDE. SDE technology is very mature and extensively used in various application fields. As SDE is developed by software enterprises of GIS, they have good comparability with integrated software platform of GIS. 3. Distributed file systems: Recently distributed file system is a new technology of solving data-intensive computing problems. Several distributed file systems have emerged such as PVFS, GPFS, ZFS, GFS, HDFS, and Lustre. 4. Large-scale network storage systems: It is a type of distributed file system with data sharing and remote access functionalities. As the performance improving of hardware and rapid development of network technologies, Storage Area Network (SAN) and Network Attached Storage (NAS) are introduced to distributed file systems. Large-scale network storage systems take different storage and management strategies for EO image files and their metadata. EO image files are stored and managed by HDFS, and their metadata are stored, processed, and managed in RDBMS metadata servers. Managing EO imagery files and their metadata in different ways can improve the management efficiencies of EO data, and balance the loading of distributed file systems. Such systems have already been developed including Celerra, CLARIION, and Symmetric storage solution of EMC, IBM HPSS, MSS, and RASCHAL of National Aeronautics and Space Administration (NASA), the Microsoft earth image storage system, and the Google Earth image storage system.
  • [PS3-6] Data formats
    The concept of data formats refers to the way, in which the digital data are organized and stored. The data format for a remote sensing mission is usually chosen based on a number of considerations, including requirements of the sensing system, mission objective, the design and technology of data processing, archiving, and distribution systems, as well as community data standard. Earth observation data usually come as raster data. The raster data refers to a data model, which holds digital numbers or values in a regularly spaced matrix of cells arranged in rows and columns covering a two-dimensional space. A digital Earth observation image may contain several layers of this two-dimensional space, e.g. one layer for a specific spectral band in the optical or microwave region of the electromagnetic spectrum. The cells in such a layer are also called pixels, which stands for picture element. Earth observation data in an image are stored on a storage medium in one of three formats: Band-Interleaved-by-Sample (BIS), Band Sequential (BSQ), or Band-Interleaved-by-Line (BIL). These formats are determined by different ordering of the data dimensions. Other data formats used in remote sensing, which in this case refer to the file format are GeoTIFF, NetCDF, and HDF. Exact details on the data format of an Earth observation data set is usually provided by the originator of the data, e.g. space administrations such as NASA or ESA or private companies.