[PS3-3] Data storage

EO data consist of unstructured image data and structured descriptive information attached to the image, which is also called metadata. EO systems are rapidly developing and data sensors resolution are continuously improving. As a result, a vast amount of EO data is generated every day, and their volumes have been in geometric progression growth. According to the current literatures, storage and management methods of EO data are divided into four groups from the perspective of basic technologies: 1. File systems: Traditionally, EO data were manually managed and organized by means of file systems that share and exchange data through storage devices. However, for large amounts of EO data this method leads to inefficiency of management, extra expenses of storage spaces, and weak data security. File systems cannot efficiently support for data retrievals, analyses, and uses in practical applications and research work nowadays. For solving these problems, parallel file system and distributed file system (see below) were presented to support data-intensive applications. 2. Relational Data Base Management Systems (RDBMS): At present, storage and management manners of major EO data are to combine RDBMS and middle-wares. On one hand, traditional RDBMS functionalities are expanded to adapt to the storage and management features of EO data. Adding new data types or encapsulating complicate data types as an object in RDBMS are two general ways to expand functionalities of traditional RDBMS. The former can meet basic requirements of EO data storage and management, but is unable to directly operate spatial data and create spatial indexes. This solution is mainly taken by Database Management System (DBMS) developers, such as Spatial GeoRaster of Oracle, Spatial Extender of IBM DB2, PostGIS of PostgreSQL, and Spatial Extension of MySQL. On the other hand, geographical software expands their data management abilities by developing spatial database engine middle-wares, which is always taken by software enterprises that develop geographical information system (GIS). Spatial Database Engine (SDE) is between users and DBMS. For data storing, SDE is responsible for receiving and storing user data into RDBMS; for data retrieving, it reads data from RDBMS and show them through user interfaces. This resolution stores EO data into RDBMS and interactively manages them by user interfaces provided by SDE. SDE technology is very mature and extensively used in various application fields. As SDE is developed by software enterprises of GIS, they have good comparability with integrated software platform of GIS. 3. Distributed file systems: Recently distributed file system is a new technology of solving data-intensive computing problems. Several distributed file systems have emerged such as PVFS, GPFS, ZFS, GFS, HDFS, and Lustre. 4. Large-scale network storage systems: It is a type of distributed file system with data sharing and remote access functionalities. As the performance improving of hardware and rapid development of network technologies, Storage Area Network (SAN) and Network Attached Storage (NAS) are introduced to distributed file systems. Large-scale network storage systems take different storage and management strategies for EO image files and their metadata. EO image files are stored and managed by HDFS, and their metadata are stored, processed, and managed in RDBMS metadata servers. Managing EO imagery files and their metadata in different ways can improve the management efficiencies of EO data, and balance the loading of distributed file systems. Such systems have already been developed including Celerra, CLARIION, and Symmetric storage solution of EMC, IBM HPSS, MSS, and RASCHAL of National Aeronautics and Space Administration (NASA), the Microsoft earth image storage system, and the Google Earth image storage system.

External resources

  • Cheng, Y., Zhou, K., Wang, J., Yan, J. (2020): Big Earth Observation Data Integration in Remote Sensing Based on a Distributed Spatial Framework. Remote Sens. 2020, 12, 972; doi: 10.3390/rs12060972.
  • Jensen, J. R. (2005). Introductory digital image processing: a remote sensing perspective (3rd ed.). Upper Saddle River, N.J.: Prentice Hall.
  • Kou, W., Yang, X., Liang, C., Xie, C., Gan, S. (2016): HDFS enabled storage and management of remote sensing data. 2nd IEEE International Conference on Computer and Communications (ICCC), Comp. Comm. 2016, pp. 80-84, doi: 10.1109/CompComm.2016.7924669.
    Kou, W., Yang, X., Liang, C., Xie, C., Gan, S. (2016): HDFS enabled storage and management of remote sensing data. 2nd IEEE International Conference on Computer and Communications (ICCC), Comp. Comm. 2016, pp. 80-84, doi: 10.1109/CompComm.2016.7924669.
  • Li, G., Huang, Z. (2017): Data infrastructure for remote sensing big data: Integration, management and on-demand service. J. Comput. Res. Dev. 2017, 54, 267–283.

Learning outcomes

Self assessment

Completed

Outgoing relations

Incoming relations

Contributors