On Understanding Big Data Impacts in Remotely Sensed Image Classification Using Support Vector Machine Methods

Gabriele Cavallaro, Morris Riedel, Matthias Richerzhagen, Jon Atli Benediktsson, Antonio Plaza
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, Issue 99, pp. 1-13, 2015
[ DOI ] [ Juelich ]

Image Classification

Owing to the recent development of sensor resolutions onboard different Earth observation platforms, remote sensing is an important source of information for mapping and monitoring natural and man-made land covers. Of particular importance is the increasing amounts of available hyperspectral data originating from airborne and satellite sensors such as AVIRIS, HyMap, and Hyperion with very high spectral resolution (i.e., high number of spectral channels) containing rich information for a wide range of applications. A relevant example is the separation of different types of land-cover classes using the data in order to understand, e.g., impacts of natural disasters or changing of city buildings over time.

More recently, such increases in the data volume, velocity, and variety of data contributed to the term big data that stand for challenges shared with many other scientific disciplines. On one hand, the amount of available data is increasing in a way that raises the demand for automatic data analysis elements since many of the available data collections are massively underutilized lacking experts for manual investigation. On the other hand, proven statistical methods (e.g., dimensionality reduction) driven by manual approaches have a significant impact in reducing the amount of big data toward smaller smart data contributing to the more recently used terms data value and veracity (i.e., less noise, lower dimensions that capture the most important information).

This paper aims to take stock of which proven statistical data mining methods in remote sensing are used to contribute to smart data analysis processes in the light of possible automation as well as scalable and parallel processing techniques. We focus on parallel support vector machines (SVMs) as one of the best out-of-the-box classification methods.