Deep learning algorithms have been developed by academia and as a result the code for the most part is open source but the successful training of deep networks requires thousands of labeled training samples and at the present time this training data is typically not open. In their presentation at this year's FOSS4GNA (Free and Open Source Software for Geospatial North America) get together in St Louis Jason Brown and Courtney Whalen, data scientists at astraea, showed how deep learning using publicly available labeled data for training was applied to track deforestation and reforestation in Mato Grosso, a state in the central amazon region of Brazil.
Chris Holmes of Planet Labs, in his insightful talk at FOSS4GNA in St Louis about the application of deep learning to geospatial data, identified a challenge in making this technology open. The deep learning algorithms have been developed by academia and as a result the code for the most part is open source. For example, a deep neural network model developed originally for medical image segmentation called U-Net is open source and has been applied to identifying building footprints. Successful training of deep networks requires thousands of labeled training samples. Labeled data involves people on the ground manually ground-truthing land use types and other features so that the deep learning algorithms can learn what to recognize. At the present time this training data is typically not open source. In this presentation by Jason Brown and Courtney Whalen, both data scientists at astraea, deep learning using publicly available labeled data for training was used to track deforestation and reforestation in Mato Grosso, a state in the central amazon region of Brazil.
This is computationally intensive and a distributed engine was used. The computation engine used open source components. Spark is a top level Apache project which enables distributed processing for global scale computation. RasterFrames is a free and open source toolkit allowing scientists, data scientists, and software developers to process and analyze geo
patial-temporal raster data with the same flexibility and ease as any other data type in Spark DataFrames. This is a LocationTech raster project and is built on GeoTrellis. Using this software each year required 6 to 7 hours of computation using 48 cores.
The imagery that was used was captured by the MODIS satellite for the years 2001 through 2017. MODIS monitors the reflection back from ground cover for several bands including red, green, blue, short wave infrared and near infrared. Its cameras have a spatial resolution of 500 by 500 meters and a revisit rate of one to two days. From the bands that it captures the normalized difference vegetation index (NDVI) can be calculated. From the data monthly means and yearly aggregates can be calculated.
The training data used came from the System for Terrrestrial Ecosystem Parameterization (STEP) which has 2000 manually labeled sites covering 17 different land cover types including five forest types scattered across all continents. The model was trained on MODIS 2012 data. 80% of the data was used for training. After training was completed the remaining 20% was used to test the model.
After training and testing the first application was to Mato Grosso in central Brazil, a large state that has seen a lot of deforestation. The rate of deforestation tracked the rate estimated independently by the Global Forest Watch for the years 2001 to 2017. The major feature, the slowing down of the rate of deforestation in 2011 probably as a result of increased enforcement by the state government, is very clearly discernible.
The successful application of the deep learning technology in Mato Grosso has encouraged astraea to aim at applying this approach globally. They also intend to use satellite data with higher resolution and to handle seasonal differences better.
The System for Terrestrial Ecosystem Parameterization (STEP) is a model for deriving vegetation and land surface parameters from remote sensing data for use in remote sensing-based classification of land cover, ecosystems, and vegetation types. The model defines parameters that relate to important ecological and biogeophysical parameters and that can be reliably measured or inferred from remote sensing, collateral, and field plot data. STEP is maintained as a database of training polygons drawn on high spatial resolution imagery that can be extracted with GIS to produce a global land cover classification. STEP is periodically reviewed to filter out inconsistent sites and augmented to fill gaps in biogeographical coverage. The database was originally created to follow the International Geosphere-Biosphere Programme (IGBP) land cover legend but it has since evolved to support any number of additional classifications.