Deep learning is increasingly finding practical applications using geospatial data, specifically, satellite imagery. At the GeoIgnite conference in Ottawa, Nicolas Martinez of Statistics Canada described a project scheduled to kick-off in July to use machine learning to identify housing starts across Canada. The goal of the project is to improve the accuracy and coverage of housing starts survey, specifically to fill data gaps for smaller remote and indiginous communities which are generally excluded from the current survey program.
Canada Mortgage and Housing Corporation (CMHC) currently employs a significant number of people at their head office in addition to field agents across the country who physically confirm residential starts and completions. The objective of the project is to apply convolutional neural network (CNN) machine learning to satellite imagery to extract information about housing starts and completions. The imagery will be preprocessed to prepare it to be used in developing a machine learning processing model. The largest challenge will be developing training and testing imagery data sets.
The original application of deep learning to photo imagery was by Geoffrey Hinton at Google who used it to distinguish dogs and other objects in photos that people uploaded. More recent applications of satellite imagery targeted landuse and agricultural identification. For example, remote sensed imagery from a satellite can be used to create a model that will differentiate between corn and potatoes, using factors that can be calculated from satellite imagery such as normalized difference vegetation index (NDVI) (min, max, mean), texture (min, max, mean), vegetation height, and geometric factors such as orientation. I blogged about how open source code and publicly available training data has been applied to track deforestation and reforestation in Mato Grosso, a state in the central amazon region of Brazil. A study in the Netherlands used CNN to identify blocked waterways using overflight imagery with a success rate of 97 %.
An important application of satellite imagery is identifying built structures. I have blogged about how satellite imagery has been used to identify buildings and transportation networks by applying a neural network model together with OpenStreetMap layers and high resolution Worldview multispectral imagery. NVIDIA has demonstrated the ability to automate detection of many road networks using deep learning algorithms and multi-spectral high resolution imagery. Another application of CNN and imagery in the Netherlands was able to differentiate residential roofs with dormer windows from those without.
The essential key to effective application of deep learning is good training and test data - ground-truthed data that involves, for example, someone on the ground identifying whether the fields seen in the imagery are corn or potatoes or something else. There are publicly available training data sets that can be used to train satellite imagery CNN models. For example, I blogged about the System for Terrrestrial Ecosystem Parameterization (STEP) dataset which has 2000 manually labeled sites covering 17 different land cover types scattered across all continents. A large publicly available training dataset derived from Sentinel 2 imagery contains 30,000 polygons of land use training data for ten classes of land use: annual crop, forest, herbaceous vegetation, highway, industrial, pastures, permanent crop, residential, river and sea or lake.
The Statistics Canada project is scheduled to begin in July with initial work on a proof of concept and CNN testing to begin in August and September. The communities chosen for the initial phase are Kitchener-Cambridge-Waterloo, Red Deer, and Iqaliut. The first phase is scheduled to be completed by winter 2019. The project plans to use Tensorflow and Keras, open source libraries and a python framework for applying machine learning. Artificial intelligence (AI) has often suffered from inflated expectations, complex code, heavy processing requirements and not very practical applications. There is a growing body of practical applications in the geospatial domain that shows that as a result of the immense processing power available in the cloud machine learning is able to generate fairly easily practical and useful results. As Chris Holmes pointed out in his talk at FOSS4GNA, the challenge for Statistics Canda will be to develop reliable and publicly available training and test datasets to enable deep learning models to be created for a broader range of applications using the huge volumes of satellite imagery and other geospatial data that are now available.