At this year’s AGI GeoCommunity '13 Conference in Nottingham, Laura Kinley gave a very germane presentation about her research on using crowdsourced data as a source of authoratative land cover data. Land cover data is used for a variety of purposes incluidng socio-economic for monitoring how land is used and determing policy and ecological purposes such as ecosystem services, climate change modeling, environmental management and as an ecosystem health indicator.
The most common techniques for large scale land use monitoring are automated using satellite and aerial imagery. But automated methods can be unreliable for a number of reasons including misclassification because of poor or the lack of ground truth data for training, inconsistent nomenclatures, and out-of-data information because of the cost of regular maintenance and updating.
The questions Laura is attempting to answer are
- What kind of information do existing crowd contributions contain?
- Do existing crowd data sources contain good quality land cover information? (in terms of coverage, temporal and attribute accuracy)
- Is it viable to use crowdsourced input in combination with authoritative Centre for Ecology and Hydrography (CEH) land cover data?
The methodolgies she applies as part of her research include semantic analysis of textual data, determination of spatial and attribute agreement between crowdsourced content and official land cover data. She also considered data density and temporal appropriateness.
The two main crowd data sources are OpenStreetMap and Geograph, both of which collect information on land coverage. One of the very interesting results of a semantic analysis is that OpenStreetMap is very consistent in its use of classifications. Only 16 land coverage classifications are used by OpenStreetMap as opposed to 369 unique terms used by Geograph. This highlights an important problem which is the low level of standardization in the terms used to classify land coverage and in how data is collected and classified.
An important advantage of crowdsourced data is that is generally much more timely than the official CEH data. For example, 56.8% of Geograph data was captured and uploaded since January,
2011 and only 1.8%
of Geograph grid squares predate the most recently available offical CEH survey date. There is the question of whether this level of timeliness can be maintained.
Her general findings to date have been that currently the lack of standardization is a major problem that prevents crowdsourced data from achieving and maintaining the necessary level of quality that the Ordnance Survey requires for authoratative data sources.
What Laura sees as needed are methods of encouraging enthusiasts to develop and adopt standards for collecting and classifying land cover data. But she recognizes that there is a tradeoff between restricting projects to achieve the necessary level of quality and encouraging user led innovation. She also suggests that making crowdsourced projects more aligned with professional practices would enable professional bodies / businesses to provide feedback.
Future work Laura outlined includes streamlining the process of obtaining and testing crowdsourced data and standardizing the terms used by crowd sourced projects by mapping common user terms onto official definitions.