At the recent Canadian Institute of Geomatics (CIG) Ottawa Branch Workshop on Big Data: What is it? Technology and Applications, Monica Wachowicz from the University of New Brunswick pointed out a brand new entry Data Science in the Gartner Hype Cycle 2014 that wasn't in the Gartner Hype Cycle 2013. Then in the Gartner Hype Cycle 2015 Data Science and Big Data have disappeared and a new entry with the intriguing name Citizen Data Science has appeared.
2013 Big Data
2014 Big Data and Data Science
2015 Citizen Data Science
What is data science ?
Monica referenced research by Harlan Harris et al “Analyzing the Analyzers” An Introspective Survey of Data Scientists and Their Work which surveyed data scientists to try to determine what data scientist do. They divided data scientists into five categories based on their self-ranked skill sets; Statistics (which includes spatial statistics), Math/Operations Research, Business, Programming, and Machine Learning/Big Data, and four categories based on their respondents' self-identification; Data Researchers, Data Business people, Data Engineers, and Data Creatives.
- Data Business people are most focused on the organization and how data projects yield profit.
- Data Creatives excel at applying a wide range of tools and technologies to a problem, or creating innovative prototypes at hackathons.
- Data Developers are focused on the technical problem of managing data — how to get it, store it, and learn from it.
- Data Researchers backgrounds are in statistics or mathematics, 75% of them have published in peer-reviewed journals, and over half have a PhD.
- a defining feature of data scientists is the breadth of their skills, and their ability to single-handedly do at least prototype-level versions of all the steps needed to derive new insights or build data products.
- the evidence points in favor of a scientific versus a tools-based education for data scientists.
- 70% of the respondents had at least a Master’s degree.
- 40% had undergraduate degrees in scientific fields, specifically, physical or social sciences (but not mathematics, computer science, statistics, or engineering).
The survey showed that the most successful data scientists are those with substantial, deep expertise in at least one aspect of data science; statistics, big data, or business communication. The authors concluded that data science is a collaborative and creative field, where the successful professional is able to work with database administrators, business people, and others to get data projects completed in innovative ways.
What is citizen data science ?
One of the biggest drivers for a new type of data specialist is that there are simply not enough data scientists to satisfy the demand from virtually all sectors of the economy. Gartner has recommended cultivating “citizen data scientists” which they identify as people on the business side who may have some undergraduate mathematics or social science background and who can be assigned to exploring and analyzing data with the appropriate software tools. Last year Gartner predicted that the demand for citizen data scientists will increase five times more rapidly than that of the highly skilled data scientists Harlan Harris studied.
Software tools are critical for helping citizen data scientists find real insights and avoid simple statistical mistakes. For example, a technology that Gartner identified is "smart data discovery" which is a next-generation data discovery capability that provides business users or citizen data scientists with insights from advanced analytics and helps them avoid some of the common statistical pitfalls.
Citizen geodata science
There is at least ancedotal evidence that there has been for some time a trend toward "citizen geodata science". At the India Geospatial Forum in Hyderabad in 2014, I moderated a session on electric power. One of the speakers was Arup Ghosh, Chief Technology Officer at Tata Power Delhi Distribution Ltd (TPDDL) who presented his perspective on implementing geospatial technology in a private utility. The geospatial group at TPDDL had about 60 field personnel and 18 analysts and support staff, none of which had an educational background in geospatial data and technology. Twelve were electrical engineers and the rest were people with electric power experience. All learned geospatial data management and simple analytics "on the fly".
At a recent GoGeomatics Social Jonathan Murphy gave a presentation about his experience working in Northern Alberta preparing terrain for seismic surveys. Geospatial data and technology was used in all aspects of field operations. The staff were experienced in seismic surveying, winter drilling programs, wildfire management, and road and facility construction, but had minimal education in geospatial data management and analytics. They also had picked up enough geospatial knowledge "on the fly" to do their jobs.
These are two examples of what could be called "citizen geodata science" which I suspect is part of Gartner's citizen data science trend and is almost certainly growing more rapidly than traditional "geospatial data science" as geospatial technology is adopted into vertical industries. The challenge is how to reach these people, perhaps through MOOCs, community colleges, conferences focused on vertical industries and including geospatial technology vendors, presentations and hand-on training, or vendor marketing to help them avoid the common pitfalls of geodata management and geoanalytics.