A very exciting project has been proposed at LocationTech (it's in the Project Proposal Phase as defined in the Eclipse Development Process). Simply put, GeoWave intends to do for "big data" databases (initially Apache Accumulo) what PostGIS does for SQL databases (PostgreSQL). GeoWave is open source software (licensed under Apache 2.0) that adds support for geographic objects, multi-dimensional indexing and geospatial operators to Apache Accumulo.
To deal with data volumes that are too large for traditional SQL databases, beginning in 2004 Google developed "BigTable" which is a compressed, high performance, and proprietary data storage system built on the Google File System that is used by a number of Google applications including Google Maps. Apache Accumulo is a distributed database that is based on Google's BigTable design and is built on top of Apache Hadoop and other Apache projects. For putting geospatial data into a key/value store like Accumulo the key concept is that of the "geospatial hash" which converts a 2D, 3D or 4D coordinate such as a lon and lat, lon, lat and elevation or a lon, lat, elevation, and time to an integer index, such as a quadtree or R-Tree index, that can be used to order and rapidly retrieve spatial data. GeoWave means that you can manage massive amounts of geoinformation in key/value databases such as Accumulo and take advantage of programs such as MapReduce which Accumulo uses for distributed processing.
Connecting Accumulo to GeoServer, GeoTools, and PDAL
In addition GeoWave includes a GeoServer plugin to enable geospatial data in Accumulo to be shared and visualized via GeoServer OGC standard web services. It provides plugins to connect the popular geospatial toolset GeoTools and the point cloud library PDAL to an Accumulo based data store. The PDAL plugin makes it possible to interact with point cloud data in Accumulo through the PDAL library.
The GeoWave project Work plans to extend the same geospatial capabilities to other distributed key-value stores in addtition to Accumulo. The next data store will be HBase. It also will support other geospatial frameworks in addition to GeoTools/GeoServer. Mapnik is the next geospatial framework targeted for GeoWave support. GeoWave says it is very interested in GeoGig and support for this geospatial data versioning library is currently on their backlog. GeoGig takes the concepts used in distributed version control such as Git and applies them to versioned spatial data.
GeoWave was developed at the National Geospatial-Intelligence Agency (NGA) in collaboration with RadiantBlue Technologies and Booz Allen Hamilton. The NGA released GeoWave under an open source license in June, 2014. The primary goal of GeoWave is to bridge the gap between well-known geospatial projects such as GeoTools and distributed databases.
I blogged previously about GeoMesa, the first LocationTech project that aims at providing a foundation for storing, querying, and transforming spatio-temporal data in Accumulo. It implements interfaces that enable Geoserver and other Geotools projects to use Accumulo as a data store.