One of the hot topics, if not the hottest, at this year's Distributech in San Antonio was big data analytics. David Lawrence of Duke energy and David Mulder of Leidos gave an insightful presentation on the challenges Duke has encountered in cleaning, merging and managing operational data, combining it with other types of data including social media, and developing analytical tools to extract meaningful information.
By way of background, Duke is the largest elecric power utility in the U.S. with 7.2 million distribution customers, 32,200 miles of transmission lines, and 4.9 GW of generation capacity.
David Lawrence's perspective on where utilties are with repect to smart grid is that many utilities are moving beyond their initial AMI deployments into advanced automation. Improved operational performance, reduced costs, and better cybersecurity are driving the convergence of IT(information technology) and OT (operations technology). In addition utilities are experiencing the rapid adoption by their customers of distributed energy resources (DERs) such as wind and solar PV. Solid state transformers are accelerating this trend. Dynamic load management is becoming more prevalent and will effectively become another DER. The convergence of IT and OT is making cybersecurity even more critical. Microgrids are on the radar but require faster response times and better integration with a variety of equipment. Utilities expect that “big data” analytics offers significant benefits, but most don’t yet have a solid strategy for implementing it.
A major challenge is that many operational systems including outage management (OMS), customer information systems (CIS), billing systems, GIS and others are packaged for a single “silo” and are not easily or cheaply interoperable. Duke has hundreds of these siloed systems.
To begin the journey toward big data analytics and the benefits expected from this, Duke Energy has undertaken a series of data mining activities by mapping, monitoring, collecting, validating, storing, visualizing, and discovering new relationships with the operational data from its smart grid test area. Duke has data stored in hundreds of databases and systems. As a first step Duke has implemented a dedicated data warehouse called “the Sandbox” to mine data across operational siloes for interesting insights and to discover new analytical tools.
The data sources that Duke has integrated in a data warehouse include operational sensors and systems such as AMI, transformers, line sensors, capacitor banks and reclosers, outage management systems, billing systems, and smart grid communication nodes and data sources such as weather stations, socioeconomic databases and social network feeds. For their Sandbox warehouse they included about 3 months of data from June to August 2012 which comprise about 30 GB of data.
To help find and develope the analytical tools to mine the data that Duke hs assembled, Duke invited the vendor community to participate in a collaborative, non-competitive data mining exercise. The objective was to mine this data to develop new use cases. To prime the analytics pump Duke provided six use cases. Participants were asked to provide new use case ideas, insights into what other types of data should be collected, data quality and interoperabiity issues, and data models appropriate for different types of analysis.
- no single vendor had all the answers,
- interoperability issues between siloed operational systems limits the value of the data,
- custom data models are needed for different types of analysis (descriptive statistics, time series analysis, logistic regression and so on) of structured data,
- unstructured data such as social network feeds from outside Duke can add significant value.
The collaborative data mining exercise was very fruitful, generating over 150 new use cases. Some examples of use cases include usage outlier analysis on meter interval data, customer peak analysis on meter interval data, outage analysis using outage data, predicting customer energy usage by correlating historical weather and customer energy usage data (predictive analytics).
Some of the most interesting insights into data requirements that were suggested by vendors include
- Need for more historical data
- Linking data from different datasets across siloed systems was a major challenge and severely limited the analysis
- Semantics - part of the linkage problem is mapping different terminology for the same data across application silos (this is similar to what the CB-NL concept library initiative is attempting to address for the construction lifecycle)
- GPS data (geolocation) for all devices would add significant value
- Skill sets and staffing: power system and data science experts are needed to develop successful big data analytics.
The next steps are to consolidate vendor data issues, comments, and modeling methodologies to develop requirements for Sandbox 2.0. The long term goal of the exercise is to define Duke Energy's big data architecture.