Big Data in the Utility Industry

Author: 
Thomas La Piana

Smart Meters have become mandatory for utility companies and with that there is a huge influx of data to manage. Big Data tools are the solution that utility companies have been waiting for. The amount of data gathered is an asset to any company that wants to increase efficiency and decrease the margin of error. Some of the biggest problems that utility companies face are in fact perfect use cases for Big Data.

The most common use case is monitoring outages. With all of that equipment, knowing its status in real time and information is critical, but what happens when you have thousands or even millions of equipment sensors feeding you data? That much data would completely overwhelm a traditional data and analytics pipeline to produce meaningful information in a real-time manner. The speed of data within a big data environment allows companies to react instantaneously and determine the scope of an outage more quickly.

Another common use case for utility companies is predictive maintenance. Ingesting and cleaning the data is the first step, but what about the ability to predict maintenance needs or outages before they happen? Then there truly is no other solution than a Big Data solution. A prediction model can tell if a machine is showing tell-tale signs of an impending malfunction so it can be addressed swiftly, before the machine malfunctions. Helping companies keep the lights on and avoid emergency fixes will decrease cost of maintenance in general and overtime due to unforeseen maintenance crises.

The number one big data solution for these types of problems is Apache Spark. Spark is a successor to Hadoop, built to deal with the weaknesses of the former. It is an in-memory database and analytics tool that is extremely flexible, fast and powerful. It is compatible with Scala, Java, Python and R so no matter which major analytics language you want to use, it will be compatible. Additionally, it is all open-source, so there are no expensive software fees or uncertainty about sudden loss of support.

A Spark database is able to ingest data as fast as you can feed it and with a proper CPU can run any complex analysis you need as that data is cleaned and then stored in a traditional database. This makes it extremely unique and applicable to many use cases. There is also a native data visualization tool that runs natively on Spark and allows for rapid prototyping of dashboards as well as instant data exploration.

As ATCG has seen, the impact of deploying a real-time analytics and predictive maintenance model is dramatically decreased equipment downtime, a drop in general equipment failures, and increased ROI by accurately forecasting the necessary allocation of resources.

If you need some basic understanding of big data click the links below

Big Data 101

Big Data Roadmap Case Study

For More advance thinkers

Please click below to get our latest user guide on how to setup an apache spark environment for utility companies: