The last decade has proved that Big Data is indispensable for all industries. From the auto industry to farming, we are seeing an acceleration of generated data that is tucked away into massive Data Lakes.
The automobile revolution, including continuous auto-health monitoring, one-push driver assistance, and fully connected cars, is pushing the frontier of data collection. New cars produced around 2020 will on average generate over four terabytes of data every day. Dealing with this massive amount of data will challenge and completely revolutionize the Big Data technology stack.
This growth in Big Data is not only restricted to consumer technologies; agriculture is going through a revolution in today’s era of Internet of Things (IoT). Sensors and actuators across all the already industrialized farm lands are yielding thousands, if not millions of data every second. The fight is on to revolutionize the experience of a farmer as more companies are pouring billions of dollars in ‘Big Ag Big Data’ research. From auto-piloted drones to deliver pesticides, to weather-proofed buried sensors giving second-by-second updates on the micro climate where plants grow, the Big Data IoT revolution has been completely embraced even by the oldest of industries.
With all this available data, expectations for reduced costs and bigger returns are high. However, many companies are realizing that the grass on the other “Big Data side” is not as green as they thought. While Big Data has provided boosts to the bottom line, the next generation of Big Data will require substantial investment into big insights, not just Big Data.
This article discusses five important concepts that are sweeping the Big Data world. Embracing this new philosophy will accelerate insights and innovation in almost all industries.
Cloud transformation started with Infrastructure-as-a-Service (IaaS) revolution. IaaS provides IT Operations with the ability to seamlessly rent out servers without needing to physically maintain infrastructure within an organization. As industry catches up to this, Platform-as-a-Service (PaaS) is becoming the new reality. PaaS is further alleviating the IT pains by relieving them of mundane tasks like OS maintenance, and allowing them to focus on better goals, like cyber defense and faster compliance, all at a better cost.
The true revolution in cloud transformation is the nascent Serverless Computing technologies. Serverless Computing can truly relieve the need to maintain servers, operating systems and even applications to a large extent. Customers now need to just produce their code and deliver it to the cloud service provider to worry about the rest. Actions are packaged into cleanly defined web functions, which are hosted on the cloud and scaled on demand. Offering functionality via web Application Programming Interface (API) not only allows internal applications to scale with business, but also enables the companies to sell selected services externally. Such APIs are opening up opportunities for organizations to monetize on their data lakes by extracting insights from the data. An “insights-first” strategy is enabling organizations to head into the future of Insights-as-a-Service, if not, at least Data-as-a-Service.
Today, data lake philosophy promotes a culture of stacking away data and not really asking the important question of what the data is going to be used for. With data flowing into organizations at lightning speeds, it’s becoming important to focus on extracting meaning from data early in the data creation/acquisition process. More organizations have to deal with garbage or irrelevant raw data occupying mind-boggling volumes in the lake. The problem is starting to become significant enough to give rise to real-time analytics. Rather than just storing away data, real-time analytics focuses on storing ‘insights.’ But, this is not easy and it requires sophisticated analytics, and sophisticated resources.
Organizations are investing in Data Science teams to condense the large volumes of data through algorithms and store away only meaningful information. Technologies like Apache Storm, Spark, and Azure Streaming Service, are starting to lay the foundation of real-time use of data science in decision making. Unlike the traditional financial service fraud models that used older platforms and proprietary solutions, which took over six months to refresh data models, the newer real-time platforms have ubiquitous application potential, capable of transacting at millisecond speeds and providing almost instantaneous model refresh capability. Data Science can now react to changing market needs at break-neck speeds.
Machine Learning (ML) is very closely related to analytics. In fact, a lot of the techniques used in traditional statistics and analytics form the foundation for modern Machine Learning and Artificial Intelligence solutions. The key difference though is the fact that Machine Learning techniques rely on human labeled data to get into a continuous learning loop with their algorithms. Machine Learning algorithms can use complex correlations between various data streams to learn how to predict important decision points for the organization.
While ML represents a great opportunity for almost all industries, enabling production ready ML systems require platforms that are capable of state-of-the-art data ingestion, data transformation and model hosting. Most organizations stop short when it comes to implementing such production-ready ML IT solutions. The key is to embrace change management (within IT and Data Science teams) to develop a new DevOps model that works seamlessly with engineers and scientists alike, while driving the entire company towards an analytics-driven organizational model.
The goal of moving towards an analytics-driven organization is to allow faster innovation and to experiment with new ideas in quick time. This is especially true when it comes to customer facing departments, solutions and products. Fail-fast experimentation, especially social-media driven A/B testing is rapidly changing the way companies are reaching out and experimenting with their customers.
Starting with web technology companies to traditional retail companies, organizations are moving to innovative fail-fast experimentation and pushing the frontiers of traditional marketing. Setting up and running these A/B experiments is hard and it demands great teamwork between various parts of the organization before a viable outcome can be materialized. In order to ensure there is an organized system for mining Big Data and extracting insightful outcomes, it is important that organizations invest in fail-fast experimentation platforms and ensure there is a good process to triage the new ideas and innovations in a structured manner.
Testing-in-Production (TiP) as the name implies, reduces the typical DevOps cycle down such that developers can push out solutions to the customers with little or no delay. Usually DevOps takes long because of the traditional Dev, Test and Release cycles.
TiP approaches this problem in an innovative way. For a start, it’s common to define a metric of customer dissatisfaction that can be measured very quickly and effectively through one of various customer touchpoint channels. With this metric defined, developers (and experimenters) are given near production release authority. The platform is designed to allow new code release, to serve a new experience, to a randomly selected group of customers, while holding back a control customer population for comparison. The customer dissatisfaction is constantly measured on both groups and even the slightest increase in dissatisfaction will auto trigger the platform to revert to the old experience, thereby rejecting the new code release. This enables the organization to rapidly innovate with little or no friction in the ideation process. TiP works seamlessly with Agile DevOps and some of the pioneering Internet companies are adopting this methodology.