I’ve spent my career looking at how large quantities of complex information affects every part of our lives and this is the most exciting time to be doing that. Information affects finances. Information affects your health. It affects the life choices presented to you. It cannot be overstated how important the accumulation of enormous sums of detailed data about all of us and every aspect of business is. Ten years ago who imagined that so much of the planet would be photographed and those photographs made widely available, as in Google Earth? Or who would have imagined that we would be so willing to share large amounts of personal information through public and quasi-public outlets like Twitter, Facebook and Foursquare?
William McKnight, Consultant, McKnight Consulting Group, [email protected]
William will be presenting the following sessions Master Data Management, Big Data Technology and Use Cases, and Strategies for Consolidating Enterprise Data Warehouses and Data Marts into a Single Platform at our Enterprise Data and BI Conference Europe 2016, 7-10 November, London
How much data are we talking about? The earth generated about 4 zettabytes of digital data in 2013. IDC forecasts that we will generate 40 zettabytes (ZB) by 2020. Now, all that data isn’t used, but increasingly, more of it is. And it’s not just stored once. Data with value is branched off into numerous databases across multiple companies. In only the last few years, as much data has been generated as had previously ever existed.
The Rise of Machine Data
It is machines that are primarily responsible. Machine data contains critical insights. It allows us to do unprecedented triangulation of physical objects,including all of us.
Unlike traditional structured data– for example data stored in a traditional relational database for batch reporting – machine data is non-standard, highly diverse, dynamic and high volume.
For example, each of the underlying customer touch systems in a product purchase can generate millions of machine data events daily. When we look more closely at the data we see that it contains valuable information – customer, order, time waiting on hold, twitter id … what was tweeted.
If you can correlate and visualize related events across these disparate sources, you can build a picture of activity, behavior and experience. And if you can do all of this in real-time, you can respond more quickly to events that matter.
You can extrapolate this example to a wide range of use cases – security and fraud, transaction monitoring and analysis, web analytics, IT operations and so on.
The Future Sources of Digital Data
We’re on the way to 50 billion connected devices. Processors are embedded in things everywhere. Most physical objects are on the way to being online. Internet Protocol 6 will allow for 78 octillion (billion billon billion) simultaneous connections. The internet in a few years will swell from the size of a relative baseball to the sun. Here are just a few of the advances to come, all of which will create digital data:
- The Pill Cam, which you swallow and it takes photos of your internals – one of many under skin devices.
- The Bodytel, the Glucotel and the IHealth Oximeter for no-sting blood measurements.
- Trash cans that sense their fullness so they can do compaction and call for emptying over wifi.
- Vending that takes pictures of you and shows images of what people like you buy – and what the vending will prefer you buy now.
- The Moticam, which captures everything a microscope looks at.
- The Bodyguardian, which allows physicians to monitor biometric data around the clock.
- Transportation companies prevent derailments caused by failing wheels and bearings. There are 20 million sensor readings that spot 1500 issues per day with the wheels and bearings at Union Pacific.
Companies that can harness this data can benefit accordingly. Industries that are growing fastest are those that are adopting technology and using data to understand themselves and within an industry, companies that are best harnessing their data are growing fastest.
Companies are beginning to value information more highly than any individual transaction or supporting process. What could be more important to a retailer? Making a sale or understanding that customer x would make that sale at that time in that store? It is the latter because that affects a life-long relationship – if you know what to do with the data.
“Software Eats the World”
All of this relates to Marc Andresson’s claim that “software eats the world.” What it means is every industry is becoming a software industry and a data-intensive industry. We really can’t afford not to store and process more data if we want to be successful and keep improving relative to peers.
For example, the data generated from sensor networks can be terabytes per second. A trans-Atlantic flight generates 650 terabytes of data from the airplane engine sensors. There is a lot of value in this information informing the repair and optimal use of the aircraft.
Or consider a truck-based asset delivery customer with average journey times of 19.6 hours for delivery corridor. A combination of tracking and insight allowed the discovery of floating optimal departure time across each day of week. Scheduling departures during these times reduces the journey duration up to 48%. Missing them can add over 100% to journey times which doubles the cost.
However, when you have an abundance of something, your relationship with that something changes.
To understand the enormity of the potential of data, let’s go back to the Industrial Revolution. That’s when there was an exodus of people from agriculture into manufacturing and the country urbanized en masse. We are now in the early throes of the Information Revolution and jobs are being reshuffled with higher value placed on those who can incorporate different kinds of information into every job. Who can use information to do their jobs?
There will be mistakes on that journey.
Our personal proclivities and psychographics are now in private hands. Sort of. Those private hands have a jaded view of what that is. Historically, corporations used information to make judgments about you – and this was mostly done on paper and barely usable – now they reach into the (landmine) data caverns of third-party curators of our digital footprints. If it can be monetized, it will be. But the curation today is in its infancy.
This curation requires the cooperation of the owners of the data – quite often application companies and also the people implicated in the use – to agree to share.
Are you a cigar smoker? Well, you did subscribe to Cigar Afficiando. Do you own a horse? Well, you do live on a lot zoned for horses. You get the point. Scores of people across thousands of dimensions are being calculated and often with imprecise data.
Companies know that if they can accurately anticipate your next move, they have a tremendous advantage in the market. But stores greeting you by name and recalling your last purchase, like in the movie Minority Report, are the tip of the iceberg of possibilities for how people will be treated in the Information Revolution.
Companies also know this interest in data extends to other companies and increasingly create lines of business for their data. It’s a ‘Wild West’ of data.
Our loose digital cues become sacrosanct in the mad rush to label us so companies can take informed actions. At the same time, companies are building their data science to handle more nuances in our data so they can treat us, more or less, as individuals. Until the data is accurate and the science is vastly improved, there will be errors. And companies have repeatedly shown their willingness to accept this and take chances.
It makes you wonder what other data we might be wiling – or coerced – to give up to give business an edge. We’ve only seen the first pitch of this game. What else do we have to give? We give our clicks, our corporate interactions go into that Wild West I spoke of and there’s a maelstrom of analysis over it.
How about our DNA? We could get very personal there. At some level, we can skip all other data because the DNA is definitive about, well, just about everything. Although I’m not sure that ship has to sail for there to be a vastly different human experience from what we have today.
We are a ways off from DNA harvesting and understanding. It’s being worked on, but know that the commensurate technology is there to do anything a company wants to do with today’s data. Companies are actually able to afford and store to process much more data than they are storing and processing today. Business needs to be planning for that. What data could it use?
Business has clear upward trends of spending on big data. It’s projected to be the top item of spend in many industries. Companies are adopting Hadoop and NoSQL, although larger companies struggle to get them into production. That will get fixed with the advent of more robust systems management tools and the increased pressure to save all data at lower costs. Graph database is the fastest growing database category. Streaming data is becoming more common for real-time data analysis. All of these involve big data and all were barely spoke of 10 years ago.
Most businesses have to admit that no matter what business they are in, they are in the business of information and everything else simply allows them to pursue business-as-usual.
So data is valuable. It gives business the view it needs to understand and improve itself. This adoption has driven the improvements in hardware. And while data is proving itself to be the next natural resource, there is a dark side. Data misinterpretation. Data misrepresentation. Hacking is at an all-time high. All the IoT devices are susceptible.
I will now discuss how the value of data demands that we capture its value in environments we control. “Let no data escape” must be the mantra of our systems development. However, we don’t need to store all data forever. We don’t even need to store every piece of data, but we do need to glean every possible value out of every data element possible.
For example, high-volume data can be used in a streaming sense of determining if it is useful to real-time applications of next-best-offer, fraud detection, account verification, etc. It doesn’t have to be stored anywhere. However, the analytic value of the transaction could be pulled into the profiles in master data management and the data could move on to the data warehouse for long-term storage for reporting.
Data can also be interesting from the third-party data marketplace. It is well past time to think of data as an asset and think about what data you could use to advantage. Chances are that data is available. Is your team?
This mantra implies that we must grow the data science of our organization to deal with the many and varied forms of data. While everyone will not be a degreed data scientist, the individuals in the organization that can deal with the greatest amount of information will be most successful. Gone are the days when a valued job gets the “data drop” monthly and proceeds without new information for a month or longer. Gone are the days when a valued job deals with only one type of information.
Those jobs exist, but they are being devalued. The degree that one can capitalize on the next natural resource of information is the degree to which one will be valued in the information revolution that is upon us.
All of this cannot be accomplished without an intense focus on the many and growing technical bases that can be used to store, view and manage data. There are many now, more than ever, that have merit in organizations today, which is why I advocate companies have a Chief Data Architect, or similar, position to govern the introduction of new data technologies.
The vendor market has kept up. As these systems continue to double their price-performance, bandwidth and storage capabilities annually, all things become possible.
Technologies to Deploy Now
- Hadoop/Spark Ecosystem – This ecosystem will evolve, but the foundation of scale-out file systems without overhead and moving towards stronger non-functionals, will not change.
- Master Data Management – Despite the intense resistance to sharing that these projects create, efficiently collecting or generating data to share in small and large ways is essential to the bottom line; the generation capabilities of MDM are increasingly being required.
- Internet of Things – Though not a technology, a consideration of using the internet as the processing backbone of new applications is increasingly compelling.
- Cloud – It’s hard to imagine just “cloud” as being a category, but at least starting there, it is a major disruptive force to IT-as-we-know-it.
- NoSQL – Perhaps the moniker will morph again, this time away from “not only SQL” to something that doesn’t imply its origination as the antithesis of a programming language; anyway, online digital strategies simply need to process too much information for any other operational approach.We will not run out of data, but we may be overwhelmed by it.
We are in the data economy and it is the next natural resource. Unlike other natural resources, every business must have a relationship with this one. Also unlike other natural resources, it is not entirely evident what that relationship needs to be. We need to figure it out in earnest. And I hope we keep the balance in favor of the human experience.
William McKnight is President of McKnight Consulting Group (www.mcknightcg.com). He is an internationally recognized authority in information management. His consulting work has included many of the Global 2000 and numerous midmarket companies. His teams have won several best practice competitions for their implementations and many of his clients have gone public with their success stories. His strategies form the information management plan for leading companies in various industries. William is author of the book “Information Management: Strategies for Gaining a Competitive Advantage with Data”. William is a very popular speaker worldwide and a prolific writer with hundreds of articles and white papers published. William is a distinguished entrepreneur, and a former Fortune 50 technology executive and software engineer. He provides clients with strategies, architectures, platform and tool selection, and complete programs to manage information. Follow William on Twitter: @williammcknight.
Copyright: William McKnight, McKnight Consulting Group