Dr. Barry Devlin, Founder and Principal, 9sight Consulting
Barry Devlin will be presenting the virtual public course:
Essentials of Data Warehouses, Lakes and BI in Digital Business – 15-16 March 2022
[2021] In October 2015, I wrote a blog for TechTarget entitled “Dodging a Data-Driven Debacle.” You can check out the link to assure yourself that I haven’t (Dominic Cummings-like) post-edited the text! But, when I came across it recently, I thought it would be interesting to comment lightly on with five years of retrospection. My 2021 comments here are formatted in italics.
Volkswagen alerts us to missing ethical thinking in the otherwise powerful concept of data-driven business. Thanks Guys!
One of the most intriguing examples of being supremely data-driven came recently from, of all places, the automobile industry. I refer to the emissions-rigging escapade of Volkswagen. As an exemplar of using sensors and analytics to achieve a business goal, the company’s approach was impressive, however questionable the ethics. Personally, I would be unsurprised to see further similar approaches exposed in the automobile and other industries. Because, beyond the disappointment and outrage being vented on the company and individuals therein, the scam exposes two widespread misconceptions about data-driven business today.
[2021] Data-driven business has morphed into digital transformation in recent years, but the transformation is in word only. The principles remain the same. And, without doubt, the misconceptions also live on. Plus ça change, plus c’est la même chose…
There is currently immense interest in building data-driven business models across the IT industry. The explosion of so-called big data first from social media and human web interaction and more recently from various sensors and other data generating devices has galvanized technology development in every aspect of hardware, networking and software. Every vendor worth their silicon is developing an Internet of Things (IoT) solution. Every consultant is discussing how business will be disrupted, while start-ups in every industry are actively doing it. Even traditional businesses are jumping aboard the IoT bandwagon. I’m not going to reel off the possibilities here; there are hundreds, if not thousands, of posts and papers out there already. My interest is in a couple of assumptions that underlie every opportunity: (1) that the data collected is valid and sufficiently reliable for the purpose to which it’s being put and (2) that (even partially) automated decisions based on the analysis of such data are better than those made by biased or otherwise debilitated humans. Both of those assumptions have been fairly well trashed by Volkswagen; I fear that few in the industry have noticed.
[2021] My recent keynote at the Enterprise Data and BI and Analytics Conference was entitled “When Good Data Goes Bad.” I’ve published a series of articles under the same title with Cutter Consortium. They speak directly to point 1 above. This is a problem that continues to get worse as we collect, use, and repurpose ever more data. It remains one of the most insidious challenges in today’s data-infested world and, worryingly, it continues to be downplayed and even ignored.
The possibility that IoT devices may provide invalid or unreliable data is often discussed in terms of failures of the devices themselves or the network infrastructure that connects them. Such failures are, of course, a reality. However, a more subtle and invidious possibility is that the devices may be designed deliberately to provide misleading data. With estimates of 20-30 billion devices on the IoT within 5 years, the opportunity to make mischief seems unmissable. Theo Priestly correctly questions who governs whether the data generated is even valid in the first place. In a legislative environment where pollutants are monitored and being driven ever lower, the value in devices that under-measure pollutants has been made obvious. Most governments now favour self-regulation by industry players themselves; so the risks of getting caught are limited. But the opportunities for fraud and deception that arise in every sector are widespread. A tiny over-measurement of the gas, water or electricity flowing into a few thousand random households in a city could enable a significant increase in profit for the supplier with minimal risk of detection. As devices become more sophisticated and more programmable remotely, further opportunities for deceit and cover-up are emerging.
In the case of automated decision-making systems, the increasing sophistication of the algorithms in use is already recognized as being beyond the comprehension of the vast majority of business people, and in many cases even beyond the programmers of the systems themselves. In these circumstances, discerning the intention of the algorithm (or actually of its designer) after the fact becomes largely impossible. Gartner is already talking about an algorithm economy where billions of algorithms exist and the workings of which are declared proprietary. Here, the question arises of who governs the ethics of the algorithms; indeed, who is even thinking about the topic? The responsibility lies with the businesses involved, because regulation at this scale is logistically impossible. And the suggestion made by some that there should be third-party devices and algorithms responsible for monitoring the other devices and algorithms only adds another layer of indirection to the question of ultimate responsibility for the correct/fair/balanced/just operation of these technologies. This, in the last analysis, comes down to the ethics of those directly involved.
So, for those businesses jumping aboard the data-driven business train, the question of intent should be front and centre in the minds of CIOs and even CEOs. What is the intent behind the design of the sensors and algorithms being built? Is there a firm ethical foundation behind their design and implementation? Is the intention to obey the letter of the law and avoid being caught out, in the interest of maximizing profit? If not, the consequences for your business may be far-reaching. Indeed, the consequences for the entire data and IT industry may be deeply destructive.
In short, we might yet be thankful to the folks at Volkswagen for alerting us to this missing thinking in the otherwise powerful concept of data-driven business.
[2021] We might indeed “be thankful to the folks at Volkswagen” if anything had changed in our level of governance of data, both big and small. Information technology tools continue to evolve, increasing our ability to collect ever more data and analyse it to within an inch of its life.
Since the original publication of this article, three new architectural patterns—data lakehouse, data fabric, and data mesh—have emerged, all of which will feature in my updated two-day online workshop “Essentials of Data Warehouses, Lakes and BI in Digital Business,” scheduled for 15-16 March 2022. But we will also examine briefly the consequences of an over-attachment to the concept of becoming data-driven rather than information-informed.
Image: The Death of Socrates, Jacques-Louis David (1787)