Data Governance: “Where’s the Money, Honey?” - IRM Connects, by IRM UK

Becoming a data-driven company: what’s not to like about it? Many organizations are embarking on their own “Big Data” journey, and many are finding that it’s not obvious how to create awareness of “data as an asset” throughout the organization. Embarking on a Data Governance plan implies driving change, which is often one of the hardest parts of these programs. How can we, as data professionals, make our lives easy(ier)?

Tom_Breur Tom Breur, VP Data Analytics, Cengage

Tom will be speaking on Establishing Data Governance in a Greenfield Data Driven Organization at IRM UK’s co-located conferences on Data Governance and MDM, 16-19 May 2016, London

First of all, and as “everybody” notes, you need senior management sponsorship. Unless upper management endorses these efforts, you might as well stay home. Of course it is fairly easy to say you support a program. At the end of the day, people involved in the day-to-day work on Data Governance will (still) be expected to do most of this hard work.

Leading change in Data Governance programs, especially in the early stages, is best done through information: a compelling story, convincing examples of how doing things in some “new” way will benefit the organization. If you need to rely on senior management’s “stick-and-carrot”, you may be able to invoke some change, but it probably won’t last. There has to be a better way, and there is.

Data Governance programs can have many objectives. One of them typically is to instill a sense of ownership and decision rights with regards to data assets. Once you begin to treat data like other valuable (albeit tangible) assets the company owns, then many similar governance principles can be applied.

For example, when it comes to ownership, you want to make explicit who (person or committee) has the ultimate right to decide about forthcoming changes in IT systems. Your budget holder is a likely candidate system owner, but there can be others, too. Often, teams or committees have this responsibility, and then it helps to be explicit about this. Who, exactly, gets to make these decisions?

Besides ownership, you also want to make public who should be consulted with regards to planned changes (who has a right to provide input), and who should be informed after changes have been put in place. Data gets used and reused downstream from primary processes, and exactly this secondary usage creates so much value. Of course secondary usage of data can also be a cause for concern, and serious headaches… For these reasons, the entire value chain that data flows through needs to be considered and informed when making changes.

IT systems exist upstream, and largely independent from data usage, yet can be the source of a wide variety of analytical applications and reporting. When business owners make changes to these source systems, downstream data usage will be affected. Are stakeholders aware of all the different ways data are being reused downstream? Probably not. Do they know how much it costs to curate inaccurate source data? Maybe even more important: what are the costs when downstream business decisions are compromised? Quantifying these consequences goes a long way towards creating awareness about managing “data as an asset.”

Once I worked for a credit card company where errors in the application processing data trickled down to sub optimal assessment of credit risks. The way this works is that historical data about credit card applicants (along with their eventual credit history) are used to predict “likelihood to default” for future applications. You gather data about thousands of applications, and track their subsequent payment behavior. For every record you append a flag “defaulted on loan yes/no”, and then use that file to create a model that predicts likelihood to default.

The quality of these data has a material impact on the accuracy of the models you will build. When a past applicant gets flagged as a home owner, but in reality he was renting (associated with higher credit risk), then those input data will render credit scoring models less accurate. But by how much? Is that really such a big deal? You won’t know, until you measure…

An example from my personal experience. At a credit card company I worked, we engaged in a wholesale scrubbing of this application scoring database. Many thousands of applications were re-entered and the “correct” value as submitted on the form was processed using manual re-entry. This allowed me to calculate how well the scorecard made predictions based on the “old” (dirty), and the cleaned version of the database. Next, I ran all processed application through these two scoring models to seen how many applications would be assigned to a different category (accept/reject). In the industry, this is aptly named a so-called “confusion matrix.”

In the credit card business you can estimate how much an application “costs” that gets rejected, when it really should have been approved: the marketing expense made to attract this applicant. In the case of Direct Marketing these costs can be readily calculated. The more expensive misclassified category are applicants that were accepted, when they really should have been rejected. A defaulting credit card customer costs on average a few thousand dollars in write-off and collection costs.

In this manner I created a highly valid estimate of the business benefits of cleaning the database. Since this customer was reluctant to share their default rates (which have a linear impact on the total dollar value I calculated), I provided a “best” (fair) guess, and a pessimistic as well as an optimistic dollar value based on industry standard average default rates (reasonable lower and upper bounds for default rates). My calculation came to $7-11M benefits for the upcoming (first) year.

Since this number (range) I came up with sounded unrealistic (too high), they had their internal credit risk expert redo my calculations (without telling me so, which is fine, btw). He, of course, did have access to some of the key parameters, and came to a whopping $10-15M. Immediately a plan to improve data entry accuracy was put into place, and every middle manager who later came to work in the back-office, was handed this report on his first day, to ensure the importance of accurate data entry was on everybody’s mind….

There’s a lesson in this case, maybe two. As the business owner later told me: “I was always aware that data quality issues were costly, and they are often more costly than you think.” But more importantly, I think, is another lesson I have learned over the years: if there is one language that every manager, in every business knows and understands, it is “dollars” (or Euros, Pounds, or what you have). Those numbers “stick”, and seem the most powerful lever to drive change in awareness of “data as an asset.”

Tom Breur, VP Data Analytics, Cengage: Tom has a background in Database Management and Market Research. He has specialized in how companies can make better use of their data. He is an accomplished teacher at universities, MBA programs, and for the IQCP (Information Quality Certified Professional) and CBIP program (Certified Business Intelligence Professional). He is a regular keynoter at international conferences. At the moment he is a member of the editorial board of the Journal of Targeting, the Journal of Financial Services Management, and Banking Review. He is Chief Editor for the new Palgrave Journal of marketing Analytics and was cited among others in Harvard Management Update about state-of-the-art data analytics. Follow Tom @tombreur. Contact him at tombreur1963@gmail.com

Leave a Comment Cancel reply