In different contexts, I am more and more confronted with an architectural discussion about whether data should be delivered in an integrated way or whether the data should be integrateable? In the case of the latter, the data is not yet integrated, but it is very easy to integrate. Integration is left to the data consumers.
Rick van der Lans, Independent Analyst, Consultant, Author and Lecturer, R20/Consultancy
Rick will be presenting the course, ‘Practical Guidelines for Designing Modern Data Architectures‘ 21 April 2021 via Live Streaming
This article was previously published here.
Integrated means that data is first corrected and transformed. For example, key values are synchronized, different codes identifying the same values are replaced with standardized codes, master data is used to identify the correct data, and incorrect values are replaced with correct values. And when all that work is done, the data is integrated and the integrated result is stored in a data platform and made available to data consumers.
With integrateable data, data is also corrected and transformed and the result is likely to be stored as well, but that’s where it stops. The result is not stored in an integrated way. The real integration is left to the data consumers. But the integration work they have to do is relatively straightforward, as the data has been designed to be integrated.
I always find discussions around integrated data versus integrateable data fascinating. Personally, I am increasingly leaning towards integrateable data to create more modular data architectures in which large, centralized components play a less crucial role and do not become a single-point-of-failure.
Note that this is not a new concept. For example, Ralph Kimball’s conformed dimensions make it easy to develop multi-star queries. In other words, the conformed dimensions make fact data stored in different fact tables integrateable. And, in a way, data vault models organize data in an integrateable fashion.
You could state that traditional forms of data integration represent early data integration or data producer-based data integration, while making data integrateable corresponds with late data integration and data consumer-based data integration.
I would recommend you to study your own data architectures and determine whether they provide integrated or integrateable data. If it is the former, ask yourself the question why and what are the pros and cons of that solution. Then determine the pros and cons if the architecture had supported integrateable data. You may be positively surprised.
Rick van der Lans is a highly respected independent analyst, consultant, author, and internationally acclaimed lecturer specialising in data architectures, data warehousing, business intelligence, big data, and database technology. In 2018 he was selected the sixth most influential BI analyst worldwide by onalytica.com. He has presented countless seminars, webinars, and keynotes at industry-leading conferences. For many years, he served as the chairman of the annual European Enterprise Data and Business Intelligence Conference in London and the annual Data Warehousing and Business Intelligence Summit in The Netherlands. Rick helps clients worldwide to design their data warehouse, big data, and business intelligence architectures and solutions and assists them with selecting the right products. He has been influential in introducing the new logical data warehouse architecture worldwide, which helps organisations to develop more agile business intelligence systems. Over the years, Rick has written hundreds of articles and blogs for newspapers and websites and has authored many educational and popular white papers for a long list of vendors. He was the author of the first available book on SQL, Introduction to SQL, which has been translated into several languages with more than 100,000 copies sold. Recently published books are Data Virtualisation for Business Intelligence Systems and Data Virtualization: Selected Writings He presents seminars, keynotes, and in-house sessions on data architectures, big data and analytics, data virtualization, the logical data warehouse, data warehousing and business intelligence.
Copyright Rick van der Lans, Independent Analyst, Consultant, Author and Lecturer, R20/Consultancy