The data warehouse, the data lake, the data hub, and also the data lake house differ in many ways. But they have a few things in common. First, they are all monolithic, meaning that they are all developed using one data platform to store all the data. These platforms can be implemented using, for example, SQL database servers, file systems, and Hadoop clusters. Second, the monolithic characteristic ensures that they are centralized and managed by a single group of data engineers who must be hyper-domain experts to work with all that data.
Rick van der Lans, Independent Analyst, Consultant, Author and Lecturer, R20/Consultancy
Rick will be presenting the course, ‘Practical Guidelines for Designing Modern Data Architectures‘ 21 April 2021 via Live Streaming
This article was previously published here.
In this blog, I want to focus on the third characteristic they have in common: they are all multi-domain data platforms. They are all loaded with data from multiple business domains, such as human resources, invoicing, transportation, finance, and manufacturing. We do this almost blindly without asking ourselves if it makes sense?
Let’s face it, most data consumers use data from a single domain. It’s highly unlikely that, for example, employees working for the human resource department use manufacturing data, or vice versa. Most of the data produced by a business domain is used by that domain. Obviously, there are data consumers who need data from multiple domains. But is it worth creating a large centralized data platform to make it easy to support a minority of the data consumers easily, while it is irrelevant to the majority?
From this perspective, I like the data mesh architecture that focuses on a more decentralized data architecture with separate solutions for separate domains. This is consistent with the assumption that most data consumers use data from the domain they work for. Evidently, in a data mesh, the domains need interfaces that allow data from different domains to be easily integrated for the multi-domain data consumers.
In more traditional architectures early domain integration is deployed, while the data mesh and other architectures support late domain integration. With the latter you try to avoid the development of these massive centralized and monolithic data architectures. Note that I am not saying that we should stop developing these centralized data architectures, but we should always consider the pros and cons beforehand. I have the feeling this is not always done explicitly in projects. Centralized data architectures are optional, not mandatory.
Rick van der Lans is a highly respected independent analyst, consultant, author, and internationally acclaimed lecturer specialising in data architectures, data warehousing, business intelligence, big data, and database technology. In 2018 he was selected the sixth most influential BI analyst worldwide by onalytica.com. He has presented countless seminars, webinars, and keynotes at industry-leading conferences. For many years, he served as the chairman of the annual European Enterprise Data and Business Intelligence Conference in London and the annual Data Warehousing and Business Intelligence Summit in The Netherlands. Rick helps clients worldwide to design their data warehouse, big data, and business intelligence architectures and solutions and assists them with selecting the right products. He has been influential in introducing the new logical data warehouse architecture worldwide, which helps organisations to develop more agile business intelligence systems. Over the years, Rick has written hundreds of articles and blogs for newspapers and websites and has authored many educational and popular white papers for a long list of vendors. He was the author of the first available book on SQL, Introduction to SQL, which has been translated into several languages with more than 100,000 copies sold. Recently published books are Data Virtualisation for Business Intelligence Systems and Data Virtualization: Selected Writings He presents seminars, keynotes, and in-house sessions on data architectures, big data and analytics, data virtualization, the logical data warehouse, data warehousing and business intelligence.
Copyright Rick van der Lans, Independent Analyst, Consultant, Author and Lecturer, R20/Consultancy