The Need for Agile in Data Warehousing

In Business Intelligence, Data Management, Enterprise Design by IRM UKLeave a Comment

Print Friendly, PDF & Email

The world is changing. No – the world as we knew it in IT and data warehousing has changed. Big Data, Agile, and the Cloud are hot topics. But companies still need to collect, report, and analyze their data. Usually this requires some form of data warehousing or business intelligence system. So how do we do that in the modern IT landscape in a way that allows us to be agile and either deal directly or indirectly with semi structured data and the cloud?

Kent Graziano, Senior Technical Evangelist, Snowflake Computing, kent.graziano@snowflake.net

Kent will be speaking at the Enterprise Data and Business Intelligence & Analytics Conference Europe, 20-23 November 2017. He will be teaching the pre-conference workshop Agile Methods and Data Warehousing: How to Deliver Faster and presenting a conference session on  Data Warehousing: Today and Beyond.

First off, we need to change our evil ways – we can no longer afford to take years to deliver data to the business. We cannot spend months doing detailed analysis to develop use cases and detailed specification documents. Then spend months building enterprise-scale data models only to deploy them and find out the source systems changed and the models have no place to hold the now-relevant data critical to business success.

We have to be more agile than that. So we need to adopt, and adapt, to the agile world view. To do that, I believe there are a few things we must do.

Adopt an Agile Methodology

By this I am talking about SCRUM, Kanban, ScrumBan, or DAD (Disciplined Agile Development), among others.

Go read the blogs, read the books, study these methods. Attend a conference (like IRM UK). Figure out what will work for your organization’s culture and leverage the skills of your staff. One size does not fit all.

In past engagements I have used approaches primarily based on SCRUM and Kanban. Both have been very effective once we got our processes down.

If you need/want help, find a good agile coach.

Use an Agile Data Engineering Approach

If you want to develop your data warehouse in an agile, iterative manner, then you need a way to design your EDW repository that lends itself to this approach without causing huge re-engineering pains (known as refactoring) in future iterations.

The best way I have found is using the Data Vault modeling approach. It was designed specifically for building data warehouses in this manner. I have written much about this approach and give many talks showing examples of successful agile projects using Data Vault. And there is plenty of material available to help you learn how to do it (see the books on the sidebar of my blog).

Use Data Warehouse Automation Software

No better way to get agile and deliver results fast, than to automate as much of your development work as possible. If you use repeatable patterns (like Data Vault) in your design methodology, then it is even easier to automate and greatly reduce your time to market.

There are two vendors in the market that I like a lot and have had some experience with. They are WhereScape and AnalytixDS. And both support not only “traditional” approaches to data warehousing (like automating the ETL for a Type 2 Slowly Changing Dimension) but they both also support Data Vault.

Which of these tools you might use depends on your approach, your current tools, and your skills.

If you are coming from a more traditional DW paradigm and use ETL tools like Informatica, Talend, or DataStage, then I would recommend you look at AnalytixDS Mapping Manager which allows you to generate your ETL code from source to target mappings.

If you are just getting started or are committed to more of a database-centric approach and want your ETL or ELT code to run in the database, then look at WhereScape’s products.

Both are great companies with knowledgeable people and happy customers.

Your third option is to write your own automation routines. There are many shops doing that as well. Just be sure you have the appropriate skills in house and can allocate the upfront time to get going (a month or so at least).

Deploy on an Agile Data Warehouse Platform

So now that I have learned about Elastic Data Warehousing in the cloud, I can’t imagine trying to do an agile DW project any other way. The cloud is the future for data warehousing and the cloud is here now.

Of course I am referring to Snowflake Computing’s DWaaS (data warehouse as a service) offering. Yes, I might be a bit biased since I do work for them now, but…this tech is really good!
From a features perspective, what I am talking about is having a high powered, easily scalable database that supports BI and analytic workloads and does not require a ton of time to configure and tweak.

Why do I think that is a success criteria? Because I have spent way too many months on way too many “agile” projects waiting to get access to the hardware! Or I get access and we either run out of space (e.g., “we had no idea you need THAT much storage”) or we can’t properly test production level loads and queries because the development box does not have enough horsepower.

Taking advantage of the elasticity of the cloud solves both of these problems and the folks at Snowflake have successfully built an RDBMS in the cloud that specifically harnesses these features and leverages them for data warehouse and analytic workloads by providing the ability to scale up and scale down both storage and compute resources on demand.

That and its many other features, give me the agile infrastructure I need to get an agile data warehouse project off the ground almost instantly. And I can do a Data Vault on Snowflake too.

Very cool.

So what do you think? Are you ready to accelerate your team’s performance and adopt an agile approach to data warehousing?

Kent Graziano is a Senior Technical Evangelist with Snowflake Computing and the author of The Data Warrior blog (http://kentgraziano.com). He is a Data Vault Master and certified Data Vault 2.0 Practitioner (CDVP2), Oracle ACE Director, member of the OakTable Network, former member of the Boulder BI Brain Trust (#BBBT), expert data modeler and architect with over 30 years of experience, including over two decades doing data warehousing and business intelligence, in multiple industries, with multiple architectures. Kent is an internationally recognized expert in Data Modeling and Agile Data Warehousing. He has developed and led many successful software and data warehouse implementation teams, including multiple agile DW/BI teams. He has written numerous articles, authored three Kindle books (including A Check List for Doing Data Model Design Reviews and An Introduction to Agile Data Engineering), co-authored four other books on Data Modeling and Data Vault, and has given hundreds of presentations, nationally and internationally. In 2014, he was voted one of the best presenters at OUGF14 in Helsinki, Finland. Follow Kent @kentgraziano

Copyright Kent Graziano, Senior Technical Evangelist, Snowflake Computing

Leave a Comment