What is a data warehouse, and what does the term mean today?
This industry, over the past 20 years, has seen many vendors forging definitions for data warehousing that conveniently matched their current technology capabilities. It’s left the industry in such confusion that it’s important to surface what each party’s definition of data warehousing is before having a conversation about it.
Organizations go on for years and even decades without having this documented. To a large degree, it doesn’t matter what the definition is for the company, but it matters that team members, and the user community, see it similarly. This is especially paramount between the CDO and the VP of Data Warehousing.
Data Warehouse Components
To identify all of the requisite data warehouse components, it is necessary to clarify several aspects. Is the data warehouse the large physical database? Does the term include the data marts? How about the data lake(s), which many are using as a term that overrides their (databases formerly known as the) data warehouses? Does it include only those databases under central control? Only those databases that are “architected”?
Some use the term to mean the entire analytics infrastructure. Is the VP of Data Warehousing the VP of that? This leads me to my next couple of questions.
What is the domain of the data warehouse in the enterprise?
Not every data warehouse (I am using the term to mean a physical database) is for the entire organization – not by a long shot. Unless this data warehouse is actually serving the entire organization, and not just a department, a subject area or a source like an ERP, then there must be multiple VPs of Data Warehousing or data warehouses without a VP. Why is this the case? Is it time to sort this out organizationally?
Follow-on question: Where does your budget come from?
This leads to the next set of primary and follow-up data warehousing questions:
Who is our enterprise data platform VP, and why do we have a VP for a single data platform?
For years, I have been encouraging data warehouse professionals to view themselves as data professionals. Data skills are in high demand operationally as well as analytically, and it’s the analytic side that has generated the most data skills over the last decade. It’s time the operations utilized those skills to, for example, add analytics to operations, do real data modeling for operations and ensure quality data is in use. Keeping their talents for the data warehouse is limiting the organization and thwarting analytic data maturity.
Follow-on question: Are you pushing all our data to the data warehouse?
While the data warehouse remains of utmost importance to organization analytics and is a must-have, it is increasingly the case that without accompanying structures, overall analytics are hampered. The fact is many new analytic functions don’t belong in the data warehouse. It’s usually not because adding data to a platform that is shared is not a good idea, or will not scale. It’s because the existing data warehouse has been built in a non-scalable fashion and/or the needs of the new applications greatly exceed the standards met by the existing data in the data warehouse. This is for load speed, performance, metadata, tooling and data quality. Eventually, even the best of data warehouses need a refresh and new applications aren’t signing up to be the guinea pig for that usually.
There are also technical reasons why the data warehouse, based on a relational database, is inappropriate for some analytic data. The characteristics of so-called big data for example – larger amounts of data with fluent ingest, but with a smaller, science-based user population – make the data warehouse inappropriate by price-performance comparison.
Graph databases, hyperlytical databases, master data management, etc. have a place in most enterprises and the data warehouse almost surely is not built on these. These need to be selectively added, and done so architecturally.
A couple of follow-up data warehousing questions:
Question 1: Do you have the credibility to control enterprise data platform decisions?
Question 2: How do you interface with data governance? With Master Data Management?
What is our analytic data maturity?
In order to competently answer all of these critical data warehousing questions, there are many things the CDO should know about the data environment, so he or she can represent it appropriately when managing the project portfolio.
This question is very encompassing and gets to the heart of many of these things.
Data maturity modeling is understanding where you stand vis-à-vis the possibilities across multiple dimensions. It is a cold, hard, quantifiable, objective look at data use in a company, department line-of-business, or whatever scope is modeled. Although the ratings are interesting and inform urgency, what’s most interesting should be what the next level in each category looks like and making plans to get there.
There are four follow-up data warehousing questions:
Question 1: What is the data quality score for the data warehouse?
Question 2: Why aren’t we using more streaming approaches for data integration?
Question 3: Why aren’t we more in the cloud?
Question 4: What has been the innovation in the data warehouse in the past year?
While answering all of these data warehousing questions in a single conversation is interesting, the ongoing dialog is more important. These two roles are hugely intertwined. The CDO needs to strongly align with those data stores that have high leverage in the organization, not the least of which is the data warehouse.
William McKnight has advised many of the world’s best-known organizations. His strategies form the information management plan for leading companies in various industries. He is a prolific author and a popular keynote speaker and trainer. He has performed dozens of benchmarks on leading database, data lake, streaming and data integration products. William is the #1 global influencer in data warehousing and master data management and he leads McKnight Consulting Group, which has placed on the Inc. 5000 list in 2018 and 2017.