As organizations handle increasingly diverse, complex, and voluminous data, many are transitioning from on-premises data warehouses to data warehouses in the cloud. In this new environment, organizations are managing real-time data from sources such as machine data and sensor data for advanced analytics, including machine learning.
In this video, Mark Cusack, CTO at Yellowbrick, Mark Atchison, Senior Manager, Enterprise Data Warehouse at Bread Financial, and Fern Halper, VP of Research, and Senior Research Director for Advanced Analytics at TDWI discuss modern cloud analytics environments, focusing on real-time data management, and best practices for migrating, managing, and analyzing real-time data.
Bread Financials’ traditional data warehouse was unable to keep up with the company’s growing data needs, leading to slow query times and difficulties in accessing and analyzing data.
To address these challenges, Bread Financial decided to move to a modern analytics environment with the Yellowbrick Data Warehouse. Learn how Yellowbrick allowed Bread Financial to easily and quickly load and analyze large volumes of data, as well as easily scale as their data needs grow.
Hello, everyone, and welcome to the TDWI webinar program. I’m Andrew Miller, and I’ll be your moderator today. For today’s program, we’re going to talk about Moving from a Traditional Data Warehouse to a Modern Analytics Environment: The Bread Financial Journey. Our sponsor today is Yellowbrick, and for our presentations, we’ll hear first from Fern Halper with TDWI. After Fern speaks, we will be joined by Mark Atchison with Bread Financial and Mark Cusack with Yellowbrick.
Before I turn the time over to our speakers, please allow me to go over a few basics. Today’s webinar will be about an hour long. At the end of our presentations, our speakers will host a question-and-answer period. If at any time during these presentations you’d like to submit a question, just use the ask a question area on your screen to type in your question.
If you have any technical difficulties during the webinar, please click on the help area located below the sliding window, and you’ll receive technical assistance. If you’d like to discuss this webinar on Twitter with fellow attendees, just include the #TDWI in your tweets. Finally, if you’d like a copy of today’s presentation, use the Click Here for a PDF Line there on the left middle of your console. And in addition, we are recording today’s event, and we’ll be emailing you a link to an archived version so you can view the presentation again later if you choose or if you’d like to share with a colleague.
Again, today we’re going to be discussing moving from a traditional data warehouse to a modern analytics environment, and our first speaker today is Fern Halper. She’s Vice President and Senior Director of TDWI Research for Advanced Analytics. She’s well known in the analytics community, having been published hundreds of times on data mining and information technology over the past 20 years. Fern is also the co-author of several Dummies books on cloud computing and big data. She focuses on advanced analytics, including predictive analytics, machine learning, AI, cognitive computing, and big data analytics approaches. She has been a partner at the industry analyst firm Hurwitz & Associates and a lead data analyst for Bell Labs. She’s taught at both Colgate University and Bentley University, and her Ph.D. is from Texas A&M University. Please welcome Fern, and I’ll hand it over to you now.
Great. Thanks, Andrew. Hi, everyone. Welcome to this webinar today on Moving From a Traditional Data Warehouse to a Modern Analytics Environment. We know that you’re busy and we really appreciate you attending today. So this is what we’re going to do: before we hear a case study from Bread Financial, I’m going to introduce the topic and talk about the modern data and analytics environment. Then as Andrew mentioned, we’ll hear from Mark Atchison from Bread Financial about how a cloud data warehouse is enabling Bread Financial to do things that they couldn’t do before and what their cloud data journey looked like.
We’ll also have a roundtable discussion with Bread Financial and then also with Mark Cusack from Yellowbrick about the cloud and the modern analytics environment. Be sure to be thinking also about your questions because we’ll do an audience Q&A at the end, and Andrew gave you the instructions for that.
Okay, so it’s no surprise to anyone that the data landscape is changing. At TDWI, we see organizations collecting an increasing amount of diverse data. This includes not only structured data, it includes unstructured data, it includes other kinds of data, geospatial data, time series data, machine data, and so on. That data comes from many sources. Those sources even include structured data sources such as SaaS systems. With all of this, organizations are typically collecting hundreds of terabytes. At least a number of them or a majority of them are collecting hundreds of terabytes. We also see organizations collecting petabytes of data.
Why are they collecting it? It’s a business reason. It’s to support analytics and advanced analytics that are becoming more popular. So it’s about self-service analytics, which is where many companies are now, but it’s also about these more advanced analytics such as predictive analytics and machine learning where we see that the demand continues to grow.
Organizations want to be able to do things like predict churn or predict fraud and better understand their customers. They want to be able to deploy new applications that might make use of more advanced technologies like deep learning, say, for image recognition. There’s a number of applications we see for that.
When you think about this more advanced analytics, it’s often very compute-intensive and it’s iterative. The reality is that traditional on-premises data warehouses often can’t meet the needs between the volume of data, the types of data, and the speed that the data’s coming. I mean, there are some that can obviously, but oftentimes we see that it can’t.
I mean, just to give you a sense of some of this new data that’s supporting analytics, here’s a chart from 2022 where we ask the question, “What kind of data is your organization currently managing and looking to manage in the next year?” And so the blue is managing now, and the red is managing in the next year. The green is having no plans.
You can see, of course, the top bar there, that structured data rules, but remember, as I was saying, this isn’t just necessarily data from a billing system, it might be data from a SaaS system or a third-party industry data. It could be weather data. It includes internal text and all of that kind of data. You see them all down there on the list, structured data, demographic data, time series data, machine data, text data, geospatial data, all of them. Even machine data down towards the bottom of the list is at over 20% with plans to collect more. And even at the bottom, the video data, also with plans to collect more.
The reason why is that organizations want to be able to enrich their structured datasets with other kinds of data for analytics. That might be extracting sentiment data from text data and then marrying that to structured data, or using machine data for proactive maintenance to be able to identify and fix problems with equipment before they occur. The data that’s associated with past failures in that use case might be used to predict the probability of a future problem. And then as new readings are taken, the data, it goes through the model, and the model can alert people if there’s a probability of failing.
The point is that enriched datasets are valuable datasets. So organizations moving to analytics, looking to capture all different kinds of data and higher volumes of data are more often moving to the cloud. A major trend then is this move to the cloud.
Here’s another chart again from the 2022 survey where we asked, “What data management and analytics tools does your organization currently use? Please select all that apply.” And you can see, if you look at the top of the chart there that there’s this on-premises data warehouse, and 61% are still using a data warehouse that’s on-premises along with analytics tools that are on-premises. You could see that at the top of the chart.
But if you just go down a few bars, you can see that 55% are already using a data warehouse in the cloud. And so, what that says to me is that the gap between a data warehouse in the cloud and the data warehouse on premises is shrinking. We’ve been tracking this data for a long time, and it used to be that we would see 75, 80% of respondents had a data warehouse on-premises and a lot fewer had one in the cloud. But that gap is actually shrinking. About 50% of the respondents to this survey were using both an on-premises data warehouse and a data warehouse in the cloud. So they’re doing both.
Likewise, more respondents in this survey were using a data lake in the cloud than one on-premises. So more we’re using one in the cloud, at 44% versus 20% using one on-premises. So moving straight to the cloud if they’re looking to store data in a data lake. What this means is that hybrid environments are the norm. For instance, I was showing you that 50% of the respondents were using a data warehouse on-premises and in the cloud. So they’re moving to the cloud, but they still have an on-premises deployment, and they may always have on-premises deployments. Likewise, they’re often moving to more than one cloud, and that’s also a hybrid environment. The question becomes, how do you enrich datasets in a hybrid environment for analytics, and how do you bring all of this data together?
So oftentimes, that’s where we see organizations making use of a data fabric approach. We’re defining a data fabric here as a way to map and connect relevant application data stores with metadata together. In essence, the data fabric is teaching together disparate data in an intelligent fashion.
One example of a data fabric approach is data virtualization, where data might be integrated via a layer that connects a number of different systems. And so, it creates these logical views in which the data looks consolidated, although the data hasn’t necessarily been moved or physically altered. So when users go to use the virtualization layer, they don’t necessarily have to worry about what’s beneath. They just see that virtualization layer.
In fact, in a recent survey, the majority of respondents to that survey felt that data virtualization was a good choice for integrating and unifying data because it gets rid of data replication issues and makes data integration more real time. That can really help with the data democratization efforts, the self-service that I was talking about earlier, because the data virtualization layer creates this logical view in which the data looks consolidated. So people aren’t necessarily going around searching for all of their data, they’re getting it through the abstraction layer that can also point to self-service BI tools.
What we see is a number of organizations deploying a data fabric approach. This comes from a 2022 survey where we asked, “What do you believe is the best way to unify your data storage environment?” You could see that on the left there, that first green bar at about 30% of the respondents were saying, “We’re going to centralize all of our data in a cloud data warehouse or data lake platform.” So that’s how they were going to unify their storage environment.
You could see that about 20% say that they were going to utilize a data fabric approach. And the majority, about 45%, said it was going to be a mix of approaches were appropriate. They’re going to use both where it makes sense, and that probably makes sense. I mean, so that’s a brief introduction about today’s topic. What I’m going to do is turn it over to Andrew so he can introduce our case study speaker. So Andrew, over to you.
Thank you so much, Fern, for that great presentation. Before I introduce our next speaker, just a quick reminder to our audience that if you do have a question, you can please enter it at any time in the Ask a Question window. We will be answering audience questions in the final portion of this program.
Our next speaker is Mark Atchison with Bread Financial. Mark has spent the last 19 years at Bread Financial and is privileged to lead teams that built and supported Bread’s enterprise data warehouse from conception to maturity. It delivered governed client dashboard data by 4:00 AM daily, Sarbanes Oxley financial reports, and workforce planning for hundreds of end users and thousands of employees.
Please welcome Mark, and I’ll hand it over to you now. For those that are tuning in, Mark is dialed in today, but you will still be able to hear him. Thanks so much, Mark. Go right ahead.
Hey, thanks, Andrew, and thank you, Fern, for the great lead-in there. Yeah, so I appreciate the opportunity to tell you a little bit about our journey and a little bit about our organization.
Go through the agenda, quick overview of the agenda, we’ll talk about Bread Financial real quickly, who we are, and then what has been our guidance for our journey through data and analytics and what we’ve done over the last… I don’t think we’re actually going to go back to 2000, that’s a bit far, but maybe over the last decade or so. And then talk about what our plans for the future are and how we see, obviously, cloud continuing to drive and eventually dominate our data and analytics platforms.
Bread Financial is, as you can see, a financial services company. It is a very rapid growth organization. It’s been, you see there about 5000% over 20 years. I’ve been there about 20 years, I don’t know if they get to claim cause and effect there or not, but maybe I’ll see if that is something my boss thinks was a true connection. That growth has been a driver of our data analytics journey and of our decision-making processes for how we will grow into the future.
The demand for data that comes along with growth like that is just significant. For example, the data warehouse portion of this or the data warehouse analogy is that about a decade ago or about 15 years ago, 12 to 15 years ago, we began the data warehouse for 20 analysts. Now there are 20 analytic departments, and we have 1,000 direct end users of our data warehouse and thousands more that are indirect consumers of that data through business intelligence platforms and other environments like that. The appetite for what we can do is very high.
Like a lot of organizations, we’ve been driven by and matched ourselves against the EIM maturity model. Probably taken a little bit of a lily pad approach to all of these different objectives and different goals with different degrees of success. But our journey has been just very fast moving.
Specifically, if we talk about how we began and how we have moved toward cloud, we began entirely on premises about a decade ago, 12 years ago, entirely on premises with the Netezza data warehouse appliance. It was later acquired by IBM. We were really only single purpose at the time. We were just trying to relieve the cost and expense of people running analytics, these 20 people running analytics on our mainframe, our banking systems.
Besides relieving those costs, we also needed to meet some emerging data security requirements that just couldn’t be met effectively on the data warehouse. Well, we did pretty well there. And before we knew it, 20 analysts turned into 200, turned into 500, and now for several years has been almost 1,000 of these end-user analysts and using different tools against the data warehouse. We’ve moved into business decision support. As a financial institution, of course, one of the major focuses we have is on managing losses and managing fraud. Fraud is a heavy consumer of the data warehouse.
For all of our banking customers, we run call centers, which is a huge investment as our organization. All of the tools and planning currently go into that for handling the caseloads or handling the call loads within the care centers, or bounces off of the data warehouse as well. We moved forward, we got a few years into that, and we had success, we had a huge pile of raw data, and it was becoming more and more difficult for large groups of these users to make sense out of it and come up with the same answers.
And so, that moved us into the early stages of data governance practices and defining data marts and coming up with governed definitions of metrics and CDEs and KPIs so we could start to provide that one version of the truth. We then also started to see additional curation aggregation, time series of this data. This was really the foundation for a lot of the analytics that now began to take over.
As we moved towards the end of those first five or six years, our consumption layer started to evolve as well. Originally, a lot of our end users were just using desktop SQL. Actually, still an alarming number of people still use desktop SQL. We have probably more SQL programmers than a lot of organizations do. But everything was also on-premises for analytics and primarily using SaaS as a tool for modeling, a tool for analytics, for ETL and on-premises MicroStrategy, everything on-premises.
And so what we began to see here in this middle of the last decade was all of those tools start to migrate to the cloud. MicroStrategy cloud implementation was completed. We moved away from SaaS and into initially a lot of, again, just hardcore programming, Python, HAR, things like that. But now for the last couple of years, we’ve been having great success with the Dataiku Product, again in the cloud.
In order to take advantage of those, that meant that this huge treasure curve of data that we had put together in the on-premises data warehouse had to be accessible to the cloud. Of course, it could be accessible through direct SQL access, but the amount of data that you’d be moving back and forth for every single analytical query was just overwhelming. It was not efficient or effective.
And so, we came up with the concept of a data hub, an enterprise data hub, or the beginnings of our data lake and essentially moved a copy of that data from our on-premises data warehouse appliance out to the cloud to enable all of these cloud tools. Beyond that, then those tools now started to build out additional capabilities, machine learning, self-service analytics for both associates or employees, internal self-service, and external for our clients and customers.
But the process was initially still all of the data routing through our on-premises data warehouse. Of course, that made it not a single point of failure, but a single point of entry, which meant it was a roadblock at times. And so, we began ingesting data directly into the data hub and into the data lake. Maybe it’s a little bit of a backward approach as to how you would start to generate cloud data because we started on-premises, but now at this point, our footprint of data in the cloud is much larger than the footprint of data that we have in the on-premises data warehouse.
In about 2018/2019, IBM decided to get out of the appliance business, and we’re glad they did because then that brought us to Yellowbrick. Although we looked at some cloud platforms at the time, we didn’t find a great fit in 2018/2019. Not so much because of the capabilities of the data warehouses in the cloud; the funding model wasn’t great for us then, but we just have such a dependency on the on-premises, it’s appliance with regards to use cases and business processes that moving all of those in a rapid fashion was not really feasible.
We ended up selecting the Yellowbrick appliance as a successor and couldn’t have been happier with that choice. It was fundamentally what we wished IBM and Netezza would have grown up to be. And so, we really leveraged that heavily in the time since that transition.
I think it ended up being about twice as big as our Netezza footprint, and think that would last, with Moore’s Law, that lasted about two years. And so, we found ourselves in 2021/2022 essentially maxing out that on-premises appliance. And so now we were at – just within the data warehouse space – we were at a decision point of, “Do we invest in additional on-premises, or is this the time for us to start looking at cloud?” As we looked around, we looked at this what you see across the top line in the previous slide and this slide, you see all of our consumption layer moving to the cloud.
We’ve also now seen a roadmap from our overall IT office that they would like to be out of the data center business by 2025/2026 entirely. The choice is pretty clear. We are now at the end of the journey so far. As we look at 2023, this is where we are. We want to see this move toward the data fabric environment.
We see some of the motivating reasons. We’re moving to cloud out of necessity and out of opportunity. We’ve had great success with our tool deployments and our use cases in the cloud. We haven’t moved all of the business dependencies from on-prem to the cloud, but the ones that we have in the cloud have been very successful. We want to be able to take advantage of some of the things Fern discussed there. We want to stop making copies of the same data over and over and over again. We would like to see that virtualization layer where it makes sense. It may still make sense for us to create time series or aggregations or CE-type definitions or additional copies of that data in the cloud, but we want to have a much more intentional model there where we’re taking advantage of tools and of processes that are part of what a data fabric movement is all about.
So skipped over a couple of our roadmap pieces, but I think the main point is still just that while we’ve had incredible success on-premises and it’s been the way in which we’ve grown so far, it’s clear that if we’re going to continue to keep up with the pace of our business and where the tools are going and where our IT roadmap is taking us, we need to go into this direction. We need to go in that.
While Yellowbrick on-prem will continue to be incredibly important to us for some time just because of that dependency and all the business processes and decision support use cases that already exist, migrating those will take a while for the business, but Yellowbrick’s cloud offering will be a fundamental component of our logical data warehouse as we dive into that next year and beyond.
Hit most of my points. Andrew, I will send it back to you, and if you want to maybe get some perspective from Yellowbrick. Thanks.
Thank you so much, Mark, that was a great presentation. This does bring me to our next speaker today, who is Mark Cusack, the Chief Technical Officer at Yellowbrick. Before joining Yellowbrick, Mark was Vice President for Data and Analytics at Teradata, where he led a variety of product management and technology teams in data warehouse and advanced analytics groups. He was also the Chief Architect of Teradata’s Internet of Things analytics effort. Mark joined Teradata in 2014 when Teradata was acquired by the startup RainStor, where he was the co-founding developer and chief architect.
Prior to Rainstor, Mark was a lead scientist in the UK Ministry of Defense. Mark holds a Ph.D. in computational physics from the Newcastle University in the UK with a thesis centered on discovering the electronic and non-linear optical properties of quantum dots. As a research fellow at Newcastle, he developed new techniques to model these novel quantum structures using a large-scale parallel and distributed computing approaches. Please welcome Mark. I’ll actually bring back Fern so that we can begin the panel discussion.
All right. Welcome both Marks, I guess, Mark from Bread Financial and Mark from Yellowbrick. Let’s get into our panel discussion here with the first question about… We’ve been talking about the cloud and some opportunities, let’s talk more about that, what opportunities do you see for the cloud and advanced analytics? For example, what can be done that couldn’t be done before? Mark Cusack, I’ll start off with you since you were just introduced.
Hi, Fern. Nice to be here. Well, I think Mark explained a lot of the reasons and rationale for what they couldn’t do before on-prem and where they want to go to in the future. But of course, for companies like Bread Financial and many others that grew up in on-prem data centers, they were typically running their analytics, their core data warehousing in a very fixed capacity. What they would end up doing is tuning those resources to make the maximum use of the floor tiles they had in their data centers.
But they really had little in reserve to do additional workloads. I think obviously what cloud brings you is the elasticity. It brings you the ability to start to experiment, to scale out, to try new use cases that ultimately will drive more business value much, much faster. I think there are other aspects to it, as well as data gravity moves into the cloud there are more public data sets available, there are a growing breadth of cloud-native AL, AI, and ML services and data integration services to take advantage of there. And so, the center of mass really around data analytics is moving there, and I think that’s opening up a lot of new opportunities too.
Are there any use cases that you see your clients undertaking with some of these new advanced analytics?
Absolutely. It really speaks again to the ability to take advantage of aspects like data sharing, new public datasets. And so, we have customers that, for example…. companies we’re working with that are augmenting their existing data warehousing roles within the cloud to drive a lot of AI and ML work. One company we’re working with in particular today, for example, spends a lot of time using drones to take photographs of natural disasters or of even things like electricity transformers.
And so, they do a lot of machine learning and manipulation to understand, for example, the risks of these transformers malfunctioning and causing wildfires. And so, what they’re doing is they need the capacity of the cloud to store millions upon millions of images, and they need the ability of the AI and ML capabilities in the cloud to do that image processing. But they still need the data warehouse to actually bring a lot of that metadata together to allow them to do a lot of analytics over that too. So we’re seeing a whole bunch of new use cases of data lakes meeting these new ML opportunities.
Yeah, really interesting. Mark Atchison from Bread Financial, what about opportunities? I know you talked about some of them. Are there others that you’re thinking about that can be done that couldn’t be done before?
Yeah, no, that’s all really spot on with what Mark Cusack just gave us there. That has been our experience as well. I think we probably felt like we were about as good as anyone at maximizing what we could do on-premises. We pushed all of our tools to the edge. But that’s the edge, and now what do you do next? What we found, also that the newer tools, the newer opportunities, whether they be in machine learning or business intelligence, any of those tools, those tools are investing in the cloud as well. So you want to be where that investment happens.
I actually was in a work call this morning talking about… Our end users, our end consumers, 100 million or so of them, do lots of things. They do those things online, they do those things on their phone, and they’re doing those things through our app. There’s so much data that is out there. Being able to unite that non-structured data with the structured metadata that goes with it, that’s beyond our ability today. We really can’t do that. It’s a struggle just to harvest the structured data off of it and bring it into the data warehouse.
But to be able to put those together under a data fabric and be able to do both, be able to deep dive or do read intentions or read customer sentiment data and see how that aligns with expenses going up, profits going down, whatever that might be, say profits going up, that’s more fun, that’s something we just can’t do today.
Actually, just to follow on a little bit from what Mark said, I think he makes a really good point. As vendors move to the cloud over the last 10 years, we’ve taken the opportunity to really think the user experience around data and analytics as well, particularly in data warehousing. If you look to what the UIs and user experience in data warehousing looked like 10 years ago, it looked terrible. I think today we’ve taken a fresh look with much more accessible APIs, easier to join different services together in the cloud. All, at the end of the day, is driving much more of a self-service capable experience than we’ve ever had before.
Yeah, I mean, so building on that, in terms of this data warehouse, Mark Cusack, I’ll start with you, what is the role of the data warehouse in future cloud architectures?
The data warehouse will not go away. I think it’s been a pivotal element of every enterprise’s data strategy for the last 20 or 30 years. Very often the crown jewels of data is in that enterprise data warehouse. I don’t think that picture will change. But what we’ve obviously seen over the last five to 10 years is the growth out into data that isn’t just directly suited to highly structured SQL analysis. We’re looking for more semi-structured/unstructured data.
I mentioned the image use cases before, but at the end of the day, any output from an ML process is actually structured in nature. And so, what we’ll see is just more and more data, whether it’s metadata from ML output and processes or scoring needs going into a data warehouse and being managed there. I think it’s still going to be a first-class citizen as part of a wider data lake house architecture as we build out.
Yeah, that makes sense for sure. It seems like more organizations are looking at having both a data warehouse and a data lake or thinking about them as one, trying to unify in some way. Mark from Bread, how about you, what role do you see the data warehouse even at your company in future cloud architectures? You showed that picture, so it’s obviously part of the future cloud architecture.
Yeah, for sure. In our case, I would’ve said a year ago that the data warehouse was the foundation of our cloud architecture. We inverted it. We started with the data on-prem, and we pushed that data warehouse and copied it out to the cloud. We’re over time considering how we move our use cases there, but it’s really now not that at all.
Now the data warehouse is almost our unit of growth in the cloud. So rather than have one data warehouse that we’ve inverted and made available in the cloud, we want to leverage there. There are seven or eight or nine data warehouses that we have acquired through acquisition, through growth that they’re already existing data structures. Decomposing those is probably neither feasible nor necessary. How we put together two, three, four, five, six, seven data warehouses in any kind of a coherent fashion, that’s becoming the unit of growth for us in the cloud, not just the foundation of how we started.
Yeah, that’s interesting.
I think that move to a logical data warehouse is incredibly important. Again, going back to that first question in a little sense around what can you do in the cloud that you can’t really do on-premises, so when you start to have a logical data warehouse with this data fabric data virtualization layer across it, obviously it allows you to reduce the amount of data copying and movement that you have to do, which is incredibly important. But it actually makes it easier to satisfy regulatory use cases and data sovereignty use cases.
We have customers, for example, that they have themselves customers in India whose data cannot leave the Indian subcontinent. And so, with cloud, it makes it easier for them to spin up a Yellowbrick instance in India. And if they want to do some level of data virtualization that matches the regulations that they need to adhere to, they can do that. I think the future is delocalized schemers, its data fabrics, and its cloud data warehousing and tying it this all together.
Especially to address the use cases that we were talking about, some of the more advanced ones. I know that we got a couple of questions around actually migrating to the cloud. Some questions came in from registrants, et cetera. This question is a good one in terms of the top challenges faced by enterprises when they’re trying to move to the cloud. Mark from Bread, let me start with you. Were there any challenges that you faced when trying to move to the cloud? How did you address those?
Yeah, naturally, of course, I mean, anything is going to present some amount of challenge. We’ve had a lot of success adopting the cloud. Moving implies leaving on-premises behind, and so that’s the biggest challenge for us. We’re expanding very well, and the cloud has been very, very good to us so far and there’s a huge future ahead of it.
Moving to the cloud is a little different than just adopting and expanding and adding the cloud. I think that’s our biggest challenge. When you talk about adoption, it’s not just that we have created the capability and made those tools available, made that analytics available, it’s getting the business and the customers to adopt that as opposed to continuing to rely on prior platforms.
So more of a cultural type of thing. Mark from Yellowbrick, what about you, what do you see your clients being faced with, and what are they doing to overcome it?
I mean, again, on the cultural side of things actually, it often it leads to a skillset gap because now you’ve got to retool and re-skill your organization to deal with some new concepts, new tooling, new approaches in the cloud. Things can get particularly fraught if you are maintaining that hybrid stance where you start to have potentially different technology stacks in the cloud and existing in your on-prem data center. I think that’s one potential issue.
I think another one is where do you even start? I always come back down to the data warehouse obviously, I mean I’m biased to that, but when you think about migrating your data warehouse and your workloads from on-prem to the cloud, it’s not just the data warehouse that you are moving, it’s the entire ecosystem and all of this connective tissue upstream from the data sources and data integration and ETL parts of the equation and then downstream, the BI tool and reporting layers as well.
And so, there’s a tendency, what do you do? Do you take the opportunity to think of the move to the cloud to being the ability to start with a blank sheet of paper, to completely rethink your analytical ecosystem from scratch? Or do you go, “Well, it’s too expensive or time-consuming to move everything at once. Maybe we move the data warehouse but we try and maintain those “legacy” components around it as well. What I see working best is that kind of not boil the ocean amongst our customers, they make gradual moves over time, prove success, prove some ROI on it, and then move on. Otherwise, it can get quite a hair-raising experience.
And if the customers were using Yellowbrick on-prem and then they’re moving to the cloud, I know that there’s a lot of plumbing, and obviously, that goes with it, but is it easier in some respects because they’re not at least trying to utilize… Is it a platform that they’re familiar with, I guess, is what I’m asking? So that piece of it at least, does that make it easier?
Yes, it is a huge help actually, Fern, because you’re able to run the same workloads in Yellowbrick on-prem that you can run in the cloud, so there is no workload conversion or workload migration that needs to take place. We maintained all of our linkages to those legacy ecosystem tooling around ETL and the BI side of things even in our cloud instance. Again, with Yellowbrick, you don’t have to boil the ocean, you can modernize, move to the cloud with Yellowbrick Data Warehouse, move your workloads there, but still maintain your ETL processes until the time comes that you want to migrate those over too if you wish.
Okay. Yeah, yeah, that definitely makes sense. We were talking about the logical data architecture and the data fabric, so let’s just talk a little bit more about that. What about the logical data architecture and what about it in terms of bringing data together for analytics? Mark from Yellowbrick, I know you were talking about that, but how does that typically work in cloud environments and in the hybrid environments from your perspective?
Well, it works actually in pretty much the same way, although you’ve got some additional considerations to account for. I mean, we have partners like Denodo that can provide data virtualization above Yellowbrick and allow you to join Yellowbrick Data with third-party data warehouses or other sources like that. And that’s existed on-prem, and it works as is today in the cloud.
I think things that you have to start to consider is the cost of doing this because cloud providers always attach a cost to things like moving data and making API calls, things that you probably wouldn’t necessarily think of when you’re in an on-prem environment. They’re basically costs that are amortized over the lifetime of the spend of your on-prem data warehouse.
So you really have to start to think, “If I’m going to join data in AWS with data on Azure, what’s the cost of moving that data to join those, for example, two tables together in different locations?” And so, you have to start to think about which technologies will allow me to minimize the amount of data movement when I’m doing virtualization or query fabrics and laying these out. So you need to consider these other aspects as well as things like data latency that start to become more apparent.
Yeah, I’ve heard some about latency issues. But also my understanding was some of that has gotten a lot better in terms of latency and involved with, say, building a machine learning model or something that’s very iterative. It used to be that data virtualization wasn’t so hot with that, let me say it that way. Well, performance has gotten better.
I think obviously, yes, it has got better, but I would actually say things like the egress charges that the cloud service providers levy start to build up, and they can become more and more of a serious cost. So you start to need to take a real thought to, how do I target data fabric technologies that will push more of the processing towards where the data lives, where in that cloud and not try and pull everything together and unify it in one place, because it will get expensive?
Yeah, it’s a good point. Mark from Bread, how about you? You talked about the fact that I know you’re just now moving to data fabrics. Is there anything you want to say additionally about logical data architectures for your company?
I think I’ll tell you in a year, right? But I think it’ll be dictated to some extent by the tools and what works best with the tools that we’re employing for our workloads and the ones that get adopted on the cloud. I think I mentioned we have essentially 1,000 SQL programmers at our company that sit on top of our Yellowbrick on-premises appliance, probably about nine of which write good SQL. It’s not the way in which we want to be a mature organization analyzing data. I’d rather have about 20 people writing SQL and 980 of them using cloud tools to be able to access that data and to be able to do that use case I mentioned from this morning where today we can harvest some of that data, put it in a structured environment. But they also want to be able to deep dive or to look at the sentiment data that goes along with that structured data. The tools that can enable that kind of capability, what works for them, what physical setup, what logical setup works well with those tools, I think is going to dictate where we go in the first couple of years.
I mean here’s a general type of question in terms of what you see working and what’s not working, getting at some more of these points. Mark from Yellowbrick, let’s start with you, what are the pros of the cloud, the pros of the data fabric, and the cons of both?
Wow, that’s a huge question. If we start kind of –
I know. And you have two minutes to answer it.
Okay, then. Challenge accepted. From a wider respect, of course, if you think about cloud challenges in general, it’s about the ongoing operational expense of running in the cloud and dealing with… You’ve got the wealth of elasticity, which is fantastic for addressing these use cases, but more troubling if you have an inelastic wallet. When you know can have unfettered spend on the cloud for new use cases, you’re going to start to see some kind of sticker shock at the end of each month with your bill. There is a level of unpredictability and a lot of growing complaints about the magnitude of the spend on the cloud.
And so, I think one of the things that is really important is you start to get a very early tracking in place of your spend and your costs and start to look to optimize from the earliest respect. I mean, from a data fabric approach, I touched on some of these things a little earlier. Now, the most efficient thing you could ever do is create a centralized data warehouse with all your schemers in one place. You can join free of charge, effectively, to your heart’s desire and great performance, et cetera. But in today’s world, and for example in Mark’s case with Bread Financial and all the acquisitions they’ve done, you’re going to end up with decentralized schemers with a logical data warehouse. So there’s no way around that.
As I mentioned earlier, data fabric approaches that apply automation around metadata management and cataloging make it easy to join remotely, make it easy to combine datasets across geographies if necessary, but efficiently are really critical. That’s the kind of thing that you need to look at when you’re thinking about these things.
Yeah, that makes sense. Mark from Bread, do you have anything to add there?
I think for us it’s changed the ROI game a little bit, right? I think we tended to look at investment when we made the on-premises investments, that’s a large CapEx expense upfront, amortized over years, and people pressing us and asking us to build things in that environment generally got to follow the same calendar, which was a luxury for them. And I’m not sure that we ever had a good way of measuring that ROI.
Now if you’re in a position of pitching a three-week, three-month, even a three-year initiative on the cloud, you better know at the end of it what you got back. So it’s not so much that it costs money in the cloud or that there’s expense associated with the effort, it’s that you want to choose the ones that are going to give you the best ROI, and I don’t know that that’s a bad thing.
No, that’s a fair point. It’s absolutely fine if you’re spending a fortune on the cloud, providing you’re actually getting real business value out of it. So totally agree.
Yeah. I want to leave time for audience Q&A, so I just have one last question here about any other advice or best practices that you want to add in terms of data analytics, data fabrics, and the cloud. Mark from Yellowbrick, let’s start with you.
Yeah, again, a little bit following on from my previous response, I think it’s really important to get cost governance controls in the cloud in place from the outset and have very much in iterative cycle as you are building new requirements and applications to serve new use cases in the cloud, to be constantly reviewing and reevaluating how you’re spending and looking for opportunities to optimize.
As a blatant advertisement for Yellowbrick, what we’ve tried to do as we deployed our data warehouse in the cloud is reduce spend, make sure that we’re using and making use of the best efficiency underlying cloud hardware that’s available. You don’t have to just consume scale, scale, scale to get the performance you need and try and do that at a much lower price point. But regardless of where you are going in the cloud and what your use cases for technologies are, I think getting strict cost governance in place and monitoring is really important.
That’s a good one. How about Mark from Bread Financial, what other advice do you have?
Yeah, I mean, easy advice, hire the best people, find good partners, good tools. I find myself like the college keg party in the middle of the two Ph.D.s today, that’s probably a good example for how to do that. But those are key. It’s easy to go off into the cloud or wander and not have good direction and get stuck and not know where to go. It’s really important to have good partners and to find people that have been there before. A lot of things in the cloud, we feel like we’re almost the first ones doing it. But there’s still good partners that can help you choose left versus right, or whatever that might be. And that’s really key because you do have to manage that ROI much more closely.
Yeah, both good sets of advice, for sure. I’m going to hand it over to Andrew. Thank you both so much, by the way. I’m going to hand it over to Andrew, and we’re going to go to audience Q&A, so let’s take some audience questions.
Absolutely, yeah, thank you so much. Great panel discussion, everybody, that was fantastic. I’ll move into the audience Q&A now. Actually, for this particular question, I’m going to start with you, Fern, but if either of the Marks have something to add, please do so. We’ve had a few questions come in from the audience regarding the data fabric. Something I think that would’ve helped the conversation a little is if, Fern, can you maybe just explain the data fabric a little bit more in-depth briefly to set the stage for those that need to know a little bit?
Okay, sure. Yeah, and I see some questions here about data fabric, data mesh. No, not the data mesh. When I think about a data fabric, I think about it as simply a way to stitch disparate data together from different systems, whether they be on-premises or in the cloud. Really, I mean, it’s not simple, but I mean think that the definition is just as that’s what it is. I mean, one way to accomplish it, as I said, was via this data virtualization layer. Yellowbrick, you mentioned Denodo is a partner. That’s an example of a data virtualization layer.
That’s different than a data mesh, which isn’t necessarily an architecture, it’s a socio-technical paradigm. That is a whole nother webinar in and of itself. One way to accomplish, I guess, you can use a data fabric approach in a data mesh, but I would say go to the TDWI website and go into research and resources, and we have a bunch of papers about the data mesh. Read some of that and understand our perspective on that.
I mean, Mark from Yellowbrick, maybe I think you should talk a little bit about how you define the data fabric.
I mean, I think you pretty much summarized it up. I think the critical part is not just about data federation/virtualization, it’s that automated management of the metadata and catalog around it, so you don’t have to go off and search and understand different data sources in different locations to be able to stitch them together. There’s a lot of automation around it as well. I could wax on about data mesh for quite a long time, but I won’t, Fern. I have quite strong opinions about it. I think the jury’s still out around how successful it will be, but I’ll leave it at that.
Yeah, we just had an executive summit in Orlando and we asked the audience on day two after we talked about the data mesh on day one what was the most overhyped thing that we talked about, because we talked about cloud and data management and whatnot? Two-thirds of the audience said data mesh. So I agree with you that the jury is still out on that.
Okay, thank you for that. I think that was helpful for those that still had questions regarding that. The next question will be for Mark Cusack. I believe you answered this as two parts. You answered the first part. Does Yellowbrick work both on-premises and in the cloud? The second part is, does it work with all of the major cloud providers?
Yes and yes. So very quick answer. All major cloud providers supported, and it works in on-prem, in all the clouds, and even in hybrid stance as well.
Fantastic. I will actually move on to Mark Atchison for this question here. “Mark, will you be using open source or commercial machine learning products? Are you able to share what products you use for your analytics?”
Yes and yes again. Like Mark answered on that one. There’s a lot of incentive to move quickly, and so a lot of times you move very quickly with open source tools. But then there’s also so much rich capability and content being developed in some of the commercial tools that we have built most of our externally facing or our productionalized analytics platforms on top of commercial tools. I think I mentioned we were heritage on-premises, we were all SaaS, and then eventually Python programmers. Now we are invested in DataIQ. We’re using Databricks for a lot of data movement. We have been through DataRobot as our machine learning platform. Now looking into Alteryx and some other tools that we think may be a better fit in cloud for us. So it’s an array, but it definitely has been open to and really dependent on using open source tools as well, especially for velocity.
All right, fantastic. Back over to Mark Cusack for this one, “How does Yellowbrick support self-service analytics? Is there a marketplace for analytics, tools, and services?”
Well, actually, the way we support self-service analytics really speaks to two different areas. First of all is our user experience in the cloud in particular. We’ve made it very, very easy for business analysts and data scientists to easily serve up data, run reports, run queries, run their favorite data science tool against Yellowbrick. Our web-based UI makes it very easy for folks to just do exploratory data analysis. We even have some simple visualization capabilities within our UI, and so you can get started and do some iterative development very, very quickly.
The other aspect that’s really important with Yellowbrick is that we maintain PostgreSQL compatibility from the outside, even to the extent that we use Postgres ODBC and JDBC drivers. Now, the critical thing here is that it opens up the entire open source Postgres ecosystem of client tooling that you can use directly against Yellowbrick. And so, Mark mentioned earlier Python and our tools. While all of the Python and our client tools, the data scientists might want to use against Yellowbrick, they can just use those Postgres versions out of the box. And so it’s very easy. We have this pre-built analytical ecosystem around us that makes that self-service very, very easy.
Fantastic. Piggybacking off of that, this one, again, is for Mark Atchison, who in the organization is performing self-service analytics on your end? Are business users doing that?
Yes, very much, the business users. Those former staffs/programmers have become the adopters and the consumers of that data and using the self-service analytic tools. A lot of times they’re doing rapid development prototyping, maybe a proof of concept, and even presenting some of those results out to our clients and our customers. It’s typically our internal business users, the power user group, those folks that were typically on SaaS or are doing that Python in our programming, those are the folks that are mostly using it.
Okay, fantastic. I think we have time for one more question here. This I think could be answered by both Marks, but Mark Cusack, I’ll start with you. And then Mark Atchison, if you have something to add, please do so. Mark Cusack, from your perspective, with data shared among repositories, applications, analytics tools and on-premise and on the cloud, what challenges have you seen your clients faced with synchronizing metadata and definitions?
Yeah, that’s a really great question, and it’s a question that I don’t think has simply a technical answer to it as well. There’s an organizational cultural answer to this as well. I’m going to go off in the direction of saying that part of this is about having some level of centralized governance to start with. You’ve got to decide as a corporation your data definitions, the data types you’re going to use, the security standards. You need to have all of this in place, a common set of standards, a kind of lexicon if you will, to allow your different departments and lines of business to actually agree on what the metadata means in the first place before you get to synchronizing it. I think there are a variety of tools and data cataloging capabilities for technical and business metadata out there, but I think you need to start with fixing some of the process challenges rather than technology challenges.
Yeah, I would echo that. In our case, we had to stay away from the boil the ocean. We didn’t have to get the cloud in order to make this difficult, we were already struggling with this like all organizations just on-prem. But I think we actually were pushed by regulatory requirements to focus in on a smaller subset of our metadata and really try to nail that down first and then build out from there as opposed to trying to bring all of those systems into a centralized governance environment all at once if it’s just too much.
Okay, fantastic. Well, that does bring us to the end of our time today. Let me take a moment here to thank our speakers. We heard from Fern Halper with TDWI, Mark Atchison with Bread Financial, and Mark Cusack with Yellowbrick. Also, thank you again to Yellowbrick for sponsoring today’s webinar. And please remember that we did record today’s webinar and we’ll be emailing you a link to an archive version of the presentation, which you can feel free to share with colleagues. Also, don’t forget, if you’d like a copy of today’s presentation, use the Click Here for a PDF Line.
Finally, I’d like to remind you that TDWI offers a wealth of information, including the latest research, reports, and webinars about BI data warehousing and a host of related topics, so I encourage you to tap into that expertise at tdwi.org. Lastly, from all of us here, let me say thank you so much for attending. This does conclude today’s event.