Data Warehouse on Kubernetes

Yellowbrick Logo
Yellowbrick | Spray Paint

The Strategic Advantages of Hybrid Cloud Data Warehouse

The Strategic Advantages of Hybrid Cloud Data Warehouse


Hema Ganapathy: Hello, everyone. Thank you for joining us for today’s webcast. We’re very excited about the topic for today. Our topic for today is the strategic advantages of hybrid cloud data warehousing. We are welcoming four or five one research to present their findings from a report and some research that they did. And I’ll also be doing some introduction to Yellowbrick. Before we begin, let me do some brief introductions. My name is Hema Ganapathy. I’m in product marketing at Yellowbrick, and I’ll be your host and moderator today. Joining me as co-presenter is Matt Aslett from 451 Research. And I’d like to tell you a little bit about Matt. Matt is a research vice president with responsibility for four or five ones, research data, AI, and analytics channels, including operational and analytic databases, Hadoop, grid cache, stream processing, integration, data governance, and data management, as well as data science and analytics.

Matt’s primary area of focus currently includes distributed data management, data catalogs, business intelligence, and analytics, data science management, and enterprise knowledge graphs. Matt is a regular speaker at client and industry events and he’s delivered keynotes and moderated panels that strata Hadoop world, DataWorks summits to name a few. And he’s also been named by enterprise management 360 as one of the top 10 influential data leaders to follow in 2019. So we’re very excited to have Matt here. And before we begin, I’d like to go over a couple of housekeeping items. Our Twitter handle, as you see below, is @Yellowbrickdata. We will be tweeting live during this webinar, and I encourage you to do the same. And with #YBLive that you see at the bottom. And I’d also like to say that we will be answering questions at the end of the webinar. So please enter your questions into the chat, into the questions window, and we’ll make sure to get to them at the end. We’re specifically leaving some time to do that. So please enter your questions during the presentation and we will get to them. And before I begin again, I’d like to thank Matt for being here, and I’m going to hand it over to you, Matt.

Matt Aslet: Thanks Hema. Thank you everybody for joining this webinar. So yeah, I run the data AI and analytics channel here at 451 Research. Before I get into the topic today, just to give you a little introduction to 451 Research for anybody who’s not come across the company before. So we are an industry analyst firm focused specifically on innovation. And as of December of last year, we’re now a part of a much larger S&P global, specifically S&P global market intelligence. And you know, we’re pleased to be part of that organization, obviously benefiting from being bought by a much larger company and all the resources that come with that. But what is great as well is that things, you know, from our perspective, haven’t changed too much.

You know, we’re still very much focused. We slotted in nicely, to continue our focus on it in particular ID innovation, and, if anything, sort of accelerate that focus. And so in relation to the team that I run, our technology market coverage includes things like obviously the data platforms are backend databases, relational, and non-relational including the SQL Hadoop spark and obviously data warehousing we’ve then obviously covered data management, data integration, MDM, data quality et cetera. And also then most of the front end side of things data science and analytics platforms and tools. And also of course, and especially in the last sort of in a couple of years, a lot of focus on artificial intelligence and machine learning. So also part of our focus, there’s a team of seven of us currently, who deliver across that, and as well as running the channel, I, you know, help contribute to each of those categories, to some extent as, as you heard earlier.

So I’m on with the, you know, the focus of today’s webinar in a, specifically, that warehousing and the cloud and what are going to be talking about for the next 20 to 25 minutes or so is delivering and providing with some of the results of a research report that we put together early this year and was published in April 2020. And as you can see, that report was, we know, was commissioned by, by Yellowbrick, but as with all of our research, whether it’s, you know, our own research or it’s, or it’s commissioned by an external organization, it was independently and objectively written. And, we did that based on both obviously our ongoing research of, of this space you know, particularly sort of the evolution of hybrid IT environments, but also the ongoing changes in the data warehousing market.

And also, you know, based on some of the survey data that we collect as part of you know, our ongoing research. And we do that with members of what we call the 451 Alliance, that’s a member-driven community of IT professionals and end-users. So practitioners out there, you know, work with technology as part of their daily lives and they help shape our view on the world, well, by taking part, you know, in the survey. So you see some of that data as we go through the results here, but what I will also say is obviously, you know, there’s much more in that report than we can, we can actually cover, you know, in this in this short webinar. So I would encourage you. I know we’ll be sharing the details of the report. I would encourage you to take a look and download that obviously it’s freely available. And you know, will compliment what I’m talking about here. And, and probably, I, you know, I add more detail in some of the areas as well.

So if we think about hybrid cloud and data warehousing, we think about the overall trends that are driving the industry right now. I think, you know, they start within a one and obviously the major trends that’s been driving the overall industry for some time, which is, you know, this shift to, or the growth of cloud computing. And certainly, you know, the phrase we’ve used here is that the cloud is becoming the default business environment. Certainly, obviously, we’ve been seeing this progressing over a number of years organizations deploying an increasing number of workloads in the cloud you know, looking for advantages, things like agility, the inability to spin up and down resources as required efficiency, in particular, you know, the opportunity to reduce upfront costs in the underlying data center infrastructure and in particular, and in relation to data workloads, we see there’s an, almost an air of sort of inevitability about this as more and more data is both generated in the cloud from, from cloud and SAS applications, but also stored in the cloud in low-cost cloud storage.

So we see that increasingly analytics and data processing workloads are also moving to the cloud. And, you know, we’ve seen that take a few phases, you know, initially, it was perhaps predominantly things like development and test and backup and recovery environments. Increasingly now we do see it is, you know, the actual enterprise mission-critical applications that not always, but are increasingly being deployed into cloud environments to give you a sense of some of the survey what we do, and this illustrates, you know, this trend, we see that you know, increase as increasing volumes of data as a set of being generated in the cloud enterprises are increasingly rethinking their strategies for data warehousing and analytics. We see that, you know, a lot of these workloads are moving towards, you know, SAS and, and infrastructure surveys and platforms or service environments, and specifically oversee a decreasing use of on-premises, traditional long non-cloud infrastructure, which has, you know, if we look back obviously a bit being the primary environment for running any database workloads both operational and analytic you know, still, if we, you know, seeing this slide here, we’re looking at asking people about what they’re using today, what they intend to use two years from now, we still see that on-premise non-cloud infrastructure is going to be, you know, a significant proportion of the overall workloads, but it is rapidly declining.

And we’ve seen this, you know, consistently in our surveys over the years. And, you know, it’s interesting as we see that those workloads are increasingly being spread across multiple environments bit on-premises, private cloud. So, as I said, SAS, obviously hosted private cloud infrastructure, service platforms, a service you know, there, there is that that data is being spread around and across multiple locations, we particularly, you know, see this happening for analytic workloads. And, and, you know, as we see here, we ask organizations about what data platforms they were planning to adopt specifically in the next two years. And we see that actually analytic database data warehouse software, you know, tops that list 41% of all of the respondents overall saying they’re looking to adopt analytic database software in the next two years. And interestingly, you know, we see this ratio of almost two to one in favor of in a database, as a service, or as a service consumption for analytic databases rather than a traditional sort of on-premises deployment.

So we do see there is more whilst this is happening across all the database workloads. It is having a particular effect in the analytic database space and it’s happening with multiple analytics application workloads. So this was another survey we did where we try to zero in on not just where the database is, but what the databases are being used for. And so, you know, we asked them that these are the results we’ll be asking specifically about the analytics workloads. We did the same thing for the different kinds of operational workloads. And as you can see, you know, for ad hoc and self-service analytics, the business intelligence reports and dashboards, and for data science workloads, we see this swing to the cloud, you know, cloud already being used for you know, some of those workloads by respondents in addition to obviously, you know, on-premises continues to be used, but we do see, you know, for example, in the case of business intelligence reports and dashboards, a 31 percentage point swing towards the cloud over the next two years in terms of the number of users deploying the databases for those workloads into cloud environments.

And so, you know, we see this, as I said, it’s happening for analytic databases and analytic databases for multiple analytics workloads. Why do we see this happening? So there are obviously multiple reasons to fill this in; one of those is this concept of data gravity. And as we’ve said, you know, data is increasingly being generated and stored in the cloud. We know that to borrow some data from our storage colleagues, we see that 60% of enterprises have already adopted public cloud storage services. And another 8% are in pilot or proof of concept and, and, you know more than that planning to do so over the next year or two years more significantly in terms of, you know, from our perspective, thinking about it from an analytic perspective, we see that enterprises are increasingly looking at this cloud storage layer as a primary platform, not just for storing data obviously is hopefully as, as cheaply as possible, but also for actually processing and analytics as well, by spinning up a separate layer of compute engine products and services.

Enterprises are able to create what we’ve talked about in terms of being an abstracted data architecture. And, they’re able to take the analytic workloads then to the data in that cloud storage and analyze that data as and when required. This is sort of an architectural change that we first sort of highlighted, you know, back in 2016, it’s an ongoing process. We see that more and more organizations are adopting it, you know, to the extent that in a, in a, in a recent survey we sold at 71% of enterprises agreed or completely agreed that object and, and storage will be a primary data platform for their data processing and analytics at their organization in the next two years. Predominantly we say this is being driven by cloud storage adoption, as I said, but I think it’s important to note that, you know, this, this separation of compute and storage whilst it is closely associated with the adoption of cloud storage services, novices has said that an increase of the volume of data being stored in those cloud storage services it’s not by any means limited to the cloud.

And we do also see the same effect being replicated in, on-premises environments through the deployment of cloud-native architecture that said you know, data gravity doesn’t only pull in one direction. I think, you know, perhaps we have a bit of a tendency in the industry to think about data gravities, as solely as pulling workloads to the cloud. What we see is that, you know, even as the reliance on-premises non-cloud infrastructure is expected to diminish, that’s really started earlier. It still remains the dominant location for existing data warehouse deployments that are driving business decision-making today. And in fact, the weight of those existing on-premises database deployments also exerts, you know, essentially a gravitational pull and analytic workloads. And indeed in many organizations, particularly sort of, you know, later adopters of cloud services that on-prem that investment on-premises infrastructure may exert a greater gravitational force as a mission-critical enterprise workloads remain on-premises, even perhaps as the deployment of you know, development and tests and other less mission-critical workloads might be deployed into the cloud.

And, as I said our research indicates that a significant proportion of companies are still using on-premises deployments to further their existing data database and particularly analytic database warehouse deployment. So we see, you know, 43% of organizations are using on-premises and analytic database software today. Only 25% of respondents are using analytic database data warehouse as a service. So, you know, as I said, whilst the overall momentum and trajectory is towards the cloud, we still have to take notice of the fact that there is this substantial existing investment and in, on-premises deployments. And, you know, as I said, that those two have gravity. And I think, again, you know, one of the things, you know, when we talk about the adoption of cloud services and the growth of cloud services, it’s very easy to sort of fall into a bit of a trap of talking about, you know, workloads migrating to the clouds and think about migration patterns, only moving in one direction.

You know, actually, you know, when you think about, you know, databases, you know, like birds you know, they don’t follow a single migration pattern. And our data increasingly illustrates this, that there are multiple directions in which different workloads are going. And more to the point, you know, some like birds, some databases don’t migrate at all, you know, it’s easy to fall into the assumption that all workloads are gonna migrate and all workloads are gonna migrate in the same direction. And that simply isn’t the case. And indeed, you know, you can argue that there’s almost more of a gravitational equilibrium that is being created as there is a significant proportion of companies that are actually looking to retain or modernize their existing mission, critical workloads, where they reside rather than my great or shift or refactor them to a cloud environment.

We see this is from a voice interface, a digital pulse survey that actually 9% of respondents are looking to keep their current mission, critical legacy workloads, completely unchanged, just keep them where they are. Whilst 35%, perhaps in some ways more interestingly, are looking to retain those workloads on-premises whilst updating them to a sort of modern architecture. And we also see that there are multiple reasons why organizations have taken this approach to modernizing in place. And the most significant of which is leveraging its infrastructure and data center investments. Obviously, if there’s a significant outlay, companies want to make the most of those investments before perhaps moving on to, to other locations in data processing capabilities also data and system securities is a key reason and obviously for choosing a specific location for any data processing workload as well as application dependencies and obviously data locality and, and, and data sovereignty requirements.

So this desire to modernize mission-critical workloads in place is a real key factor as to why we say that almost two-thirds of organizations are currently pursuing a hybrid IT strategy. If you see here, we actually see that 6% of respondents are currently building both on-premises and or relying on on-premises and taking advantage of.. sorry, my mistake, sorry, 8% that are using both on-premises cloud and an off-premises, public cloud, but using them separately. You know, the, more to the point that the much larger proportion received 57% of respondents who are moving towards a hybrid IT environment that legends leverage is both on-premises systems and off-premises, cloud-hosted resources, and more to the point in an integrated fashion. I think what we’ve seen is that, you know, for many years, this concept of sort of hybrid cloud or hybrid IT, it was actually considered something that perhaps enterprises ended up doing by accident.

It was like the unintended consequence of failing to fully embrace the potential advantages of public cloud or apps for failing to limit the tendency of users to launch their own shadow. It projects to bypass formal IT policies. So while many IT organizations may have initially stumbled into the use of multiple cloud computing environments alongside the use of existing on-premises. With investments today, we see that hybrid cloud is a more strategic choice. It’s viewed not only as the logical and inevitable consequence of an abundance of choice in relation to computing and data storage, location options. But also, as I say, as a strategic imperative, that actually enables enterprises to make the most efficient use of the variety of infrastructure location options. And again, there’s multiple reasons, obviously why we see enterprises are choosing to operate in a hybrid fashion.

The most popular of those is in order to have the potential to migrate workloads as needed between on-premises and public cloud environments. Now, it’s an interesting question as to whether to actually take advantage of this theoretical potential that differs from organization to organization and it’s dependent on multiple factors, including things like, you know, requirements for data residency, obviously cost performance, and security and risk. What we do see is that while many enterprises may not actively actually be moving workloads between on-premises and off-premises environments on a regular basis. One of the key advantages of a hybrid IT strategy is that they have the option to do so when the related variables indicate that it would be appropriate. So there’s a couple of key, obviously, variables I’ve already mentioned that will drive in that decision.

Obviously one of those we see is performance. You know, performance is a key consideration for workload placement, you know, generally and in particular, in relation to analytics there are some key implications. One of those is, predictability, you know, put simply, enterprises are less able to predict the performance of resources that they don’t have control over. So you know, in a public cloud environment, this can potentially be an issue. You know, service level commitments are obviously available and you know, from cloud providers and they could lead to compensation if performance guarantees are not met which is, which is good. But obviously, by that point, you’ve already as an organization faced whatever challenge or, or our outage that might’ve come from that performance limitation in the first place. So by then, you know, as well as the, you know, compensation is always nice, it’s, you know, it’s obviously after the effect.

And what we see is that given that there are multiple dependencies that can impact the performance of any given computing workload including things like the underlying infrastructure and networking, it’s obviously understandable that for workloads in particular, that absolutely require the guaranteed performance in for example, in financial trading and e-commerce payment systems in enterprises might choose to retain control over those resources in order to have that predictability in relation to performance for analytic workloads in particular, we see that query latency is another key consideration in, particularly again, financial services is a key example of this where, you know, potentially microsecond delays can mean the difference between gaining and conceding competitive advantage in algorithmic trading environments, for example, and that’s obviously an extreme case, but it serves to illustrate why the time taken to transmit data from an operational application to the cloud in a further analysis is potentially an argument against the use of the cloud data warehouse for analysis of, of that particular data, even in less extreme examples, query speed is an important cause consideration for thinking about migrating an analytic workload to a new location.

Another key consideration obviously in relation to the cloud is economics. And we do see that whilst there are multiple theoretical economic advantages to running data processing workloads in the cloud, some organizations have seen that the potential advantages for cloud that for data processing, don’t always materialize. And that could be perhaps for performance reasons, as we’ve talked about you know, there, there are various examples of days what we, as a typical example, what we’ve seen is that, you know, first-generation cloud data warehousing so many enterprises spinning up virtual machines to create while they intended to be an ephemeral data service to analyze data that was, you know, in long-running cloud storage, that separation of compute and storage that I described earlier theoretically this means only having to pay for the compute resources as they’re required to analyze the data, hence the potential for cost savings in practice.

However, you know, the time taken to spin up and down virtual machines combined with the need to maintain things like security and metadata policies actually often leads to the need to maintain those long-running compute clusters, meaning that the potential economic advantages of ephemeral servers may not actually materialize. Data from 451 Research is digital economics unit highlights the potential opportunities that strategic adoption of hybrid cloud can provide to avoid some of these challenges. So for the past few years, we’ve been publishing a heat map that shows the break-even points for total cost of operations between public and private cloud. And there’s an example of that on this slide, as you can see, utilization is the really important factor here. Put simply, the more private cloud is used and the better-managed it is in terms of the number of virtual machines per engineer, the cheaper the unit cost of each private cloud resource will be.

And the more that can be saved compared with a public cloud. So the chart illustrates the relative cost of public and private cloud-based on utilization and labor efficiency, the green area is where the private cloud provides more savings than the public cloud. And the red area obviously indicates the opposite. And the white area is, you know, effectively represents the breakeven point between the two. So in this example, a private cloud becomes cheaper at 70% utilization and labor efficiencies of 200 to one. You know, obviously, you know, this is just provided for illustration purposes only. You know, individual circumstances will differ. It is clear, however, that hybrid cloud can provide the potential to increase utilization as efficiently as possible as enterprises are able to combine the use of private cloud for delivering a stable and constant capacity and public cloud for amongst other things bursting above maximum capacity. Of course, managing both on and off-premises resources and then integrated fashion is not easy.

You know, dynamic allocation of resources based on the individual workloads does demand robust engineering, and automation. As a hybrid IT becomes more of a strategic choice, we see that enterprises are paying much closer attention to the factors that can influence decisions around data storage and processing locations, such as, you know, as you’ve mentioned, performance costs, revenues, residency requirements, agility, security, and risk. Similarly, we come back to, you know, the core topic here of data warehousing. We think companies really need to think carefully about their choices for data warehousing in the cloud. You know, as data is distributed across more locations, as we talked about earlier, managing data across those multiple locations has already become one of the primary concerns for data and analytics professionals as illustrated here alongside things like data security, data quality, privacy, and data governance.

And, you know, we think that this is only likely to rise in importance as we see increasing volumes of data distributed across multiple clouds. And so if you think about the choices for cloud data warehousing specifically, also, I should say for hybrid data warehousing specifically, you know, whilst cloud data, warehousing environments have already proven popular as we’ve stated, they’re obviously in many cases on suitable for a hybrid IT strategy, given that the databases that support them are rarely available for on-premises use. Additionally, we see that all major on-premises database products today can also be deployed in the cloud. There is a big difference between running the same database in the cloud and on and on-premises, and actually having an integrated hybrid data warehousing approach. In the same way, as we talked about earlier, there’s, you know, there’s a difference between having an integrated hybrid cloud approach that leverages both on and off-premises resources in a unified way and a segregated approach that uses both. But, we’re not in any attempt to integrate them. You know, enterprises don’t just need a consistent experience when using a data warehousing product, either on-premises or in the cloud, but while they are looking for increasingly, you know, in the longer term is an integrated environment that is actually designed to support a hybrid cloud architecture and multi-location data management.

Is that a bit of a rapid run through some of the key findings from the report that I mentioned earlier? Obviously, as I said, there’s, there’s much more we could go into, but I’ll finish off here by just talking about some of the key recommendations that we’ve made, you know, based on this research and the findings, the first is that organizations that have not already done so should definitely explore the potential advantages of taking a strategic integrated approach to hybrid cloud. We saw, you know, an increasing number of organizations are taking that approach. It’s now actually the, you know, the majority of organizations are doing so. So I think, you know, if you haven’t even considered that yet, you really need to start thinking about that because there are some potential advantages that can be had as we’ve described those investigations through to involve, you know, the evaluation of obviously the existing and new application workloads and their related data storage and processing stacks to ascertain their potential suitability for deployment in the cloud or for on-premises modernization. And as we said, you know, there are multiple factors that could be considered as part of that and including things like security and risk and data governance, data sovereignty, but performance, and obviously economics are key to that as well.

And then finally, you know, we see that enterprises should also be specifically reviewing the data warehousing capabilities and their potential vendors with regards to the ability to manage and analyze data in multiple locations. You know, it’s not just about the ability of a product or service to support public cloud or as a service consumption. Increasingly we said, it’s going to be important to be able to manage data across multiple locations, including on-premises and indeed, you know, not just one public cloud, but multiple public clouds. So with that, I’ll thank you for your time. We’re going to open up for Q&A shortly. So I’m glad to take questions then obviously, if you want to get in contact with us directly, you can see my details here and then be glad to take any, any questions offline if anybody’s gotten any follow-up on that. I’m obliged to show this slide, but I won’t dwell on it. I will hand it over to Hema, who’s going to talk to you obviously specifically about Yellowbrick and where it fits in the landscape that we just described.

Hema Ganapathy: Thank you, Matt. That was a very informative presentation. I really appreciate having you here. So I’m going to go over a quick introduction to Yellowbrick and want to talk to you about some of our product offerings and how we actually are the only modern hybrid cloud data warehouse. Next slide.

So our Yellowbrick solutions, we have an on-premises as well as a cloud service offering. And coupled with the cloud services, we have cloud disaster recovery and the on-premises is a subscription model. And we actually have a data warehouse instance that has innovations in all of the layers in compute storage, networking, and software purpose-built for a hybrid scenario. And one of the things that we have in the product is extremely high performance in a very small footprint with extensible scale. Now I’m leveraging upon what Matt said. We actually have an integrated hybrid cloud approach, and because we have a unified database or data warehouse, both on-premises and in the cloud. So our cloud service actually leverages the exact same hardware innovations that are in the hardware instance but in the cloud. And one of the key differentiators for us is that we actually enable multi-cloud support, so support for AWS Azure and GPC. And as I said before, we actually coupled with the cloud service cloud disaster recovery to ensure that your data is protected no matter what happens in the cloud environment. Next slide.

Some of our key differentiators are really around our price performance. So we have a hundred times performance. We’re very predictable and reliable, and we scale to not just terabytes, but petabytes of data. And we’ve also architected innovative simplicity into the product where you’re able to query data immediately. And we look and act like an RDBMS. So you don’t need any special skills or special resources to be able to implement Yellowbrick. And that we have flexible deployment both on-premises and in the cloud and in a hybrid cloud scenario. So you have a broad choice as to how you can deploy Yellowbrick within your environment. Next slide, I’d like to go over some of the use cases for Yellowbricks. So you understand how you can use Yellowbrick in your environment.

Next slide, I mentioned the hybrid cloud, but again, to leverage upon what Matt was saying, you know, we have a unified hybrid cloud approach so that you can run your workloads both on-premises, in private cloud, and in the major public clouds in any combination that you choose.

And that gives you a level of flexibility, but other vendors don’t offer you today. And our instances are single-tenant reserved and always on. And there’s near real-time synchronization between the instances and that we have an annual subscription model. And, you know, really what this does. We have a hardware instance, as well as a cloud offering, based on that hardware instance that offers you a scale of performance. That’s not available on the market today, a scale of performance and concurrency. So remember I mentioned before, petabytes of data, not just terabytes of data and concurrency at thousands of users without having any performance hit for your applications. And of course, we integrate with all of the tooling ecosystems, the BI data science, and data motion. So make it easy for you to integrate Yellowbrick into your environment. Next slide.

So we also offer application acceleration with an open and non-proprietary interface. That’s fully Ansell’s ANSI SQL ready with Postgres drivers. And so you can integrate Yellowbrick into all of your BI applications. You’ll see some examples here on the slide, but we don’t charge any additional licensing fees or costs for that. And seamlessly you’re able to connect Yellowbrick into your existing environment without the need for a new retooling, rescripting, or rewriting of anything. So it’s a very simple migration to Yellowbrick. As you look to migrate your workloads onto a hybrid cloud or cloud scenario. Excellent.

And another key application for Yellowbrick is, you know, we help you protect your data lake investment. Matt touched upon it in one of his slides, but there are a lot of data lakes out there. And, you know, it’s kind of onerous to get information out of the data lakes because of speed and just massive amounts of data. It’s difficult to data-mine them. And what you can do with Yellowbrick actually helped accelerate any of the queries coming from your data lake. And so we actually provide lower location, flexibility, and agility, but you can then strip off some data marts from your data lake, put them through Yellowbrick for processing, and make them available to your data analyst and your data scientists and Yellowbrick understands all data formats. So it’s a very easy fit with your data lake and we help you protect your data lake investment by helping you optimize your data lake as much as possible.

Next slide. So just to summarize what Yellowbrick does for your business, we actually scale to manage your largest datasets with the best price-performance today. We make your business more efficient with innovative simplicity, and we actually simplified migration to the cloud or to hybrid cloud with flexible deployment. Next slide. I encourage you to follow Yellowbrick Twitter, Facebook, and LinkedIn, and as you can see here, all of the handles are available on the slide. And if you want to see what Yellowbrick can do for you, I, again, suggest that you visit us at and book a demo today. We can show you what Yellowbrick can do for you with your own datasets. Next slide.

And as Matt mentioned, we will be providing the Pathfinder report, the strategic advantages of hybrid cloud data warehousing to all attendees from the webinar. The report is also available on our website. And so I encourage you to read the report. There’s a lot more information in there, very information-rich. And again, I want to thank Matt for presenting today and thank you all for attending. We are going to get to the question portion of the webinar. And so I’d like you to enter your questions into the question window, and we will, we will get to those. And so Matt, I have a question that’s come in here. The question is, do you see workloads moving from the cloud back to on-premises?

Matt Aslet: Yeah, we do. Although it’s, I mean, it’s relatively right. I think, you know, one of the things actually, we, I didn’t really talk about is whilst we, you know, we talk about, we do see database workloads, obviously, as we said, moving in multiple directions. Ultimately it’s always been true that organizations don’t move their database workloads around unless they’ve got a very, very good reason to because, you know, it’s costly and it’s complex and it’s difficult. And so yeah, and so hence why we see a lot of companies do that upfront work to figure out first, you know, is this going to be the right location, but as we’ve talked about, you know, there have been some examples of, of organizations where they’ve tried to move more aggressively perhaps than they should towards the cloud, and then I’ve then perhaps thought better of it.

And certainly, there are the biggest reasons we’ve seen for companies actually, you know, repatriating data or sorry, repatriating workloads back from the cloud to on-premises is performance and availability issues. Also obviously data sovereignty and regulatory changes can obviously be a key trigger point for that. And in relation to data warehousing in particular. So, yeah, definitely performance availability and not, and also costs, you know, as we talked about performance and costs, you know, obviously sometimes, you know, go hand in hand and particularly in relation to perhaps some of the expectations not quite being met. So we do see that happening.

Hema Ganapathy: So I think you’ve touched upon some of these points, but there’s another question here that says, is there a pattern of which workloads move from cloud back to on-prem?

Matt Aslet: Yeah, I’d say it definitely relates to you know, those key issues that we talked about. So, you know, performance availability, cost, you know, and data sovereignty, regulatory changes. Those are the biggest reasons, obviously, you know, from organization to organization, it’s going to differ then, you know, which applications that means specifically. But, those are the key reasons that we see that a drive would drive that kind of movement. Yep.

Hema Ganapathy: Great. Thank you. Another question here. Are you saying that the cloud is best used for storage and not data processing? Can you elaborate on that a little bit?

Matt Aslet: Yeah, no tool. And so I suppose why, you know, what we’re seeing is that we’re definitely what we do see is that many companies have that cloud adoption has been driven, it’s been some storage first, put it that way. So it’s not that, you know, that, you know, cloud is better suited to storage than data processing. It’s just that in many cases we’ve seen that the data processing and analytics workloads have followed, you know, the data we definitely see that their company, a lot of companies start with, you know, one of the key considerations, for example, could be, you know, lowering that their storage costs and that, and that for, therefore, you know, that drives the use of, cloud storage. Particularly as I said, if a significant proportion of their data is actually generated in that environment in the first place.

And so then what we see is that companies have to then consider, well, you know, given that an increasing volume of data is now being stored in the cloud, what is the most efficient way to actually process that data? You know, it may not necessarily be to pull it out of cloud storage to another environment. I mean, you know, there’s some of them, multiple ways people can use Yellowbrick and that kind of illustrates that there are multiple options for organizations to try and find the most efficient way of doing that for them. So, yeah, particularly what we’ve seen is that cloud storage costs are consistently declining faster than data processing costs. So yeah, it’s often covered storage first and then data processing is considerations follow that. But it’s definitely not a case of it being the cloud being better suited one than the other.

Hema Ganapathy: Understood. Yeah. Some of the things that we see from our customers is that there are additional costs for migrating data out of the cloud. So that’s a key consideration. Do you have any comments on that? And actually, there’s a question here that says, what are the costs of public cloud versus hybrid cloud? And so can you talk to that?

Matt Aslet: Yeah, no, definitely a good point, obviously, in terms of the cost of moving data. Yeah, absolutely. And that is, you know, potentially could be prohibitive to, for a good reason for companies to keep data wherever it resides, you know, be it on-premises or in the cloud. Yeah. The cost of moving data into the cloud or out of it, you know, it can be you know, really prohibitive. And so yes, there’s definitely a key challenge. And so, yeah, and in relation to the specific needs of the cost of comparison of public and hybrid cloud you know, it definitely depends on it. You know, obviously that, that chart I provided earlier, you know, is just an illustration of a theoretical example. You know, what we do see is that utilization is key as well as labor efficiency.

And obviously, you know, the individual’s ability to negotiate discounts obviously comes into play and the leverage they might have over an existing supplier, be it on-premises or in the cloud. But you know, what we do see is that if you know, arguably it’s a big gift, but if organizations are able to manage their private cloud at high levels of efficiency and utilization, and also sort of manage the public and private cloud in a unified single environment, then you know, the, our data absolutely points to the fact that a hybrid approach should be the most effective cost-effective option for cloud consumption. Clearly, that will be, see that, you know, the devil is very much in the details there. And I would, you know, obviously, I mentioned that the data I presented came from our collective economics unit and some of my colleagues in that space obviously, you know, experts in their area. So if, if somebody you’re considering and you’re looking for help and advice, I definitely would recommend going to them and they can at the very least talk you through, you know, the calculations and the way in which we go about calculating that. And then you can start thinking about how to apply that to your own workloads.

Hema Ganapathy: Great. Thank you. Yeah, I think that cost conversation can be quite a long one. We hear a lot from our customer base on the additional hidden costs of cloud. There is one more question here. How does Yellowbrick accelerate and extend the data lake? Do you make files in the lake available via an external table, or does the data have to be ingested into Yellowbrick? So, yes, the data can be ingested directly into Yellowbrick from the data lake, and so that we can process it there, you can actually strip off a subset of your data, something that we like to call a data mart and be able to process that in, in Yellowbrick so that you don’t have the delays that are happening with Hive and when you’re using Hadoop in your, in your data lake. And we can certainly give you more information about that. We’ll do some outreach to you, the individual who asked the question. Matt, is there anything else that you’d like to say or you know, add to the conversation?

Matt Aslet: Yeah, just, I thank, obviously, everybody for the webinar or listening to the replay. I think, you know, this is definitely a key topic that we see, you know, a lot of focus on from our, from our clients you know, moving forward. Actually particularly, you know, given the current circumstances, some companies obviously, you know, trying to obviously clearly control their costs, but also thinking about, you know, potentially accelerating some of their transformation plan. So yeah, it is definitely a key consideration and in particular, as we said, hybrid is absolutely going to be the default for most organizations. And so I think, you know, it’s not enough just to be using, you know, the same database in different locations. You have to think about it more strategically. And I think, you know, we do see enterprises are doing that and we expect more of that this year. So yeah, we’ll definitely be an ongoing focus for us. And yeah, if there’s any, any companies out there who are going through there, someone I’d like to talk to us about it, or pick our brains about that, then more than happy to speak to them.

Hema Ganapathy: Thank you, Matt. And as Yellowbrick, the only modern data warehouse for hybrid cloud, we’re very excited to hear you say things like that. We also hear the same from our customer base. And so I just want to wrap up and thank you everyone for attending the webinar today. I want to also say that we have a couple of really great webinars coming up in May, including a customer case study with one of our key customers in the retail vertical. And that’s coming up on May 28th. So please go to to register, and we’d very much appreciate you being here. Please. Don’t hesitate to email us at and to stay up to date on our upcoming webinars, or follow trends and best practices. You can also follow us on Twitter, LinkedIn, and Facebook. Thank you again for joining us today. And we will see you next time.

Yellowbrick | Panda
Yellowbrick | Panda

Top Rated in Customer Reviews

Yellowbrick is a leader in Data Warehouse on G2
Review Yellowbrick on G2
Book a Demo

Learn More About the Only Modern Data Warehouse for Hybrid Cloud

Run analytics 10 to 100x FASTER to achieve analytic insights that have never been possible.

Simpler to Manage
Configure, load and query billions of rows in minutes.

Shrink your data warehouse footprint by as much as 97% and save millions in operational and management costs.

Accessible Anywhere
Achieve high speed analytics in your data center or in any cloud.