Cloud Repatriation in Data Warehousing
A few interesting articles coupled with my recent chat with 451 Research (S&P Global) analyst James Curtis have prompted me to think about the albeit weak trend for cloud repatriation and how that applies to data warehousing and analytics.
Results from 451 Research’s “Voice of the Enterprise: Data & Analytics, Data Platforms 2022” survey that underpinned my online conversation with James Curtis unexpectedly confirmed that we are not at peak cloud for data and that the strongest area of growth for data and analytics is in the cloud with participants forecasting a jump in cloud-based data platform in two years.
Also interesting is survey participants forecasting growth in on-premises private cloud and managed database environments. One clear trend that comes out of this data is that organizations are looking for managed, cloud-like experiences to reduce run cost and complexity. Growth in on-premises could be organic but could also be a result of cloud optimization and repatriation.
The public cloud is great for agility and ensuring organizations aren’t left in the lurch when they need to pivot quickly or develop new solutions. During my time in cloud sales, I was horrified to hear one of my early adopter customers had coined the term “de-cloud” about seven or eight years ago. It knocked me for six at the time. However, as a team, we rationalized it by saying the customer hadn’t moved to a cloud operating model and so hadn’t seen the true benefits of the cloud. The reality was that their workloads were running 24×7 and hardly fluctuated. When there is a stable workload, particularly those that are always on and where agility is not needed, the cloud is often more expensive.
A quick example of the ability to deliver efficiency is storage. In the cloud, particularly with cloud object storage, the same data is often stored multiple times. On-premises storage platforms leverage techniques like block-level de-duplication to minimize costs, in the cloud you don’t get this by default. Storage reliability options in the cloud are limited to super-reliable storage options at multiple 9’s of availability – can you imagine the furor if a cloud provider lost some data? This reliability is delivered by replicating data multiple times between different servers, different availability zones, and different regions; the customer ultimately pays for these replicas. Yellowbrick’s on-premises offering uses erasure-encoding to dial in the reliability our customers need without multiplying the storage cost in the same way.
Cloud Analytics Platform
An all-in vision for cloud analytics would make the case for sacrificing efficiency and run cost in favor of agility. There’s certainly a case for that, particularly if you layer on the cost of running different data warehouses or analytics platforms in the cloud and on-premises. At face value, it doesn’t make sense to run in multiple locations from a run cost, complexity, user experience, or agility perspective. An all-in vision for cloud analytics would make the case for sacrificing efficiency and run cost in favor of agility.
If you’ve investigated cloud data warehouse platforms from the hyperscalers like Azure Synapse, Amazon Redshift, or Google Big Query, or from the fastest-growing solutions like Snowflake, the above scenario is clearly the case. Despite recent nascent announcements to bring on-premises or multi-cloud data into the fold, these solutions continue to be cloud-only solutions. Similarly, market-leading on-premises solutions like Teradata and Netezza which initially resisted cloudification magically found a way to run their solutions in the public cloud – ultimately it was changing their business models that proved the heavier lift, and not their technology pivot.
Yellowbrick Data Warehouse Solution for Cloud Repatriation
With Yellowbrick, that calculus changes. Yellowbrick Data Warehouse is designed to meet exactly this need to run anywhere across on-premises and cloud or across multiple clouds. Yellowbrick eliminates this false choice. Our customers are free to grow on-premises, free to grow in the cloud, and — should they choose to in the future — free to repatriate or cloudify some or all their data workloads without adding complexity. They are also ready to meet any current or future data residency requirements.
There is minimal difference in the experience of managing or using Yellowbrick in the cloud or on-premises. Yellowbrick’s software pricing models are the same across cloud and on-premises. The only real difference being that, like for like, infrastructure costs on-premises come in at a lower cost – in the cloud we trade cost for agility.
At a personal level, the concept of ubiquitous cloud computing has started to scare me a little. As a society, we are super dependent on technology. Without a more diverse cloud provider landscape, to me, it seems like we are sleepwalking into a potential future catastrophe with a repeat of the “too big to fail” scenarios of the financial services meltdown.
Let us know what you think. Are you seeing the repatriation of data and analytics? Have you actively chosen not to move mission-critical data workloads to the cloud?