Why Private Data Cloud?

Why Private Data Cloud?

Kubernetes containers

Sometimes nothing short of total control over data security and data residency is acceptable.

At Yellowbrick, we’ve taken a different approach to cloud database deployment. Rather than the pervasive Platform as a Service (PaaS) or Software as a Service (SaaS) models, we run entirely inside a customer’s cloud account or data center. You still get the SaaS-like simplified administration, operations and agile elasticity that customers have come to expect. This has many benefits for our customers, in particular those concerned with data residency or privacy, but also for those concerned about cost. In this post, I’ll try to explain why we went down this “Private Data Cloud” route.

Trust is a fundamental tenet of the public cloud

The need for privacy and security in the public cloud is self-apparent. Public cloud providers invest huge sums of money, in the billions of USD, to ensure their platforms are secure and trustworthy. To address the specific needs of data residency and privacy cloud service providers have invested in dedicated data centers with additional government-required controls on personnel and encryption. A quick search of any of their websites will surface their credentials to meet FedRAMP, ITAR, HIPAA, PCI-DSS and any number of global standards. Trust is an essential pillar of public cloud.

For security-sensitive public sector organizations, there are well-established dedicated sovereign AWS GovCloud and Azure Government regions in the US and other sovereign cloud regions. Google has created an entirely new Google Public Sector subsidiary to meet the needs of US federal and public sector agencies. More recently, cloud providers have been actively responding to changes in the EU-US data following rulings from the European Court of Justice. Cloud providers have been tripping over themselves to ensure customers are aware of their data sovereignty credentials. With AWS recently strengthening sovereign offerings and assurances, and Azure advancing its EU data boundary program to reduce data transfers that happen as part of everyday cloud operations to keep more data in the EU region.

While you may immediately think of military, defense, and intelligence as needing advanced cybersecurity protection, other critical national infrastructure programs such as energy and transport are equally at risk. In the UK, a recent spate of cybersecurity attacks against the UK’s National Health Service exposed some sensitive patient data and impacted critical hospitals and the 111 health emergency information service, preventing people from accessing help in emergencies.

The 2021 cyberattack on the Colonial Pipeline impacted US fuel supplies, impacting airlines and causing panic-buying at gas stations.  This major incident sparked a flurry of activity in US government circles, with the Cybersecurity and Infrastructure Security Agency (CISA) setting up a new Joint Ransomware taskforce with the FBI and the Shields Up public information campaign to improve readiness posture in commercial organizations.

Many so-called hacks are often just human error – companies misconfiguring permissions on public-facing cloud services or network firewalls, allowing public or unprivileged access where it shouldn’t have been allowed. In the cloud, this is an easy thing to do: an incorrect checkbox clicked here or there, a typo in a provisioning script, or a temporary developer configuration that makes its way into production. Services that can be configured to be public are open to mistakenly being configured to be so.

Public cloud providers are not invulnerable

Even with billions of dollars invested in security, cloud service providers like AWS, Azure, and Google Cloud cannot fully protect customers from self-inflicted security design errors. CSPs have also made great efforts to ensure their customers follow “Well-Architected Framework” principles to ensure that security and availability are considered at all stages, from system design to delivery and all the way to operations.  Good cloud systems and application design assumes most components will fail unexpectedly at some point.

Vulnerabilities have been reported in cloud service provider platforms which form the building blocks of many thousands of applications. As recently as last year, a flaw in GCP’s Cloud SQL service could have enabled regular users to escalate permissions to see data they shouldn’t have access to. Flaws in Google’s application authentication provided a potential route for attackers to deploy hidden malicious applications in a GCP project.

If you are using a managed platform, you are outsourcing some elements of data security and risk management. Software as a Service (SaaS) and Platform as a Service (PaaS) cloud services run by cloud providers or other cloud solution vendors are layered on top of base cloud provider infrastructure and can introduce their own vulnerabilities. Being a multitenant service, there is always the risk, however small, that an intrusion into the platform could expose the data of multiple customers. Security holes have previously been found in cloud platforms.

In 2021 an exploit was found that enabled users to gain access to other users’ Azure CosmosDB data – a popular NoSQL database service used in thousands of applications, through errors in the configuration of an adjacent Jupyter notebook service. In 2022, security research firm Orca discovered a vulnerability in Azure’s Synapse big data services and Azure Data Factory – a data integration service, that could have allowed users to run data jobs in other customers’ Data Factory, giving them access to their data. In December 2023, an intrusion into MongoDB’s network exposed some customer-related data and system logs, scaring users of its flagship Atlas service.

All these vulnerabilities were quickly addressed and resolved by service providers but demonstrate the ever-present risk in multi-tenant solutions. Where the risk of even a small exposure can be catastrophic for an organization’s future, or reputation, sometimes cloud provider assurances and opaque third-party operations aren’t able to mitigate potential significant impacts.

The rise of the Private Data Cloud

Yellowbrick’s customers have always had privacy and security front of mind. Our single-tenant, BYOC (Bring Your Own Cloud) model enables our Yellowbrick Data Warehouse solution to be totally private, running in a customer’s own cloud, minimizing the risk of accidental data exfiltration wherever it runs, in any cloud or private data center. Privacy-sensitive organizations that have data residency or privacy concerns can use Yellowbrick to build a Private Data Cloud.

 If you want to build an organizational data cloud that spans multiple cloud providers, you are usually left to devise your own solutions – essentially missing out on all the benefits of managed SaaS platforms. Snowflake and Databricks stepped up in the data analytics arena to address some of these concerns. However, their multi-tenant solutions are still at risk from vulnerabilities that appear in their own platforms or base cloud provider platforms with the potential to expose data from multiple customers. Of course, they invest significantly in security and make huge efforts to make sure this doesn’t happen. However, as we’ve discussed, the risk persists, and for some organizations, the risk of outsized impact still outweighs the benefits.

If you additionally need your data cloud to span on-premises data centers or co-lo facilities, you are out of luck. Early cloud provider initiatives such as Azure Stack and Google Anthos don’t even get close to delivering the scalable data cloud that organizations need on-premises. Yellowbrick is the only data platform provider that provides a modern, consistent, simple-to-operate, SaaS-like experience across multiple clouds and on-premises locations. Yellowbrick’s Private Data Cloud extends even to fully air-gapped environments with limited external network connectivity, such as mining, ships, or military/intelligence field operations.

A key technology enabling Yellowbrick’s consistent but still managed Private Data Cloud experience is Kubernetes which provides a common infrastructure orchestration framework across different environments. Yellowbrick’s former lead Architect, Robert Wipfel, calls it the “cloud OS” and predicts that the “cloud will be wherever you can run k8s”. A recent report by Nutanix reported that 97% of organizations surveyed are also investing, with at least 42% reporting challenges building and managing Kubernetes environments.

The benefits extend beyond security and privacy. The unit cost of running data platforms also varies based on location. It may be cheaper to run certain 24×7 or regular workloads on-premises when the predictability makes the benefits of cloud agility less compelling. There is also a significant 25 to 20% cost premium for using dedicated government cloud regions provided by the CSPs. A small but increasing number of vocal proponents of on-premises, for example, David Heinemeier Hansson and the team at 37Signals, actively advocate for organizations to reconsider the benefits of self-managed private cloud.

Please don’t think I am in any way trying to denigrate cloud services, but there are times when you need to keep total control or need absolute certainty on cost. You can try and build your own Private Data Cloud service. As you can see from this recent report on Hybrid-Cloud and Multicloud Analytical Data Platforms by Constellation Research, there aren’t many vendors prioritizing this approach. So, if you are looking to adopt a hybrid multi-cloud approach to your data platform either because of cost concerns, data security, or it is just the reality of your IT estate, then take the much easier route and come and talk to us about Yellowbrick’s Private Data Cloud solution.

Get the latest Yellowbrick News & Insights
Keeping Your Cloud Data Platforms Secure
This blog post sheds light on user experiences with Redshift,...
Why Private Data Cloud?
This blog post sheds light on user experiences with Redshift,...
Data Brew: Redshift Realities & Yellowbrick Capabilities –...
This blog post sheds light on user experiences with Redshift,...
Book a Demo

Learn More About the Only Modern Data Warehouse for Hybrid Cloud

Run analytics 10 to 100x FASTER to achieve analytic insights that have never been possible.

Simpler to Manage
Configure, load and query billions of rows in minutes.

Shrink your data warehouse footprint by as much as 97% and save millions in operational and management costs.

Accessible Anywhere
Achieve high speed analytics in your data center or in any cloud.