Data Warehouse on Kubernetes

Yellowbrick Logo
Keeping Your Cloud Data Platforms Secure

Keeping Your Cloud Data Platforms Secure

Recent major attacks against Snowflake customers have resulted in significant data breaches. Yellowbrick’s CEO, Neil Carson, recently wrote an excellent breakdown of the issues in a recent blog post. Snowflake’s answer was to exhort all customers to switch on MFA. Is MFA really the magic solution here? I’m not convinced. I believe there are deeper-rooted problems in cloud data security at play here.

MFA, the silver bullet?

MFA or Multi-Factor Authentication requires you to authenticate in multiple ways to prove you are who you say you are. MFA requires a combination of something you know (eg password); something you have (eg cellphone); and something you are (eg fingerprint). We all experience the most common scenarios with a password followed by a one-time code sent to our cellphone, or apps prompting for fingerprint ID. Only if you are holding the physical phone and have access to it pronto will you be able to pass that test.

If MFA was turned on, the thinking is that the hacker with stolen credentials would not have been able to complete authentication to Snowflake and therefore would not been able to get at the data they eventually stole. You can turn on MFA in two ways. The easiest, particularly if you are a small company with limited IT resources, is to use Snowflake’s own MFA solution (powered by Cisco Duo). Turning on MFA is reasonably straightforward in Snowflake… but you have to know to do it. It’s off by default (for now). No one tries to hide this not even Snowflake themselves.

Most large enterprises use a technique called Single-Sign-On (SSO) which enables staff to logon to many different systems using their normal corporate credentials (the same one you use to login to your machine or get your email). More often than not, MFA is part of that SSO authentication process. You can configure Snowflake to use your corporate SSO solution and identity provider (IdP). Snowflake can then rely on your own IdP configured by your own security team to ensure that you are who you say you are after performing all the policy checks that your administrator has enabled.

Easy isn’t always better

So why isn’t it on by default. If you are selling a service, you want to make it as easy as possible for people to get started. Not everyone you sell to is an IT guru. In fact, many vendors target non-IT folk because they can sell to them based on value and ROI outcomes, and avoid technical complexities up front. Who wants hard anyway? Easy is better. Right?

“Easy” is often another way of avoiding the hard bit of the solution design not doing away with it altogether. Configuring SSO to applications like Snowflake requires a web of people across different IT functions like cloud and network teams with privileged access, and sign-off from information security (InfoSec) teams. Less than easy in most large organizations, impacting on demo-ability and sales velocity.

How do multiple cloud services talk securely?

The situation gets more complicated when you layer cloud services. For example, how do you securely connect from Azure’s enterprise data preparation tool, Azure Data Factory (ADF), part of Microsoft Fabric (formerly Azure Synapse) to Snowflake? It’s well documented.

Getting it done is not trivial. You need admin access to multiple technologies, and several roundtrips to Snowflake support. Rinse and repeat for each Snowflake or ADF instance you set-up.  Now figure out how to automate it for your production deployments.

Now clearly there are people who can and have set things up correctly for their specific needs. However, even experienced IT and cloud pros would struggle to read the above set of instructions and know exactly what is going on. That’s just one pairing of tools. Often there’s a chain of tools in any cloud application and it is rare that it is only two. Now automate this process so you can repeat it… The problem is worse with Platform-as-a-service (PaaS), or software-as-a-service (SaaS) solutions that don’t fully run inside your network. Add into the mix OAuth, External OAuth, Private Link and Private Link Service, Managed Identities or Service Accounts, and Private DNS. You soon realize that you are not totally in control. All the solutions strive to protect you but all do it differently.

MFA can also get in the way of automating tasks and programmatic app-to-app authentication. Let’s say we want to build a report in and have it refresh automagically every day. When we run the report, we authenticate with MFA. How does the automation tooling do MFA? There are ways to approach this, but many tools can’t manage it. There’s a temptation to ask for an “exception” and turn it off for this one user or this scenario. I’m not saying that’s what happened in the recent spate of data breaches, but it’s one reason why things like MFA don’t get used and another reason why it isn’t on by default.

Source: Runtime

Most data projects in the cloud need multiple services to interface together securely. There may be a cloud data source, a storage service like S3, a data integration tool like Azure Data Factory or AWS Glue, an orchestration tools like Airflow or scheduled AWS Lambda functions, a data warehouse like AWS Redshift, Databricks, Google BigQuery or Snowflake, a BI tool like Power BI, Looker, or Tableau.  There is no cross-service trust. Each tool operates in its own security bubble and requires you to figure out how to make things work securely – or if that is even possible.

Cloud security spaghetti

Here’s another example. Snowflake has excellent tools to make sure that you can only connect to Snowflake from networks your IT team has blessed. These are called Network Policies and Rules. Great for Azure and AWS, but in Google Cloud they don’t work (or at least not yet). To achieve something similar you need Google’s Private Service Connect together with the more expensive Snowflake Business Critical Edition. If you want to layer on SSO with that there’s another complex set of instructions with more roundtrips to Snowflake support.

And then for each scenario there are exceptions. Take this one for example:

Source: Snowflake User Guide

Who can keep up with all these different permutations? It’s truly mind boggling. It amazes me how there aren’t more security incidents. As someone with a technical background who has advised customers on cloud data solutions for many years at Microsoft and elsewhere, the security challenges often perplexed me personally. I know security design continues to challenge consultants and advisors even from the solution providers themselves.

The obvious temptation then is to try and do everything in one platform in the hope that the vendor can stitch their own services together in a way that makes sense and is easier to manage. Remember though that each PaaS service, even if they are all owned, developed, and managed by a single vendor are really a conglomeration of independent services. It is just as complicated for them. I remember, in my time with Microsoft, how customers were highly concerned at the need for a public IP address for Azure Synapse and other Azure data services – at best you could firewall it off, you couldn’t eliminate it entirely. Things have improved slightly since then.

Getting data security right

Platforms hosting sensitive data should be secure and inaccessible by default. The principle of least privilege access should apply at all times. The more traditional approach of putting critical platforms behind your own network firewalls, your policy controls, managed by your security teams remains the safest option. As can be seen in Snowflake’s reaction, ultimately security responsibility falls to you and your organization.

At Yellowbrick our largest customers from the very get-go recognized these challenges and pushed us explicitly not to go down the route of Snowflake and other data platforms in our cloud platform. Their caution is paying off. Yellowbrick’s data platform always sits behind your firewall, in your network, protected by your own security teams. Easier to reason about, easier to design, and we’ve made it just as easy as SaaS or PaaS data solutions to manage. Better data security, no management compromise. Of course, not forgetting higher performance and lower cost than your existing managed data service.

If you are using a managed PaaS or SaaS service turn on MFA. MFA isn’t a magic bullet but will help in some cases. Data encryption and tokenization solutions should also be part of the mix. Ensure conditional access policies in your Identity Provider’s SSO platform are configured to adequately protect data services. Push your data platform vendors to ensure they are constantly monitoring traffic to minimize the risk of bad actors, just like your credit card company or bank does. Don’t forget to audit access to ensure that your controls are effective and working.

I don’t in any way mean to pick on Snowflake. The problem here is a much more general one. How do you reliably secure communication between multiple managed cloud services when each cloud vendor, each data platform, each data integration platform, each AI/ML tool, and each BI platform all operate in their own security bubble, and each have their own idiosyncratic approaches to authentication and authorization? I’ve yet to find a data governance platform that solves these problems in their entirety.

Stay data safe folks.

Get the latest Yellowbrick News & Insights
Keeping Your Cloud Data Platforms Secure
This blog post sheds light on user experiences with Redshift,...
Why Private Data Cloud?
This blog post sheds light on user experiences with Redshift,...
Data Brew: Redshift Realities & Yellowbrick Capabilities –...
This blog post sheds light on user experiences with Redshift,...
Book a Demo

Learn More About the Only Modern Data Warehouse for Hybrid Cloud

Run analytics 10 to 100x FASTER to achieve analytic insights that have never been possible.

Simpler to Manage
Configure, load and query billions of rows in minutes.

Shrink your data warehouse footprint by as much as 97% and save millions in operational and management costs.

Accessible Anywhere
Achieve high speed analytics in your data center or in any cloud.