Data Warehouse on Kubernetes

Yellowbrick Logo
DBAs Face Up To Kubernetes

DBAs Face Up To Kubernetes

sql query generator

The role of the DBA has gone through many twists and turns over the last couple of decades. From the days of deciding which files to put on which disk spindle on which storage controller in which RAID configuration. The shift to SAN storage brought battles with storage admins. Developers needed the DBA’s god-like understanding of database internals and arcane query execution plans to deliver the performance they needed. Mission critical services needed highly available clusters and replicas to be built. And as databases and applications sprawled and concerns about security started to rise, automation of centralized and virtualized fleets of thousands of databases became the new normal. Most recently teams are left figuring out how to optimize data services in the cloud in terms of cost, performance, and complexity and choosing between an ever-changing variety of database architectures.

Indeed, many different roles now take on DBA responsibilities; it’s rare to find a dedicated DBA in all but the largest enterprises. Platform engineers and DevOps teams often take on the responsibility to build the automation to deploy databases, with developers left to configure and design the database model and processes.

The new breed of database technologies focussed on analytics like Yellowbrick are expressly designed to be autonomous – essentially “load and go”, eliminating that need to understand database storage models and query execution plans. Yellowbrick is self-indexing and self-optimizing, with no secondary indexes. For time-critical, repetitive processes there’s some scope for optimization but the aim is to run fast out of the box. Maintenance tasks such as statistics and index maintenance, object garbage collection and data compaction (vacuuming) now automated. Yellowbrick’s Direct Data Accelerator approach efficiently utilizes compute resources without the use of in-memory caching, delivering the fastest query processing without optimization.

With Yellowbrick, one task that remains for the DBA is workload management – ensuring fair utilization of resources between different consumers inside the same cluster. It’s a control point we believe is important for customers to ensure great experiences for their end users while keeping control over costs. Giving each set of users their own cluster is another approach to de-conflicting different workloads.

DBAs also worry about high availability – ensuring their database service is resilient to failures. Yellowbrick is built on Kubernetes which ensures any failed processes are restarted delivering a highly resilient database service. This is even more important in the cloud as interruptions can and do happen outside of your control. Highly resilient cloud object storage, like AWS S3 and Azure Data Lake Storage, minimize the potential for data loss with 11 9’s of durability.   When you need DR, Yellowbrick’s in-built replication continuously replicates data to a DR instance with minimal configuration. Backup and restore are also easy to automate.

Kubernetes also provides the framework for elasticity – the ability to grow and shrink clusters or add clusters and load-balance across clusters on-demand. Elasticity minimizes the need for forward capacity planning and over-provisioning, another DBA activity that’s gone by the wayside.

Of course, the great thing about Kubernetes is that it brings portability, which means you can run Yellowbrick anywhere you need to – public cloud or private cloud, without changing how you operate or manage it.

A major downside of Kubernetes is that it can be complex to manage. With Yellowbrick, Kubernetes is very much an internal hidden technology. All DBA operations are executed via SQL – even operations like scaling clusters. Just as well, since one of the biggest reported challenges in Kubernetes adoption is difficulty finding advanced skills, with 48% citing lack of in-house skills as the biggest challenge with Kubernetes and containerization (according to Canonical’s Cloud Native Usage Report 2022). Advanced Kubernetes can of course look under the covers, and we are working on opening up lower-level APIs and a dedicated Kubernetes operator for teams that prefer to use this route for management.

Another major challenge with Kubernetes is security, with 38% in the above report indicating challenges and only 13.5% reporting they’ve mastered the security. Yellowbrick takes care of ensuring that Kubernetes services are highly secured and up to date, which is of great importance to many of our customers, particularly our Federal and Financial Services customers.

Despite 71% of respondents hosting databases in Kubernetes, according to the Cloud Native Computing Foundations 2022 survey.  Only 2% are running big data platforms like Spark and MPP databases don’t even make the list, according to Dynatrace’s Kubernetes in the Wild Report 2023. If only there was an elastic MPP Data Warehouse that was built natively for Kubernetes!? Meet Yellowbrick.

In summary Yellowbrick Data Warehouse’s Kubernetes foundation provides all the benefits of portability, resilience, and elasticity. We’ve done all the hard work to abstract away all the complexity of deploying, securing, and managing Kubernetes technology. With our investment in Kubernetes, Yellowbrick’s customers are future-proofed from changes in underlying technology platforms on-prem or in the cloud. Our approach reduces both costs and complexity, and enables a high-performance MPP data warehouse to be managed just like any other application.

Get the latest Yellowbrick News & Insights
Data Brew: Redshift Realities & Yellowbrick Capabilities –...
This blog post sheds light on user experiences with Redshift,...
DBAs Face Up To Kubernetes
DBAs face new challenges with Kubernetes, adapting roles in database...
Unleashing Innovation: The Oft Overlooked Power of the...
Yellowbrick recently partnered with Vitrifi Digital, a well-funded start-up on...
Book a Demo

Learn More About the Only Modern Data Warehouse for Hybrid Cloud

Faster
Run analytics 10 to 100x FASTER to achieve analytic insights that have never been possible.

Simpler to Manage
Configure, load and query billions of rows in minutes.

Economical
Shrink your data warehouse footprint by as much as 97% and save millions in operational and management costs.

Accessible Anywhere
Achieve high speed analytics in your data center or in any cloud.