Yellowbrick database warehouse architecture is a massively parallel (MPP) SQL relational database with agility through elastic scale, ease of use, and a management-free “as-a-service” model that vastly simplifies operations. It is designed for the most demanding batch, real-time, ad hoc, and mixed workloads. It can run complex queries at up to petabyte scale across numerous nodes, with guaranteed sub-second response times. And all of this is enabled without giving up any control of data and with an identical experience, whether running on- premises or in GCP, Azure, or AWS. We’re proud to have the best performance in the industry, at the lowest possible cost. No one runs data warehousing workloads faster than Yellowbrick.
What are MPP Databases?
We just highlighted that Yellowbrick database warehouse architecture is designed to support a massively parallel (MPP) SQL relational database. For those unfamiliar with this concept: MPP databases are optimized with multiple processers — each with its own OS and memory — to dramatically accelerate the performance velocity of even the most demanding batch, real-time, ad hoc, and mixed workloads. The MPP architecture leveraged by Yellowbrick can run highly-complex queries at up to petabyte scale across numerous nodes, with guaranteed sub-second response times.
Yellowbrick was conceived with the goal of optimizing price/performance. New SQL analytics use cases are emerging all the time, and more concurrent users are consuming more ad hoc analytics. That requires more performance per dollar spent, and Yellowbrick data warehouse architecture (see high-level view below) leapfrogs the industry in this respect. It’s not uncommon for customers to see their workloads run tens or hundreds of times faster at a fraction of the cost compared to cloud-only or legacy data warehouses.
Yellowbrick Data Architecture: Transformative vs. Incremental
With Yellowbrick data warehouse architecture, it is clear to see that we aren’t interested in incremental improvements in efficiency, however. Incremental is boring! Rather, our goal is to make step-function improvements in economics, and when it comes to data processing, these improvements come from modern hardware technologies that are more efficient than traditional systems.
Inefficiency of high-throughput data processing with Linux
Today’s hardware instances are routinely available with hundreds of gigabytes to terabytes of memory and dozens of CPU cores: At the time of writing, a single off-the-shelf instance can support 2TB of RAM and 128 CPU cores (256 vCPU), and we envisage by 2023, 192 cores (384 vCPU) will be commonly available. Running generic software on these instances does not work well: Operating system schedulers were built to wait for events and “context switch:” Threads wait for events, such as a keypress, a network packet arriving, storage I/O completing or synchronization primitives becoming available – and switch between competing threads and processes to try to be as fair as possible and use buffers efficiently. As a result, it’s not uncommon for modern databases to do tens of thousands of context switches per second per CPU core, and millions of them per second in aggregate.
Conventional wisdom states that if you’re not spending much CPU time context switching – under 10% – you’re in good shape; context switches are cheap with a good operating system. However, this assumption is outdated. Modern CPUs get their performance from processing data from their caches, typically called L1, L2 and L3. The L1 contains data pertinent to the most recent processing, the L2 cache is larger but slower to access, and likewise the L3 cache. The L1 cache per CPU core is measured in tens of KB, the L2 cache in hundreds of KB, and the L3 cache single-digit megabytes.
When this context switching and bouncing in and out of complex Linux kernel subsystems is happening continuously across dozens of cores, any modern CPU will struggle to work efficiently. The DBAs will be none the wiser because the CPU will be 100% utilized, but under the covers, due to the inherent limitations of the database architecture, the database is achieving only a fraction of the theoretical maximum efficiency.
The Yellowbrick Kernel for a Modern Data Warehouse Architecture
To avoid these Linux-intrinsic problems, we built a new OS kernel — which we call the Yellowbrick kernel — from scratch, in order to establish the necessary foundation and capacity for a modern data warehouse architecture. It implements a new execution model to eliminate measurable context switching overhead, and eliminates penalties associated with accessing storage, the network, and other hardware devices. We do that with a new, reactive programming model for the entire data path.
The Yellowbrick Kernel is implemented as a “user space bypass kernel” – a Linux process that takes control of most of the machine and attached I/O devices. As a Linux process, it can run comfortably in container environments such as Kubernetes, in virtual machines, or on bare metal. It assesses how much “bypass” capability is possible in each environment and then adapts to use as much of it as it can, so it performs optimally in a VM on a 10-year-old laptop, a container in Amazon EKS, on OpenShift in a private cloud, or on bare metal in a custom-designed blade server. When Yellowbrick starts, Linux is relegated to being a supervisor agent that collects logs and statistics, with all core data path functionality bypassing it completely.
Some of the principles of this new programming model, which is at the heart of the Yellowbrick data warehouse architecture, include:
Memory management: The Yellowbrick kernel intrinsically understands NUMA (Non-Uniform Memory Architecture) machines. At database startup, almost all memory in the system is handed to Yellowbrick and pinned (to make sure Linux never swaps in/out our process). Physical-to-virtual mappings are noted so hardware devices can directly and safely access the database memory bypassing the kernel.
Threading and processes: Yellowbrick has a modern threading model based on reactive concepts such as futures and co-routines. Small, individual units of work called tasks are scheduled and run to completion without preemptive context switching.
Device drivers: Traditional device drivers run in the Linux kernel and interrupt execution whenever something happens. In contrast, Yellowbrick device drivers are asynchronous and polling in nature. Access to drivers is always via queue with well-defined interfaces. Drivers are present for general PCIe devices, NVMe SSDs, various network adapters, and so on, all of which work without Linux’s involvement. In cases where Yellowbrick is running without bypass being available, emulated drivers for each class (network, storage, and so on) are present that fall back on the Linux kernel or on software emulation.
Networking: Like many modern, microservices-based software stacks, Yellowbrick is implemented in a variety of different languages. We primarily make use of C, C++ , and Java, with a sprinkling of Go and Python where necessary, and these services need to talk to each other. The use of abstracted, high-performance, zero-copy networking with standard interfaces brings benefits to the Yellowbrick database that can’t be matched by legacy databases: We have clocked a single CPU core sending and receiving 16GB/sec of data across the network in the MPP fast path, with time to spare. When using the Linux kernel, around 1.5GB/sec is the limit and the CPU core is fully loaded, leaving no time for data processing. The Yellowbrick architecture and networking technology allows expensive parts of database queries – such as re-distribution of data for joins, aggregates (GROUP BY), and sorting – to run 10x more efficiently than competing databases, using a fraction of the resources.
For more details about Yellowbrick data warehouse architecture, and its related design concepts, watch CEO Neil Carson’s keynote from the recent Yellowbrick Summit 2021 virtual event:
And for an even deeper dive, read the Yellowbrick Database Architecture White Paper.