Yellowbrick Data Warehouse: Engineered for Extreme Efficiency
Some of the smartest database minds in the world have been building Yellowbrick’s MPP database engine from scratch over the last eight years. The technology underpinning the Yellowbrick Data Warehouse is highly differentiated from competitors – we are not just delivering an MPP layer on top of someone else’s database. Yellowbrick’s fast, efficient data warehouse processing engine translates into huge resource efficiencies, lower resource and energy consumption, faster queries, and higher workload density – ultimately delivering more productivity and value. The Yellowbrick Data Warehouse unlocks data for every enterprise user with efficiencies driving lower costs.
There are four primary components to the Yellowbrick database engine, all written from scratch by Yellowbrick: the Storage Engine, the Execution Engine, the Workload Manager, and the Query Compiler. In addition, Yellowbrick’s engineers have optimized the entire data path and operating system process management, something we call the Direct Data Accelerator®.
All of these optimizations work silently, behind the scenes whether Yellowbrick is running in the public cloud using commodity compute and storage, or on its optimized hardware appliance, Andromeda.
This overview will walk through some of what makes Yellowbrick technically different.
Yellowbrick’s Direct Data Accelerator® shrinks or removes bottlenecks in the flow of data all the way from storage through to the CPU, across the network, and back to the client. This requires optimizing operations at a significantly lower level of the technology stack than most vendors would dare to tread in areas of the technology stack buried so deep that they rarely see the light of day.
Re-envisioning the Operating System
Most database platforms run on general-purpose Operating Systems (OS) built to run lots of different workloads together. Yellowbrick has re-envisioned a single-purpose OS, optimized for database workload efficiency, by-passing the OS for task scheduling , device interfaces, and memory management. Co-operative multitasking both within a single node and across distributed compute nodes ensures queries get answered faster.
By-passing Main Memory Increases Performance and Efficiency
Traditionally database platforms move data from storage to main memory and then start operating on the data based on an outdated assumption that this improves performance. This results in wasted CPU cycles cycling data in and out of this in-memory data cache which wastes valuable main memory. Memory which could be supporting critical calculations not serving database internals. By architecting for modern high-performance NVMe storage and random reads, and ignoring legacy assumptions, data gets to the CPU at memory transfer speeds.
Re-imagining Device Interfaces with OS By-pass
Workload Manager
Query Compiler, Workload Manager, Execution Engine, Storage Engine
Execution Engine
Flows Data Efficiently Through the Query Graph