Andromeda Server Hardware Instance – Yellowbrick
The Yellowbrick Data Warehouse is a cloud-native, parallel SQL database designed for the most demanding batch, ad hoc, real time, and mixed workloads.
For on-premises use cases, Yellowbrick has developed the Andromeda server hardware instance and our new Kalidah processor.
Yellowbrick’s Database & Andromeda Server
Together, the Kalidah processor and Andromeda system optimize price/performance, driving new efficiencies.
The Yellowbrick cloud-native data warehouse also provides deployment flexibility by offering a cloud-compatible data warehouse and identical data warehouse for on-premises use cases.
Yellowbrick enables a more efficient business model by allowing customers to consume a modern, elastic, SaaS user experience in their own cloud account with predictable costs.
Significant Andromeda Price/Performance
With our database and Andromeda server, it’s not uncommon to find one server node providing the equivalent query throughput of a dozen or more nodes of competitive cloud and on-premises databases, at a fraction of the total cost.
The Andromeda system provides optimized performance/price for new use cases that require more concurrent users and more ad hoc analytics.
Instance Design for Data Warehousing
Parallel data warehouse workloads place substantial stress on servers, networks, and storage, similar to supercomputer applications. Unlike storage systems that just read or write data from discs and send it over a network, MPP database servers require large amounts of compute to process and transform the data before it’s read or written, and as much memory bandwidth as possible to support random lookups of data for operations such as aggregates and joins.
Furthermore, all the servers in a cluster need to continually coordinate query processing (requiring ultra-low network latency to rapidly execute short queries) and exchange data (requiring massive amounts of streaming bandwidth for large queries). During query processing, throughput will be bound by the network (latency or bandwidth), computation (cores or memory channels), or storage (reads or writes for spilling), depending on the operators in use.
Data warehouses are becoming Tier 1, business-critical applications, requiring instances to be highly available at the hardware and system level, fully resilient to hardware components (fans, power supplies, drives, adapters, etc.) failure, network failure, server node failure, and partial power failure.
For compute, we care about the cost of each CPU core, which largely dictates how fast we can go on executing instructions, and the cost per memory channel, which largely dictates how fast we can do large aggregates, joins, and sorts. With the introduction of AMD’s EPYC processors, it is affordable to acquire 64 cores of compute with eight memory channels to result in the lowest possible price per core and memory channel.
100Gb networks are now the sweet spot in cost per unit of bandwidth. Since a redundant network architecture is required for high availability, each server node has access to two network interfaces running over two separate switches. In addition, we have made use of features on the EPYC processor and the network interface to closely couple the fabric and query processing, enabling us to drive an incredible 200Gb/sec per node of data across the network – roughly 20GB/sec per node, full duplex, or 400GB/sec per chassis. To make this process efficient, we use a remote direct memory access (RDMA) fabric that allows direct movement of data – typically cache-resident – between nodes, with no TCP/IP or Linux kernel in the way to slow things down.
Each Andromeda server supports 8x 7mm NVMe U.2 drives, offering 24GB/sec of read bandwidth per node and 16GB/sec of write bandwidth. Because data is compressed, the effective read bandwidth per node is over 3x higher, sometimes peaking at over 100GB/sec of user data scanned per server node. To scan data at this rate, we need a hardware accelerator.
Download the full whitepaper to learn how Andromeda-optimized instances are designed to bring significant performance, efficiency, and economic advantages to customers deploying Yellowbrick inside private clouds.