Trickle Loading Data via JDBC

Unlike traditional OLAP systems, a Yellowbrick database stores data in both column-oriented and row-oriented storage to support both high-performance large-scale analytics and real-time trickle loads. Column-oriented storage is common in OLAP systems because it is the most efficient storage for large-scale analytics; however, columnar storage requires data to be loaded in large batch sizes of ten million rows or more. Because of this restriction, most analytic databases only support large bulk data loads, which both delay the time until data is available for processing and complicate data transfer.

To support “real-time” data processing requires an approach that allows insertion of individual rows into the database. Therefore, Yellowbrick implemented a row-oriented storage engine in addition to columnar storage. This approach allows efficient loading of individual rows with SQL INSERT statements. Data is always visible in real time because queries automatically process data from both the row storage and the columnar storage. Additionally, Yellowbrick automatically manages data transfer between the row and column stores in the background to maximize performance without user intervention.

This section describes the best practices required to achieve up to 250,000 rows per second using SQL INSERT statements and the JDBC driver. For higher ingest rates, up to and exceeding 10,000,000 rows per second, you can use the Yellowbrick bulk loader (ybload).