A Data Engineering Use Case

A Data Engineering Use Case

I wrote this blog a while ago, espousing the benefits of using Yellowbrick alongside Snowflake.  I was chatting recently to a Yellowbrick client, NC Solutions, who follows this model to engineer massive quantities of data for their retail customers in Yellowbrick on AWS and use Snowflake’s data-sharing model to disseminate this data. This blog will explore the rationale behind this architecture.

The “better together” argument is based on workload and processing window – that some jobs are better suited to Yellowbrick’s unique high-performance architecture. Furthermore, cloud consumption-based pricing models come at a premium. If you need agility and elasticity, it’s worth it, but if you are running the same workload every day and it’s growing only slowly, then it probably isn’t. With Yellowbrick, throughput is faster, and cloud consumption costs are reduced with a system designed for performance, aka efficiency. 

The arguments against best-of-breed are typically the cost of integration and the opportunity cost of not creating value elsewhere. Ultimately, it depends on the use case, but as this NCSolutions example shows, high-performance analytics was key, and integration with Snowflake relatively straightforward.

NCSolutions delivers a unique and differentiated service for ad performance management. Through Snowflake Marketplace, NCSolutions offers their “CPG Insights Stream”, a first-of-its-kind service that provides a holistic view of consumer-packaged goods (CPG) shopping behavior in the United States at the household level. This unified data source enables CPG brands to identify their best prospects and reach them with the right message regardless of the medium. In short, with NCSolutions, CPG companies drive more revenue from advertising.

Assembling CPG Insights Stream is a mammoth feat of data Engineering. Deducing ad effectiveness and predicting consumer behavior from billions of rows of point of sale (POS) information and correlating it with other data sources, such as industry-specific events, event-driven promotions, and weather-dependent factors, is NCSolutions’ IP. They have built a unique business around their command of data engineering.

Data engineering involves designing, building, and maintaining systems for collecting, storing, processing, and delivering data. It encompasses data collection from various sources, determining storage solutions, transforming and processing data for usability, integrating diverse data sources, ensuring data quality, and constructing efficient data pipelines. Data engineers focus on scalability, security, and automation while monitoring systems for optimal performance. Unsurprisingly, this is critical for creating NCSolutions’ strong foundation for effective data utilization and analysis.

NCSolutions uses Yellowbrick on AWS as their data engineering and preparation platform. While they did not share with me the rationale for using Yellowbrick over Snowflake, I can only surmise that it’s because Yellowbrick is better at this type of data processing and can do so at less cost and faster. When dealing with any performance management discipline, time and speed are of the essence.  A one-way flow of data from Yellowbrick to Snowflake makes this more feasible, with Snowflake being the single source of the truth and Yellowbrick providing the transient pre-processing step necessary to get there. Data consolidation, correlation, transformation, and deep analytics are done in Yellowbrick. BI and data access for CPG Insights Service Subscribers is via Snowflake Data Cloud and Snowflake Marketplace by.

To summarize. Best of breed only works when the benefits of increased complexity from adding an additional system outweigh the implementation and running costs This benefit needs to be by at least one order of magnitude for any real impact.  In this example, complexity is significantly reduced by the one-way data flow, with Yellowbrick operating as a pre-processing step for data engineers and Snowflake doing the things it excels at including Snowflake Marketplace. Data volumes and latency are clearly factors that also need to be considered as does having access to proficient data engineering skills.

NCSolutions data engineering use case
NCSolutions data engineering use case

Yellowbrick is used for data consolidation, transformation, correlation and advanced analytics before data is sent to Snowflake Marketplace for sharing with service subscribers.

Get the latest Yellowbrick News & Insights
Why Private Data Cloud?
This blog post sheds light on user experiences with Redshift,...
Data Brew: Redshift Realities & Yellowbrick Capabilities –...
This blog post sheds light on user experiences with Redshift,...
DBAs Face Up To Kubernetes
DBAs face new challenges with Kubernetes, adapting roles in database...
Book a Demo

Learn More About the Only Modern Data Warehouse for Hybrid Cloud

Run analytics 10 to 100x FASTER to achieve analytic insights that have never been possible.

Simpler to Manage
Configure, load and query billions of rows in minutes.

Shrink your data warehouse footprint by as much as 97% and save millions in operational and management costs.

Accessible Anywhere
Achieve high speed analytics in your data center or in any cloud.