Data Warehouse on Kubernetes

Yellowbrick Logo
Yellowbrick | Spray Paint

Yellowbrick Data Warehouse dramatically improves performance and reliability of a critical fraud detection application

ThreatMetrix – Split-second Fraud Detection
Industry: Financial Services
Business use cases: Risk Management & Fraud Detection
Technical use case: Data Lake Augmentation


LexisNexis Risk Solutions, initially known as ThreatMetrix, is a leader in global digital fraud detection and identity authentication services. Central to its operations is the LexisNexis Digital Identity Network (DIN) powered by a sophisticated ML model, which serves over 5,000 brands in 244 countries, illustrating the company’s extensive global reach and influence in digital security.

Key Statistics and Operation

  • The DIN processes over 8 billion transactions monthly across 8.2 billion devices.
  • The system streams 200+ data points and calculates 1,000 extra properties for each transaction, all within an average time of less than 60 milliseconds.
  • Clients utilize a 300TB multi-tenant database over 25,000 times daily, integrating up to 1TB of new data from a data lake via Kafka.
  • The platform adeptly handles complex, simultaneous queries from hundreds of users, accessing data across months and millions of records.


Challenges faced by business users

LexisNexis’s data pipeline was initially built using a variety of technologies, including Apache Kafka, Apache Cassandra, Apache Apex, Apache Impala, and Greenplum. Despite leveraging these advanced technologies, LexisNexis encountered significant operational challenges, especially during peak activity periods. The growing size of data sets and an increasing number of users put a strain on their infrastructure, leading to several critical issues:

  • Data Ingestion Delays: Ingesting data took up to a minute due to small-file writes and necessary compaction.
  • Long Query Completion Times: Customers faced query times up to three minutes, significantly hindering efficiency.
  • Frequent Outages: Unpredictable outages in the data pipeline led to customer frustration.
  • Complex to Change: Implementing business process changes, such as adding new data columns, was a lengthy process, often taking weeks.


Next-Gen Database Needs for DIN:

  1. Flexible Query Capability: Facilitate customer-initiated, ad-hoc queries over a 6-month data period for datasets larger than 3 billion records without preset queries.
  2. Rapid Data Ingestion: Ingest over 5,000 rows per second, with the data being ready for querying within a minute.
  3. Wid Tables: Store a main table with 40,000 rows, 1,200 columns, and more than 1 petabyte of data.
  4. High User and Query Volume: Support over 250 users simultaneously and process more than 100,000 daily queries, keeping query response times below 50 milliseconds.

3X speed from 4X fewer nodes

By transitioning to Yellowbrick, LexisNexis achieved a significant performance boost, integrating smoothly with the existing data pipeline. End-users experienced marked improvements, with most operations completing in milliseconds. This enhancement was realized using only 15 nodes, which is a quarter of the previous number, and with 80% less memory than the prior solution.

Results include:

  • Improved Customer Experience: Leveraging Yellowbrick’s rapid processing, LexisNexis delivers up-to-date and in-depth insights more efficiently.
  • Minimal Management: Yellowbrick’s automated resource allocation reduces administrative needs, with no manual performance tuning required.
  • Enhanced Customer Experience: Stability and global distribution of Yellowbrick instances mean reliable service and flexible workload management, improving overall customer satisfaction.

“Compared to other data warehouses and Hadoop-based solutions, Yellowbrick Data provides superior performance.”

- Matthias Baumhof,
CTO LexisNexis Risk Solutions
Yellowbrick | Panda
Yellowbrick | Panda

Top Rated in Customer Reviews

Yellowbrick is a leader in Data Warehouse on G2
Review Yellowbrick on G2

Join Us for a Webinar

Meet our experts and learn how to leverage Yellowbrick's secure and fast query response.

Book a Demo

Blazing-fast performance at petabyte scale awaits you.

Book a Demo

Learn More About the Only Modern Data Warehouse for Hybrid Cloud

Run analytics 10 to 100x FASTER to achieve analytic insights that have never been possible.

Simpler to Manage
Configure, load and query billions of rows in minutes.

Shrink your data warehouse footprint by as much as 97% and save millions in operational and management costs.

Accessible Anywhere
Achieve high speed analytics in your data center or in any cloud.