How To Avoid Drowning In Your Data Lake

Yellowbrick

April 24, 2020

5 Min Read

Yellowbrick Product

Home

blog

Yellowbrick Product

How To Avoid Drowning In Your Data Lake

A few years ago, Gartner warned us about some significant risks in data lakes that could eventually lead to “data swamps.” For many companies, that prediction was all too accurate. Many of them that have invested millions are still looking for business value, frustrated by the fact that data lakes don’t deliver on the original promise of enabling actionable analytics on huge amounts of data.

The reality is that data lakes are useful as low-cost storage and for managing a variety of unstructured and semi-structured data, but they struggle as a true real-time analytics environment. Despite repeated attempts by open source and commercial solutions (e.g., Apache Hive, Apache Impala, Greenplum, and so on), most Hadoop- or cloud-based data lakes can’t support thousands of concurrent analytics users, sophisticated ad hoc queries, data-intensive reports, or any of the other demands of a true real-time analytics system.

Instead, the right answer is to augment the data lake’s cheap storage with a fully modern analytics environment that is purpose-built to support sub-second ANSI SQL queries, even for the most complex workloads and for up to thousands of concurrent users in their favorite BI and data science tools. That environment needs to understand required file formats (Orc, Parquet, JSON, etc.), ingest extremely quickly in batch or in a real-time stream, and make it all query-able instantly. And, it should simplify and streamline data management, and eliminate the need for specialized data engineering skills.

Finally, the word “modern” implies that you should also have the flexibility to run workloads wherever it makes the most sense: in an on-premises data center, in the cloud, or both. Today, neither traditional data warehouses, nor SQL-on-Hadoop engines, nor cloud-native data warehouses check all those boxes—but Yellowbrick does.

Read the white paper, “Unlocking Data Lake Value with Hybrid Cloud Analytics” that explains the design principles behind Yellowbrick that make it an ideal solution for augmenting (or even replacing) data lakes as described above.

Platform

workloads

Resource Center

Customer Stories

About us

Newsroom

careers

partner

CONTACT US

How To Avoid Drowning In Your Data Lake

More like this

The Power of Real-Time Analytics and Yellowbrick’s Role in the Data Revolution

Workload Analytics: Tickling the Soft Underbelly of the Platform

Text-to-SQL with Dataherald and Yellowbrick

Keeping Your Cloud Data Platforms Secure

Why Private Data Cloud?

DBAs Face Up To Kubernetes

Customers

Learn Why Customers Love Us!

Sign up for our newsletter and stay up to date

How To Avoid Drowning In Your Data Lake

Product

Customers

Pricing

Resources

BLOG

Resources

competitive

Platform

workloads

Resource Center

Customer Stories

About us

Newsroom

careers

partner