The Dirty Little Secret of analytics infrastructure

The Dirty Little Secret of Analytics

March 7, 2023

Umair Waheed

Head of Product Marketing

You’ll have heard some variation of “data is the new oil.” I’m not the biggest fan of that analogy, but for those that care about the environment, that expression takes on new meaning.

Hidden Costs of Analytics Workloads

Analytic workloads are computationally expensive. In a typical data center, you will find tens of cabinets of equipment dedicated to the core analytics database – many times more than one. These cabinets contain Storage Area Network (SAN) infrastructure with thousands of individual drives, storage servers, multiple network fabric switches, and then the database servers with thousands of cores and hundreds of terabytes of memory – all wrapped with thousands of miles of cable. Not forgetting the air cooling required to keep these machines running optimally.

Clearly, there’s an ongoing energy footprint for these machines that are running 24x7x365 – even with modern power management, there’s a significant constant energy draw. There are sunk energy costs in the manufacture of this hardware and its transport. Let’s ignore for now the legions of people responsible for its care and feeding.

The Analytics Carbon Cost

But we’ve barely scratched the surface of the analytics carbon cost. Around this central analytics infrastructure sits the analytics sprawl – hundreds of applications, Business Intelligence tools, supplementary databases, and data marts – needed because this central platform, no matter how well it was architected ten years ago now cannot meet the data-hungry needs of the modern business. Incompatibility between analytics vendors means that data is often copied from one platform and formatted to another and then sometimes back again.

Add to that all the integration and monitoring technology needed to keep the flow of data going; because things can and do go wrong… all that data is duplicated multiple times in backups and disaster recovery systems with data often shipped between data centers. When we dig a little, we find many systems, integrations, and applications are kept around because we’ve always had them. We are afraid to turn them off because we don’t want to risk unforeseen impacts. We end up with multiple copies of the same data in multiple siloed repositories to resolve resource and financial conflicts between the needs of different teams.

All of this takes compute power which draws energy and adds to our carbon footprint.

Cloud is Not Always Efficient

Even data at rest is consuming carbon. You may think, “Well I unplug my external disk drive and it isn’t drawing any power.” But the enterprise technology world rarely works like that. With the exception of robotic data archiving solutions using magnetic tape, most data is kept online, ready for action, even if it is hardly ever used. It’s just easier that way – or at least that’s what vendors have led us to believe. We are generating more data than ever before, and we are keeping an ever-higher percentage of it around, just in case it turns out to be useful.

“Yes, I know,” you say, “but I’m shutting down my data center and moving to the cloud. The cloud is greener right?” Cloud providers invest in highly efficient data centers, invest in recycling, and in the green energy infrastructure needed to power them. True. However, the cloud rarely drives consolidation and rationalization. We are mostly running all the things we were running before, sometimes now more inefficiently in the cloud as we have less control… and every day we think of new uses for that data. Vendors are focused on making data more accessible, and easier to consume and process – but often at the cost of processing efficiency.

Cloud vendors have a reputation to uphold. They can’t lose any of your data… so they make sure… really, really sure, that your data is safe by replicating it multiple times. Even though you may not care about losing a particular bit of data… the cloud vendor doesn’t know that and anyway they can’t afford the reputational risk in case you change your mind. They also need to make sure they have capacity when you decide you need it… i.e., lots of spare compute.

Reducing Our Carbon Impact

It’s not all gloom though. Data analytics can power new more efficient business models, reduce the carbon footprint of supply chains and delivery through intelligent scheduling, help us create more efficient engines, and reduce water and chemical consumption in agriculture. Data plays a huge role in helping us to reduce our carbon impact.

But we can do more to reduce the carbon impact. At Yellowbrick we strive to build the most efficient cloud data warehouse engine to power business analytics on the planet. Our engineers focus on extracting efficiency from every component in the chain, driving exceptional query performance and higher user concurrency on a radically smaller infrastructure footprint. One side effect is lower costs. Another surprise for customers migrating to Yellowbrick is that they end up running on a fraction of the infrastructure they ran previously – sometimes up to 97% less – and still get through more analytics tasks, for more users, much faster, delivering higher levels of analytics productivity, happier analytics teams, and more immediate, more impactful business insights.

Check out this analysis of multiple Yellowbrick customers by Nucleus Research, which concluded that Yellowbrick customers benefit from 100-fold query improvements and an 8x reduction in data center footprint. Alternatively, hear directly from Nigel Pratt, SVP Technology at Symphony RetailAI on how much they were able to consolidate multiple platforms including Netezza, Amazon Redshift, SQL Server and 1010Data onto Yellowbrick and achieve simplicity and major cost savings.

You really don’t need hundreds of terabytes to get these savings. With Yellowbrick’s cloud data warehouse offering there’s really no minimum size. Talk to one of our team today to see how we can shrink your data and analytics carbon footprint.