Workload Analytics: Tickling the Soft Underbelly of the Platform

Yellowbrick | Mark Cusack
Mark Cusack
5 Min Read

I’ve recently written about the strengths and weaknesses of LLM-based text-to-SQL. I wanted to continue the discussion, this time turning the focus way from how text-to-SQL can be a useful tool for helping business analysts and data scientists craft a good SQL starting point, and look instead at how it can help in the area of workload analytics.

Workload analytics (WLA), or “analytics on analytics”, focuses inwards, asking questions about how well (or not) analytics workloads are running on a data platform. These questions are particularly interesting to DBAs, operations teams and especially to the finance department. Some questions worth asking include:

How can I improve the user experience for my end-users?

How can I optimize the costs of my cloud data warehouse?

How’s the utilization of my platform trending?

Questions like these are best answered by analyzing a platform’s system logs. Most respectable databases store their logs in tables that can be directly queried via SQL. So the task becomes one of crafting SQL queries to pull insights from the logs to help characterize and optimize the platform.

In the same way that text-to-SQL can provide a leg up in helping data engineers and data scientists understand their customer data, it can also help with understanding how a data platform performs. While I’m not claiming that text-to-SQL can provide 100% accurate recommendations based on the SQL it produces (it really, really can’t), it can certainly generate foundational queries that can be used to delve deeper into platform workloads.

To investigate its value as WLA tool, I used Dataherald and GPT-4o to analyze the query logs on a Yellowbrick system. The target system is running in AWS and is used by our sales engineers to run demos, train on, and for prototyping. As described previously, I modified the Dataherald code to add Yellowbrick as a target platform. I also forked and modified a Streamlit-based frontend app to add a conversational capability that would allow me to drill down into the workload analysis by asking deeper questions as part of the chat. If you want to hook it up all up to Yellowbrick, you can sign up to do so here: https://cloudlabs.yellowbrick.com/.

There’s around 100 users registered on this internal Yellowbrick system which maintains query logs for the past 30 days. I provided Dataherald with instructions to ignore queries against system views and databases as part of the prompt, as well as some annotations on fields, and then started to ask questions:

To which it responded with:

Looks like a good SQL query answer to a simple question, and it masked the user names nicely. Next, I thought I’d drill into what “user_1” had been up to – they’d clearly been busy:

Well, it discarded the anonymity in this case (lol, whutt?), but at least it correctly picked out the top user’s real login from the prompt history. Ok Resource Hog, what was your workload mix like against the most queried database?

After giving it a hint that the ‘type’ column in the sys.log_query view was a good place to look for the type of query, it came up with something reasonable from a SQL perspective:

Clearly, our friend ‘whutt’ was selecting the stuffing out of that database. How did they rack up 66,000 queries in 30 days?

Which generated:

If I was partial to a bet, I’d wager those JDBC queries were generated by JMeter during concurrency testing. Since Yellowbrick doesn’t charge by the query or GB scanned, go nuts, you great big Resource Hog, ‘whutt.’

While identifying a capricious user of capacity is fun, the real value, I think, is in being able to rapidly ask questions and getting decent baseline SQL and answers, as well as the ability to drill further into the responses. This is a potential timesaver rather than a way to open up access to those that don’t write SQL. It’s perfect for impatient people like me who have a love/hate relationship with the language and want to crack on in the general direction of the answer without getting mired in the setup.

More importantly, workload analytics — looking inward into how our platform performs from a user’s perspective — is an incredibly important and overlooked data discipline. I really do think that a little text-to-SQL action will improve user experience, optimize cost and give us better insight into the performance of our platforms.

Sign up for our newsletter and stay up to date