I’d like to take you through an interesting bug I dug into and fixed in my first months here at Yellowbrick. I joined the planning and optimization team about four months ago, which focuses most of its time on the query-planning component of our core product, the Yellowbrick Data Warehouse, a MPP (Massively Parallel Processing) SQL database. The query planner makes sure queries run quickly, efficiently, and that the results of a query are correct.
The bug in question comes from a query that a customer sent to us. The query, when ran, very quickly caused the database to run out of memory. This was an interesting bug to dive into as a new team member because it gave me a lot of exposure to the different parts of the stack.
I call this bug the “Case of the Exploding COALESCE” for reasons which I hope will become obvious. In this blog post, you’ll get a glimpse of the structure of the Yellowbrick stack, and an example of the abstract problems computer science students learn about manifesting in the real world.
The Yellowbrick Stack
Before diving into the query that sparked the “Case of the Exploding COALESCE” I’d like to give you a brief overview of the Yellowbrick stack. There are three main components to consider:
- Postgres front-end (forked and heavily modified)
- Yellowbrick Lime Query Compiler
- Yellowbrick Workers
When a user submits a query, it first lands in Postgres front-end which parses the query and produces a query plan. That query plan is then serialized and sent off to Lime. Lime has several jobs, but in this context, its job is to take the serialized query plan and generate query-specific C++ code that is then compiled and sent to the workers – this all happens on-the-fly. To finally execute the query, the workers execute the compiled C++ code and send the results back to the user via Postgres.
There are definitely many more moving pieces required to realize the Yellowbrick stack in its entirety, but we now have the basic prerequisite knowledge of the stack to make sense of the bug and the rest of this blog post.
The Explosions Begin
Now let’s dive into the query that sparked the investigation. Of course, I can’t share the exact query submitted to us, which was kind of large and difficult to understand without staring at it for a bit (okay more than a bit), but I can share a query that captures the exact essence of the problem.
The query contains a lot of COALESCE functions. As a quick reminder, a COALESCE function takes any number of arguments and returns the first non-null argument, and if all arguments are null then it returns null; it’s a convenient way to encode basic if/else logic where nullness is concerned.
Here is the version of the query I’d like to share with you:
SELECT COALESCE(column1, 'default') as field1, COALESCE(field1, column2) as field2, ... COALESCE(field24, column25) as field25 FROM some_table;
It is important to point out something here that is not seen in many other databases. We can see in the query that field2 expression references field1 as part of its definition. The only other database I am aware of that has this convenient syntactic sugar is Teradata. For clarity, if we had to write out the definition for field2 without this syntactic sugar, it would look like this:
COALESCE(COALESCE(column1, 'default'), column2)' );
It then follows that in the query field25 would have COALESCE functions nested 25 layers deep.
When I ran the query the memory the query planner (i.e., Postgres) consumed quickly ballooned and an OOM (out of memory) event occurred – it didn’t really matter how much memory I made available. Even if I made 256GB of memory available Postgres would still eat through all of it. I honestly didn’t quite know where to begin at first, so I attached a debugger and hit pause while I watched Postgres’ memory consumption swell. This wound up being a reasonably effective way to get going.
The debugger landed in a function that calculated the common typemod (type modifier) of the arguments in a COALESCE function – let’s call it select_common_typemod. If you’ve worked with SQL, you’ve seen a typemod before. It’s not common terminology though. As a basic example, when declaring a column as varchar(10) the typemod is 10. The reason we need to calculate the common typemod is that the arguments to a COALESCE function can have different typemods and we need to know the largest typemod across them to accurately estimate the memory required to execute a query. The following is the core of what select_common_typemod does (in pseudo-code):
def select_common_typemod(coalesce_expr): typemod = expression_typemod(coalesce_expr.first_arg()) for arg in coalesce_expr.args(): typemod = max(typemod, expression_typemod(arg)) return typemod
And expression_typemod is defined as:
def expression_typemod(expr): if type_of_expression(expr) == "coalesce": return select_common_typemod(expr) # code to calculate the typemod of other expressions # for example, a varchar expression return -1
What a place to land because this is where the issue was, though admittedly I had to stare at it for a bit. Consider the case where we are evaluating the outer COALESE function in field2. When we pass a COALESCE function to select_common_typemod the first thing it does is call expression_typemod on coalesce_expr‘s first argument, which is itself another COALESCE function and results in another call to select_common_typemod. This is a totally legit way to traverse through the nested COALESCE functions, but the problem is that we call expression_typemod a second time on coalesce_expr‘s first argument in the loop in select_common_typemod. This means that for each “layer” of the nested COALESCE functions we duplicate the number of calls to select_common_typemod (via expression_typemod) which results in an exponential explosion of function calls.
If we now want to evaluate field25, we will need on the order of 225 = 33,554,432 calls to select_common_typemod! This is a lot of function calls, but the stack doesn’t get too deep at any given point. The OOM is a result of this large number of function calls and how Postgres handles memory and garbage collection. In short, memory allocations are associated with “contexts” in Postgres, and all allocations associated with a context are automatically cleaned up when a context is freed. The benefit here is that a programmer doesn’t need to worry about freeing each allocation as it will be cleaned up later. In our case there was a single context active for the duration of these roughly 33 million function calls, so even if one allocates just a small amount of memory in each call (which does happen, but I have excluded it from the pseudo-code) the total memory allocated quickly swells to whatever the system has available.
def select_common_typemod(coalesce_expr): if coalesce_expr.typemod is not None: return coalesce.typemod typemod = expression_typemod(coalesce_expr.first_arg()) for arg in coalesce_expr.args(): typemod = max(typemod, expression_typemod(arg)) coalesce_expr.typemod = typemod return typemod def expression_typemod(expr): if type_of_expression(expr) == "coalesce": if expr.typemod is not None: return expr.typemod return select_common_typemod(expr) # code to calculate the typemod of other expressions # for example, a varchar expression return -1
With the fix in place, I was ready for happy days. The number of function calls to select_common_typemod was now linear in the number of coalesce statements. I ran the query again and… the query failed to execute. Postgres didn’t run out of memory – that issue was fixed. I began digging into the logs of the rest of the stack and saw that Lime was generating OOM events now.
The Fix (Part 2)
The Lime logs told me that it was running out of memory during the templating phase of the code generation. That is, we hadn’t even gotten to the point where we had started to compile the code; we were generating so much code to compile that we were eating up all of the system’s available memory.
This was maybe the second or third time where I needed to poke around in Lime. I figured this was still related to the deep-nesting of COALESCE functions so I took a look at how Lime generates code to represent COALESCE functions. It turns out they were represented as a series of nested if-else statements. As an example, the following is the pseudo-code for how COALESCE(column1, column2) was represented:
IF column1.not_null() THEN return column1 ELSE IF column2.not_null() THEN return column2 ELSE return NULL
This seemed fine at first glance, but things got a bit hairier when we needed to represent nested COALESCE functions. To fully appreciate where it gets weird, let’s look at the same pseudo-code as before, but in its more generic template-like form:
IF (coalesce_argument_1).not_null() THEN RETURN (coalesge_argument_1) ELSE (include IF template for coalesge_argument_2)
That means the representation of COALESCE(COALESCE(column1, ‘default’), column2) was:
IF ( IF column1.not_null() THEN RETURN column1 ELSE RETURN 'default' ).not_null() THEN IF column1.not_null() THEN RETURN column1 ELSE RETURN 'default' ELSE IF column2.not_null THEN RETURN column2 ELSE RETURN NULL
Another case of duplication! By outputting the same text in the condition and body of the if statement, we see yet another exponential explosion, except this time in the amount of text produced instead of the number of function calls.
The solution was again straightforward once I could see the issue. I decided to do two things to remedy the situation: removed the code duplication and unwound the nested if-statements. The following is what the same COALESCE function looked like after my changes:
x = ( IF column1.not_null() THEN RETURN column1 ELSE RETURN 'default' ) IF x.not_null() THEN RETURN x IF column2.not_null() THEN RETURN column2 RETURN NULL
With this second fix in place, I ran the query again and… success! I got back the right results quickly and without generating any OOM events (or any noticeable increase in memory for that matter). Happy days had arrived!
In solving the “Case of the Exploding COALESCE” we dove into a real-world example of how things that are hard to see when looking at the source code result in big issues for the “right” input. We found two instances of logic that led to an exponential blow-up in memory consumption and solved them with memoization and deduplication, respectively. It turns out the problems we learned about in our algorithms classes really do pop up in all sorts of places in the wild.
There is one last thing I would also like to talk about. When reading this, one might have asked themselves, “Who would actually write a query like this?” Who indeed – as it is a bit unusual, at least to a human. In many instances like this no one did. All sorts of engineers, data scientists, and others use a variety of tools to generate SQL queries. In previous roles, I’ve worked on projects that use ORMs and I remember them outputting some pretty strange SQL at times. Given the strange nature of the query, I’m tempted to say it was generated by a machine. But at the same time, it used our uncommon syntactic sugar for referencing previously defined fields. So, I’m not exactly sure, but at the end of the day, it didn’t really matter as it’s something that should have run issue-free irrespective of how it came to be.
Thanks for reading, and happy coding!