Luke Evers

MySQL "Vertical" Coalesce

Posted on 28 December 2017

The coalesce function in MySQL is a great way to get the first non-NULL value, but what if you want to do something similar but with multiple rows for a single column (or multiple columns)?

Disclaimer: you probably can do this in other databases as well (or maybe there’s even a better way), but I only tested this on MySQL.

Example

Let’s assume we have a table (called form) where a user begins to fill out a form. They can leave, and we still save some data in the database, but if they come back with a different session, they’re going to create a new form instead of continuing where they left off. In this example, we have an id for the form pk, a user_id for the foreign key to the user’s pk, three questions (NULL or an integer), and some timestamps.

id user_id q_1 q_2 q_3 created_at updated_at
1 1 NULL NULL NULL 2017-08-01 15:30:00 2017-08-01 15:30:00
2 1 2 3 NULL 2017-08-02 15:30:00 2017-08-02 15:30:00
3 1 NULL NULL 1 2017-08-03 15:30:00 2017-08-03 15:30:00
4 1 NULL 2 1 2017-08-04 15:30:00 2017-08-04 15:30:00

Since a user can have multiple forms, we could just take most recently updated form, like this:

SELECT
    *
FROM
    form
WHERE
    user_id = 1
ORDER BY
    updated_at DESC
LIMIT 1

Which gives us:

id user_id q_1 q_2 q_3 created_at updated_at
4 1 NULL 2 1 2017-08-04 15:30:00 2017-08-04 15:30:00

Great! We got their most recent form. But wait, they never answered question 1! If only they answered question 1. Question 1 was so important! Now we’re wondering if they’ve ever answered the first question. Let’s try another query:

SELECT
    *
FROM
    form
WHERE
    user_id = 1
AND
    q_1 IS NOT NULL
ORDER BY
    updated_at DESC
LIMIT 1

Which gives us:

id user_id q_1 q_2 q_3 created_at updated_at
2 1 2 3 NULL 2017-08-02 15:30:00 2017-08-02 15:30:00

Looks like it. Even though the data is older, we still want to know what they answered for that question, even if the data is older.

Solution

I’ve not done any benchmarks on this, but I’ve solved this problem by doing a combination of SUBSTRING_INDEX and GROUP_CONCAT.

SELECT 
    user_id,
    SUBSTRING_INDEX(GROUP_CONCAT(q_1 ORDER BY updated_at DESC SEPARATOR '@@INDEX@@'), '@@INDEX@@', 1) AS q_1,
    SUBSTRING_INDEX(GROUP_CONCAT(q_2 ORDER BY updated_at DESC SEPARATOR '@@INDEX@@'), '@@INDEX@@', 1) AS q_2,
    SUBSTRING_INDEX(GROUP_CONCAT(q_3 ORDER BY updated_at DESC SEPARATOR '@@INDEX@@'), '@@INDEX@@', 1) AS q_3
FROM
    form
WHERE
    user_id = 1
ORDER BY
    updated_at DESC
LIMIT 1

And for our final result we get:

user_id q_1 q_2 q_3
1 2 2 1

You’ll notice that I didn’t include id in the final result. Since we’re doing this manipulation of data, there isn’t really an identifier for this query in the database.

Obviously, replace @@INDEX@@ with something unique to your data, as if it’s not something unique the data is going to get chopped off. It’s also very verbose. With GROUP_CONCAT I’m ordering by the most recently updated form every time, but depending on your data and what you’re trying to do, you could use that statement to identify what’s the most important part of your data you’re trying to find.

comments powered by Disqus