Skip to content

Null query result when using SQL CASE expression #17768

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
MohTaheri opened this issue Feb 28, 2025 · 2 comments
Open

Null query result when using SQL CASE expression #17768

MohTaheri opened this issue Feb 28, 2025 · 2 comments

Comments

@MohTaheri
Copy link

MohTaheri commented Feb 28, 2025

Affected Version

Druid 32.0.0

Description

After upgrading to version 32, where the druid.generic.useDefaultValueForNull=true feature is deprecated, we’re getting incorrect results when executing SQL queries using CASE.
The query is supposed to return a result based on a given condition, but when a type value doesn’t match any condition, it returns NULL instead of the expected ELSE value or COALESCE result.

SELECT
        t.id,
        SUM(CASE WHEN type = 'X'
            THEN NVL(amount, 0)
            ELSE 0
        END) AS total_amount
    FROM
        example t
    GROUP BY t.id

We followed all the steps in the migration guide, and no null values were inserted during ingestion. All raw data contains non-null values.

Image

@MohTaheri
Copy link
Author

@clintropolis Sorry, do you have any idea how the new release will affect that?

@gianm
Copy link
Contributor

gianm commented Apr 3, 2025

Probably this happens when none of the rows for a t.id match type = 'X'. When a CASE statement like this appears inside a SUM, it hits a bug in our SQL dialect currently, specifically with the rule DruidAggregateCaseToFilterRule. It really should be 0 according to standard SQL, but we're returning NULL. Essentially we are doing a rewrite of this SUM on CASE (which would be slow if run directly) into a filtered aggregator (fast, uses indexes). But that changes the null-handing behavior, which is a bug.

You can disable the rewrite by setting extendedFilteredSumRewrite: false, but that will slow down performance of the aggregator, since it'll no longer be able to use indexes.

You can also rewrite your query like this, which will be both fast and correct:

SELECT
    t.page,
    NVL(SUM(added) FILTER(WHERE page = 'X'), 0) AS total_amount
FROM
    wikipedia t
GROUP BY t.page

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants