-
Notifications
You must be signed in to change notification settings - Fork 3.7k
ClassCastException when grouping on virtualColumn with nvl expression #17821
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I tried to repro this and wasn't able to. It is possible it's been fixed since Druid 31.0.0. There have been changes in the area of vectorized expressions since then. Here is what I did:
{
"queryType": "groupBy",
"dataSource": {
"type": "union",
"dataSources": [ "error-state", "help-state" ]
},
"dimensions": [ "code" ],
"granularity": "all",
"virtualColumns": [
{
"type": "expression",
"name": "code",
"expression": "nvl(errorCode, helpCode)",
"outputType": "STRING"
}
],
"intervals": [ "0000/3000" ],
"context": {"vectorize": true}
} I got good results:
|
I tried your example in a fresh docker compose with both 31.0.0 and 32.0.1 (with the sample compose file and environment file linked in the docs). For me when I run the groupBy query right after the data has been inserted, I actually do get
However, if I run the groupBy even once with I also tried to change the help and error codes to contain non-numeric characters (as they do in our case), as well as tweak the timestamps and partitioning (e.g., "by month") to see if it matters when the data is not in the same time range. But it made no difference, I still always only get the error until I make one query with So yes, it seems to be difficult to reproduce, and I guess there must be some other factor at play in our real world setup where it happens every time (?). The original datasources in question are using Kafka-based ingestion, in case that matters. |
Affected Version
31.0.0 (have not tried with 32.0.0 yet)
Description
The following error is produced when running a groupBy query where
context.vectorize=true
and the grouping dimension is a virtual column of the formnvl(col1, col2)
where both columns are string typed,col1
exists in at least some segments andcol2
does not exist in all segments:Logs do not seem to offer much help:
For context, in our case, we have two datasources,
error-state
andhelp-state
, which contain columnserrorCode
andhelpCode
, respectively, and the query looks like this (actually it is more complex, this is just a minimal reproducing example):If we change the expression to
nvl(errorCode, nvl(helpCode, ''))
or evennvl(errorCode, nvl(helpCode, null))
, the query works as expected (this is our workaround for now).The same can be reproduced with any datasource (can be a single table as well), by grouping on an nvl expression where the second argument is a reference to a column that doesn't exist in every segment.
This is not a big issue for us because of the workaround, however, we would expect such queries to either A just work, or B return a useful error message telling the user if there is something in the query that is not supported.
The text was updated successfully, but these errors were encountered: