Description
Affected Version
31.0.0 (have not tried with 32.0.0 yet)
Description
The following error is produced when running a groupBy query where context.vectorize=true
and the grouping dimension is a virtual column of the form nvl(col1, col2)
where both columns are string typed, col1
exists in at least some segments and col2
does not exist in all segments:
{
"error": "Unknown exception",
"errorClass": "java.lang.RuntimeException",
"host": "...",
"errorCode": "legacyQueryException",
"persona": "OPERATOR",
"category": "RUNTIME_FAILURE",
"errorMessage": "java.util.concurrent.ExecutionException: java.lang.ClassCastException",
"context": {
"host": "...",
"errorClass": "java.lang.RuntimeException",
"legacyErrorCode": "Unknown exception"
}
}
Logs do not seem to offer much help:
2025-03-19T11:19:51,063 ERROR [processing-5] org.apache.druid.query.groupby.epinephelinae.GroupByMergingQueryRunner - Exception with one of the sequences!
java.lang.ClassCastException: null
org.apache.druid.query.QueryException: java.util.concurrent.ExecutionException: java.lang.ClassCastException
at jdk.internal.reflect.GeneratedConstructorAccessor132.newInstance(Unknown Source) ~[?:?]
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:?]
at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:500) ~[?:?]
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:481) ~[?:?]
at com.fasterxml.jackson.databind.introspect.AnnotatedConstructor.call(AnnotatedConstructor.java:124) ~[jackson-databind-2.12.7.1.jar:2.12.7.1]
at com.fasterxml.jackson.databind.deser.std.StdValueInstantiator.createFromObjectWith(StdValueInstantiator.java:291) ~[jackson-databind-2.12.7.1.jar:2.12.7.1]
at com.fasterxml.jackson.databind.deser.ValueInstantiator.createFromObjectWith(ValueInstantiator.java:288) ~[jackson-databind-2.12.7.1.jar:2.12.7.1]
at com.fasterxml.jackson.databind.deser.impl.PropertyBasedCreator.build(PropertyBasedCreator.java:202) ~[jackson-databind-2.12.7.1.jar:2.12.7.1]
at com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeUsingPropertyBased(BeanDeserializer.java:455) ~[jackson-databind-2.12.7.1.jar:2.12.7.1]
at com.fasterxml.jackson.databind.deser.std.ThrowableDeserializer.deserializeFromObject(ThrowableDeserializer.java:65) ~[jackson-databind-2.12.7.1.jar:2.12.7.1]
at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:196) ~[jackson-databind-2.12.7.1.jar:2.12.7.1]
at com.fasterxml.jackson.databind.deser.DefaultDeserializationContext.readRootValue(DefaultDeserializationContext.java:322) ~[jackson-databind-2.12.7.1.jar:2.12.7.1]
at com.fasterxml.jackson.databind.ObjectMapper._readValue(ObjectMapper.java:4569) ~[jackson-databind-2.12.7.1.jar:2.12.7.1]
at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2798) ~[jackson-databind-2.12.7.1.jar:2.12.7.1]
at org.apache.druid.client.JsonParserIterator.init(JsonParserIterator.java:196) [druid-server-31.0.0.jar:31.0.0]
at org.apache.druid.client.JsonParserIterator.hasNext(JsonParserIterator.java:102) [druid-server-31.0.0.jar:31.0.0]
at org.apache.druid.java.util.common.guava.BaseSequence.toYielder(BaseSequence.java:70) [druid-processing-31.0.0.jar:31.0.0]
at org.apache.druid.java.util.common.guava.MappedSequence.toYielder(MappedSequence.java:49) [druid-processing-31.0.0.jar:31.0.0]
at org.apache.druid.java.util.common.guava.ParallelMergeCombiningSequence$ResultBatch.fromSequence(ParallelMergeCombiningSequence.java:932) [druid-processing-31.0.0.jar:31.0.0]
at org.apache.druid.java.util.common.guava.ParallelMergeCombiningSequence$SequenceBatcher.block(ParallelMergeCombiningSequence.java:984) [druid-processing-31.0.0.jar:31.0.0]
at java.base/java.util.concurrent.ForkJoinPool.compensatedBlock(ForkJoinPool.java:3451) [?:?]
at java.base/java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3434) [?:?]
at org.apache.druid.java.util.common.guava.ParallelMergeCombiningSequence$SequenceBatcher.getBatchYielder(ParallelMergeCombiningSequence.java:972) [druid-processing-31.0.0.jar:31.0.0]
at org.apache.druid.java.util.common.guava.ParallelMergeCombiningSequence$YielderBatchedResultsCursor.block(ParallelMergeCombiningSequence.java:1107) [druid-processing-31.0.0.jar:31.0.0]
at java.base/java.util.concurrent.ForkJoinPool.compensatedBlock(ForkJoinPool.java:3451) [?:?]
at java.base/java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3434) [?:?]
at org.apache.druid.java.util.common.guava.ParallelMergeCombiningSequence$BatchedResultsCursor.nextBatch(ParallelMergeCombiningSequence.java:1021) [druid-processing-31.0.0.jar:31.0.0]
at org.apache.druid.java.util.common.guava.ParallelMergeCombiningSequence$YielderBatchedResultsCursor.initialize(ParallelMergeCombiningSequence.java:1081) [druid-processing-31.0.0.jar:31.0.0]
at org.apache.druid.java.util.common.guava.ParallelMergeCombiningSequence$PrepareMergeCombineInputsAction.compute(ParallelMergeCombiningSequence.java:772) [druid-processing-31.0.0.jar:31.0.0]
at java.base/java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:194) [?:?]
at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373) [?:?]
at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182) [?:?]
at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655) [?:?]
at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622) [?:?]
at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165) [?:?]
For context, in our case, we have two datasources, error-state
and help-state
, which contain columns errorCode
and helpCode
, respectively, and the query looks like this (actually it is more complex, this is just a minimal reproducing example):
{
"queryType": "groupBy",
"dataSource": {
"type": "union",
"dataSources": [ "error-state", "help-state" ]
},
"dimensions": [ "code" ],
"granularity": "all",
"virtualColumns": [
{
"type": "expression",
"name": "code",
"expression": "nvl(errorCode, helpCode)",
"outputType": "STRING"
}
],
"intervals": [ "2025-03/2025-04" ],
"context": {"vectorize": true}
}
If we change the expression to nvl(errorCode, nvl(helpCode, ''))
or even nvl(errorCode, nvl(helpCode, null))
, the query works as expected (this is our workaround for now).
The same can be reproduced with any datasource (can be a single table as well), by grouping on an nvl expression where the second argument is a reference to a column that doesn't exist in every segment.
This is not a big issue for us because of the workaround, however, we would expect such queries to either A just work, or B return a useful error message telling the user if there is something in the query that is not supported.