Skip to content

ClassCastException when grouping on virtualColumn with nvl expression #17821

Open
@pmmag

Description

@pmmag

Affected Version

31.0.0 (have not tried with 32.0.0 yet)

Description

The following error is produced when running a groupBy query where context.vectorize=true and the grouping dimension is a virtual column of the form nvl(col1, col2) where both columns are string typed, col1 exists in at least some segments and col2 does not exist in all segments:

{
  "error": "Unknown exception",
  "errorClass": "java.lang.RuntimeException",
  "host": "...",
  "errorCode": "legacyQueryException",
  "persona": "OPERATOR",
  "category": "RUNTIME_FAILURE",
  "errorMessage": "java.util.concurrent.ExecutionException: java.lang.ClassCastException",
  "context": {
    "host": "...",
    "errorClass": "java.lang.RuntimeException",
    "legacyErrorCode": "Unknown exception"
  }
}

Logs do not seem to offer much help:

2025-03-19T11:19:51,063 ERROR [processing-5] org.apache.druid.query.groupby.epinephelinae.GroupByMergingQueryRunner - Exception with one of the sequences!
java.lang.ClassCastException: null

org.apache.druid.query.QueryException: java.util.concurrent.ExecutionException: java.lang.ClassCastException
	at jdk.internal.reflect.GeneratedConstructorAccessor132.newInstance(Unknown Source) ~[?:?]
	at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:?]
	at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:500) ~[?:?]
	at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:481) ~[?:?]
	at com.fasterxml.jackson.databind.introspect.AnnotatedConstructor.call(AnnotatedConstructor.java:124) ~[jackson-databind-2.12.7.1.jar:2.12.7.1]
	at com.fasterxml.jackson.databind.deser.std.StdValueInstantiator.createFromObjectWith(StdValueInstantiator.java:291) ~[jackson-databind-2.12.7.1.jar:2.12.7.1]
	at com.fasterxml.jackson.databind.deser.ValueInstantiator.createFromObjectWith(ValueInstantiator.java:288) ~[jackson-databind-2.12.7.1.jar:2.12.7.1]
	at com.fasterxml.jackson.databind.deser.impl.PropertyBasedCreator.build(PropertyBasedCreator.java:202) ~[jackson-databind-2.12.7.1.jar:2.12.7.1]
	at com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeUsingPropertyBased(BeanDeserializer.java:455) ~[jackson-databind-2.12.7.1.jar:2.12.7.1]
	at com.fasterxml.jackson.databind.deser.std.ThrowableDeserializer.deserializeFromObject(ThrowableDeserializer.java:65) ~[jackson-databind-2.12.7.1.jar:2.12.7.1]
	at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:196) ~[jackson-databind-2.12.7.1.jar:2.12.7.1]
	at com.fasterxml.jackson.databind.deser.DefaultDeserializationContext.readRootValue(DefaultDeserializationContext.java:322) ~[jackson-databind-2.12.7.1.jar:2.12.7.1]
	at com.fasterxml.jackson.databind.ObjectMapper._readValue(ObjectMapper.java:4569) ~[jackson-databind-2.12.7.1.jar:2.12.7.1]
	at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2798) ~[jackson-databind-2.12.7.1.jar:2.12.7.1]
	at org.apache.druid.client.JsonParserIterator.init(JsonParserIterator.java:196) [druid-server-31.0.0.jar:31.0.0]
	at org.apache.druid.client.JsonParserIterator.hasNext(JsonParserIterator.java:102) [druid-server-31.0.0.jar:31.0.0]
	at org.apache.druid.java.util.common.guava.BaseSequence.toYielder(BaseSequence.java:70) [druid-processing-31.0.0.jar:31.0.0]
	at org.apache.druid.java.util.common.guava.MappedSequence.toYielder(MappedSequence.java:49) [druid-processing-31.0.0.jar:31.0.0]
	at org.apache.druid.java.util.common.guava.ParallelMergeCombiningSequence$ResultBatch.fromSequence(ParallelMergeCombiningSequence.java:932) [druid-processing-31.0.0.jar:31.0.0]
	at org.apache.druid.java.util.common.guava.ParallelMergeCombiningSequence$SequenceBatcher.block(ParallelMergeCombiningSequence.java:984) [druid-processing-31.0.0.jar:31.0.0]
	at java.base/java.util.concurrent.ForkJoinPool.compensatedBlock(ForkJoinPool.java:3451) [?:?]
	at java.base/java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3434) [?:?]
	at org.apache.druid.java.util.common.guava.ParallelMergeCombiningSequence$SequenceBatcher.getBatchYielder(ParallelMergeCombiningSequence.java:972) [druid-processing-31.0.0.jar:31.0.0]
	at org.apache.druid.java.util.common.guava.ParallelMergeCombiningSequence$YielderBatchedResultsCursor.block(ParallelMergeCombiningSequence.java:1107) [druid-processing-31.0.0.jar:31.0.0]
	at java.base/java.util.concurrent.ForkJoinPool.compensatedBlock(ForkJoinPool.java:3451) [?:?]
	at java.base/java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3434) [?:?]
	at org.apache.druid.java.util.common.guava.ParallelMergeCombiningSequence$BatchedResultsCursor.nextBatch(ParallelMergeCombiningSequence.java:1021) [druid-processing-31.0.0.jar:31.0.0]
	at org.apache.druid.java.util.common.guava.ParallelMergeCombiningSequence$YielderBatchedResultsCursor.initialize(ParallelMergeCombiningSequence.java:1081) [druid-processing-31.0.0.jar:31.0.0]
	at org.apache.druid.java.util.common.guava.ParallelMergeCombiningSequence$PrepareMergeCombineInputsAction.compute(ParallelMergeCombiningSequence.java:772) [druid-processing-31.0.0.jar:31.0.0]
	at java.base/java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:194) [?:?]
	at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373) [?:?]
	at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182) [?:?]
	at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655) [?:?]
	at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622) [?:?]
	at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165) [?:?]

For context, in our case, we have two datasources, error-state and help-state, which contain columns errorCode and helpCode, respectively, and the query looks like this (actually it is more complex, this is just a minimal reproducing example):

{
  "queryType": "groupBy",
  "dataSource": {
    "type": "union",
    "dataSources": [ "error-state", "help-state" ]
  },
  "dimensions": [ "code" ],
  "granularity": "all",
  "virtualColumns": [
    {
      "type": "expression",
      "name": "code",
      "expression": "nvl(errorCode, helpCode)",
      "outputType": "STRING"
    }
  ],
  "intervals": [  "2025-03/2025-04" ],
  "context": {"vectorize": true}
}

If we change the expression to nvl(errorCode, nvl(helpCode, '')) or even nvl(errorCode, nvl(helpCode, null)), the query works as expected (this is our workaround for now).

The same can be reproduced with any datasource (can be a single table as well), by grouping on an nvl expression where the second argument is a reference to a column that doesn't exist in every segment.

This is not a big issue for us because of the workaround, however, we would expect such queries to either A just work, or B return a useful error message telling the user if there is something in the query that is not supported.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions