Skip to content

ClassCastException when grouping on virtualColumn with nvl expression #17821

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
pmmag opened this issue Mar 19, 2025 · 2 comments
Open

ClassCastException when grouping on virtualColumn with nvl expression #17821

pmmag opened this issue Mar 19, 2025 · 2 comments
Labels

Comments

@pmmag
Copy link

pmmag commented Mar 19, 2025

Affected Version

31.0.0 (have not tried with 32.0.0 yet)

Description

The following error is produced when running a groupBy query where context.vectorize=true and the grouping dimension is a virtual column of the form nvl(col1, col2) where both columns are string typed, col1 exists in at least some segments and col2 does not exist in all segments:

{
  "error": "Unknown exception",
  "errorClass": "java.lang.RuntimeException",
  "host": "...",
  "errorCode": "legacyQueryException",
  "persona": "OPERATOR",
  "category": "RUNTIME_FAILURE",
  "errorMessage": "java.util.concurrent.ExecutionException: java.lang.ClassCastException",
  "context": {
    "host": "...",
    "errorClass": "java.lang.RuntimeException",
    "legacyErrorCode": "Unknown exception"
  }
}

Logs do not seem to offer much help:

2025-03-19T11:19:51,063 ERROR [processing-5] org.apache.druid.query.groupby.epinephelinae.GroupByMergingQueryRunner - Exception with one of the sequences!
java.lang.ClassCastException: null

org.apache.druid.query.QueryException: java.util.concurrent.ExecutionException: java.lang.ClassCastException
	at jdk.internal.reflect.GeneratedConstructorAccessor132.newInstance(Unknown Source) ~[?:?]
	at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:?]
	at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:500) ~[?:?]
	at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:481) ~[?:?]
	at com.fasterxml.jackson.databind.introspect.AnnotatedConstructor.call(AnnotatedConstructor.java:124) ~[jackson-databind-2.12.7.1.jar:2.12.7.1]
	at com.fasterxml.jackson.databind.deser.std.StdValueInstantiator.createFromObjectWith(StdValueInstantiator.java:291) ~[jackson-databind-2.12.7.1.jar:2.12.7.1]
	at com.fasterxml.jackson.databind.deser.ValueInstantiator.createFromObjectWith(ValueInstantiator.java:288) ~[jackson-databind-2.12.7.1.jar:2.12.7.1]
	at com.fasterxml.jackson.databind.deser.impl.PropertyBasedCreator.build(PropertyBasedCreator.java:202) ~[jackson-databind-2.12.7.1.jar:2.12.7.1]
	at com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeUsingPropertyBased(BeanDeserializer.java:455) ~[jackson-databind-2.12.7.1.jar:2.12.7.1]
	at com.fasterxml.jackson.databind.deser.std.ThrowableDeserializer.deserializeFromObject(ThrowableDeserializer.java:65) ~[jackson-databind-2.12.7.1.jar:2.12.7.1]
	at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:196) ~[jackson-databind-2.12.7.1.jar:2.12.7.1]
	at com.fasterxml.jackson.databind.deser.DefaultDeserializationContext.readRootValue(DefaultDeserializationContext.java:322) ~[jackson-databind-2.12.7.1.jar:2.12.7.1]
	at com.fasterxml.jackson.databind.ObjectMapper._readValue(ObjectMapper.java:4569) ~[jackson-databind-2.12.7.1.jar:2.12.7.1]
	at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2798) ~[jackson-databind-2.12.7.1.jar:2.12.7.1]
	at org.apache.druid.client.JsonParserIterator.init(JsonParserIterator.java:196) [druid-server-31.0.0.jar:31.0.0]
	at org.apache.druid.client.JsonParserIterator.hasNext(JsonParserIterator.java:102) [druid-server-31.0.0.jar:31.0.0]
	at org.apache.druid.java.util.common.guava.BaseSequence.toYielder(BaseSequence.java:70) [druid-processing-31.0.0.jar:31.0.0]
	at org.apache.druid.java.util.common.guava.MappedSequence.toYielder(MappedSequence.java:49) [druid-processing-31.0.0.jar:31.0.0]
	at org.apache.druid.java.util.common.guava.ParallelMergeCombiningSequence$ResultBatch.fromSequence(ParallelMergeCombiningSequence.java:932) [druid-processing-31.0.0.jar:31.0.0]
	at org.apache.druid.java.util.common.guava.ParallelMergeCombiningSequence$SequenceBatcher.block(ParallelMergeCombiningSequence.java:984) [druid-processing-31.0.0.jar:31.0.0]
	at java.base/java.util.concurrent.ForkJoinPool.compensatedBlock(ForkJoinPool.java:3451) [?:?]
	at java.base/java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3434) [?:?]
	at org.apache.druid.java.util.common.guava.ParallelMergeCombiningSequence$SequenceBatcher.getBatchYielder(ParallelMergeCombiningSequence.java:972) [druid-processing-31.0.0.jar:31.0.0]
	at org.apache.druid.java.util.common.guava.ParallelMergeCombiningSequence$YielderBatchedResultsCursor.block(ParallelMergeCombiningSequence.java:1107) [druid-processing-31.0.0.jar:31.0.0]
	at java.base/java.util.concurrent.ForkJoinPool.compensatedBlock(ForkJoinPool.java:3451) [?:?]
	at java.base/java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3434) [?:?]
	at org.apache.druid.java.util.common.guava.ParallelMergeCombiningSequence$BatchedResultsCursor.nextBatch(ParallelMergeCombiningSequence.java:1021) [druid-processing-31.0.0.jar:31.0.0]
	at org.apache.druid.java.util.common.guava.ParallelMergeCombiningSequence$YielderBatchedResultsCursor.initialize(ParallelMergeCombiningSequence.java:1081) [druid-processing-31.0.0.jar:31.0.0]
	at org.apache.druid.java.util.common.guava.ParallelMergeCombiningSequence$PrepareMergeCombineInputsAction.compute(ParallelMergeCombiningSequence.java:772) [druid-processing-31.0.0.jar:31.0.0]
	at java.base/java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:194) [?:?]
	at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373) [?:?]
	at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182) [?:?]
	at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655) [?:?]
	at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622) [?:?]
	at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165) [?:?]

For context, in our case, we have two datasources, error-state and help-state, which contain columns errorCode and helpCode, respectively, and the query looks like this (actually it is more complex, this is just a minimal reproducing example):

{
  "queryType": "groupBy",
  "dataSource": {
    "type": "union",
    "dataSources": [ "error-state", "help-state" ]
  },
  "dimensions": [ "code" ],
  "granularity": "all",
  "virtualColumns": [
    {
      "type": "expression",
      "name": "code",
      "expression": "nvl(errorCode, helpCode)",
      "outputType": "STRING"
    }
  ],
  "intervals": [  "2025-03/2025-04" ],
  "context": {"vectorize": true}
}

If we change the expression to nvl(errorCode, nvl(helpCode, '')) or even nvl(errorCode, nvl(helpCode, null)), the query works as expected (this is our workaround for now).

The same can be reproduced with any datasource (can be a single table as well), by grouping on an nvl expression where the second argument is a reference to a column that doesn't exist in every segment.

This is not a big issue for us because of the workaround, however, we would expect such queries to either A just work, or B return a useful error message telling the user if there is something in the query that is not supported.

@gianm
Copy link
Contributor

gianm commented Apr 3, 2025

I tried to repro this and wasn't able to. It is possible it's been fixed since Druid 31.0.0. There have been changes in the area of vectorized expressions since then.

Here is what I did:

  1. Populated error-state with errorCode
REPLACE INTO "error-state"
OVERWRITE ALL
SELECT * FROM (VALUES ('1'), ('2')) t (errorCode)
PARTITIONED BY ALL
  1. Populated help-state with helpCode
REPLACE INTO "help-state"
OVERWRITE ALL
SELECT * FROM (VALUES ('3'), ('4')) t (helpCode)
PARTITIONED BY ALL
  1. Ran this query (same as yours, but with a broader time interval, since the datasets I inserted would use the default timestamp 1970-01-01)
{
  "queryType": "groupBy",
  "dataSource": {
    "type": "union",
    "dataSources": [ "error-state", "help-state" ]
  },
  "dimensions": [ "code" ],
  "granularity": "all",
  "virtualColumns": [
    {
      "type": "expression",
      "name": "code",
      "expression": "nvl(errorCode, helpCode)",
      "outputType": "STRING"
    }
  ],
  "intervals": [  "0000/3000" ],
  "context": {"vectorize": true}
}

I got good results:

{"code":"1"}
{"code":"2"}
{"code":"3"}
{"code":"4"}

@pmmag
Copy link
Author

pmmag commented Apr 7, 2025

I tried your example in a fresh docker compose with both 31.0.0 and 32.0.1 (with the sample compose file and environment file linked in the docs).

For me when I run the groupBy query right after the data has been inserted, I actually do get

{
  "error": "Unknown exception",
  "errorClass": "java.lang.RuntimeException",
  "host": "172.21.0.5:8083",
  "errorCode": "legacyQueryException",
  "persona": "OPERATOR",
  "category": "RUNTIME_FAILURE",
  "errorMessage": "java.util.concurrent.ExecutionException: java.lang.ClassCastException: class [J cannot be cast to class [Ljava.lang.Object; ([J and [Ljava.lang.Object; are in module java.base of loader 'bootstrap')",
  "context": {
    "host": "172.21.0.5:8083",
    "errorClass": "java.lang.RuntimeException",
    "legacyErrorCode": "Unknown exception"
  }
}

However, if I run the groupBy even once with vectorize: false, then afterwards all subsequent queries with vectorize: true will work (which is not the case with our actual Druid 31.0.0 installation, where we get the error every time)... Perhaps this is some kind of user error or I just need to wait longer after inserting the data?

I also tried to change the help and error codes to contain non-numeric characters (as they do in our case), as well as tweak the timestamps and partitioning (e.g., "by month") to see if it matters when the data is not in the same time range. But it made no difference, I still always only get the error until I make one query with vectorize: false. Yet, in our real world datasources, the errors persist.

So yes, it seems to be difficult to reproduce, and I guess there must be some other factor at play in our real world setup where it happens every time (?). The original datasources in question are using Kafka-based ingestion, in case that matters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants