Skip to content

druid fresh setup coordination & overlord db migration failing #17870

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
btalukder-ea opened this issue Apr 3, 2025 · 8 comments
Open

druid fresh setup coordination & overlord db migration failing #17870

btalukder-ea opened this issue Apr 3, 2025 · 8 comments

Comments

@btalukder-ea
Copy link

We were setting up druid version 32.0.1 .. using docker-compose
but the coordinator service continuously restarting, when we check the error it is db migration fail..

DB is fresh db with no previous data.

i think there is a race condition happening due to threads .. if we check the logs ..
1st it says creating table druid_tasks .. Table[druid_tasks] doesn't exist.

again one line is printed Adding column[type] to table[druid_tasks]. ..

but in the next moment it says task migration failed as column "type" does not exist
then again in log, one error is printed,
2025-04-03T13:28:38,248 INFO [main] org.apache.druid.metadata.SQLMetadataConnector - Creating table[druid_config]
2025-04-03T13:28:40,420 INFO [main] org.apache.druid.metadata.SQLMetadataConnector - Creating table[druid_audit]
2025-04-03T13:28:40,746 INFO [main] org.apache.druid.metadata.SQLMetadataConnector - Creating table[druid_tasks]
2025-04-03T13:28:41,176 INFO [main] org.apache.druid.metadata.SQLMetadataConnector - Creating Index on Table [druid_tasks], sql: [CREATE INDEX idx_druid_tasks_active_created_date ON druid_tasks(active,created_date)]
2025-04-03T13:28:41,677 INFO [main] org.apache.druid.metadata.SQLMetadataConnector - Adding column[type] to table[druid_tasks].
2025-04-03T13:28:41,870 INFO [main] org.apache.druid.metadata.SQLMetadataConnector - Adding column[group_id] to table[druid_tasks].
2025-04-03T13:28:42,070 INFO [main] org.apache.druid.metadata.SQLMetadataConnector - Table[druid_tasks] doesn't exist.
2025-04-03T13:28:42,264 INFO [main] org.apache.druid.metadata.SQLMetadataConnector - Creating table[druid_tasklocks]
2025-04-03T13:28:42,579 INFO [main] org.apache.druid.metadata.SQLMetadataConnector - Creating table[druid_dataSource]
2025-04-03T13:28:42,890 INFO [main] org.apache.druid.metadata.SQLMetadataConnector - Creating table[druid_pendingSegments]
2025-04-03T13:28:43,214 INFO [main] org.apache.druid.metadata.SQLMetadataConnector - Adding column[upgraded_from_segment_id] to table[druid_pendingSegments].
2025-04-03T13:28:43,406 INFO [main] org.apache.druid.metadata.SQLMetadataConnector - Adding column[task_allocator_id] to table[druid_pendingSegments].
2025-04-03T13:28:43,410 WARN [pool-20-thread-1] org.apache.druid.metadata.SQLMetadataStorageActionHandler - Task migration failed while reading entries from task table
org.skife.jdbi.v2.exceptions.CallbackFailedException: org.skife.jdbi.v2.exceptions.UnableToExecuteStatementException: org.postgresql.util.PSQLException: ERROR: column "type" does not exist
Position: 45 [statement:"SELECT * FROM druid_tasks WHERE id > '' AND type IS null ORDER BY id LIMIT 100", located:"SELECT * FROM druid_tasks WHERE id > '' AND type IS null ORDER BY id LIMIT 100", rewritten:"SELECT * FROM druid_tasks WHERE id > '' AND type IS null ORDER BY id LIMIT 100", arguments:{ positional:{}, named:{}, finder:[]}]

@btalukder-ea
Copy link
Author

Could some one please look into this ... If required i can share the docker compose file

@gianm
Copy link
Contributor

gianm commented Apr 3, 2025

Could you please share some steps to reproduce this? Ideally a set of commands to run from either a fresh git checkout or a fresh download of some release artifact.

@btalukder-ea
Copy link
Author

btalukder-ea commented Apr 4, 2025

This is the docker-compose have used initially..
running docker-compose up

in environment file
druid_extensions_loadList=["postgresql-metadata-storage"]
druid_metadata_storage_type=postgresql
druid_metadata_storage_connector_connectURI=jdbc:postgresql://<>connetor-uri?currentSchema=druid #this schema it is populating
druid_metadata_storage_connector_user=username
druid_metadata_storage_connector_password=password
druid_metadata_postgres_dbTableSchema=druidt #this does not work

druid_storage_type=google
druid_google_bucket=druid_data
druid_zk_service_host=zookeeper
druid_zk_service_port=2181

version: "3.3"
volumes:
  metadata_data: {}
services:
  zookeeper:
    image: zookeeper:3.8
    restart: always
    ports:
      - "2181:2181"

  druid-coordinator:
    image: apache/druid:32.0.1
    restart: always
    environment:
      - DRUID_NODE_TYPE=coordinator
    command:
      - coordinator
    depends_on:
      - zookeeper
    env_file:
      - environment        
    ports:
      - "8081:8081"
    volumes:
      - ./gcs-service-account.json:/opt/druid/conf/gcs-service-account.json

  druid-overlord:
    image: apache/druid:32.0.1
    restart: always
    env_file:
      - environment
    command:
      - overlord
    depends_on:
      - zookeeper
      - druid-coordinator  
    environment:
      - DRUID_NODE_TYPE=overlord
    ports:
      - "8090:8090"

  druid-broker:
    image: apache/druid:32.0.1
    restart: always
    command:
      - broker    
    depends_on:
      - zookeeper
    env_file:
      - environment        
    environment:
      - DRUID_NODE_TYPE=broker
    ports:
      - "8082:8082"

  druid-historical:
    image: apache/druid:32.0.1
    restart: always
    command:
      - historical    
    depends_on:
      - zookeeper   
    env_file:
      - environment        
    environment:
      - DRUID_NODE_TYPE=historical
    ports:
      - "8083:8083"
    volumes:
      - ./gcs-service-account.json:/opt/druid/conf/gcs-service-account.json

  druid-middlemanager:
    image: apache/druid:32.0.1
    restart: always
    command:
      - middleManager    
    depends_on:
      - zookeeper    
    env_file:
      - environment        
    environment:
      - DRUID_NODE_TYPE=middleManager
    ports:
      - "8091:8091"

  druid-router:
    image: apache/druid:32.0.1
    env_file:
     - environment
    command:
      - router    
    restart: always
    depends_on:
      - zookeeper     
    environment:
      - DRUID_NODE_TYPE=router
    ports:
      - "8888:8888"

@btalukder-ea
Copy link
Author

btalukder-ea commented Apr 21, 2025

could anyone help here.. the postgres we are using aws rds ... zone is also same.. both pg & druid (ec2 instance)

@btalukder-ea
Copy link
Author

Image
if we are using local pg .. then we do not see this issue...

but when using rds ... then issue consistently occurring ..

@btalukder-ea
Copy link
Author

Table is also created.. but those columns are missing, coming in log..

Image

@btalukder-ea
Copy link
Author

used BOOLEAN NOT NULL,
payload BYTEA NOT NULL,
used_status_last_updated VARCHAR(255) NOT NULL,
PRIMARY KEY (id)
) was aborted: ERROR: relation "druid_segments" already exists  Call getNextException to see other errors in the batch.
        at org.postgresql.jdbc.BatchResultHandler.handleCompletion(BatchResultHandler.java:186) ~[?:?]
        at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:590) ~[?:?]
        at org.postgresql.jdbc.PgStatement.internalExecuteBatch(PgStatement.java:912) ~[?:?]
        at org.postgresql.jdbc.PgStatement.executeBatch(PgStatement.java:936) ~[?:?]
        at org.apache.commons.dbcp2.DelegatingStatement.executeBatch(DelegatingStatement.java:345) ~[commons-dbcp2-2.0.1.jar:2.0.1]
        at org.apache.commons.dbcp2.DelegatingStatement.executeBatch(DelegatingStatement.java:345) ~[commons-dbcp2-2.0.1.jar:2.0.1]
        at org.skife.jdbi.v2.Batch.execute(Batch.java:121) ~[jdbi-2.63.1.jar:2.63.1]
        at org.apache.druid.metadata.SQLMetadataConnector.lambda$createTable$2(SQLMetadataConnector.java:227) ~[druid-server-32.0.1.jar:32.0.1]
        at org.skife.jdbi.v2.DBI.withHandle(DBI.java:281) ~[jdbi-2.63.1.jar:2.63.1]
        ... 22 more
Caused by: org.postgresql.util.PSQLException: ERROR: relation "druid_segments" already exists
        at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2725) ~[?:?]
        at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2412) ~[?:?]
        at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:579) ~[?:?]
        at org.postgresql.jdbc.PgStatement.internalExecuteBatch(PgStatement.java:912) ~[?:?]
        at org.postgresql.jdbc.PgStatement.executeBatch(PgStatement.java:936) ~[?:?]
        at org.apache.commons.dbcp2.DelegatingStatement.executeBatch(DelegatingStatement.java:345) ~[commons-dbcp2-2.0.1.jar:2.0.1]
        at org.apache.commons.dbcp2.DelegatingStatement.executeBatch(DelegatingStatement.java:345) ~[commons-dbcp2-2.0.1.jar:2.0.1]
        at org.skife.jdbi.v2.Batch.execute(Batch.java:121) ~[jdbi-2.63.1.jar:2.63.1]
        at org.apache.druid.metadata.SQLMetadataConnector.lambda$createTable$2(SQLMetadataConnector.java:227) ~[druid-server-32.0.1.jar:32.0.1]
        at org.skife.jdbi.v2.DBI.withHandle(DBI.java:281) ~[jdbi-2.63.1.jar:2.63.1]
        ... 22 more
2025-04-21T08:44:01,337 INFO [main] org.apache.druid.metadata.SQLMetadataConnector - Adding columns [upgraded_from_segment_id, used_status_last_updated] to table[druid_segments].
2025-04-21T08:44:01,338 INFO [main] org.apache.druid.metadata.SQLMetadataConnector - Table[druid_segments] doesn't exist.
2025-04-21T08:44:01,343 INFO [main] org.apache.druid.metadata.SQLMetadataConnector - Creating Index on Table [druid_segments], sql: [CREATE INDEX idx_druid_segments_datasource_upgraded_from_segment_id ON druid_segments(dataSource,upgraded_from_segment_id)]
2025-04-21T08:44:01,344 ERROR [main] org.apache.druid.metadata.SQLMetadataConnector - Exception while creating index on table [druid_segments]
org.skife.jdbi.v2.exceptions.CallbackFailedException: org.skife.jdbi.v2.exceptions.UnableToExecuteStatementException: org.postgresql.util.PSQLException: ERROR: column "upgraded_from_segment_id" does not exist [statement:"CREATE INDEX idx_druid_segments_datasource_upgraded_from_segment_id ON druid_segments(dataSource,upgraded_from_segment_id)", located:"CREATE INDEX idx_druid_segments_datasource_upgraded_from_segment_id ON druid_segments(dataSource,upgraded_from_segment_id)", rewritten:"CREATE INDEX idx_druid_segments_datasource_upgraded_from_segment_id ON druid_segments(dataSource,upgraded_from_segment_id)", arguments:{ positional:{}, named:{}, finder:[]}]
        at org.skife.jdbi.v2.DBI.withHandle(DBI.java:284) ~[jdbi-2.63.1.jar:2.63.1]
        at org.apache.druid.metadata.SQLMetadataConnector.lambda$retryWithHandle$0(SQLMetadataConnector.java:157) ~[druid-server-32.0.1.jar:32.0.1]
        at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:129) ~[druid-processing-32.0.1.jar:32.0.1]
        at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:81) ~[druid-processing-32.0.1.jar:32.0.1]
        at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:163) ~[druid-processing-32.0.1.jar:32.0.1]
        at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:153) ~[druid-processing-32.0.1.jar:32.0.1]
        at org.apache.druid.metadata.SQLMetadataConnector.retryWithHandle(SQLMetadataConnector.java:157) ~[druid-server-32.0.1.jar:32.0.1]
        at org.apache.druid.metadata.SQLMetadataConnector.retryWithHandle(SQLMetadataConnector.java:167) ~[druid-server-32.0.1.jar:32.0.1]
        at org.apache.druid.metadata.SQLMetadataConnector.createIndex(SQLMetadataConnector.java:1082) [druid-server-32.0.1.jar:32.0.1]
        at org.apache.druid.metadata.SQLMetadataConnector.alterSegmentTable(SQLMetadataConnector.java:607) [druid-server-32.0.1.jar:32.0.1]
        at org.apache.druid.metadata.SQLMetadataConnector.createSegmentTable(SQLMetadataConnector.java:760) [druid-server-32.0.1.jar:32.0.1]
        at org.apache.druid.metadata.IndexerSQLMetadataStorageCoordinator.start(IndexerSQLMetadataStorageCoordinator.java:149) [druid-server-32.0.1.jar:32.0.1]
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) ~[?:?]
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
        at java.base/java.lang.reflect.Method.invoke(Method.java:569) ~[?:?]
        at org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler.start(Lifecycle.java:446) [druid-processing-32.0.1.jar:32.0.1]
        at org.apache.druid.java.util.common.lifecycle.Lifecycle.start(Lifecycle.java:341) [druid-processing-32.0.1.jar:32.0.1]
        at org.apache.druid.guice.LifecycleModule$2.start(LifecycleModule.java:152) [druid-processing-32.0.1.jar:32.0.1]
        at org.apache.druid.cli.GuiceRunnable.initLifecycle(GuiceRunnable.java:136) [druid-services-32.0.1.jar:32.0.1]
        at org.apache.druid.cli.GuiceRunnable.initLifecycle(GuiceRunnable.java:94) [druid-services-32.0.1.jar:32.0.1]
        at org.apache.druid.cli.ServerRunnable.run(ServerRunnable.java:70) [druid-services-32.0.1.jar:32.0.1]
        at org.apache.druid.cli.Main.main(Main.java:112) [druid-services-32.0.1.jar:32.0.1]
Caused by: org.skife.jdbi.v2.exceptions.UnableToExecuteStatementException: org.postgresql.util.PSQLException: ERROR: column "upgraded_from_segment_id" does not exist [statement:"CREATE INDEX idx_druid_segments_datasource_upgraded_from_segment_id ON druid_segments(dataSource,upgraded_from_segment_id)", located:"CREATE INDEX idx_druid_segments_datasource_upgraded_from_segment_id ON druid_segments(dataSource,upgraded_from_segment_id)", rewritten:"CREATE INDEX idx_druid_segments_datasource_upgraded_from_segment_id ON druid_segments(dataSource,upgraded_from_segment_id)", arguments:{ positional:{}, named:{}, finder:[]}]
        at org.skife.jdbi.v2.SQLStatement.internalExecute(SQLStatement.java:1334) ~[jdbi-2.63.1.jar:2.63.1]
        at org.skife.jdbi.v2.Update.execute(Update.java:56) ~[jdbi-2.63.1.jar:2.63.1]
        at org.skife.jdbi.v2.BasicHandle.update(BasicHandle.java:294) ~[jdbi-2.63.1.jar:2.63.1]
        at org.skife.jdbi.v2.BasicHandle.execute(BasicHandle.java:412) ~[jdbi-2.63.1.jar:2.63.1]

@btalukder-ea
Copy link
Author

we are not using public schema..

druid_metadata_storage_type=postgresql
druid_metadata_storage_connector_connectURI=jdbc:postgresql://HOST:5432/druidtest?currentSchema=druid
druid_metadata_storage_connector_user=dev_baitanik
druid_metadata_storage_connector_password=pass
druid_metadata_postgres_dbTableSchema=druid

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants