Skip to content

server process (PID 229) was terminated by signal 7: Bus error #946

Closed
@dylan2intel

Description

@dylan2intel

postgres 14 deployed in kubernetes with 64Gi huge page on running TPC-C benchmarking with 800 warehouses(~80GB data size)

$ free -g
              total        used        free      shared  buff/cache   available
Mem:            503         134         256           0         113         366
Swap:             0           0           0

kubectl version

Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.4", GitCommit:"e6c093d87ea4cbb530a7b2ae91e54c0842d8308a", GitTreeState:"clean", BuildDate:"2022-02-16T12:38:05Z", GoVersion:"go1.17.7", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.5", GitCommit:"c285e781331a3785a7f436042c65c5641ce8a9e9", GitTreeState:"clean", BuildDate:"2022-03-16T15:52:18Z", GoVersion:"go1.17.8", Compiler:"gc", Platform:"linux/amd64"}

Detailed errors:

kubectl logs -n sfdev-test-static-hammerdb-tpcc-postgresql-disk-hugepage-on-128   postgresql-7df8c8b6c4-q9jz2
....
.....
2022-04-05 11:55:28.034 GMT [1] LOG:  database system is ready to accept connections
**2022-04-05 12:06:05.719 GMT [1] LOG:  server process (PID 229) was terminated by signal 7: Bus error
2022-04-05 12:06:05.719 GMT [1] DETAIL:  Failed process was running: insert into order_line (ol_o_id, ol_d_id, ol_w_id, ol_number, ol_i_id, ol_supply_w_id, ol_quantity, ol_amount, ol_dist_info, ol_delivery_d) values ('2601','6','798','1','78438','798','5','0.0','WYsLGpch9MMPlrLtOvBAcfPY',null),('2601','6','798','2','78366','798','5','0.0','dkbbzWuT2fYGEA46ysZT4Soq',null),('2601','6','798','3','83817','798','5','0.0','akE67RK2vhlovKtmTT9h4OR6',null),('2601','6','798','4','36375','798','5','0.0','ayTlJV0KL3FbYB4w57kes8VY',null),('2601','6','798','5','40293','798','5','0.0','ZXl83xwLLJnEjeRTwHpayPfy',null),('2601','6','798','6','5079','798','5','0.0','S3zeBuSPobwm9mjZj6tnuShV',null),('2601','6','798','7','73194','798','5','0.0','teE2ELm3J5a81TK2McP10ojr',null),('2601','6','798','8','74758','798','5','0.0','yaykpdVaqKrKZdtSujY2WpNx',null),('2602','6','798','1','25330','798','5','0.0','atSwfYVJbCjI5gfj6S2tjxPE',null),('2602','6','798','2','74285','798','5','0.0','KqMS78XCkxexHrRxSXsQfMT6',null),('2602','6','798','3','10538','798','5','0.0','Foz8BVQA1qsu8lH80MZWXiC6',null),('2602',
2022-04-05 12:06:05.719 GMT [1] LOG:  terminating any other active server processes**
2022-04-05 12:06:05.775 GMT [1] LOG:  all server processes terminated; reinitializing
2022-04-05 12:06:06.024 GMT [249] LOG:  database system was interrupted; last known up at 2022-04-05 12:05:35 GMT
2022-04-05 12:06:08.718 GMT [249] LOG:  database system was not properly shut down; automatic recovery in progress
2022-04-05 12:06:08.718 GMT [249] LOG:  redo starts at 9/F358258
2022-04-05 12:07:19.931 GMT [258] FATAL:  the database system is in recovery mode
2022-04-05 12:07:39.762 GMT [261] FATAL:  the database system is in recovery mode
2022-04-05 12:07:44.540 GMT [263] FATAL:  the database system is in recovery mode
2022-04-05 12:07:56.819 GMT [265] FATAL:  the database system is in recovery mode
2022-04-05 12:08:47.695 GMT [249] LOG:  redo done at 14/74451FB0 system usage: CPU: user: 133.00 s, system: 25.88 s, elapsed: 158.97 s
2022-04-05 12:08:55.073 GMT [1] LOG:  issuing SIGKILL to recalcitrant children
2022-04-05 12:08:55.085 GMT [1] LOG:  all server processes terminated; reinitializing
2022-04-05 12:08:55.314 GMT [273] LOG:  database system shutdown was interrupted; last known up at 2022-04-05 12:08:47 GMT
2022-04-05 12:08:56.072 GMT [273] LOG:  database system was not properly shut down; automatic recovery in progress
2022-04-05 12:08:56.073 GMT [273] LOG:  redo starts at 9/F358258
2022-04-05 12:08:58.177 GMT [275] FATAL:  the database system is in recovery mode
2022-04-05 12:09:02.147 GMT [277] FATAL:  the database system is in recovery mode
2022-04-05 12:09:04.930 GMT [279] FATAL:  the database system is in recovery mode
2022-04-05 12:11:29.808 GMT [273] LOG:  successfully skipped missing contrecord at 14/74451FF8, overwritten at 2022-04-05 12:08:47.699097+00
2022-04-05 12:11:29.808 GMT [273] CONTEXT:  WAL redo at 14/74452018 for XLOG/OVERWRITE_CONTRECORD: lsn 14/74451FF8; time 2022-04-05 12:08:47.699097+00
2022-04-05 12:11:29.808 GMT [273] LOG:  invalid record length at 14/74452048: wanted 24, got 0
2022-04-05 12:11:29.808 GMT [273] LOG:  redo done at 14/74452018 system usage: CPU: user: 131.11 s, system: 22.60 s, elapsed: 153.73 s
2022-04-05 12:12:29.781 GMT [1] LOG:  database system is ready to accept connections

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionUsability question, not directly related to an error with the image

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions