Skip to content

gt lastNum error reported when synchronizing blocks #6310

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jakamobiii opened this issue May 9, 2025 · 27 comments
Open

gt lastNum error reported when synchronizing blocks #6310

jakamobiii opened this issue May 9, 2025 · 27 comments
Assignees

Comments

@jakamobiii
Copy link

Software Versions

OS : Linux
JVM : Oracle Corporation 1.8.0_161 amd64
Version : 4.8.0

Expected behavior

The node is running normally and no exception should be thrown.

Actual behavior

I deployed a node online and found an exception while checking the logs. I want to know what the specific error is, whether it has any impact on the system, and whether it will affect block synchronization or data consistency?

Image

Frequency

Only appeared once.

@xxo1shine
Copy link
Contributor

@jakamobiii Can you post the interaction information with node 146.190.13.115? Use the command grep "146.190.13.115:36448" tron.log -A1 to obtain the log.

@317787106
Copy link
Contributor

@jakamobiii It's similiar to that issue #6272.

@jakamobiii
Copy link
Author

@xxo1shine The following is the final interactive information.

17:11:03.620 INFO  [peerWorker-17] [net](P2pEventHandlerImpl.java:169) Receive message from  peer: /146.190.13.115:36448, type: INVENTORY
invType: BLOCK, size: 1, First hash: 000000000442326920522f340fc09ba7d047157ecc0dbbbd46316e0761372510
--
17:11:06.470 INFO  [peerWorker-8] [net](PeerConnection.java:184) Send peer /146.190.13.115:36448 message type: INVENTORY
invType: BLOCK, size: 1, First hash: 000000000442326ae9efc83412a1140d2b37fa811adf7d4d0205d150e2addafb
--
17:11:09.428 INFO  [peerWorker-27] [net](PeerConnection.java:184) Send peer /146.190.13.115:36448 message type: INVENTORY
invType: BLOCK, size: 1, First hash: 000000000442326b4507fbdacd4f7b7efc76e7849cf6663c3f36014d2b15169e
--
Peer /146.190.13.115:36448
connect time: 211s [57ms]
--
17:11:12.528 INFO  [peerWorker-17] [net](PeerConnection.java:184) Send peer /146.190.13.115:36448 message type: INVENTORY
invType: BLOCK, size: 1, First hash: 000000000442326c4371a99df87807da7826d22dd9a9b6fb2d55b8d772f6124c
--
17:11:15.327 INFO  [peerWorker-17] [net](P2pEventHandlerImpl.java:169) Receive message from  peer: /146.190.13.115:36448, type: FETCH_INV_DATA
invType: BLOCK, size: 1, First hash: 000000000442326b4507fbdacd4f7b7efc76e7849cf6663c3f36014d2b15169e
17:11:15.327 INFO  [peerWorker-17] [net](PeerConnection.java:184) Send peer /146.190.13.115:36448 message type: BLOCK
Num:71447147,ID:000000000442326b4507fbdacd4f7b7efc76e7849cf6663c3f36014d2b15169e, trx size: 293
--
17:11:15.403 INFO  [peerWorker-27] [net](PeerConnection.java:184) Send peer /146.190.13.115:36448 message type: INVENTORY
invType: BLOCK, size: 1, First hash: 000000000442326ddc5e37e27d941ee898720831ca748d12be5820273a6dfd5f
--
17:11:15.630 INFO  [peerWorker-17] [net](P2pEventHandlerImpl.java:169) Receive message from  peer: /146.190.13.115:36448, type: SYNC_BLOCK_CHAIN
size: 5, start block: Num:71447126,ID:0000000004423256c427270750a95df7004342bd2a7bc0181dfb421bcb23451b, end block Num:71447145,ID:000000000442326920522f340fc09ba7d047157ecc0dbbbd46316e0761372510
17:11:15.630 INFO  [peerWorker-17] [net](PeerConnection.java:184) Send peer /146.190.13.115:36448 message type: BLOCK_CHAIN_INVENTORY
size: 5, first blockId: Num:71447145,ID:000000000442326920522f340fc09ba7d047157ecc0dbbbd46316e0761372510, end blockId: Num:71447149,ID:000000000442326ddc5e37e27d941ee898720831ca748d12be5820273a6dfd5f, remain_num: 0
--
17:11:19.051 INFO  [peerWorker-17] [net](P2pEventHandlerImpl.java:169) Receive message from  peer: /146.190.13.115:36448, type: SYNC_BLOCK_CHAIN
size: 5, start block: Num:71447129,ID:00000000044232596231772542030f3f93d04d36702621eed100206b2f0b2e34, end block Num:71447146,ID:000000000442326ae9efc83412a1140d2b37fa811adf7d4d0205d150e2addafb
17:11:19.053 ERROR [peerWorker-17] [net](P2pEventHandlerImpl.java:278) Message from /146.190.13.115:36448 process failed, type: SYNC_BLOCK_CHAIN
size: 5, start block: Num:71447129,ID:00000000044232596231772542030f3f93d04d36702621eed100206b2f0b2e34, end block Num:71447146,ID:000000000442326ae9efc83412a1140d2b37fa811adf7d4d0205d150e2addafb 

@xxo1shine
Copy link
Contributor

@jakamobiii Thank you for providing the log. From the log information, it should be caused by the following logic.

  • The highest block of the last BLOCK_CHAIN_INVENTORY message replied by the node is 71447149, and the lastSyncBlockId will be set to 71447149. The code is as follows:
peer.setLastSyncBlockId(blockIds.peekLast());
  • When receiving the SYNC_BLOCK_CHAIN message, since the highest block carried is 71447146, which is less than the highest block of the BLOCK_CHAIN_INVENTORY message 71447149, an error is reported. The code is as follows.
BlockId lastSyncBlockId = peer.getLastSyncBlockId();
long lastNum = blockIds.get(blockIds.size() - 1).getNum();
if (lastSyncBlockId != null && lastSyncBlockId.getNum() > lastNum) {
throw new P2pException(TypeEnum.BAD_MESSAGE,
"lastSyncNum:" + lastSyncBlockId.getNum() + " gt lastNum:" + lastNum);
}

Further analysis is needed to determine what scenarios will trigger this situation.

@xxo1shine
Copy link
Contributor

@jakamobiii There is a concurrency problem between the Peer thread and the Bock processing thread. The following scenario can trigger this scenario.

  • Peer thread:
    After processing the block chain inventory message, set blockBothHave to 71447146. Execute the getBlockChainSummary function, and because blockBothHave is 71447146, set beginBlockId = blockBothHave = 71447146, and then the peer thread stops scheduling.

  • Bock processing thread:
    processing blocks 71447147, 71447148, 71447149, so the syncBlockToFetch queue is currently empty.

  • Peer thread:

    1. Set List<BlockId> blockIds = new ArrayList<>(peer.getSyncBlockToFetch()), blockIds is empty at this time.
    2. Set syncBeginNumber = tronNetDelegate.getSyncBeginNumber(), equal to the solidified block height 71447129.
    3. Set highNoFork = high = beginBlockId = 71447146.
    4. Set long realHigh = high + blockIds.size(), since blockIds.size() is 0, so the realHigh is 71447146, so the lowest block of the last sync block chain message sent is 71447129, the highest block is 71447146.

The probability of this scenario occurring is extremely low, which will cause the connection to be disconnected. It currently has little impact on the system and will not affect block synchronization and broadcasting.

@abn2357
Copy link

abn2357 commented May 19, 2025

@jakamobiii There is a concurrency problem between the Peer thread and the Bock processing thread. The following scenario can trigger this scenario.

  • Peer thread:
    After processing the block chain inventory message, set blockBothHave to 71447146. Execute the getBlockChainSummary function, and because blockBothHave is 71447146, set beginBlockId = blockBothHave = 71447146, and then the peer thread stops scheduling.

  • Bock processing thread:
    processing blocks 71447147, 71447148, 71447149, so the syncBlockToFetch queue is currently empty.

  • Peer thread:

    1. Set List<BlockId> blockIds = new ArrayList<>(peer.getSyncBlockToFetch()), blockIds is empty at this time.
    2. Set syncBeginNumber = tronNetDelegate.getSyncBeginNumber(), equal to the solidified block height 71447129.
    3. Set highNoFork = high = beginBlockId = 71447146.
    4. Set long realHigh = high + blockIds.size(), since blockIds.size() is 0, so the realHigh is 71447146, so the lowest block of the last sync block chain message sent is 71447129, the highest block is 71447146.

The probability of this scenario occurring is extremely low, which will cause the connection to be disconnected. It currently has little impact on the system and will not affect block synchronization and broadcasting.

What is the approximate value of this low probability? Is there an estimated figure?

@xxo1shine
Copy link
Contributor

What is the approximate value of this low probability? Is there an estimated figure?

@abn2357 There is no exact statistical value, it is estimated that it will only appear once every few days, and it has no impact on the system.

@abn2357
Copy link

abn2357 commented May 23, 2025

@jakamobiii There is a concurrency problem between the Peer thread and the Bock processing thread. The following scenario can trigger this scenario.

  • Peer thread:
    After processing the block chain inventory message, set blockBothHave to 71447146. Execute the getBlockChainSummary function, and because blockBothHave is 71447146, set beginBlockId = blockBothHave = 71447146, and then the peer thread stops scheduling.

  • Bock processing thread:
    processing blocks 71447147, 71447148, 71447149, so the syncBlockToFetch queue is currently empty.

  • Peer thread:

    1. Set List<BlockId> blockIds = new ArrayList<>(peer.getSyncBlockToFetch()), blockIds is empty at this time.
    2. Set syncBeginNumber = tronNetDelegate.getSyncBeginNumber(), equal to the solidified block height 71447129.
    3. Set highNoFork = high = beginBlockId = 71447146.
    4. Set long realHigh = high + blockIds.size(), since blockIds.size() is 0, so the realHigh is 71447146, so the lowest block of the last sync block chain message sent is 71447129, the highest block is 71447146.

The probability of this scenario occurring is extremely low, which will cause the connection to be disconnected. It currently has little impact on the system and will not affect block synchronization and broadcasting.

My understanding is that the first sync_block_chain request was triggered because the peer node received block 71447147 from this node, but at that time the main chain height of the peer node was still at 71447145, Since block 71447147 couldn't find its parent block, it initiated a synchronization request to this node. I have a question: under what conditions is the second sync_block_chain request triggered?

@xxo1shine
Copy link
Contributor

My understanding is that the first sync_block_chain request was triggered because the peer node received block 71447147 from this node, but at that time the main chain height of the peer node was still at 71447145, Since block 71447147 couldn't find its parent block, it initiated a synchronization request to this node. I have a question: under what conditions is the second sync_block_chain request triggered?

@abn2357 The second time is triggered when processing the BLOCK_CHAIN_INVENTORY message.

@abn2357
Copy link

abn2357 commented May 23, 2025

My understanding is that the first sync_block_chain request was triggered because the peer node received block 71447147 from this node, but at that time the main chain height of the peer node was still at 71447145, Since block 71447147 couldn't find its parent block, it initiated a synchronization request to this node. I have a question: under what conditions is the second sync_block_chain request triggered?

@abn2357 The second time is triggered when processing the BLOCK_CHAIN_INVENTORY message.

@xxo1shine Could you show the detail code position, thx.

@xxo1shine
Copy link
Contributor

My understanding is that the first sync_block_chain request was triggered because the peer node received block 71447147 from this node, but at that time the main chain height of the peer node was still at 71447145, Since block 71447147 couldn't find its parent block, it initiated a synchronization request to this node. I have a question: under what conditions is the second sync_block_chain request triggered?

@abn2357 The second time is triggered when processing the BLOCK_CHAIN_INVENTORY message.

@xxo1shine Could you show the detail code position, thx.

@abn2357 Refer to the ChainInventoryMsgHandler.processMessage() function. After processing this function, syncService.syncNext(peer) will be executed. The code is as follows:

if ((chainInventoryMessage.getRemainNum() == 0 && !peer.getSyncBlockToFetch().isEmpty())
|| (chainInventoryMessage.getRemainNum() != 0
&& peer.getSyncBlockToFetch().size() > syncFetchBatchNum)) {
  syncService.setFetchFlag(true);
} else {
  syncService.syncNext(peer);
}

When the syncNext function is executed, the sync_block_chain message will be sent.

@xxo1shine
Copy link
Contributor

xxo1shine commented May 23, 2025

@jakamobiii There is a concurrency problem between the Peer thread and the Bock processing thread. The following scenario can trigger this scenario.

  • Peer thread:
    After processing the block chain inventory message, set blockBothHave to 71447146. Execute the getBlockChainSummary function, and because blockBothHave is 71447146, set beginBlockId = blockBothHave = 71447146, and then the peer thread stops scheduling.

  • Bock processing thread:
    processing blocks 71447147, 71447148, 71447149, so the syncBlockToFetch queue is currently empty.

  • Peer thread:

    1. Set List<BlockId> blockIds = new ArrayList<>(peer.getSyncBlockToFetch()), blockIds is empty at this time.
    2. Set syncBeginNumber = tronNetDelegate.getSyncBeginNumber(), equal to the solidified block height 71447129.
    3. Set highNoFork = high = beginBlockId = 71447146.
    4. Set long realHigh = high + blockIds.size(), since blockIds.size() is 0, so the realHigh is 71447146, so the lowest block of the last sync block chain message sent is 71447129, the highest block is 71447146.

The probability of this scenario occurring is extremely low, which will cause the connection to be disconnected. It currently has little impact on the system and will not affect block synchronization and broadcasting.

@abn2357 Maybe my description above is not accurate. It may be that the BlockId blockBothHave = new BlockId(); variable is not modified by volatile. After the peer thread sets it to 71447146, the modification of the block processing thread is not visible.

@abn2357
Copy link

abn2357 commented May 23, 2025

My understanding is that the first sync_block_chain request was triggered because the peer node received block 71447147 from this node, but at that time the main chain height of the peer node was still at 71447145, Since block 71447147 couldn't find its parent block, it initiated a synchronization request to this node. I have a question: under what conditions is the second sync_block_chain request triggered?

@abn2357 The second time is triggered when processing the BLOCK_CHAIN_INVENTORY message.

@xxo1shine Could you show the detail code position, thx.

@abn2357 Refer to the ChainInventoryMsgHandler.processMessage() function. After processing this function, syncService.syncNext(peer) will be executed. The code is as follows:

if ((chainInventoryMessage.getRemainNum() == 0 && !peer.getSyncBlockToFetch().isEmpty())
|| (chainInventoryMessage.getRemainNum() != 0
&& peer.getSyncBlockToFetch().size() > syncFetchBatchNum)) {
  syncService.setFetchFlag(true);
} else {
  syncService.syncNext(peer);
}

When the syncNext function is executed, the sync_block_chain message will be sent.

Triggering syncNext requires peer.getSyncBlockToFetch().isEmpty() = true. If peer.getSyncBlockToFetch().isEmpty() = true, it means the main chain height has already reached 71447149. We need to figure out what caused peer.blockBothHave to remain stuck at 71447146.
We both agree that there's an issue with peer.blockBothHave. However, what exactly caused it still needs further investigation.

@xxo1shine
Copy link
Contributor

xxo1shine commented May 23, 2025

@abn2357 This description might be clearer.

  • Peer thread:
    After processing the block chain inventory message, set blockBothHave to 71447146.

  • Bock processing thread:
    processing blocks 71447147, 71447148, 71447149, so the syncBlockToFetch queue is currently empty.
    set blockBothHave = 71447149

  • Peer thread:
    Triggering syncNext and execute the getBlockChainSummary function.

    1. Since the variable blockBothHave is not modified by volatile, the variable value at this time may still be 71447146, set beginBlockId = blockBothHave = 71447146
    2. Set List<BlockId> blockIds = new ArrayList<>(peer.getSyncBlockToFetch()), blockIds is empty at this time.
    3. Set syncBeginNumber = tronNetDelegate.getSyncBeginNumber(), equal to the solidified block height 71447129.
    4. Set highNoFork = high = beginBlockId = 71447146.
    5. Set long realHigh = high + blockIds.size(), since blockIds.size() is 0, so the realHigh is 71447146, so the lowest block of the last sync block chain message sent is 71447129, the highest block is 71447146.

@abn2357
Copy link

abn2357 commented May 23, 2025

@abn2357 This description might be clearer.

  • Peer thread:
    After processing the block chain inventory message, set blockBothHave to 71447146.

  • Bock processing thread:
    processing blocks 71447147, 71447148, 71447149, so the syncBlockToFetch queue is currently empty.
    set blockBothHave = 71447149

  • Peer thread:
    Triggering syncNext and execute the getBlockChainSummary function.

    1. Since the variable blockBothHave is not modified by volatile, the variable value at this time may still be 71447146, set beginBlockId = blockBothHave = 71447146
    2. Set List<BlockId> blockIds = new ArrayList<>(peer.getSyncBlockToFetch()), blockIds is empty at this time.
    3. Set syncBeginNumber = tronNetDelegate.getSyncBeginNumber(), equal to the solidified block height 71447129.
    4. Set highNoFork = high = beginBlockId = 71447146.
    5. Set long realHigh = high + blockIds.size(), since blockIds.size() is 0, so the realHigh is 71447146, so the lowest block of the last sync block chain message sent is 71447129, the highest block is 71447146.

@xxo1shine
If blockBothHave is set 71447146 in peer thread , peer.getSyncBlockToFetch() is not empty, if second syncNext is triggered as you say in function processmessage of ChainInventoryMsgHandler.java, how to trigger?

@xxo1shine
Copy link
Contributor

@abn2357 This description might be clearer.

  • Peer thread:
    After processing the block chain inventory message, set blockBothHave to 71447146.

  • Bock processing thread:
    processing blocks 71447147, 71447148, 71447149, so the syncBlockToFetch queue is currently empty.
    set blockBothHave = 71447149

  • Peer thread:
    Triggering syncNext and execute the getBlockChainSummary function.

    1. Since the variable blockBothHave is not modified by volatile, the variable value at this time may still be 71447146, set beginBlockId = blockBothHave = 71447146
    2. Set List<BlockId> blockIds = new ArrayList<>(peer.getSyncBlockToFetch()), blockIds is empty at this time.
    3. Set syncBeginNumber = tronNetDelegate.getSyncBeginNumber(), equal to the solidified block height 71447129.
    4. Set highNoFork = high = beginBlockId = 71447146.
    5. Set long realHigh = high + blockIds.size(), since blockIds.size() is 0, so the realHigh is 71447146, so the lowest block of the last sync block chain message sent is 71447129, the highest block is 71447146.

@xxo1shine If blockBothHave is set 71447146 in peer thread , peer.getSyncBlockToFetch() is not empty, if second syncNext is triggered as you say in function processmessage of ChainInventoryMsgHandler.java, how to trigger?

@abn2357 As described above, the syncBlockToFetch queue is currently empty.

@abn2357
Copy link

abn2357 commented May 23, 2025

@abn2357 This description might be clearer.

  • Peer thread:
    After processing the block chain inventory message, set blockBothHave to 71447146.

  • Bock processing thread:
    processing blocks 71447147, 71447148, 71447149, so the syncBlockToFetch queue is currently empty.
    set blockBothHave = 71447149

  • Peer thread:
    Triggering syncNext and execute the getBlockChainSummary function.

    1. Since the variable blockBothHave is not modified by volatile, the variable value at this time may still be 71447146, set beginBlockId = blockBothHave = 71447146
    2. Set List<BlockId> blockIds = new ArrayList<>(peer.getSyncBlockToFetch()), blockIds is empty at this time.
    3. Set syncBeginNumber = tronNetDelegate.getSyncBeginNumber(), equal to the solidified block height 71447129.
    4. Set highNoFork = high = beginBlockId = 71447146.
    5. Set long realHigh = high + blockIds.size(), since blockIds.size() is 0, so the realHigh is 71447146, so the lowest block of the last sync block chain message sent is 71447129, the highest block is 71447146.

@xxo1shine If blockBothHave is set 71447146 in peer thread , peer.getSyncBlockToFetch() is not empty, if second syncNext is triggered as you say in function processmessage of ChainInventoryMsgHandler.java, how to trigger?

@abn2357 As described above, the syncBlockToFetch queue is currently empty.

@xxo1shine After set peer.blockBothHave, peer thread immediately execute the following code:

 peer.setFetchAble(true);
    if ((chainInventoryMessage.getRemainNum() == 0 && !peer.getSyncBlockToFetch().isEmpty())
        || (chainInventoryMessage.getRemainNum() != 0
        && peer.getSyncBlockToFetch().size() > syncFetchBatchNum)) {
      syncService.setFetchFlag(true);
    } else {
      syncService.syncNext(peer);
    }
  }

you mean block processing thread process 71447147, 71447148, 71447149 in a very short time so that when peer thread check peer.getSyncBlockToFetch().isEmpty() = true?

@xxo1shine
Copy link
Contributor

xxo1shine commented May 23, 2025

@abn2357 This description might be clearer.

  • Peer thread:
    After processing the block chain inventory message, set blockBothHave to 71447146.

  • Bock processing thread:
    processing blocks 71447147, 71447148, 71447149, so the syncBlockToFetch queue is currently empty.
    set blockBothHave = 71447149

  • Peer thread:
    Triggering syncNext and execute the getBlockChainSummary function.

    1. Since the variable blockBothHave is not modified by volatile, the variable value at this time may still be 71447146, set beginBlockId = blockBothHave = 71447146
    2. Set List<BlockId> blockIds = new ArrayList<>(peer.getSyncBlockToFetch()), blockIds is empty at this time.
    3. Set syncBeginNumber = tronNetDelegate.getSyncBeginNumber(), equal to the solidified block height 71447129.
    4. Set highNoFork = high = beginBlockId = 71447146.
    5. Set long realHigh = high + blockIds.size(), since blockIds.size() is 0, so the realHigh is 71447146, so the lowest block of the last sync block chain message sent is 71447129, the highest block is 71447146.

@xxo1shine If blockBothHave is set 71447146 in peer thread , peer.getSyncBlockToFetch() is not empty, if second syncNext is triggered as you say in function processmessage of ChainInventoryMsgHandler.java, how to trigger?

@abn2357 As described above, the syncBlockToFetch queue is currently empty.

@xxo1shine After set peer.blockBothHave, peer thread immediately execute the following code:

 peer.setFetchAble(true);
    if ((chainInventoryMessage.getRemainNum() == 0 && !peer.getSyncBlockToFetch().isEmpty())
        || (chainInventoryMessage.getRemainNum() != 0
        && peer.getSyncBlockToFetch().size() > syncFetchBatchNum)) {
      syncService.setFetchFlag(true);
    } else {
      syncService.syncNext(peer);
    }
  }

you mean block processing thread process 71447147, 71447148, 71447149 in a very short time so that when peer thread check peer.getSyncBlockToFetch().isEmpty() = true?

@abn2357 When the following code is executed, set blockBothHave to 71447146, then the peer thread stops scheduling (because of the time slice interruption), and when the thread resumes scheduling, the block processing thread has processed the three blocks.

synchronized (tronNetDelegate.getBlockLock()) {
      try {
        BlockId blockId = null;
        while (!peer.getSyncBlockToFetch().isEmpty() && tronNetDelegate
                .containBlock(peer.getSyncBlockToFetch().peek())) {
          blockId = peer.getSyncBlockToFetch().pop();
          peer.setBlockBothHave(blockId);
        }
        if (blockId != null) {
          logger.info("Block {} from {} is processed",
              blockId.getString(), peer.getInetAddress());
        }
      } catch (NoSuchElementException e) {
        logger.warn("Process ChainInventoryMessage failed, peer {}, isDisconnect:{}",
                peer.getInetAddress(), peer.isDisconnect());
        peer.setFetchAble(true);
        return;
      }
    }

@abn2357
Copy link

abn2357 commented May 23, 2025

@abn2357 This description might be clearer.

  • Peer thread:
    After processing the block chain inventory message, set blockBothHave to 71447146.

  • Bock processing thread:
    processing blocks 71447147, 71447148, 71447149, so the syncBlockToFetch queue is currently empty.
    set blockBothHave = 71447149

  • Peer thread:
    Triggering syncNext and execute the getBlockChainSummary function.

    1. Since the variable blockBothHave is not modified by volatile, the variable value at this time may still be 71447146, set beginBlockId = blockBothHave = 71447146
    2. Set List<BlockId> blockIds = new ArrayList<>(peer.getSyncBlockToFetch()), blockIds is empty at this time.
    3. Set syncBeginNumber = tronNetDelegate.getSyncBeginNumber(), equal to the solidified block height 71447129.
    4. Set highNoFork = high = beginBlockId = 71447146.
    5. Set long realHigh = high + blockIds.size(), since blockIds.size() is 0, so the realHigh is 71447146, so the lowest block of the last sync block chain message sent is 71447129, the highest block is 71447146.

@xxo1shine If blockBothHave is set 71447146 in peer thread , peer.getSyncBlockToFetch() is not empty, if second syncNext is triggered as you say in function processmessage of ChainInventoryMsgHandler.java, how to trigger?

@abn2357 As described above, the syncBlockToFetch queue is currently empty.

@xxo1shine After set peer.blockBothHave, peer thread immediately execute the following code:

 peer.setFetchAble(true);
    if ((chainInventoryMessage.getRemainNum() == 0 && !peer.getSyncBlockToFetch().isEmpty())
        || (chainInventoryMessage.getRemainNum() != 0
        && peer.getSyncBlockToFetch().size() > syncFetchBatchNum)) {
      syncService.setFetchFlag(true);
    } else {
      syncService.syncNext(peer);
    }
  }

you mean block processing thread process 71447147, 71447148, 71447149 in a very short time so that when peer thread check peer.getSyncBlockToFetch().isEmpty() = true?

@abn2357 When the following code is executed, set blockBothHave to 71447146, then the peer thread stops scheduling (because of the time slice interruption), and when the thread resumes scheduling, the block processing thread has processed the three blocks.

synchronized (tronNetDelegate.getBlockLock()) {
      try {
        BlockId blockId = null;
        while (!peer.getSyncBlockToFetch().isEmpty() && tronNetDelegate
                .containBlock(peer.getSyncBlockToFetch().peek())) {
          blockId = peer.getSyncBlockToFetch().pop();
          peer.setBlockBothHave(blockId);
        }
        if (blockId != null) {
          logger.info("Block {} from {} is processed",
              blockId.getString(), peer.getInetAddress());
        }
      } catch (NoSuchElementException e) {
        logger.warn("Process ChainInventoryMessage failed, peer {}, isDisconnect:{}",
                peer.getInetAddress(), peer.isDisconnect());
        peer.setFetchAble(true);
        return;
      }
    }

process the 3 blocks and make getSynBlockToFetch to empty, but when pop the element of getSynBlockToFetch, it will also update blockBothHave

@xxo1shine
Copy link
Contributor

xxo1shine commented May 23, 2025

process the 3 blocks and make getSynBlockToFetch to empty, but when pop the element of getSynBlockToFetch, it will also update blockBothHave

@abn2357 Since the variable blockBothHave is not modified by volatile, the variable blockBothHave of the peer thread may still be in the CPU cache, that is 71447146.

Even if the block processing thread modifies the value, the value read by the peer thread is still 71447146 due to CPU cache..

@abn2357
Copy link

abn2357 commented May 23, 2025

process the 3 blocks and make getSynBlockToFetch to empty, but when pop the element of getSynBlockToFetch, it will also update blockBothHave

@abn2357 Since the variable blockBothHave is not modified by volatile, the variable blockBothHave of the peer thread may still be in the CPU cache, that is 71447146.

Even if the block processing thread modifies the value, the value read by the peer thread is still 71447146 due to CPU cache..

Yes, Thx, this is really a possibility. Since there are no logs of the peer, we don’t know if this is the real reason.

@yuekun0707 yuekun0707 moved this to Todo in java-tron May 27, 2025
@yuekun0707 yuekun0707 moved this from To Do to In Progress in java-tron May 29, 2025
@abn2357
Copy link

abn2357 commented May 29, 2025

@xxo1shine hey buddy, I have a puzzle about one of peer.setBlockBothHave(blockId) here, when handle fetchinvdata message, at this point, it cannot confirm whether the peer will definitely receive this block, so why is it allowed to set block both have here?

@xxo1shine
Copy link
Contributor

@abn2357 TCP will not lose packets. As long as it is sent, it will be received. If it cannot be received, the peer status check will disconnect the connection. Refer to this function statusCheck.

@abn2357
Copy link

abn2357 commented Jun 3, 2025

@abn2357 TCP will not lose packets. As long as it is sent, it will be received. If it cannot be received, the peer status check will disconnect the connection. Refer to this function statusCheck.

What are the benefits of setting peer.blockBothHave here in advance compared to setting it after confirming receipt? Logically, it should be set after confirming receipt.

@xxo1shine
Copy link
Contributor

@abn2357 This is two nodes communicating. A sends a block to B. How does A know when B receives it?

@abn2357
Copy link

abn2357 commented Jun 3, 2025

@abn2357 This is two nodes communicating. A sends a block to B. How does A know when B receives it?

I mean when B receives the block, B set blockBothHave,As for when A should set blockBothHave, is there any other more logical approach?

@xxo1shine
Copy link
Contributor

@abn2357 A sends B block 100, A sets blockBothHave to 100, and after B receives block 100, it also sets blockBothHave to 100, so that the status is consistent. If A does not set this field after sending the block, A's bothwehava may still be 99, but B's is already 100. If A does not request blocks from B, A's blockBothHave may never be updated. blockBothHave is mainly used for log printing and troubleshooting during the broadcast stage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In Progress
Development

No branches or pull requests

5 participants