Skip to content

Section-Bloom Control Optimization #6335

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
bladehan1 opened this issue May 30, 2025 · 7 comments
Open

Section-Bloom Control Optimization #6335

bladehan1 opened this issue May 30, 2025 · 7 comments

Comments

@bladehan1
Copy link
Contributor

Background

The current storage of section-bloom data is strongly coupled with the JSON-RPC interface toggle (node.jsonrpc.httpFullNodeEnable), leading to the following issues:

  1. Irreversible Data Loss: If a user disables the toggle and later re-enables it, section-bloom data generated during the disabled period cannot be recovered, causing permanent failures in eth_getLogs queries.
  2. Confusing Configuration Semantics: A single toggle controls both the data layer and interface layer, blurring behavioral boundaries for users.
  3. Contradictory Default Behavior: Full nodes should natively support complete log queries, but the current configuration defaults to disabling this functionality.

Image

Data Status (as of 2025-05-29, Mainnet Block Height 72.6 million):

  • Theoretical Storage Size: 17.3 GB
  • Actual Compressed Size: 12.9 GB (25.4% compression ratio)
    Annual growth: ~10 million new blocks, max theoretical increase: 2.4 GB;
    Factoring in 25.4% compression, actual storage growth: ~1.79 GB/year.

Rationale

Why should this feature exist?

  1. Data Integrity: Index data should be decoupled from interface availability to prevent configuration changes from causing data gaps.
  2. Operational Flexibility: Users may need to temporarily disable interfaces (e.g., during security audits) without compromising data completeness.

What are the use cases?

  1. Full Node Operation: Preserve section-bloom data for future analysis even when the JSON-RPC interface is disabled.
  2. Light Node Optimization: Resource-constrained nodes can disable data writes (storage.writeSectionBloom=false) to save storage.
  3. Chain Service Providers: DApps require guaranteed completeness of eth_getLogs results, unaffected by temporary configuration changes.

Specification

Candidate Solutions Comparison

Solution Description Pros Cons
Solution 1: Unconditional Writes Remove all write condition checks Write section-bloom data unconditionally Simple implementation (minimal code changes) 100% data integrity No write disable option (no choice for storage-sensitive users) Violates config controllability
Solution 2: New Independent Config storage.writeSectionBloom (data toggle) Fully decouple data and interface control User freedom to enable/disable writes Unified control for eth_getFilterChanges and eth_getLogs New config affects eth_getFilterChanges Missing historical data requires full sync recovery
Solution 3: Hybrid Config storage.writeSectionBloom + legacy control Decouple persistent data from interface control Granular write control Compatible with legacy interface logic for eth_getFilterChanges Code redundancy Overly complex config granularity

Test Specification

Test Scenario Solution 1 Solution 2 Solution 3
Default configuration Continuous writes, interface OK Same Same
Disable writes (if applicable) N/A section-bloom writes halted eth_getLogs returns existing data eth_getFilterChanges returns no data section-bloom writes halted eth_getLogs returns existing data eth_getFilterChanges returns data

Scope Of Impact

  1. Affected Modules:
    • Block processing pipeline (BlockProcessor)
    • LevelDB storage (SectionBloomStore)
    • JSON-RPC interface layer (EthApi)
  2. Affected Interfaces:
    • eth_getLogs
    • eth_getFilterLogs
    • eth_getFilterChanges

Implementation

Approach

Recommended: Solution 2 (New independent config), with steps:

  1. Code Decoupling

    • Separate data write logic from interface control

    •  if (CommonParameter.getInstance().writeSectionBloom()) {
                   Bloom blockBloom = chainBaseManager.getSectionBloomStore()
                       .initBlockSection(transactionRetCapsule);
                   chainBaseManager.getSectionBloomStore().write(block.getNum());
                   block.setBloom(blockBloom);
                 } 
      
  2. Documentation Updates

    • Explicitly state: Disabling the data toggle mid-operation requires full block sync to recover missing historical data.

Are you willing to implement this feature?

Yes. Willing to lead development.

Questions for community discussion:

  1. Solution Selection: Do you agree with recommending Solution 2?
  2. Default Value: Should storage.writeSectionBloom default to true (data priority) or false (resource priority)?
@lxcmyf
Copy link
Contributor

lxcmyf commented May 30, 2025

@bladehan1
Is there a difference discovered through testing between data priority and resource priority?

@yuekun0707 yuekun0707 self-assigned this May 30, 2025
@yuekun0707 yuekun0707 moved this to To Do in java-tron May 30, 2025
@waynercheung
Copy link
Contributor

@bladehan1 > Recommended: Solution 2 (New independent config)

For the solution2, if the user enabled jsonrpc API, but disabled the new added setting storage.writeSectionBloom, the api related with Logs such as eth_getLogs will not work for the new logs data, this would cause trouble for users, right?

Beside, what's the default value of storage.writeSectionBloom?
Let's discuss a scenario, if the default value is False, but the user enabled JSON-RPC api, and the eth_getLogs will not work.
If the default value is True, and the JSON-RPC api is disabled(default value), the disk will be large than before, though it's not very obvious.

And maybe there also exists some other scenarios we need to talk about.

But overall, I think adding a separate setting storage.writeSectionBloom maybe a good solution, and we need to explain it in detail to provide the user with a comprehensive breakdown.

@bladehan1
Copy link
Contributor Author

bladehan1 commented May 30, 2025

@bladehan1 Is there a difference discovered through testing between data priority and resource priority?

Data priority will be the default storage section-bloom, supporting eth_getLogs query, but compared to resource priority, the additional storage is about 1.7G per year.

@Sunny6889
Copy link

@bladehan1 Since the additional storage is about 1.7G per year. How about we just enable section-bloom and delete the configuration option.

@bladehan1
Copy link
Contributor Author

@waynercheung
storage.writeSectionBloom defaults to true.
We should provide detailed documentation for this setting. Users must understand the consequences of changing this option before changing it.

@0xbigapple
Copy link

I think it's reasonable to set storage.writeSectionBloom's default value to true.Compared to the additional 1.7GB of resources required per year, users being unable to retrieve data when querying eth_getLogs is much worse.

@bladehan1
Copy link
Contributor Author

@bladehan1 Since the additional storage is about 1.7G per year. How about we just enable section-bloom and delete the configuration option.

The 1.7GB annual storage increase poses minimal impact for light nodes (current baseline: ~60GB) – and is effectively negligible for full nodes. This justifies default enablement.
But locking configuration concerns me - is this too radical? Could we:
Keep it configurable during transition?
I'd appreciate others' perspectives on this balance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: To Do
Development

No branches or pull requests

6 participants