custodian: retry proof retrieval indefinitely and avoid fatal receive errors #1927

ffranr · 2026-01-05T16:50:33Z

Fixes #1923.

Summary

add unlimitedtries to proof backoff and use it for custodian receive paths so missing universe proofs don’t exhaust retries
treat custodian receive/inspection errors as non-fatal to prevent tapd shutdowns on transient proof issues
document unlimitedtries in sample-tapd.conf and add courier config validation helpers

gemini-code-assist · 2026-01-05T16:50:51Z

Summary of Changes

Hello @ffranr, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the robustness of the custodian service by implementing indefinite retries for proof retrieval and making most receive and inspection errors non-fatal. These changes aim to prevent tapd from shutting down due to temporary network issues or missing proofs, ensuring more resilient operation. Additionally, it introduces configuration validation for courier settings and updates documentation for the new retry behavior.

Highlights

Indefinite Proof Retrieval Retries: The custodian's proof retrieval mechanism now retries indefinitely for missing universe proofs, preventing exhaustion of retries and ensuring eventual proof acquisition.
Non-Fatal Error Handling: Custodian receive and inspection errors are now treated as non-fatal, logging the issues and continuing operation instead of causing tapd shutdowns due to transient proof problems.
Configuration Validation and Documentation: New validation helpers have been added for courier configurations, and the unlimitedtries option for backoff is now documented in sample-tapd.conf.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces two main improvements for the custodian: it enables indefinite retries for proof retrieval and makes error handling more robust by treating transient errors as non-fatal. This is achieved by adding an unlimitedtries option to the backoff configuration and updating the custodian's event loop to log and continue on non-critical errors. The changes are well-implemented, and I have a few suggestions to improve code style and maintainability.

proof/courier.go

tapcfg/server.go

Ensure proof retrieval has unlimited retries.

coveralls · 2026-01-05T17:15:24Z

Pull Request Test Coverage Report for Build 20722871752

Details

0 of 116 (0.0%) changed or added relevant lines in 4 files are covered.
28661 unchanged lines in 201 files lost coverage.
Overall coverage decreased (-22.1%) to 34.896%

Changes Missing Coverage	Changed/Added Lines	%
tapcfg/config.go	1	0.0%
tapgarden/custodian.go	14	0.0%
tapcfg/server.go	41	0.0%
proof/courier.go	60	0.0%

Files with Coverage Reduction	New Missed Lines	%
universe/supplyverifier/util.go	1	98.75%
proof/util.go	2	81.63%
tapdb/cache_logger.go	2	94.44%
tapdb/migrations.go	2	76.19%
tapdb/mssmt.go	2	88.64%
address/log.go	3	0.0%
commitment/log.go	3	0.0%
internal/pedersen/commitment.go	3	95.31%
lndservices/log.go	3	0.0%
rfq/log.go	3	0.0%

Totals
Change from base Build 20468127774:	-22.1%
Covered Lines:	31475
Relevant Lines:	90196

💛 - Coveralls

Roasbeef · 2026-01-05T21:16:45Z

proof/courier.go

+	// UnlimitedTries indicates that we should retry indefinitely until the
+	// transfer succeeds or the context is canceled. If true, NumTries must
+	// be zero.
+	UnlimitedTries bool `long:"unlimitedtries" description:"Retry indefinitely instead of stopping after a fixed number of attempts."`


Hmm, not sure we should expose such a setting like this. Could end up being accidental DoS.

Roasbeef · 2026-01-05T21:19:11Z

tapgarden/custodian.go

+			log.Errorf("Unable to check proof availability for "+
+				"event (outpoint=%v): %v", event.Outpoint, err)
+
+			if fn.ErrorAs[*fn.CriticalError](err) {


Are we actually using CriticalError anywhere? IMO this is the fundamental fix: don't need to fatalf/criticalf when we fail to fetch as proof.

ffranr · 2026-01-09T14:12:41Z

Replaced by #1941 which is more narrowly focused on the early termination problem and does not add indefinite retry to proof retrieval.

ffranr added 2 commits January 5, 2026 14:15

proof: add support for unlimited retry in proof transfer backoff config

2c6b1a9

proof: add validation methods for courier configurations

94ad383

ffranr self-assigned this Jan 5, 2026

ffranr added bug Something isn't working proofs receive labels Jan 5, 2026

github-project-automation bot added this to Taproot-Assets Project Board Jan 5, 2026

github-project-automation bot moved this to 🆕 New in Taproot-Assets Project Board Jan 5, 2026

ffranr mentioned this pull request Jan 5, 2026

[bug]: How to Fix 'No Universe Proof Found' Error Causing Taproot Assets (tapd) to Crash? #1923

Open

gemini-code-assist bot reviewed Jan 5, 2026

View reviewed changes

proof/courier.go Outdated Show resolved Hide resolved

proof/courier.go Show resolved Hide resolved

tapcfg/server.go Show resolved Hide resolved

ffranr added 5 commits January 5, 2026 17:02

proof: add error validation for CourierDispatch initialization

d8d3249

server: add custodian proof courier dispatcher with unlimited retries

350fb5c

Ensure proof retrieval has unlimited retries.

config: add unlimited retry option to courier backoff sample config

b1ab7a9

tapgarden: avoid tapd shutdown on non-critical errors in custodian

ea667cc

docs: update release notes

ebd8140

ffranr force-pushed the tpm/issues/1923-universe-proof-crash branch from 6e1ed7f to ebd8140 Compare January 5, 2026 17:02

Roasbeef requested changes Jan 5, 2026

View reviewed changes

github-project-automation bot moved this from 🆕 New to 👀 In review in Taproot-Assets Project Board Jan 5, 2026

ffranr closed this Jan 9, 2026

github-project-automation bot moved this from 👀 In review to ✅ Done in Taproot-Assets Project Board Jan 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

custodian: retry proof retrieval indefinitely and avoid fatal receive errors #1927

custodian: retry proof retrieval indefinitely and avoid fatal receive errors #1927

ffranr commented Jan 5, 2026

Uh oh!

gemini-code-assist bot commented Jan 5, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coveralls commented Jan 5, 2026 •

edited

Loading

Uh oh!

Roasbeef Jan 5, 2026

Uh oh!

Roasbeef Jan 5, 2026

Uh oh!

ffranr commented Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

custodian: retry proof retrieval indefinitely and avoid fatal receive errors #1927

custodian: retry proof retrieval indefinitely and avoid fatal receive errors #1927

Conversation

ffranr commented Jan 5, 2026

Summary

Uh oh!

gemini-code-assist bot commented Jan 5, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coveralls commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Test Coverage Report for Build 20722871752

Details

💛 - Coveralls

Uh oh!

Roasbeef Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

Roasbeef Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

ffranr commented Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

coveralls commented Jan 5, 2026 •

edited

Loading