Skip to content

feat: FailSafe — structured failure memory and auto-clustering for SIA#29

Open
ayaelnakeb wants to merge 2 commits into
hexo-ai:mainfrom
ayaelnakeb:main
Open

feat: FailSafe — structured failure memory and auto-clustering for SIA#29
ayaelnakeb wants to merge 2 commits into
hexo-ai:mainfrom
ayaelnakeb:main

Conversation

@ayaelnakeb

Copy link
Copy Markdown

FailSafe adds generation-persistent structured failure memory to SIA's
feedback loop.

After each generation, automatically clusters failures from results.json
and stdout logs (format errors, zero-accuracy classes, regressions,
strong classes) and hard-injects them at the top of every feedback prompt
— replacing the passive context.md file pointer with actionable,
structured failure signal.

Results on LawBench (913 cases, 191 classes):

  • Baseline: 14% → 17% → 16% (regressed gen 3)
  • FailSafe: 14% → 26.3% → 30% (monotonic improvement, 0 regressions)

Files changed: orchestrator.py, prompts.py, agent_impls/pydantic_ai.py

After each generation, automatically cluster failures from logs and
results.json (format errors, zero-accuracy classes, regressions, strong
classes) and inject the structured memory at the top of every feedback
prompt. Prevents regressions and surfaces per-class failure signal that
was invisible in the aggregate score.

- orchestrator.py: _cluster_failures() + _update_structured_memory()
- prompts.py: inject memory.md at top of feedback prompt
- agent_impls/pydantic_ai.py: add list_dir tool + system_prompt for OpenAI
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants