feat: FailSafe — structured failure memory and auto-clustering for SIA by ayaelnakeb · Pull Request #29 · hexo-ai/sia

ayaelnakeb · 2026-06-07T00:51:29Z

FailSafe adds generation-persistent structured failure memory to SIA's
feedback loop.

After each generation, automatically clusters failures from results.json
and stdout logs (format errors, zero-accuracy classes, regressions,
strong classes) and hard-injects them at the top of every feedback prompt
— replacing the passive context.md file pointer with actionable,
structured failure signal.

Results on LawBench (913 cases, 191 classes):

Baseline: 14% → 17% → 16% (regressed gen 3)
FailSafe: 14% → 26.3% → 30% (monotonic improvement, 0 regressions)

Files changed: orchestrator.py, prompts.py, agent_impls/pydantic_ai.py

After each generation, automatically cluster failures from logs and results.json (format errors, zero-accuracy classes, regressions, strong classes) and inject the structured memory at the top of every feedback prompt. Prevents regressions and surfaces per-class failure signal that was invisible in the aggregate score. - orchestrator.py: _cluster_failures() + _update_structured_memory() - prompts.py: inject memory.md at top of feedback prompt - agent_impls/pydantic_ai.py: add list_dir tool + system_prompt for OpenAI

ayaelnakeb added 2 commits June 6, 2026 17:23

add Nebius meta profile for full Nebius-only runs

b34954a

8RON8 approved these changes Jun 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: FailSafe — structured failure memory and auto-clustering for SIA#29

feat: FailSafe — structured failure memory and auto-clustering for SIA#29
ayaelnakeb wants to merge 2 commits into
hexo-ai:mainfrom
ayaelnakeb:main

ayaelnakeb commented Jun 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ayaelnakeb commented Jun 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants