Skip to content

🔨 Improve Platform Operations and Trust

Open
No due date
Last updated Jan 5, 2026
66% complete

By the end of Q4 2025, make our platform the service that application teams instinctively trust to stay up, stay responsive, and stay out of their way.

Success Indicators

  • Monitor Mean Time To Recover (MTTR)
  • Keep-the-Lights-On tickets ≤ 30 % of stories in any sprint (rolling average).
  • 100 % of re-architecture / “big bet” decisions have an ADR or decision log published ≤ 5 days after kickoff.

Notes

  • The milestone is open-ended (no fixed due date); teams progress it alongside normal roadmap work.
  • KTLO ratio script runs after every sprint retrospective, and the team is informed.
  • Future (out of scope): cost-of-downtime KPI, change-failure-rate SLO, automated post-incident survey.

List view