We help your Agent get smarter
every time it fails.

EvalFix automatically finds what's breaking in your AI product, figures out why, and fixes it — so your team spends less time debugging and more time shipping.

evalfix run my-agent/
avg score
evalfix fix my-agent/ AI
prompt diff — v2 → v3
- You are a helpful assistant that answers
- questions clearly and concisely.
+ You are a helpful assistant. When asked for
+ structured output (JSON, haiku, numbered steps),
+ follow the exact format requested. Otherwise,
+ answer clearly and concisely.
reasoning

3 tests failed due to format non-compliance. The prompt lacked explicit format instructions. Added format guidance targeting json_only_output, respond_in_haiku, and numbered_steps.

score 0.42 → 0.91

Get started in seconds

Two packages. That's it.

terminal CLI
$ pip install evalfix

Run evals, analyze failures, and fix prompts from your terminal or CI pipeline.

$ evalfix run my-agent/
$ evalfix fix my-agent/
✓ Fixed in 1 iteration score 0.42 → 0.91
support_bot.py SDK
$ pip install evalfix-sdk
from evalfix_sdk import capture, configure
 
configure(queue_file="support-bot/.evalfix/failures.jsonl")
 
if quality_score < 0.7:
  capture(
    input=user_msg,
    output=response,
    expected="Empathetic, actionable reply",
    score=quality_score,
  )

This is how real-world failures become eval cases. Every bad response your agent produces in production gets captured here — so evalfix fix is always optimizing against what actually breaks, not examples you invented.

Never blocks. Never throws.

Why not just edit the prompt yourself?

You can't fix what you can't see.

Manual prompt editing is a guess. evalfix gives you the failure, the context, and the fix — verified against your real test cases before it ships.

Without evalfix
# you get a Slack message
⚠ support-bot responses feel off today
# you open the prompt file
You are a helpful assistant that answers
questions clearly and concisely.
# you make a guess
+ Please be more empathetic.
# you deploy and hope
No evals. No diff. No rollback.
With evalfix
# evalfix captured 7 failures overnight
empathy_check0.18
got: "Your ticket is closed." expected: acknowledgement
action_items0.31
no next steps offered in 6/7 cases
# evalfix fix support-bot/
▸ root cause: prompt lacks tone + CTA guidance
+ Acknowledge the issue, then offer a clear next step.
✓ score 0.24 → 0.87 · verified on 7 real cases

Evaluation methods

Exact match Contains Regex LLM-as-judge Custom

Evaluate the way your use case demands — from deterministic checks to AI-graded rubrics.