Nothing Failed. That’s the Problem.
Your agent optimizes for green, not for truth.
Context
In my last issue — Your AI Tests Are Probably Useless — I showed why AI tests fail by duplicating production logic.
That’s step one.
But even without duplication, your tests can still lie to you.
Because AI optimizes for passing tests.
Not for failing when it matters.
Tests pass even when behavior is broken
Your test suite stays green—even after you break production code.
That’s not safety. That’s theater.
Force tests to verify observable side effects, not internal calls.
Bad:
expect(mailer.send).toHaveBeenCalled();Better:
expect(sentEmails).toEqual([
{ to: "a@x.com", subject: "Hi" }
]);If you can delete core logic and tests still pass, they’re useless.
Agent extension: flag tests that only verify internal calls instead of real outcomes.
Over-mocking hides real failures
Mocks make everything look correct.
Until production explodes.
When you mock your data layer, you stop testing real behavior.
// production
async function getActiveUsers(repo) {
return (await repo.findAll()).filter(u => u.isActive);
}// test (over-mocked)
repo.findAll.mockResolvedValue([
{ isActive: true },
{ isActive: false }
]);
const result = await getActiveUsers(repo);
expect(result).toHaveLength(1);If your real query is broken, this still passes.
You tested the mock—not the system.
Agent extension: flag tests that mock internal boundaries like repos or services when a simple in-memory version would test real behavior.
Weak assertions create ambiguity
Some tests only prove what didn’t happen.
That tells you almost nothing.
Bad:
expect(result).not.toBeNull();
expect(list).not.toHaveLength(0);Better:
expect(result).toEqual(expectedUser);
expect(list).toHaveLength(3);Ambiguous tests pass too easily.
Precise tests fail when they should.
Agent extension: reject weak assertions and require explicit expected values.
Tests ignore edge cases
AI sticks to the happy path.
That’s where bugs don’t live.
Edge cases are where things break.
// production
function divide(a, b) {
return a / b;
}// test (happy path only)
expect(divide(10, 2)).toBe(5);That covers the easy case.
It says nothing about zero, null, or bad input.
Agent extension: require at least one edge case per test (null, empty, zero, duplicate, or invalid input).
Tests never fail
A test suite that never catches a bug is worse than no suite at all.
It gives you fake confidence.
Break the code on purpose.
// production (bug introduced)
function isEven(n) {
return n % 2 !== 0;
}// test
expect(isEven(2)).toBe(true);If this still passes, your test isn’t protecting you.
A good test should fail immediately.
Agent extension: require a “what would break this?” check for every test.
Bonus: recently I’ve been telling the agent try to poke some holes in the implementation in PR #1057. You’ll be surprised!
The Agent
---
name: "test-reviewer"
description: "Use this agent when reviewing or writing tests. Ensures tests verify real behavior, avoid false confidence, and fail when production code is broken."
---
# Test Reviewer Agent
You are a specialist in reviewing tests. Your goal is to prevent false confidence by ensuring tests verify real behavior and fail when they should.
---
## Core Principle: Tests Must Fail When Behavior Breaks
A test that passes when production code is broken is worse than no test.
---
## Rule 1: Assert Observable Behavior
Flag tests that only verify internal calls.
Prefer:
- Returned values
- State changes
- External side effects
---
## Rule 2: Don’t Mock Internal Boundaries
Flag:
- Mocked repositories
- Mocked services
Allow:
- External systems (APIs, payments, email)
Suggest in-memory replacements where possible.
---
## Rule 3: Prefer Strong Assertions
Flag:
\`\`\`js
expect(result).not.toBeNull();
expect(list).not.toHaveLength(0);
\`\`\`
Prefer:
\`\`\`js
expect(result).toEqual(expected);
expect(list).toHaveLength(3);
\`\`\`
---
## Rule 4: Require Edge Cases
Each test should include at least one:
- Empty input
- Null / undefined
- Zero
- Duplicate data
- Invalid state
---
## Rule 5: Mutation Check
Ask:
"What change to the production code should make this fail?"
If unclear, the test is weak.
---
## Review Process
1. Identify behavior under test
2. Check for mocked internals
3. Check for missing edge cases
4. Check assertion strength
5. Verify the test would fail if logic breaks
---
## Output Format
- Issue
- Why it matters
- Suggested fix 👉 Action Step
Add one of these rules to your test agent and run it on your next PR.
Stuck? Reply to this post or hit reply to this email!
📰 Weekly shoutout
📣 Share
There’s no easier way to help this newsletter grow than by sharing it with the world. If you liked it, found something helpful, or you know someone who knows someone to whom this could be helpful, share it:
🏆 Subscribe
Actually, there’s one easier thing you can do to grow and help grow: subscribe to this newsletter. I’ll keep putting in the work and distilling what I learn/learned as a software engineer/consultant. Simply sign up here:

