Nothing Failed. That’s the Problem.

Akos Komuves

Apr 11, 2026

Your agent optimizes for green, not for truth.

Context

In my last issue — Your AI Tests Are Probably Useless — I showed why AI tests fail by duplicating production logic.

That’s step one.

But even without duplication, your tests can still lie to you.

Because AI optimizes for passing tests.

Not for failing when it matters.

Tests pass even when behavior is broken

Your test suite stays green—even after you break production code.

That’s not safety. That’s theater.

Force tests to verify observable side effects, not internal calls.

Bad:

expect(mailer.send).toHaveBeenCalled();

Better:

expect(sentEmails).toEqual([
  { to: "a@x.com", subject: "Hi" }
]);

If you can delete core logic and tests still pass, they’re useless.

Agent extension: flag tests that only verify internal calls instead of real outcomes.

Over-mocking hides real failures

Mocks make everything look correct.

Until production explodes.

When you mock your data layer, you stop testing real behavior.

// production
async function getActiveUsers(repo) {
  return (await repo.findAll()).filter(u => u.isActive);
}

// test (over-mocked)
repo.findAll.mockResolvedValue([
  { isActive: true },
  { isActive: false }
]);

const result = await getActiveUsers(repo);

expect(result).toHaveLength(1);

If your real query is broken, this still passes.

You tested the mock—not the system.

Agent extension: flag tests that mock internal boundaries like repos or services when a simple in-memory version would test real behavior.

Weak assertions create ambiguity

Some tests only prove what didn’t happen.

That tells you almost nothing.

Bad:

expect(result).not.toBeNull();
expect(list).not.toHaveLength(0);

Better:

expect(result).toEqual(expectedUser);
expect(list).toHaveLength(3);

Ambiguous tests pass too easily.

Precise tests fail when they should.

Agent extension: reject weak assertions and require explicit expected values.

Tests ignore edge cases

AI sticks to the happy path.

That’s where bugs don’t live.

Edge cases are where things break.

// production
function divide(a, b) {
  return a / b;
}

// test (happy path only)
expect(divide(10, 2)).toBe(5);

That covers the easy case.

It says nothing about zero, null, or bad input.

Agent extension: require at least one edge case per test (null, empty, zero, duplicate, or invalid input).

Tests never fail

A test suite that never catches a bug is worse than no suite at all.

It gives you fake confidence.

Break the code on purpose.

// production (bug introduced)
function isEven(n) {
  return n % 2 !== 0;
}

// test
expect(isEven(2)).toBe(true);

If this still passes, your test isn’t protecting you.

A good test should fail immediately.

Agent extension: require a “what would break this?” check for every test.

Bonus: recently I’ve been telling the agent try to poke some holes in the implementation in PR #1057. You’ll be surprised!

The Agent

---
name: "test-reviewer"
description: "Use this agent when reviewing or writing tests. Ensures tests verify real behavior, avoid false confidence, and fail when production code is broken."
---

# Test Reviewer Agent

You are a specialist in reviewing tests. Your goal is to prevent false confidence by ensuring tests verify real behavior and fail when they should.

---

## Core Principle: Tests Must Fail When Behavior Breaks

A test that passes when production code is broken is worse than no test.

---

## Rule 1: Assert Observable Behavior

Flag tests that only verify internal calls.

Prefer:
- Returned values
- State changes
- External side effects

---

## Rule 2: Don’t Mock Internal Boundaries

Flag:
- Mocked repositories
- Mocked services

Allow:
- External systems (APIs, payments, email)

Suggest in-memory replacements where possible.

---

## Rule 3: Prefer Strong Assertions

Flag:
\`\`\`js
expect(result).not.toBeNull();
expect(list).not.toHaveLength(0);
\`\`\`

Prefer:
\`\`\`js
expect(result).toEqual(expected);
expect(list).toHaveLength(3);
\`\`\`

---

## Rule 4: Require Edge Cases

Each test should include at least one:
- Empty input
- Null / undefined
- Zero
- Duplicate data
- Invalid state

---

## Rule 5: Mutation Check

Ask:

"What change to the production code should make this fail?"

If unclear, the test is weak.

---

## Review Process

1. Identify behavior under test  
2. Check for mocked internals  
3. Check for missing edge cases  
4. Check assertion strength  
5. Verify the test would fail if logic breaks  

---

## Output Format

- Issue  
- Why it matters  
- Suggested fix

👉 Action Step

Add one of these rules to your test agent and run it on your next PR.

Stuck? Reply to this post or hit reply to this email!

📰 Weekly shoutout

📣 Share

There’s no easier way to help this newsletter grow than by sharing it with the world. If you liked it, found something helpful, or you know someone who knows someone to whom this could be helpful, share it:

🏆 Subscribe

Actually, there’s one easier thing you can do to grow and help grow: subscribe to this newsletter. I’ll keep putting in the work and distilling what I learn/learned as a software engineer/consultant. Simply sign up here:

Bitsy

Discussion about this post

Ready for more?