Lesson 3.3 — Reviewing What the Agent Did

Why this is the most important lesson in Module 3 CORE

You can get by without knowing the details of how the Code tab renders its panels or where the Claude Code CLI prints its status line. Tools change. What does not change — will not change for a long time — is the skill of reviewing the agent's work.

A change the agent makes is a claim. The claim is: “If you accept this, your program will do what you asked and nothing else.” Your job as the director is to check that claim. Not because agents are sloppy — frontier coding agents are good, often better than a human at writing the obvious-looking fix — but because “does what you asked” is not a property the agent can verify alone. Only you know what you asked for, and only you can confirm the result matches.

This lesson does three things. It names the four ways an agent's edit can go wrong. It gives you a written five-move review you can run on any change. And it puts that review on real changes — including one that looks right and is wrong, which is the category that catches everybody eventually.

Read this lesson slowly. The payoff is not in the next ten directed edits. It is in the next thousand.

Reviewing for a director, not a programmer CORE

You are not a programmer reading code. You are a director reviewing the agent's work. Those are different jobs and they call for different tools.

A programmer reviews a change by reading every line of code in the diff and judging whether each line is correct. That works if you can read the language fluently. This course does not require you to.

A director reviews a change by checking the shape of what the agent did, reading the agent's summary of the change, asking the agent to explain anything unfamiliar, running the result with realistic input, and iterating if anything is wrong. That works whether or not you can read the language — and it scales to every kind of agentic work, not just code.

One small bit of mechanics first: a diff is the set of changes between what a file used to contain and what it contains now. Lines removed appear with a - prefix (red); lines added appear with a + prefix (green); unchanged lines for context appear plain. You don't need to read the lines to use the diff — you just need to see the shape: which files, how big, what's added, what's removed, anything you didn't expect.

The five-move review is the rest of this lesson. Before that, the four things you're looking for.

The four failure modes CORE

Every agent change that goes wrong goes wrong in one of these four ways. Naming them gives you faster recognition when you hit one — and each one has a director's move that catches it without code-reading.

Misplaced change. The fix works here but won't work for the next similar case. Example: in Lesson 3.2 you'll direct the agent to fix a bug where certain filenames end up in the wrong folder — specifically, files with two dots in the name like notes.tar.gz. The agent could "patch" the bug by adding a one-off rule just for .tar.gz files, and the test would pass. But the next file with two dots in its name (a different kind of backup file, a different kind of archive) would have the exact same bug all over again, because the rule was glued onto one specific case rather than fixed at the root. The right fix lives in the part of the program that handles filenames in general, so it covers every similar case at once. How a director catches this: ask the agent — “Will this fix work for similar cases I haven't shown you, or only for the exact one I asked about?” A good agent will admit when its fix is narrow.

Scope creep. The agent does more than you asked. Example: you asked for a bug fix in tidy.py; the change also reformats three other files “for consistency.” The extras may be harmless, they may be helpful, and they may include a silent behavior change you did not ask for and will not notice until something breaks in two weeks. The rule is: unrequested changes are bugs until proven otherwise. How a director catches this: look at the shape of the change. Ask the agent — “Did you change anything I didn't ask for?” Ask it directly to list every file it touched and why.

Silent deletion. The agent removes something that was carrying real behavior, and the deletion isn't replaced. Example: a line that creates a folder before writing to it gets removed because “the system handles that automatically” — which isn't quite true, and the program now crashes the first time someone runs it on a fresh machine. The behavior that was there yesterday is gone, and the agent's summary may not mention it. How a director catches this: ask the agent — “What did you delete and why?” And run the result against realistic inputs that exercise behavior the change might have removed.

Plausible wrong. The change looks right, runs without errors, passes the tests — but does the wrong thing on a case the tests didn't cover. Example: the file-sorting script from Lesson 3.2 has tests for files like report.pdf, photo.jpg, and notes.txt. The agent's fix passes all of them. But the first time you run the script on your real Downloads folder, it crashes on a file with no extension at all — a screenshot saved with just a name, or a download the website didn't label. The tests pass because nobody wrote a test for "file with no extension." This is the hardest failure mode to catch, and it's the one that most rewards running the result against real inputs (your actual Downloads folder, your actual notes, your actual data) rather than just trusting that "tests pass" means "works."

Figure 3.3The four failure modes, arranged by where the edit lands (wrong place vs. wrong scope) and what the diff is doing (adds harm on the + side vs. removes signal on the − side). The two ember cells — misplaced change and plausible wrong — are the ones that sail past inattentive reviewers.

Naming these four failure modes gives you four questions you can ask of any diff — the four diff-review questions the rest of the course refers back to:

Right place? — does the change land where it should, in a way that will generalize? (Catches misplaced change.)
Right scope? — did the agent change only what you asked? (Catches scope creep.)
Legitimate deletion? — if anything was removed, did the agent name it and replace its behavior? (Catches silent deletion.)
Any surprise? — tests pass and the diff looks reasonable, but does the result match the goal on real input? (Catches plausible wrong.)

These four questions are the compressed form of the failure-mode taxonomy. The five-move review below is how you actually run them on a diff.

The five-move review CORE

Here is how a director runs the review. Five moves, in order, every time. The first time it will feel slow. By the tenth time it will compress into a single pass.

Move 1 — Check the shape. Before anything else, look at what files changed and how big the change is. Most agent tools show this at the top of the diff: “3 files changed, 14 lines added, 6 lines removed.” Ask yourself:

Did the agent change the files I expected, and only those files?
Is the size of the change roughly what I expected? (A one-line bug fix shouldn't be 80 lines.)
Are there any new files I didn't ask for, or files deleted that I didn't ask to remove?

If the answer to any of these is “no” or “I'm not sure,” that's the cue to slow down. Shape is the cheapest and fastest signal you have — most scope-creep failures are visible from the file list alone, before you read a single line of the change.

Move 2 — Read the agent's summary. Every modern coding agent gives you a plain-English summary of what it did. The Code tab does this inline next to the diff; the Claude Code CLI does it in the terminal. Read it. Then compare it, sentence by sentence, against the prompt you wrote.

Does the summary describe what you asked for?
Does it mention things you did not ask for? (Those are scope creep.)
Does it mention removing something? (That deserves a follow-up — see Move 3.)
If the summary is vague (“made the changes”, “fixed the issue”), ask the agent to be more specific: “List every change you made and why.”

The summary is the agent's claim about what it did. Move 1 checks the shape of the work; Move 2 checks the story of the work.

Move 3 — Ask the agent about anything unfamiliar. This is the move most non-coders skip and shouldn't. For any line, file, or concept in the change that you can't tell about, ask the agent in plain English:

“What does this part do?”
“Why did you make this change?”
“What happens if I run this on an empty input? On a really large input? On a file with weird characters?”
“Are there any cases where this fix won't work?”
“Did you remove anything that was doing real work?”

The agent already has full context on the change — it just made it. Asking is the fastest diagnostic you have. If the agent's answer doesn't satisfy you, that is itself information: the change isn't ready to accept yet. This is the move that scales — every kind of work the rest of this course teaches uses the same pattern. When uncertain, ask the agent.

Move 4 — Run the result. Tests passing is not the same as “the change works.” A test exercises one case the test author thought to write. Real input is the hundred cases nobody thought of.

For coding work specifically: actually run the script on a realistic input and confirm the output matches what you wanted. Not what the test wanted — what you wanted. If the change is supposed to fix a bug, reproduce the bug yourself first (so you know what “broken” looks like), then run the change and confirm it's fixed.

For changes that aren't immediately runnable, the equivalent move is to walk through the agent's output yourself and check it against your goal. “Does this brief actually answer the question I asked?” “Does this email actually say what I want it to say?” The principle: don't trust automated success signals as proof of behavioral success.

Move 5 — Iterate. If anything in moves 1–4 didn't satisfy you, tell the agent what's still wrong and ask for the next pass. Do not start trying to fix it yourself. The agent is the one who wrote the change; the agent is the one who fixes it. Phrase the iteration specifically:

“This change touches config.py, which I didn't ask about. Please give me a version that only touches tidy.py.”
“Your change passes the test, but when I run it on a real Downloads folder, files starting with .env end up in the wrong place. Please fix that case and add a test for it.”
“You removed the line that creates the output folder. I think we still need that. Please put it back, or explain why it's safe to remove.”

The iteration is the back-and-forth that the rest of the course assumes you can do. After a few passes, most changes are clean. After enough real edits, the iteration becomes natural — but the only way to get there is to keep doing it deliberately at the start.

A printable director's review checklist lives at /resources/module-03/diff-review-checklist/. Print it. Use it on every real change you direct in this module. With practice, the five moves compress into a single mental pass — but don't rush that.

Answering the three common objections CORE

When students first hear the five-move review, three objections show up. Worth answering directly because the rule is easier to keep when you understand why it holds.

“This is too slow. I'll never get anything done.” It is slower than auto-accepting, yes. It is not slower than recovering from the bugs auto-accepting introduces. In practice the review adds a small amount of time per change; the bugs it catches can take much longer to find and recover from later. The math is not close.

“The agent is smart enough that I can skip this on small changes.” The catch-rate on small changes is lower than on large ones, because reviewers pay less attention. “Small and obvious” is precisely where silent deletions and plausible-wrongs slip in. The honest rule is the simple one: run the five moves, every time. The exception — “unless it's a throwaway test on a folder I'm about to delete” — is narrow enough to not be worth carving out.

“If I have to review every change anyway, why am I using an agent?” Because reviewing a change is faster than making it. The agent is still doing the work — locating, planning, writing, running tests. You are still getting the time back. The review is the last, cheapest step of a pipeline whose expensive parts the agent absorbed. You're not the typist; you're the director.

The underlying principle is the one the rest of your career rides on: an AI agent can do most of the work much faster than you can, but it cannot take responsibility for the result. The director's job is to take the responsibility the tool cannot. The five-move review is the mechanical form of that responsibility.

Looking at the change in VS Code RECIPE

Tool	VS Code
Last verified	2026-04-17
Next review	2026-07-17

VS Code's built-in diff view is a clean way to see shape — what files changed, how big the change is, where things were added or removed — even if you don't read the code line-by-line. This takes 20 seconds.

For the current uncommitted change (pre-accept review):

Open the Source Control panel: Ctrl+Shift+G (Windows/Linux) or Cmd+Shift+G (macOS), or click the branch icon in the left sidebar.
Under “Changes,” you'll see the file list — that's your shape check (Move 1). Click any file to see its before/after side-by-side.
The colored bands (red for removed, green for added) tell you at a glance how big the change is. You're not reading the code; you're looking at the footprint of the change.

For a diff between two branches or commits:

Open the Command Palette: Ctrl+Shift+P / Cmd+Shift+P.
Type “Git: Compare” and select the comparison you want.

Safe default — when in doubt, ask the agent

If you see something in the diff you can't tell about, paste a copy of that section back into the Code tab (or the Claude Code CLI session if that's where you're working) and ask: “What does this part do, and why is it here?” The agent has full context on the change it just made; asking is faster than guessing.

Try it — Review three real changes CORE

open the interactive activity →

The interactive activity walks you through three pre-staged scenarios. For each one, you see the goal, the agent's plain-English summary of what it did, the shape of the diff (file count, line count, files touched), and the test output. You can click follow-up questions to see what the agent says when asked. Then you decide accept / ask for a revision / reject and name the failure mode if any.

Read the goal stated above each scenario.
Run the five-move review on each one — shape, summary, ask the agent, run the result, iterate.
Decide: accept as-is, ask for a revision, or reject. If “revise,” write the one-sentence revision prompt.
Name the failure mode, if any: misplaced, scope creep, silent deletion, or plausible wrong.

Summary of the three scenarios (no spoilers — try the activity first):

Scenario A — “Fix off-by-one in range_sum.” A small change to the loop bounds. Tests pass.
Scenario B — “Add a --dry-run flag to tidy.py.” A larger change adding the flag, wiring it into the main function, and touching two unrelated config files.
Scenario C — “Speed up the file-extension lookup.” A modest-sized change replacing an existing if/else chain with a lookup table. Tests pass.

After you have made your call on each, read the activity's explanation. One of the changes is correct and cleanly scoped; one has scope creep (Scenario B touches config files no one asked about); one is a plausible-wrong (Scenario C's lookup table is missing two cases the if/else chain had — the tests do not cover those cases, so the tests pass, and the bug is invisible until someone sorts a .heic file).

Deliverable. The worksheet's printable one-pager, filled in. Keep it; Lesson 3.4 revisits “plausible wrong” in detail.

Done with the hands-on?

When the recipe steps and any activity above are complete, mark this stage to unlock the assessment, reflection, and project checkpoint.

Key concepts

An agent's change is a claim that accepting it will do what you asked and nothing else. Your job is to check the claim.
Four failure modes: misplaced change, scope creep, silent deletion, plausible wrong. Naming them gives you faster recognition when you hit one.
The five-move review: check the shape, read the summary, ask the agent, run the result, iterate. This works for every kind of agent change, not just code.
You don't need to read code. You need to know what you asked for, recognize the shape of the agent's reply, ask the agent to explain anything unfamiliar, and verify by running the result.
The director's review checklist is a physical artifact you use until the five moves are automatic. Print it.

Quick check

Four questions. Q4 is deliberately harder than the others.

Q1. Match each failure mode to its cue. Cues: (i) the diff touches files the prompt did not mention; (ii) code disappears and is not replaced; (iii) a fix works here but will not generalize to the next similar case; (iv) the tests pass but the goal is not fully met. Modes: (a) misplaced, (b) scope creep, (c) silent deletion, (d) plausible wrong.

A i→a, ii→b, iii→c, iv→d
B i→b, ii→c, iii→a, iv→d
C i→b, ii→a, iii→c, iv→d
D i→d, ii→c, iii→a, iv→b

Show explanation

Answer: B. (i) touching unexpected files is scope creep. (ii) disappearing code is silent deletion. (iii) a fix that will not generalize is misplaced — the right level of abstraction was skipped. (iv) passes tests and misses the goal is plausible wrong. Memorizing this mapping is the point of the question; you will see these cues in the wild.

Q2. You direct the agent to fix a bug. The agent's summary says exactly what you asked for. The tests pass. The diff touches one file. What's the one move you still have to do before you accept the change?

A Nothing; all three signals agree.
B Read the agent's summary one more time.
C Run the result yourself and confirm it does what you asked for, on a realistic input.
D Ask the agent to write more tests.

Show explanation

Answer: C. The summary, the shape, and the test output are all agent-side signals. The one signal that comes from you is whether running the result actually produces the behavior you wanted. That's Move 4 of the five-move review, and it's the one most often skipped because the other three look reassuring. A is the failure mode the lesson is built to prevent. B doesn't add information. D might be a good idea but isn't a substitute for running it yourself.

Q3. A student argues: “Scope creep is a feature, not a bug. The agent improves things I did not know needed improving.” How would Module 3 respond?

A Agree; this is a legitimate style of use.
B Partially agree — scope creep is fine if the student reviews the extra changes as carefully as the requested ones.
C Disagree — unrequested changes are bugs until proven otherwise, because you cannot pre-commit to reviewing changes you did not know were coming; the right move is to ask for minimally-scoped diffs.
D Disagree — agents should never make changes outside what was asked.

Show explanation

Answer: C. Option B is the tempting middle ground and it is not quite right, because the reviewer who did not expect an extra change is also the reviewer most likely to skim it. The course’s position is pragmatic, not maximal: ask for minimally-scoped diffs by default so the review is bounded; if the agent suggests a broader improvement, accept it as a separate second prompt where you are reviewing with the expectation of breadth. D overstates; agents proposing auxiliary improvements separately is healthy.

Q4. (Harder.) You review a change. The goal was “fix the crash when the input file is empty.” The agent's summary says: “I added a check for an empty file and return an empty result. I also added handling for malformed files.” The tests cover the empty-file case but not the malformed-file case. The diff touches one file. What's the right call?

A Accept — the goal was met.
B Accept — the malformed-file handling is out of scope of the prompt, but tests pass.
C Reject — the agent introduced extra behavior.
D Revise — the goal was met, but the malformed-file handling is scope creep that needs to be either justified or split out; ask the agent to either remove it or move it to a separate change with its own test.

Show explanation

Answer: D. This is exactly the scope-creep failure mode, and the agent told you about it in its summary (Move 2). The goal was met, so outright rejection C is too heavy. But the malformed-file handling was not requested and was not tested. The revise prompt is short and specific: “Please give me a version that only handles the empty-file case. If you think malformed-file handling is also worth doing, propose it as a separate change with its own test.” A lets unrequested behavior land. B is the “out of scope, so fine” mistake — out of scope is exactly why you noticed it.

Reflection prompt

The move you most often skip.

In 6–8 sentences: Pick the hardest of the three Diff Review Trainer scenarios. When you first read it, did you spot the problem — or did you read the explanation and think “of course, I should have seen that”? What specifically kept you from catching it on your own? Which of the five moves — check the shape, read the summary, ask the agent, run the result, iterate — is the move you are most likely to skip, and what is one concrete thing you'll do differently on Lesson 3.4's harder edits to make sure you don't skip it?

Project checkpoint

Capstone log: Entry 2 — the deliberately loose prompt.

Time for Entry 2 of your capstone directed-edit log. The prompt:

Pick one of the three starter-repo changes you have not yet made — --dry-run, --verbose, or “skip symlinks.” This time, make the edit deliberately harder to review. Use a prompt that is intentionally loose — something like “make the script a bit more robust to weird files” — and let the agent interpret. Then run the five-move review on the resulting change.

The point of this entry is not to produce a good change. It is to produce a change that exercises your review skill. Some of what the agent proposes will be scope creep or misplaced. Catching that is the entry. Log:

The (deliberately loose) prompt you wrote.
The shape of the change the agent produced (file count, line count, anything unexpected).
The agent's summary of what it did.
One question you asked the agent during review and what it told you.
Whether you ran the result yourself and what happened.
Your decision: accept, revise, or reject.
One sentence: what you would ask for differently next time.

Save to /capstone/directed-edit-log-draft.md as Entry 2.

Instructor / parent note

This is the lesson where reviewing the agent's work goes from a rule to a habit. Three signals a student has internalized it: (1) they name the failure mode in plain English when they spot one, not just “something feels off”; (2) they run all five moves out loud the first few times, and they don't skip Move 3 (asking the agent) or Move 4 (running the result) under time pressure; and (3) they ask for minimally-scoped changes proactively, before the agent produces a sprawl. If a student is still saying “it looks fine” without walking through the five moves out loud, ask them to do one of the activity's scenarios with you verbally, move by move. It takes five minutes and it tends to click. The habit is the product of this module.

A common parent-side worry: “My student doesn't read code. How can they review a change?” That worry is the reason this lesson exists in the shape it does. Reviewing an agent's work is not code-reading. It's checking what the agent claimed it did against what you asked for, asking the agent to explain anything unfamiliar, and running the result to confirm the behavior. Your student doesn't need to be a programmer to do any of those things — they need to be a director, which is what the course is teaching.

Next in Module 3

Lesson 3.4 — When AI coding works brilliantly and when it fails.

The balanced view: four strong zones where coding agents genuinely outperform you, and eight tripwires that tell you to slow down or step out. Plus the zones map you will keep on your wall.

Continue to Lesson 3.4 →

Reviewing what the agent did.

Read & Understand

Why this is the most important lesson in Module 3 CORE

Reviewing for a director, not a programmer CORE

The four failure modes CORE

The five-move review CORE

Answering the three common objections CORE

Try & Build

Looking at the change in VS Code RECIPE

Try it — Review three real changes CORE

Done with the hands-on?

Check & Reflect

Quick check

Reflection prompt

The move you most often skip.

Project checkpoint

Capstone log: Entry 2 — the deliberately loose prompt.

Lesson 3.4 — When AI coding works brilliantly and when it fails.