Module 8 · End-of-Module Check

Ten questions. Pick the shape, write the contract, diagnose the handoff, audit for runaway.

10 questions Passing bar: 11.5 / 15, with full credit on one applied question

This is the integrative assessment for Module 8. It confirms you can pick an orchestration shape that fits the goal, write handoff contracts that hold up under a running pipeline, diagnose a failing pipeline to the weakest handoff (not the weakest agent), budget and audit a pipeline for runaway, and own a frozen pipeline with a practiced kill switch — not merely recall that those steps exist. Multiple choice and short answer are closed-book. Applied items are open-workstation: keep your frozen /capstone/pipeline-v1/blueprint.md, runaway-audit.md, pre-flight-cost-estimate.md, the traces folder, and the before/after block open.

How to take this check

  • Do it in one sitting.
  • For the multiple-choice and short-answer sections, close every AI tool and tab. This checks your internalized model, not the model’s.
  • For each multiple-choice item, pick an answer before you reveal the explanation. Guessing and then reading the answer is not the same as knowing.
  • For short-answer items, write your response on paper or in a text file before you reveal the rubric and model answer. Compare honestly.
  • For the applied section, open your frozen /capstone/pipeline-v1/ folder: blueprint.md, runaway-audit.md, pre-flight-cost-estimate.md, the traces folder, and the before/after block. Q9 and Q10 are open-workstation on purpose.
  • If you miss a question, the feedback names the lesson(s) to revisit.

Multiple choice CORE

Six questions. Concept recall and diagnosis. One point each.

Q1. A student wants to build a pipeline that, given a research topic, decides how many sub-topics the topic should be decomposed into (based on what the topic turns out to cover), spawns a worker for each, and then combines their output into a single brief. Which orchestration shape fits?
  • A Sequential pipeline.
  • B Parallel workers with a fixed fan-out.
  • C Hierarchical supervisor — the decomposition itself requires judgment at runtime.
  • D Do not orchestrate; use a single well-tuned research skill.
Show explanation

Answer: C. Review Lesson 8.1 Content Block 3 if missed. The critical phrase is “decides how many sub-topics … based on what the topic turns out to cover” — decomposition is runtime-variable, which is the defining property of a hierarchical supervisor. (A) and (B) both require a fixed decomposition. (D) is appropriate for narrow topics, but as stated the task explicitly needs runtime decomposition.

Q2. A handoff contract has three of the four required fields filled in — input shape, output shape, success criteria — but no failure mode. What is the specific risk of shipping the pipeline in this state?
  • A The pipeline will refuse to run.
  • B Retry-runaway: without a bounded failure mode, a flaky stage can retry silently and multiply token usage without anyone noticing until the bill arrives.
  • C The receiving agent will not know how to consume the input.
  • D The pipeline will be slower but otherwise fine.
Show explanation

Answer: B. Review Lesson 8.2 Content Block 2 if missed. A missing failure-mode field is the specific opening through which retry runaway enters — “retry until successful” is the de-facto behavior when the field is empty. (A) is wrong; nothing technical prevents a three-field contract from running. (C) names an input-shape problem, which is field 1. (D) understates the risk — retry runaway can be a dollar-sized problem, not a latency one.

Q3. Your sequential pipeline has four agents: researcher, drafter, reviewer, packager. How many handoff contracts does the blueprint need?
  • A One.
  • B Two.
  • C Three.
  • D Four.
Show explanation

Answer: C. Review Lesson 8.2 Content Block 3 if missed. A sequential pipeline with N agents has N–1 handoffs: researcher → drafter, drafter → reviewer, reviewer → packager. Each needs its own four-field contract. (A), (B), and (D) miscount. (D) in particular conflates agents with handoffs — a common first-draft mistake.

Q4. Which of the following is not one of Module 8’s three runaway rails?
  • A Pipeline depth ≤ 3 layers.
  • B Fan-out ≤ 4 children per parent.
  • C Bounded retries on every handoff contract’s failure mode.
  • D Per-agent context window capped at 8,000 tokens.
Show explanation

Answer: D. Review Lesson 8.5 Content Block 1 if missed. The three rails are depth, fan-out, and bounded retries — (A), (B), and (C). Context-window caps are a cost-estimation consideration, not one of the runaway rails. A pipeline can respect the three rails and still have a large per-agent context; the pre-flight cost estimate is the tool that catches that case separately.

Q5. Two weeks after freezing your pipeline, you add a citation-checking agent as a fourth stage. Which of the following must you re-run before the modified pipeline is allowed to invoke?
  • A Nothing — the freeze has already been audited.
  • B The pre-flight cost estimate, to verify the new agent does not push the pipeline outside budget; and the Runaway audit, because a new agent changes depth, contract count, and possibly fan-out.
  • C Only the Runaway audit.
  • D Only the pre-flight cost estimate.
Show explanation

Answer: B. Review Lesson 8.5 Content Blocks 2 and 4 if missed. A structural change to the pipeline re-opens both the cost estimate (new tokens) and the Runaway audit (new agent, new handoff contract, possibly new shape). (A) is the error the module is designed to prevent — a frozen pipeline is frozen against unaudited drift, not against intentional change. (C) and (D) each pass half the test.

Q6. A pipeline runs away — the parent session’s token count is climbing past the estimate, visibly, while you watch. You have documented a kill switch on the blueprint. The correct action is:
  • A Wait to see what output is produced; you may still get something useful.
  • B Execute the documented kill switch immediately, then diagnose from the trace.
  • C Increase the budget to match the actual burn rate.
  • D Delete the pipeline and start over.
Show explanation

Answer: B. Review Lesson 8.5 Content Block 3 if missed. A kill switch exists to be used in this exact situation; hesitating to “see what happens” is the fumble the practiced-kill-switch discipline exists to prevent. (A) accepts runaway cost for a chance at useful output — a bad trade every time. (C) pays for the runaway after the fact. (D) is an overreaction; the pipeline may well be fixable with a tightened contract or a cut retry loop.


Short answer CORE

Two questions, 3–4 sentences each. Up to 2.5 points each. Write your response before revealing the rubric.

Q7. In your own words, explain why handoff-is-a-contract is the module’s headline technical insight. Include what the agents on either side of a handoff are actually doing when they read each other’s output, and why a vague handoff silently corrupts downstream work rather than loudly failing.

Rubric (5 sub-points, up to 2.5 points total):

  • (0.5) Identifies that a pipeline’s quality is set by the interfaces between agents, not by any single agent.
  • (0.5) Explains that the downstream agent reads the upstream output and has to interpret it — the quality of the interpretation depends entirely on whether the contract told both agents what the output was supposed to be.
  • (0.5) Names the specific failure mode of vague handoffs: the downstream agent produces plausible-looking output from garbled input, so no error fires, and the corruption is silent.
  • (0.5) Contrasts against an obvious-failure case (the downstream agent outright refuses to run) and notes that silent corruption is the harder problem because nothing alerts the student.
  • (0.5) Correctly identifies the implication: the contract deserves at least as much tuning attention as the agent’s system prompt.

A passing short-answer (3–4 sentences) hits at least four of the five bullets.

Show model answer

Model answer. “A pipeline is only as good as its handoffs. The upstream agent produces a file; the downstream agent reads it and has to decide what the file means. If the contract between them is vague, the downstream agent will invent an interpretation — usually a plausible one — and keep going. No error fires. The final output looks fine until you read it carefully and realize the citations do not match the claims, or the draft skipped sections the reviewer was supposed to check. That silent-corruption pattern is the expensive one, because there is nothing obvious to debug. So the contract deserves at least as much tuning attention as any single agent’s system prompt — probably more.”

Remediation: re-read Lesson 8.2 Content Blocks 1–2. Re-run the Handoff Contract Drill.

Q8. You run a three-agent sequential pipeline (research → draft → review) and the final review’s output is weak — it misses several claims the drafter should have caught, and its citation list disagrees with the sources. The drafter and the reviewer each look fine in isolation. Where is the bug most likely to be, and what is the first move you make to diagnose it?

Rubric (5 sub-points, up to 2.5 points total):

  • (0.5) Correctly identifies the bug as most likely in the research → draft handoff contract, the draft → review handoff contract, or both — not in either agent individually.
  • (0.5) Names the diagnostic move: read the trace from the top, stage by stage, looking for the first intermediate file whose shape does not match the shape the next stage expects.
  • (0.5) Notes that the symptom (“weak review”) is downstream of where the problem actually lives — the weakness shows up late because it had room to compound.
  • (0.5) Names one specific contract-level check that might catch this: “does the draft → review contract require a specific citation format that forces the drafter to cite every claim?” or equivalent.
  • (0.5) States the fix: tighten the upstream contract (and, if needed, the verification step that enforces it) rather than rewriting the agent’s system prompt first.
Show model answer

Model answer. “The bug is almost certainly in one of the two handoff contracts, not in either agent. Because each agent looks fine in isolation, the weakness is at the interface — most likely the research → draft contract was not specific about the shape of the sources file (one bullet per claim with a URL, or just a list of links), which let the drafter make plausible-looking but unsupported claims, which the reviewer could not re-ground because the sources file did not match claim to source. The first diagnostic move is to read the trace from the top: look at sources.md, then look at draft.md, and find the first place where the shape of the upstream file did not cleanly match what the downstream agent needed. The fix is to tighten the upstream contract and the verification step — e.g., require every drafter claim to cite a bullet from sources.md by anchor — rather than rewriting the reviewer’s system prompt first.”

Remediation: re-read Lesson 8.2 Content Blocks 3 and 5, and the tightening-loop section of whichever lesson (8.3 or 8.4) is your frozen path.


Applied CORE

Two questions, half a page each, up to 2.5 points each. Open-workstation: keep your frozen /capstone/pipeline-v1/blueprint.md, runaway-audit.md, pre-flight-cost-estimate.md, traces folder, and before/after block open. Full credit requires the analysis be grounded in your own artifacts, not a generic response.

Q9 — Contract-tightening diagnosis (applied). Open your /capstone/pipeline-v1/traces/<path>/before-after.md from Lesson 8.3 or 8.4 (whichever is your frozen path) and the relevant handoff contracts in blueprint.md. In half a page: (a) quote the weakest handoff contract’s before state (all four fields) — the version that was in place during invocation 1; (b) quote the after state of the same contract; (c) name which of the four fields carried the change that mattered, and cite the specific observable difference between invocation 1 and invocation 2 that demonstrated the change worked; (d) predict one failure mode the tightened contract still does not prevent.

Scoring rubric (5 sub-points, up to 2.5 points total):

  • (0.5) Before contract quoted verbatim (not paraphrased).
  • (0.5) After contract quoted verbatim.
  • (0.5) Correctly identifies which field carried the load (one of: input shape, output shape, success criteria, failure mode) and names it cleanly.
  • (0.5) Cites a specific, observable difference between invocation 1 and invocation 2 — e.g., “invocation 1’s draft had three uncited claims; invocation 2 had zero” — not a vibe.
  • (0.5) Names an honest remaining failure mode — demonstrates the student understands the contract is not bulletproof and has a sense of what the next tightening round would address.

Full credit requires the analysis be grounded in the student’s own traces, not a generic response.

Show model answer

Model answer (illustrative — your specifics must come from your own before-after).

  1. Before contract (research → draft, invocation 1):
    • Input shape: “research-sweeper produces a markdown file at 01-research/sources.md.”
    • Output shape: “brief-drafter produces a markdown file at 02-draft/draft.md.”
    • Success criteria: “Draft covers the topic.”
    • Failure mode: “Retry if draft is missing.”
  2. After contract (invocation 2):
    • Input shape: sources.md contains a ## Summary section (100–300 words) and a ## Key sources section (≥ 3 bullets, each a URL + 1 sentence quote).”
    • Output shape: draft.md is 400–800 words, has a ## Claims section with exactly one claim per bullet from sources.md, each bulleted with an anchor link back to the sources.md bullet it cites.”
    • Success criteria: “Every claim bullet cites a source; zero uncited claims; word count inside band.”
    • Failure mode: “Zero retries. If any claim is uncited or word count is outside band, parent writes 00-status/draft-failed.md and stops.”
  3. Load-bearing field: output shape. The invocation-1 draft had 3 uncited claims; invocation-2 had zero, because the output shape now names the exact citation format the drafter must produce. Success criteria and failure mode had to change along with it, but output shape is what the drafter actually reads.
  4. Remaining failure mode: the contract still does not prevent the drafter from citing a source that exists in sources.md but does not actually support the claim. Catching that would need either a reviewer stage whose contract requires re-reading the cited bullet, or a stricter success criterion that requires each claim to include the quoted sentence. Next tightening round.

Remediation: a miss here sends the student back to Lesson 8.2 and to their frozen path’s build lesson (8.3 or 8.4). Run a third tightening round if the before/after is not sharp enough to write about.

Q10 — Runaway-audit applied (applied). Open your /capstone/pipeline-v1/runaway-audit.md and pre-flight-cost-estimate.md. Pick the audit check whose pass was hardest to achieve — the one where you had to revise the pipeline before the check passed, or the one that surfaced something you had not expected. In half a page: (a) name which of the nine checks it was and what initial state flagged it as failing; (b) describe the revision you made — to a handoff contract, an agent prompt, the tool allowlist, the budget, the kill-switch documentation, or the shape itself — and cite specific files or sections you changed; (c) explain why the fix was the right fix, not a different legitimate fix you considered; (d) predict what you would expect the next review (at day 90) to find for this check if your fix holds — and what you would expect to find if it does not.

Scoring rubric (5 sub-points, up to 2.5 points total):

  • (0.5) Specific check named correctly — one of the nine, not a summary.
  • (0.5) Initial failing state described concretely — not “it failed,” but “the fan-out step in the drafter spawned five workers when the cap is four” or equivalent.
  • (0.5) Fix is traced to specific files / sections, not described generically.
  • (0.5) Justification tips on a concrete reason — the student names a factor, not a vibe. Full credit if the student also names an alternative fix they considered and rejected with a reason.
  • (0.5) 90-day prediction is specific and falsifiable — e.g., “if the fix holds, the audit at day 90 should still show Pass on this check and no new summary-skipped.md files in 00-status/; if it does not hold, I would expect to see recurring skipped-status files driven by the same root cause.”

A passing Q10 requires that the student treat the audit as a thinking tool for an ongoing pipeline, not a one-time checklist.

Show model answer

Model answer (illustrative — your specifics must come from your own audit).

  1. Check: Check 4 — No handoff contract has an unbounded retry. Initial state: the research → draft contract’s failure mode read “retry if draft is missing” — no cap, no alternative. Check flagged as failing.
  2. Revision: edited /capstone/pipeline-v1/blueprint.md § Handoffs — research → draft — failure mode to read: “Zero retries. On first failure, write 00-status/draft-failed.md and stop.” Also edited the parent orchestrator prompt in /capstone/pipeline-v1/subagents/<parent>/orchestrator-prompt.md step 4 to call out the stop-on-failure explicitly.
  3. Why this fix: the drafter’s failures were all drafts missing entirely due to an upstream sources.md mis-shape — a retry would not fix the input, so zero retries is correct. I considered “one retry with a tighter prompt” (the course default) but rejected it because the failure was an upstream shape error, not a drafter inconsistency — a retry would have burned tokens with the same garbled input. Bounded retries live in the audit; the right bound depends on what kind of failure the stage is vulnerable to.
  4. 90-day prediction: if the fix holds, the day-90 audit should still show Pass on Check 4 and no new draft-failed.md files in 00-status/ (the upstream fix on the research contract should mean the drafter never sees a mis-shaped input). If instead I see recurring draft-failed.md files, the upstream contract is drifting and I need to tighten the research stage’s output shape — not re-enable retries.

Remediation: a miss here sends the student back to Lesson 8.5. Re-run the full Runaway audit from scratch and produce a written log of each check with pass / fail notes, including which revisions were needed.


Parent / instructor scoring summary

Total: 15 points across 10 questions.

  • Multiple choice (Q1–Q6): 1 point each — 6 points.
  • Short answer (Q7–Q8): up to 2.5 points each — 5 points.
  • Applied (Q9–Q10): up to 2.5 points each — 5 points.

Passing bar: 11.5 of 15 or better, with at least one applied question at full credit. A miss on the applied section sends the student back to Lesson 8.2 (if the missed applied is Q9, contract-diagnosis) or Lesson 8.5 (if the missed applied is Q10, Runaway-audit) before Module 9.

Weighting suggestions for parents issuing credit:

  • Multiple choice (Q1–6): 40% of Module 8 score.
  • Short answer (Q7–8): 20%.
  • Applied (Q9–10): 40%. Q9 and Q10 are the load-bearing items — they demonstrate applied judgment on the student’s own pipeline, not recall.

Evidence to file in the student’s credit portfolio for Module 8. This check alone is not sufficient evidence of Module 8 completion. The full Module 8 portfolio is:

  1. This completed check (all ten answers written out). File in /ops/credit-docs/module-08/.
  2. /capstone/pipeline-v1/blueprint.md — all sections filled, including the path-picking decision, budget, kill switch, and next-review date.
  3. /capstone/pipeline-v1/runaway-audit.md — nine checks, all Pass.
  4. /capstone/pipeline-v1/pre-flight-cost-estimate.md — agent-by-agent table, pipeline total inside budget.
  5. /capstone/pipeline-v1/subagents/ or scheduled-tasks/ — depending on which path is frozen.
  6. /capstone/pipeline-v1/traces/ — two invocations captured, each with the path’s canonical trace format, plus a before-after.md.
  7. /capstone/pipeline-v1-draft/<other-path>-sibling/ — the path not picked, preserved with a short notes.md.
  8. The Module 8 retrospective in my-first-loop.md — one paragraph summary, what was surprising, what will break on the next-review date if nothing changes, one honest sentence on whether the student will actually use the pipeline.

Transcript language. If a parent is assembling a transcript, the transcript line for Module 8 can accurately say: “Designed, composed, and audited multi-agent AI pipelines — picked the orchestration shape from sequential, parallel, and hierarchical patterns; authored four-field handoff contracts and tightened them across two real invocations; built the same pipeline as Claude Code subagents and as chained Cowork scheduled tasks, picked the frozen path by the goal’s shape, and passed a nine-check runaway audit with a practiced kill switch and a pre-flight cost estimate inside budget.”

Remediation if missed:

  • Q1: Re-read Lesson 8.1 Content Block 3. The three orchestration shapes map cleanly to three kinds of decomposition; if the mapping is not crisp, the rest of the module loses leverage.
  • Q2: Re-read Lesson 8.2 Content Block 2. The four-field handoff contract exists to close the four specific openings through which runaway enters.
  • Q3: Re-read Lesson 8.2 Content Block 3 (contract counting by pattern). N–1 for sequential, N for parallel (one per worker), depth-dependent for hierarchical.
  • Q4: Re-read Lesson 8.5 Content Block 1. The three rails and why each cap is set where it is.
  • Q5: Re-read Lesson 8.5 Content Blocks 2 and 4. Any pipeline change that affects tokens or shape re-opens both the pre-flight estimate and the Runaway audit.
  • Q6: Re-read Lesson 8.5 Content Block 3. If your answer was not “execute the kill switch immediately,” practice the kill switch again so the motor habit is there when you need it.
  • Q7: Re-read Lesson 8.2 Content Blocks 1–2. Handoff-is-a-contract is the module’s headline technical insight; if it is not clear, nothing downstream works.
  • Q8: Re-read Lesson 8.2 Content Block 5 and whichever tightening-loop section (8.3 or 8.4) is your frozen path. The diagnosis pattern is always “read the trace top-down and find the first intermediate file whose shape surprised the next stage.”
  • Q9: The contract-tightening muscle is the load-bearing skill of Module 8. Return to your frozen path’s build lesson and run a third tightening round. Write a sharper before/after.
  • Q10: The Runaway audit is the load-bearing discipline of Module 8. Return to Lesson 8.5 and re-run the nine checks from scratch, producing a written pass / fail log alongside the blueprint.

If the student passes at 11.5 / 15 or above with at least one applied question at full credit, Module 8 is complete and Module 9 can begin. Below that bar, target remediation to the specific lesson(s) listed above before moving on.

Next up

Module 9 — Security, Privacy & Responsibility.

The frozen pipeline you shipped in Module 8 is the artifact Module 9 audits. Module 9 examines the security posture of every handoff (shared-state prompt injection), every skill the pipeline depends on (supply-chain risk), and every credential the pipeline handles. A clean Module 8 freeze makes Module 9 possible; a messy one makes Module 9 miserable.

Open Module 9 →