Module 7 · End-of-Module Check

Ten questions. Design, evaluate, author, retire.

10 questions Passing bar: 11.5 / 15, with full credit on one applied question

This is the integrative assessment for Module 7. It confirms you can design a skill that fires reliably (description-as-classifier), evaluate a third-party extension with the minimum viable audit, author and package your own extensions with honest permission surfaces, retire extensions cleanly when they stop earning their keep, and read your own register critically enough to catch pile-up and drift before they become expensive — not just recall that those steps exist. The multiple-choice and short-answer sections are closed-book. The applied section is open-workstation: keep your frozen extension-register-v1.md, the hygiene ritual worksheet, your custom skill’s SKILL.md and CHANGELOG.md, and your custom plugin’s SECURITY.md in front of you.

How to take this check

  • Do it in one sitting.
  • For the multiple-choice and short-answer sections, close every AI tool and tab. This checks your internalized model, not the model’s.
  • For each multiple-choice item, pick an answer before you reveal the explanation. Guessing and then reading the answer is not the same as knowing.
  • For short-answer items, write your response on paper or in a text file before you reveal the model answer. Compare honestly.
  • For the applied section, open your frozen /capstone/extension-register-v1.md, the hygiene ritual worksheet, your custom-skill-v1/SKILL.md and CHANGELOG.md, and your custom-plugin-v1/SECURITY.md. Q9 and Q10 are open-workstation on purpose.
  • If you miss a question, the feedback names the lesson(s) to revisit.

Multiple choice CORE

Six questions. Concept recall and diagnosis. One point each.

Q1. The six-place taxonomy from Lesson 7.1 places memory and skill at different points in the stack. The key distinction is:
  • A Memories persist across sessions and skills do not.
  • B Memories tune how the agent talks to you; skills give the agent a named capability it can reach for on its own when the description matches.
  • C Skills are written in prose and memories are written in YAML.
  • D Memories are stored in the cloud and skills are stored locally.
Show explanation

Answer: B. Review Lesson 7.1 Block 2 if missed. Both persist across sessions, so (A) is wrong. (C) and (D) are implementation fictions. The distinguishing property is what the artifact is for — memory tunes conversational style; a skill adds a discrete named capability.

Q2. A skill fires reliably on “summarize this research paper” but also fires on “can you read this article for me quickly?” when the student wanted only a one-paragraph reply. The description reads: “Use for research work.” The most likely failure mode is:
  • A The body is too long.
  • B The description is overfit (too narrow).
  • C The description is too broad“research work” catches adjacent, lighter-weight requests that should have stayed in-session. Add an explicit exclusion.
  • D The skill is missing a supporting script.
Show explanation

Answer: C. Review Lesson 7.3 Block 2 if missed. The skill is firing on cases the student did not intend because the description’s scope is wider than the procedure justifies. The fix is an exclusion block: “Do not use for single-article reads, casual summaries, or when the user has already named a single source.” (B) is the wrong diagnosis — overfit would under-fire. (A) and (D) are body-level issues, not trigger issues.

Q3. You are evaluating an off-the-shelf Cowork plugin. The manifest lists mail:send as a granted capability. The plugin has a configurable send_enabled: false setting. You are using the plugin for personal note-taking and never want a message to go out. Which row of the outbound decision matrix applies, and what is the install posture?
  • A Option A (clean audit), install.
  • B Option B (drafts-only mode), install with send_enabled: false, record the setting in the register, revisit in the hygiene ritual.
  • C Option E (conscious override), install with a 30-day review.
  • D Option D, decline.
Show explanation

Answer: B. Review Lesson 7.2 Block 4 if missed. The plugin is not clean-audit (send capability exists) and not decline-required (a drafts-only mode is available). Option B is the textbook case. Option E is reserved for extensions that cannot be narrowed and are being installed anyway with a specific, time-bound reason — not the situation here.

Q4. Your register has one authored skill row whose last invoked column shows 74 days ago, and next review 20 days from now. None of the other retirement signals apply. The ritual’s correct decision for this row is:
  • A Keep — the review is still upcoming.
  • B Retire — 74 days of disuse is past the 30–60 day retirement-signal threshold, and no seasonal tag is on the row. Retire, unless you can name a specific upcoming use.
  • C Refactor — the skill is fine, the description just needs tightening.
  • D Replace — find a better skill.
Show explanation

Answer: B. Review Lesson 7.5 Block 3 if missed. Use has stopped is the first retirement signal. 74 days is past the threshold. The next review date being 20 days out does not change the signal — the signal was already lit. The right move is retire (unless a seasonal tag explains the gap, which the prompt rules out). (A) misreads the review-date semantics; (C) and (D) are reflexive.

Q5. The security questionnaire adds two questions to the Lesson 7.2 audit. They are:
  • A Performance and accessibility.
  • B S6 (what installing the plugin removes from the user’s control — what defaults it sets that the user no longer decides per-call) and S7 (the plugin’s update posture — auto-update, notify on permission delta, manual pull).
  • C Cost and latency.
  • D Language coverage and internationalization.
Show explanation

Answer: B. Review Lesson 7.4 Block 3 if missed. S6 makes the convenience-vs-control tradeoff explicit; S7 makes update-time permission changes auditable. (A), (C), and (D) are non-answers — legitimate concerns, not part of the module’s questionnaire.

Q6. Which of the following is not one of the five retirement signals?
  • A Use has stopped.
  • B The underlying tool has changed (model deprecated, API shifted).
  • C The extension has an open issue.
  • D The permission surface has widened between versions.
Show explanation

Answer: C. Review Lesson 7.5 Block 3 if missed. The five signals are use-stopped, tool-changed, description-no-longer-matches, permission-widened, better-option-matured. An open issue is a maintenance signal but is not by itself a retirement trigger; retire when the issue has actual impact on the work you use the extension for.


Short answer CORE

Two questions, 3–4 sentences each. Up to 2.5 points each. Write your response before revealing the rubric.

Q7. In your own words, explain why description-as-classifier is the module’s headline technical insight. Include what the agent is actually doing when it decides whether to fire a skill, and why a vague description silently fails rather than loudly fails.

Rubric (5 sub-points, up to 2.5 points total):

  • (0.5) Names that the agent reads descriptions — not bodies — to decide which skill to fire.
  • (0.5) Explains that the description functions as the input to a classifier: “does this job belong to this skill?”
  • (0.5) Names the specific failure mode of vagueness — the skill never fires, rather than firing wrongly — so the student does not notice the problem (no visible error); they just do not get the lift they expected.
  • (0.5) Contrasts against loud failure modes like a body-level bug (which the student would see in the output).
  • (0.5) Correctly identifies the implication: the description deserves the same tuning attention as the body.

A passing short-answer (3–4 sentences) hits at least four of the five bullets.

Show model answer

Model answer. “When I ask the agent for something, it does not inspect every installed skill’s body to decide what to do. It reads the short description field and uses that as the input to a classifier: ‘does this request belong to this skill?’ If the description is vague, the classifier never matches — the skill silently does not fire, and the student just gets a generic chat response. That is worse than a loud bug, because there is no error to debug; the student concludes ‘skills are mysterious’ and stops building. The implication is that the description deserves as much careful tuning as the body. Both are prompts, for two different decisions.”

Remediation: re-read Lesson 7.3 Block 2 and re-run the description-tuning drill.

Q8. You author a custom Cowork plugin. Its bundled skill needs to read the user’s agent-access mail label and draft a daily summary. Three scopes are available in the plugin framework: mail:read_all, mail:read_label, and mail:read+draft. Explain which scope you grant and why, and name one concrete thing that would change your answer.

Rubric (5 sub-points, up to 2.5 points total):

  • (0.5) Grants mail:read_label (narrowest scope that does the job — read a specific label, nothing more).
  • (0.5) Explains least-privilege reasoning: the plugin does not need wider read access, so do not request it.
  • (0.5) Notes that draft capability is not needed — drafting happens by writing to a local file, not via the mail provider’s draft API — so mail:read+draft is excluded.
  • (0.5) Names a change-condition that would flip the scope — e.g., “if the skill later needs to draft a reply that stays in the user’s Gmail drafts folder, mail:read+draft becomes correct; this would be a minor-version bump with a changelog entry and a fresh audit.”
  • (0.5) References the plugin-level security discipline — the decision is documented in SECURITY.md and the register.
Show model answer

Model answer. mail:read_label. The skill only needs the agent-access label, so a full-mailbox scope (mail:read_all) would be an unnecessary widening; mail:read+draft adds draft capability I do not use because the summary is written to a local file, not to Gmail drafts. This is least-privilege applied: narrowest scope that does the job. The answer would change if a future version of the skill started drafting replies that live in the mail provider’s drafts folder — then mail:read+draft becomes the correct scope, with a minor-version bump, a changelog entry, and a fresh pass through the S4 and S6 questions in SECURITY.md.”

Remediation: re-read Lesson 7.4 Block 4 (least-privilege, applied) and Block 3 (questionnaire).


Applied CORE

Two questions, half a page each, up to 2.5 points each. Open-workstation: keep your frozen register, your custom skill’s CHANGELOG.md, and your custom plugin’s SECURITY.md open. Full credit requires the analysis be grounded in your own artifacts, not a generic response.

Q9 — Description-tuning diagnosis (applied). Open your custom skill’s SKILL.md and the skill’s CHANGELOG.md. In half a page: (a) quote your current description field verbatim; (b) name which of the three description failure modes your first draft description most resembled (vague, overfit, or triggerless) and explain how you know from the Round 1 trigger test results in your changelog; (c) propose one specific edit you would make to the current description if you re-ran the trigger test today, and predict what failure-mode risk the edit reduces. You may cite specific test phrases from your changelog.

Scoring rubric (5 sub-points, up to 2.5 points total):

  • (0.5) Description quoted verbatim.
  • (0.5) Names the first-draft failure mode and cites changelog evidence (specific test request that failed, with the reason).
  • (0.5) Proposes a concrete edit (not a vague “make it better”).
  • (0.5) Predicts the failure-mode risk the edit addresses (e.g., “reduces Mode 2 / overfit by adding an exclusion for casual summaries”).
  • (0.5) Cites at least one specific trigger-test phrase from the changelog by name.

Full credit requires the analysis be grounded in the student’s own changelog, not a generic response.

Show model answer

Model answer (illustrative — your specifics must come from your own skill).

  1. Current description (verbatim): “Runs a multi-source research sweep on a named topic. Use when the user asks to triangulate sources, check citations, produce a sources.md, or build a competitive landscape. Inputs: a topic string. Outputs: a sources.md with three independent sources, one fabrication-risk callout, and a one-paragraph synthesis. Do not use for single-article summaries or casual curiosity lookups.”
  2. First-draft failure mode (vague): my Round 1 description was “Helps with research.” The changelog Round 1 test shows this failed to fire on “do a research sweep on carbon capture” — the agent produced a generic chat reply instead of invoking the skill, which is the silent-fail pattern of Mode 1 / vague. I confirmed this by seeing the skill was not named in the session trace at all.
  3. Proposed edit & predicted risk: add a sentence to the exclusion block: “Do not use when the user names a single author or source — prefer in-session prompting for single-source reads.” The Round 3 test “read the Stanford NLP paper and summarize it” fired the sweep when I wanted an in-session reply. This edit reduces Mode 2 / overfit risk by narrowing the exclusion to name the “single named source” shape explicitly, which is the surface my Round 3 test exposed.

Remediation: a miss here sends the student back to Lesson 7.3. Run a fourth tuning round; add a changelog entry. This is a core module capability and must be demonstrated before Module 8.

Q10 — Hygiene ritual applied (applied). Open your frozen extension-register-v1.md. Pick the row whose hygiene decision was hardest — the one where the keep / refactor / replace / retire call was not obvious. In half a page: (a) name the row and summarize its current state in two sentences; (b) walk through the five ritual steps and cite which step surfaced the issue that made the decision hard; (c) name the decision you reached and the reasoning that tipped it — if you are uncertain, describe the one additional piece of information that would have made it easy; (d) predict what you would expect to see on this row in 90 days, at the next ritual sweep, if your decision was correct.

Scoring rubric (5 sub-points, up to 2.5 points total):

  • (0.5) Row correctly identified and summarized (not a generic row).
  • (0.5) Correctly cites which of the five ritual steps surfaced the issue.
  • (0.5) Reasoning tips cleanly — the student names a concrete factor, not a vibe.
  • (0.5) Honest uncertainty acknowledged if present (the rubric rewards honesty; pretending confidence you do not have loses a point).
  • (0.5) 90-day prediction is specific and falsifiable (e.g., “if keep was right, use count should be > 5 and no collisions with newly authored skills; if replace was right, the new extension should have its own row with 2 real-use traces”).

A passing Q10 requires that the student demonstrate the ritual is a thinking tool, not a checklist.

Show model answer

Model answer (illustrative — your specifics must come from your own register).

  1. Row: research-sweep (authored skill). State: frozen v1.0, 6 invocations in the last 60 days, description tight after Round 3 tuning, permission surface unchanged.
  2. Ritual walkthrough: Step 1 (descriptions aloud) passed. Step 2 (permissions vs audit) passed. Step 3 (review dates) passed — row’s next review is 35 days out. Step 4 (collisions) surfaced the issue: my newly installed quick-summary plugin shares the trigger phrase “summarize this article” with my research-sweep skill’s exclusion block. The agent has been picking the plugin correctly on 4 of 5 recent short-article requests, but the collision is not explicit — it is working by the exclusion block doing its job, not by an affirmative disambiguation.
  3. Decision: Refactor, not retire. The skill is still earning its keep, but the exclusion block should become a positive reference to the plugin by name — tightening the classifier input, not just ruling out adjacent cases. The one piece of information that would have made this easier: 30 more days of collision data — right now I only have 5 short-article requests in the window.
  4. 90-day prediction: if refactor was right, the next ritual pass should show (a) the 10+ short-article requests in that 90-day window all routed to quick-summary without the agent reaching for research-sweep; (b) the refactored description hitting a new Round 4 trigger test at 6/6; (c) no new collision surfaced with any additional plugin installed since. If instead I see research-sweep firing on short-article requests anyway, the refactor was the wrong call and I should replace rather than patch.

Remediation: a miss here sends the student back to Lesson 7.5. Re-run the hygiene ritual on at least three rows with the worksheet open; produce a written ritual log alongside the register.


Parent / instructor scoring summary

Total: 15 points across 10 questions.

  • Multiple choice (Q1–Q6): 1 point each — 6 points.
  • Short answer (Q7–Q8): up to 2.5 points each — 5 points.
  • Applied (Q9–Q10): up to 2.5 points each — 5 points (total capped at 15 for the effective scale).

Passing bar: 11.5 of 15 or better, with at least one applied question at full credit. A miss on the applied section sends the student back to Lesson 7.3 (if the missed applied is Q9, description-tuning) or Lesson 7.5 (if the missed applied is Q10, hygiene-ritual) before Module 8.

Weighting suggestions for parents issuing credit:

  • Multiple choice (Q1–6): 40% of Module 7 score.
  • Short answer (Q7–8): 20%.
  • Applied (Q9–10): 40%. Q9 and Q10 are the load-bearing items — they demonstrate applied judgment on the student’s own artifacts, not recall.

Evidence to file in the student’s credit portfolio for Module 7. This check alone is not sufficient evidence of Module 7 completion. The full Module 7 portfolio is:

  1. This completed check (all ten answers written out). File in /ops/credit-docs/module-07/.
  2. /capstone/extension-register-v1.md — 5–7 rows, all audited, every row with current next-review date and status column filled.
  3. /capstone/custom-skill-v1/SKILL.md, README.md, CHANGELOG.md (3+ entries from Rounds 1, 2, 3 of the tuning loop), traces/ (2 real-use runs).
  4. /capstone/custom-plugin-v1/ — manifest, skills/, README.md, SECURITY.md (all seven questionnaire items answered), CHANGELOG.md, traces/ (2 real-use runs), and a clean-uninstall confirmation.
  5. The Module 7 retrospective in my-first-loop.md — 300–500 words answering the four retrospective questions.
  6. The updated Extension Posture section in my-first-loop.md — with the pre-module version visible as historical record.
  7. A short (2–3 sentence) instructor note on Q9 and Q10 — which description-tuning failure mode the student diagnosed and which row they chose for the hardest hygiene decision.

Transcript language. If a parent is assembling a transcript, the transcript line for Module 7 can accurately say: “Designed, audited, authored, and retired AI extensions — installed and evaluated community plugins with an outbound decision matrix, built and tuned a custom skill across three trigger-test rounds, wrapped it in a custom Cowork plugin with a seven-question security disclosure, and maintained a frozen extension register using a five-step quarterly hygiene ritual.”

Remediation if missed:

  • Q1: Re-read Lesson 7.1 Blocks 2–3. The six-place taxonomy is the navigation map for the whole module.
  • Q2: Re-read Lesson 7.3 Block 2. Re-run the description-tuning drill in /activities/module-07/description-tuning-drill.html.
  • Q3: Re-read Lesson 7.2 Block 4 (the outbound decision matrix). Re-audit one community plugin using the minimum-viable-audit worksheet.
  • Q4: Re-read Lesson 7.5 Block 3. The use-has-stopped signal is the quietest and the strongest.
  • Q5: Re-read Lesson 7.4 Block 3 (the seven-question security questionnaire). S6 and S7 are the plugin-specific additions.
  • Q6: Re-read Lesson 7.5 Block 3. Print the five signals and keep them near your workstation.
  • Q7: Re-read Lesson 7.3 Blocks 1–2. Description-as-classifier is the module’s headline technical insight; if it is not clear, nothing else downstream works.
  • Q8: Re-read Lesson 7.4 Block 4 (least-privilege, applied). Re-audit your plugin’s SECURITY.md answers against your actual skill’s needs.
  • Q9: The description-tuning muscle is the load-bearing authoring skill of Module 7. Return to Lesson 7.3. Run a fourth tuning round; add a changelog entry. This is a core module capability and must be demonstrated before Module 8.
  • Q10: The hygiene ritual is the load-bearing discipline of Module 7. Return to Lesson 7.5. Re-run the ritual on at least three rows with the worksheet open; produce a written ritual log alongside the register.

If the student passes at 11.5 / 15 or above with at least one applied question at full credit, Module 7 is complete and Module 8 can begin. Below that bar, target remediation to the specific lesson(s) listed above before moving on.

Next up

Module 8 — Agent Orchestration.

The skills and plugin you shipped in Module 7 are the natural components Module 8 composes. Module 8 teaches you to design systems where multiple agents hand off work — sequential pipelines, parallel workers, hierarchical supervisors — and to debug the handoff failures that only show up when more than one agent is in the loop. A clean Module 7 register makes Module 8 possible; a messy one makes Module 8 miserable.

Module 8 opens when the Module 7 portfolio is complete — this check, the frozen register, the custom skill, the custom plugin, and the Module 7 retrospective.

Open Module 8 →