Lesson 7.3 — Building Your First Custom Skill (Claude Code Path)

What a skill actually is on disk CORE

A Claude Code skill is a folder. That is the whole story. The folder lives at one of three addresses — user-scope ~/.claude/skills/, project-scope <project>/.claude/skills/, or inside a plugin’s skills/ folder. Inside, the only file that is strictly required is SKILL.md. Everything else — example inputs, scripts, templates, reference data — is optional, and lives alongside SKILL.md in the same folder.

SKILL.md has two sections. The frontmatter is a YAML block at the top bounded by --- lines. It carries structured fields the agent reads before it reads anything else — minimally name and description, optionally model (which model this skill prefers), allowed-tools (a list naming which tools this skill may use, for students who want to lock a skill to a narrow tool surface), and a handful of other conventional fields. The body is the prose that follows the frontmatter — the actual instructions the agent consults once it has decided this skill applies.

Concretely, a minimal SKILL.md looks like this — frontmatter on top, body underneath:

---
name: research-sweep
description: One sentence about what this skill does and when to use it.
---

(The body goes here — the step-by-step the agent follows once this skill fires.)

The two --- lines are the fences that mark the frontmatter; everything between them is structured fields the agent reads first. Everything after the closing --- is the body. That is the entire shape; the rest of this lesson is about authoring each half well.

The split is important because the agent reads the frontmatter before it reads the body. When the agent is deciding which of the (possibly dozens of) installed skills to reach for, it reads descriptions — not bodies. A skill whose body is a masterpiece of prompt design but whose description is vague will never fire. A skill whose description is tight but whose body is thin will fire and then do the job badly. Both halves have to be good; they are good for different reasons.

Two corollaries worth naming:

The description field is a prompt. It is not a blurb for a directory page. Every word in it is consumed by a model making a routing decision. Write it as if you were giving a smart intern instructions for when to pull this folder off the shelf.
The body is also a prompt. Once the skill fires, the agent reads the body and uses it as instructions for how to do the work. The same rules apply as everything you have learned about prompting — be specific, name the input shape, name the output shape, name the failure modes, show an example.

A useful mental model is that a skill is a two-layer prompt. The outer layer (description) answers “does this job belong to me?” The inner layer (body) answers “given that it does, how do I do it?” A well-authored skill has distinct, honest answers at both layers.

Description as a classifier: three failure modes CORE

Module 7’s headline technical insight is that the description field is not decoration; it is a live classifier. Three predictable ways students write descriptions that fail:

Mode 1 — Vague description, never fires. “Helps with research.” The agent will almost never reach for this skill because nothing distinctive is claimed. Every time the student asks for research help, the agent decides the skill is no more relevant than any other generic option and produces a generic chat response. The student concludes “skills are mysterious” and stops building them.

The fix is specificity: “Runs a multi-source research sweep on a named topic. Use when the user asks to triangulate sources, check citations, produce a sources.md, or build a competitive landscape. Inputs: a topic string. Outputs: a sources.md file with at least three independent sources, one fabrication-risk callout, and a one-paragraph synthesis.”

Mode 2 — Overfit description, fires on adjacent wrong work. “Runs when the user talks about research.” Too broad. The agent will fire this skill on any research-adjacent request — a single summarization, a casual question about a paper, a quick curiosity lookup — whether or not the skill’s actual body is the right tool for that request. The student gets a heavy-weight triangulation procedure applied to a one-line question and wastes tokens, time, and attention.

The fix is bounding: include an explicit exclusion block. “Do not use for single-article summaries, casual curiosity lookups, or when the user has already named sources and only wants you to read them. For those, prefer in-session prompting or the quick-summary skill.”

Mode 3 — Honest-about-inputs, silent-about-triggers. “This skill produces a sources.md with three independent sources and a fabrication-risk callout.” Technically accurate, but the agent reads this and does not know when to use it. It sounds like a description of output, not a trigger for when the skill applies. The agent reaches for it rarely, and when it does, it is by luck rather than design.

The fix is an explicit triggering phrase list: “Use when the user asks for: a research sweep, source triangulation, a sources.md, a competitive landscape, or a fabrication check. Use when the user names a topic and wants independent sources. Do not use when the user wants a single article read or a casual summary.”

A good description is descriptive + inclusive + exclusive, in roughly that order. It describes what the skill does in one sentence, lists the phrases that should invoke it, and names the adjacent cases it explicitly does not cover. The order matters because the description is read top-down by the model; put the description first so the model knows what the skill is before it decides whether to reach for it.

Write descriptions for your future agent, not your current self

It is tempting to write descriptions that make sense to you reading them. That is fine for naming, but wrong for routing. The description is read by a model that has no idea what is inside the body; it only sees the description and has to decide. Read your description aloud, imagine you are the model, and ask: “Given only this text, would I pick this skill for the request ‘do a research sweep on carbon capture’?” If the answer is not a clean yes, the description needs more signal.

Anatomy of the body: what goes where CORE

The body is the part of SKILL.md that runs once the skill has been chosen. Five sections, in order, have proven to work well across the skills students write in this module:

1. Purpose paragraph (2–4 sentences). A tight restatement of what the skill does and who it is for. Yes, this overlaps the description. It is worth the overlap; the body is the first thing the agent reads as instruction, and starting with purpose re-aligns the model before it does the work.

2. Inputs (one section). List what the skill expects from the user. If the skill takes a topic, say so. If it takes a path to a file, say so. If it tolerates an empty input and will ask a clarifying question, say so. A skill that is silent about inputs will, over many runs, do inconsistent things; describing inputs makes the behavior reproducible.

3. Procedure (the longest section). The step-by-step the agent should follow when the skill fires. Numbered steps. Every step should be concrete — “read the topic, search three source types (academic, news, primary), triangulate disagreements, flag any source that lacks a verifiable link as a fabrication risk, write sources.md with the schema in Appendix A.” Generic steps produce generic output. Specific steps produce the kind of output the description promised.

4. Output contract (one section). What the skill produces. File name conventions, section headings, any required fields. If the skill writes multiple files, name each one. If the skill writes nothing and only returns a chat response, say that and describe the response shape. The output contract is what students use later, in the invocation trace, to verify the skill did the named job.

5. Failure modes (one short section). The predictable ways this skill can go wrong, and what the skill should do when each occurs. “If no independent sources are found, say so and produce a sources.md with only the primary source and an explicit lowconfidence: true flag.” “If the topic cannot be narrowed to a single research question, pause and ask the user to narrow before searching.” This is the section students most commonly skip and most commonly wish they had written when the skill misfires in week three.

Optional sections worth adding when relevant: Examples (one concrete input/output pair is worth more than many abstract sentences), Appendices / schemas (if the output contract references a specific file schema, keep it here), Supporting scripts (if the procedure calls a script for deterministic formatting, the script lives in the same folder and the body refers to it by name).

A good body is short enough to read in one sitting. A skill body that does not fit comfortably in one read is usually two skills.

Walkthrough: the research-sweep skill RECIPE

Tool	Claude Code CLI; ~/.claude/skills/research-sweep/; skill trace flag
Last verified	2026-04-18
Next review	2026-07-18
Supported OSes	macOS, Linux, Windows

The goal of this walkthrough is not to ship research-sweep (the course provides a reference version in the Recipe Book). The goal is to walk through the authoring move-by-move so the student internalizes the workflow on a skill they will then adapt in the activity for their own candidate. The canonical version of this walkthrough, with exact commands and file paths, lives at /recipe-book/author-a-claude-code-skill.md.

Step 1 — Create the folder. Choose a scope. User-scope (~/.claude/skills/research-sweep/) is appropriate for a skill you want available everywhere. Project-scope (<project>/.claude/skills/research-sweep/) is appropriate if the skill is specific to one project’s conventions. For Lesson 7.3, the student uses user-scope unless they have a strong reason not to. Create the folder; create an empty SKILL.md inside.

Step 2 — Write the frontmatter. The minimal frontmatter for this walkthrough:

---
name: research-sweep
description: Runs a multi-source research sweep on a named topic.
  Use when the user asks to triangulate sources, run a research
  sweep, produce a sources.md, check citations, or build a
  competitive landscape on a specific topic. Inputs: a topic
  string. Outputs: a sources.md file with at least three
  independent sources, one fabrication-risk callout, and a
  one-paragraph synthesis. Do not use for single-article
  summaries, casual lookups, or when the user has already named
  sources and only wants them read.
---

Notice the description is long. That is deliberate. Terse descriptions are the most common Mode-1 failure.

Step 3 — Write the body. Five sections in the order from Content Block 3. For the walkthrough, the body’s Procedure section should look something like:

1. Read the topic string. If it is unambiguous, proceed. If it
   is broad (e.g., "climate change"), ask one clarifying
   question to narrow to a specific sub-question.

2. Search three source types: academic (Google Scholar, JSTOR
   if available, or the student's library), news (the last 90
   days of major outlets), and primary (government documents,
   official reports, organization statements).

3. Produce at least three independent sources from distinct
   author/institutional bases. A single author's blog post from
   three different dates is one source, not three.

4. Triangulate disagreements explicitly. If sources disagree on
   a fact, name the disagreement and the position of each
   source.

5. Flag any source that does not have a verifiable link or a
   stable identifier as a fabrication-risk with `risk:
   unverified`.

6. Write sources.md with the schema in Appendix A, including a
   one-paragraph synthesis at the bottom.

The schema in the appendix names the required keys (title, URL, author, published date, source type, summary, risk) and the structure of the synthesis paragraph.

Step 4 — Smoke test. Open a Claude Code session in a real project. Type a request that should fire the skill — “Run a research sweep on carbon capture patents filed in 2025.” Watch whether the skill triggers. If Claude Code has a skill-trace mode enabled (via its CLI flag; see the Recipe Book), the trace shows which skill the model chose and why. If the skill fires and produces a sources.md that conforms to the contract, the smoke test passed.

Step 5 — Save the trace. Whatever the chat session produced, save the transcript (or a screenshot of the key section) to <skill-folder>/traces/2026-04-XX-smoke-test.md. Traces are not decoration; they are the evidence that the skill does the job it claimed. You will save two real-use traces for the capstone.

The three-round tuning loop RECIPE

A skill that passes the smoke test is not a finished skill. It is a candidate. Run the three-round tuning loop before freezing anything. Each round produces a dated entry in CHANGELOG.md so future-you can see what changed and why.

Round 1 — Trigger test

Write six test requests — three that should fire this skill, three that should not. For research-sweep, good tests look like:

Should fire: “Run a research sweep on carbon capture patents filed in 2025.” “Triangulate sources on the rise of youth mental health interventions since 2020.” “Produce a sources.md on the Supreme Court case Patel v. United States.”
Should not fire: “Summarize this article I uploaded.” “What is 2+2?” “Draft an email to my teacher about the homework extension.”

Run each in a Claude Code session with a trace visible. Record which triggered and which did not. If a should-fire request did not fire, the description is undertuned — tighten it. If a should-not-fire request did fire, the description is overtuned — add a more specific exclusion. Changelog entry, dated.

Round 2 — Body test

The skill now fires correctly. Run three real requests end-to-end and read the outputs. Does the sources.md the skill produces actually match the output contract? Does it have at least three independent sources? Does it call out fabrication risk? Does the one-paragraph synthesis read as a synthesis, or as three bullet points pretending to be one? If the procedure produces output that does not match the contract, tighten the procedure in the body (not the description). Changelog entry.

Round 3 — Scope test

Stress-test the edges. Ask the skill to run on an intentionally too-broad topic (“climate change”) — does it ask a clarifying question, as the failure-modes section promised? Ask it to run on a topic with no English-language sources — does it degrade gracefully with a lowconfidence: true flag, or does it hallucinate a source to satisfy the three-source minimum? Ask it to run on a topic you already know well so you can verify the sources are real. Any failure mode the skill encounters that is not already named in the Failure modes section gets added to that section, with what the skill should do next time. Changelog entry.

Three rounds is the discipline. Students who skip rounds produce skills that look fine in the first week and quietly misfire by week three. The changelog — even three short entries — is how future-you knows what you learned by tuning.

Two invocation traces, real runs

The capstone requires that the skill have been invoked successfully at least twice in real use before you freeze it. Do not treat the tuning rounds as the only invocations. Round 2 and Round 3 can produce the traces, but prefer two runs on requests that came from your actual life — the ones that prompted you to build the skill in the first place. A skill that only ever ran on tuning tests is a skill whose ecological validity has not been checked.

Freezing the skill into /capstone/ RECIPE

Once the skill has passed all three tuning rounds and produced two real-use traces, freeze it.

Folder layout under /capstone/custom-skill-v1/:

custom-skill-v1/
├── SKILL.md          (frontmatter + body; the only required file)
├── README.md         (2-3 paragraphs: what this skill is, why
│                      you built it, how to install)
├── CHANGELOG.md      (dated entries from your three tuning rounds)
├── appendices/
│   └── schema.md     (output contract schema if your body
│                      references one)
├── scripts/
│   └── (any helper scripts)                    (optional)
└── traces/
    ├── 2026-04-XX-real-run-1.md
    └── 2026-04-XX-real-run-2.md

Not every folder is required. A skill that uses no scripts does not need scripts/. A skill with no schema does not need appendices/. The SKILL.md, README.md, CHANGELOG.md, and traces/ are required.

Register row. In /capstone/extension-register-v1-draft.md, fill in the custom-skill row using the template from /resources/module-07/extension-register-row-template.md. Key fields:

Type: Claude Code skill, authored.
What it does: one sentence, in the same shape as your frontmatter description.
Where it lives: both the live install path (~/.claude/skills/<name>/) and the frozen path (/capstone/custom-skill-v1/).
Invocation: two or three phrases that reliably trigger it.
Audience: only me.
Budget / model: rough per-invocation cost estimate + the model family the skill is written against.
Next review: a date sooner than the standard 90-day cadence. Authored skills drift faster in your first year of authoring, so they earn a tighter review window.
Keep/retire: the condition that would make you retire (e.g., “retire if I stop doing project-based research for a full semester”).

Try it — Author, install, invoke, and iterate your custom skill CORE

six deliverables · open the description-tuning drill →

This is the module’s largest Try-it. The deliverables: a skill folder live on your machine, a matching folder frozen under /capstone/custom-skill-v1/, two invocation traces, a changelog with at least three entries, and the register row.

Step 1 — Pick your candidate. Return to your Lesson 7.1 five-candidate sort. Choose one candidate in the Skill candidate now pile. If your sort left that pile empty, pick the strongest Skill candidate if the shape stabilizes candidate. If you have options, prefer a candidate whose shape is more stable and whose work you do more often.

Step 2 — Sketch the skill on paper. Before you write any YAML, open /resources/module-07/skill-planner.md and fill it in: name, what it does in one sentence, inputs, outputs, procedure in 4–7 numbered steps, output contract (file name + schema), and at least three failure modes. This planner is where most of the thinking should happen; the actual SKILL.md is mostly a transcription.

Step 3 — Author SKILL.md. Create the folder at the scope you chose (user-scope unless you have a reason otherwise). Write the frontmatter with a descriptive + inclusive + exclusive description. Write the body with the five sections. Read the whole thing back aloud; a skill that reads awkwardly is a skill that will fire awkwardly.

Step 4 — Run the three-round tuning loop. Three rounds, one changelog entry per round. Do not skip. The temptation to declare the skill done after Round 1 is strong; resist it.

Step 5 — Run two real-use invocations and save the traces. Real requests from your actual work, not tuning tests. Save the traces under traces/.

Step 6 — Complete the HTML activity Description-tuning drill. Open /activities/module-07/description-tuning-drill/. Five pre-baked skill descriptions, each with five test requests. For each pair, say whether the skill would fire and explain why. The drill calibrates your eye before you return to your own skill for Step 7. (Can be done after Step 3 if you prefer.)

Step 7 — Freeze the skill and fill the register row. Copy the live skill folder into /capstone/custom-skill-v1/. Write the README.md (what this skill is, why you built it, how to install on a clean machine). Fill in the custom-skill register row. Set a next-review date that is sooner than the standard 90 days, since authored skills drift faster in the first year of authoring.

If the skill never fires reliably

Some students, despite the tuning loop, end Lesson 7.3 with a skill that fires reliably on three out of six test requests. That is common. It means the description needs a fourth round, not that you have failed the lesson. Note the failure pattern in the changelog with the word UNRESOLVED, freeze the skill anyway (Module 7 freezes the current state, not the perfect state), and add an item to the Lesson 7.5 hygiene ritual: “description needs another round.” You will come back to it.

Done with the hands-on?

When the recipe steps and any activity above are complete, mark this stage to unlock the assessment, reflection, and project checkpoint.

Key concepts

A skill is a folder on disk with SKILL.md at minimum. Frontmatter (description for routing) + body (instructions for execution).
Description is a classifier. Three failure modes: vague (never fires), overfit (fires wrong), triggerless (agent cannot tell when to fire). The fix is descriptive + inclusive + exclusive, in that order.
Body has five sections. Purpose, inputs, procedure, output contract, failure modes. The failure-modes section is the one students most commonly skip and most commonly wish they had written.
Three-round tuning loop. Trigger test, body test, scope test — with a dated changelog entry per round.
Freeze only after two real-use traces. A skill that has only ever run on tuning tests has not proven itself.

Quick check

Five questions. Tap a question to reveal the answer and the reasoning.

Q1. A skill that never fires, despite the work clearly being in its lane, most likely suffers from which failure mode?

A Overfit description.
B Vague description.
C Body that is too long.
D Missing supporting script.

Show explanation

Answer: B. The agent is not even opening the folder because nothing in the description distinguishes it from generic chat. A produces misfires, not under-fires. C affects execution after triggering, not triggering itself. D is a body-level issue. The fix for Mode 1 is specificity in the description: name inputs, outputs, and at least one triggering phrase.

Q2. The recommended order for a description field is:

A Exclusive → descriptive → inclusive.
B Descriptive → inclusive → exclusive.
C Inclusive → descriptive → exclusive.
D It does not matter.

Show explanation

Answer: B. The model reads top-down. Describe what the skill is, then name the phrases that invoke it, then fence in the adjacent cases it does not cover. A and C bury the lede. D is the stance that leads to skills that never fire.

Q3. Which section of the body do students most commonly skip and most commonly regret skipping?

A Purpose.
B Procedure.
C Output contract.
D Failure modes.

Show explanation

Answer: D. Skills that do not name their failure modes quietly hallucinate through edge cases for weeks before anyone notices. Purpose and procedure are hard to skip — without them the skill does nothing at all. Output contract is easy to write. Failure modes is the one that feels skippable and is not.

Q4. You finish the trigger test (Round 1). All six test requests routed correctly. What should you do next?

A Ship the skill.
B Run Round 2 (body test) — the trigger is working, but you have not verified the skill produces outputs that match the contract.
C Run Round 3 (scope test).
D Add a new skill for the adjacent case that almost fired.

Show explanation

Answer: B. A skill that fires is not a skill that works. Round 2 checks whether the procedure produces the promised output. Skipping to A or C leaves the body untested. D is the pile-up pattern; add skills, not tune the one you have.

Q5. The Module 7 capstone requires two invocation traces for the custom skill. Why does the lesson recommend the traces come from real-use requests rather than the tuning rounds?

A Tuning traces do not save to disk.
B Tuning requests are the only ones the skill was designed for; real requests are the ecological validity check.
C The agent does not produce traces in tuning rounds.
D Real-use traces are shorter.

Show explanation

Answer: B. A skill that only ever runs on its own test cases has been verified against itself. Real use surfaces the edge cases the author did not think to test for — and those are exactly the cases that matter for whether the skill earns its keep.

Reflection prompt

Which of the three tuning rounds surfaced the most about your skill?

In 4–6 sentences: Was the bottleneck in whether the skill fired (description), whether it produced the right output when it fired (procedure / contract), or how it handled the edges (failure modes)? What does the answer tell you about where your skill-authoring instincts need the most calibration — and what one change would you make the next time you author a skill from scratch?

The answer is diagnostic. If Round 1 was the bottleneck, your instinct for description-as-classifier needs practice. If Round 2 was the bottleneck, your body is vague where it needs to be specific. If Round 3 was the bottleneck, you were over-optimistic about how often the edge cases would show up. All three failure modes are normal; knowing which is yours tells you what to work on in Lesson 7.4.

Project checkpoint

One frozen skill, two real-use traces, a changelog, and the register row.

By the end of this lesson, you should have:

1. Your custom skill live on your machine at the scope you chose (~/.claude/skills/<name>/ or <project>/.claude/skills/<name>/).

2. /capstone/custom-skill-v1/ containing SKILL.md, README.md, CHANGELOG.md with three+ entries, and traces/ with two real-use invocation traces. Optional appendices/ and scripts/ as needed.

3. The custom-skill row filled in /capstone/extension-register-v1-draft.md with a next-review date 60–75 days out.

4. Completed description-tuning drill HTML activity.

Do not proceed to Lesson 7.4 until the skill is frozen and the register row is complete. Lesson 7.4 will wrap this skill (or a second one of its size) into a Cowork plugin.

Instructor / parent note

This is the most demanding lesson in Module 7, and the one where students most often underestimate the authoring effort. Plan for a real authoring session, not a quick exercise; Steps 3–5 in particular reward time. A student who rushes Steps 3–5 ships a skill that fires once and never again; a student who gives the tuning loop its three rounds ships a skill that earns its register row. If a parent sees the student skip the changelog, the right nudge is not “add changelog entries” but “what did you learn between Round 1 and Round 2? Where does that knowledge live?” The changelog is the only place.

Watch for the “I’ll tune it later” response after Round 1. Round 1 is the easiest round. Round 2 is where the work is. A student who is stuck in Round 2 — the skill fires but produces bad output — usually has a procedure section that is three bullets long and needs six. That is a teachable moment: the body is also a prompt; prompts that are short produce outputs that are vague.

Also watch for students who try to build two skills in this lesson. The lesson instruction is explicit: one skill. A student who builds two will do both poorly and will carry the pile-up habit into Lesson 7.4 where it will compound. If the student has two candidates they feel equally strongly about, pick the one whose shape is more stable — the one they have done more times, the one whose inputs and outputs are the clearest. The second candidate goes into the hygiene-ritual backlog for Lesson 7.5.

Finally: the UNRESOLVED outcome in the callout is a legitimate outcome. A student whose skill fires 3/6 reliably and who honestly marks that in the changelog has demonstrated better judgment than a student who claims 6/6 by cherry-picking test requests. Celebrate the honesty. The Lesson 7.5 hygiene ritual exists precisely so students can return to skills that were good-enough-to-freeze, not-yet-good-enough-to-trust and finish the work.

Next in Module 7

Lesson 7.4 — Building and packaging a custom plugin (Cowork path).

Wrap this skill (or a second one of its size) into a Cowork plugin. Write a manifest with the seven required fields. Answer the seven-question security questionnaire — including S6 (what installing the plugin removes from user control) and S7 (update posture). Run the uninstall test. Freeze the plugin with its own SECURITY.md.

Continue to Lesson 7.4 →

Building your first custom skill.

Read & Understand

What a skill actually is on disk CORE

Description as a classifier: three failure modes CORE

Write descriptions for your future agent, not your current self

Anatomy of the body: what goes where CORE

Try & Build

Walkthrough: the research-sweep skill RECIPE

The three-round tuning loop RECIPE

Round 1 — Trigger test

Round 2 — Body test

Round 3 — Scope test

Freezing the skill into /capstone/ RECIPE

Try it — Author, install, invoke, and iterate your custom skill CORE

Done with the hands-on?

Check & Reflect

Quick check

Reflection prompt

Which of the three tuning rounds surfaced the most about your skill?

Project checkpoint

One frozen skill, two real-use traces, a changelog, and the register row.

Lesson 7.4 — Building and packaging a custom plugin (Cowork path).