Three buckets, one principle CORE
Data classification sounds bureaucratic. It is not. It is the shortest way to turn “where should this go?” from a per-decision judgment call into a mechanical lookup — and a mechanical lookup is the kind of rule a tired student follows correctly at 11 p.m. when a judgment call would be wrong.
The three buckets Module 9 commits to:
- Public. Information that already lives on the open internet, or would be fine if it did. Examples: a Wikipedia article the student summarized, a public blog post, an open dataset, a news headline, a paragraph from a textbook, code from a public GitHub repository, a quote from a published book. The defining test: if a stranger read this tomorrow, would anything change? If nothing would change, it is public.
- Personal. Information about the student, their immediate household, or a third party the student has a relationship with, that the student has not chosen to publish. Examples: a draft of the student’s essay, a calendar event, an email thread, a note about a friend’s birthday, a meeting prep document, a research brief for a school project that has not yet been submitted. The defining test: would the student be fine if the same stranger read this tomorrow? If the answer is “I’d rather they didn’t, but it wouldn’t be a problem,” it is personal.
- Sensitive. Information whose disclosure would create real, non-trivial harm — to the student, their household, or a third party. Examples: medical or mental-health detail, financial-account specifics, immigration or legal status information, anyone else’s password or secret key, the student’s own credentials beyond what is already covered by Lesson 9.3, a third party’s private information the student was trusted with (a friend’s health situation, a sibling’s search history). The defining test: would the student or someone they care about experience real harm if a stranger read this tomorrow? If the answer is yes, it is sensitive.
The one principle that holds the scheme together: the lower a bucket’s tolerance for disclosure, the tighter the routing rule that follows must be. Public data may go anywhere. Personal data may go to cloud models whose terms the student has read and accepted. Sensitive data goes to a local model or not through an agent at all. The principle is mechanical and does not require judgment at the moment the data is about to be processed.
The scheme is deliberately coarse — three buckets, not seven. Finer classification schemes exist (some organizations use six or more tiers); they add complexity that one student cannot maintain. Three is the coarsest scheme that captures the distinction that matters: does disclosure matter? → no (public), yes but mildly (personal), yes and a real problem (sensitive).
What sensitive actually means (and why students under-classify it) CORE
The bucket students most often mis-classify is sensitive. The common failure mode is to label sensitive data as personal because the student does not, in the moment, feel threatened. The lesson’s correction: sensitivity is a property of what the data could become, not how the student feels about it right now. Three examples that students typically under-classify on first pass:
A friend’s mental-health remark in an email. A friend mentions in an email that they are struggling with anxiety. The student’s inbox triage agent processes the email. The friend’s mental-health status is sensitive — not because the student feels threatened, but because its disclosure would affect the friend. The student’s comfort is not the test; the friend’s downside is. An agent processing this email should not be running against a model whose terms allow training on user content, and the email body should not land in a shared-state folder that gets committed to git.
A financial-account digit a parent mentioned in passing. A parent texted the student the last four digits of their credit card to confirm a purchase. An agent summarizing the day’s messages reads the text. Those four digits are sensitive — not because a four-digit suffix alone is reconstructible, but because, combined with other information an attacker could find elsewhere (the parent’s name, approximate income, usual vendors), it moves them closer to an account-takeover. Information does not have to be standalone-catastrophic to be sensitive; contribution to a larger disclosure is enough.
The student’s own medical condition, even if “minor.” The student is being treated for something routine — allergies, anxiety, a skin condition. The student mentions it in a note they dictated for a personal-health-tracking agent to summarize. Medical information is sensitive by default, regardless of the student’s own assessment of its severity, because the downside of miscategorization is disproportionate (it affects insurance, employment, relationships in ways the student may not be able to fully predict at the moment of disclosure). The lesson’s rule: medical data is sensitive, full stop.
Two further notes on sensitivity:
- Sensitivity is sticky. Data that starts personal and later gets mixed with sensitive context inherits the sensitive rating. A summary of five emails is sensitive if any one of the five was sensitive. This is why the worksheet in the activity below asks you to rate data at the folder level as well as the artifact level: once a sensitive file has landed in a folder, the folder’s routing rule tightens.
- Third-party sensitive data is the hardest call. Most students are generous with their own data (“I don’t care if a cloud model sees my draft essay”) and stricter with data about others (“I would care if a cloud model saw my sister’s email to me”). The lesson’s guidance: when in doubt about a third party’s disclosure tolerance, rate up. Personal becomes sensitive; sensitive becomes “do not process through an agent at all.” The cost of being too conservative here is a slightly slower workflow; the cost of being too permissive is a trust breach with someone who did not consent to the routing.
Routing rules derived from classification CORE
The classification scheme is not the rule; the routing rule it implies is. Here is the rule Module 9 commits to.
Public data → any model. A Wikipedia article summarized by a research agent, a news headline triaged by an inbox agent, a public-repo code file reviewed by a coding partner. These can go to your cloud-default model, your local model, a cloud model you are evaluating, or a third-party MCP. Nothing about public data constrains the routing.
Personal data → vetted cloud model, or local. Personal data may go to a cloud model whose terms the student has read and accepted — which in practice means the default cloud model named in Module 2 (Anthropic’s Claude, OpenAI’s GPT family, or whichever the student committed to), and specifically the non-training-on-user-data tier of that provider if the provider offers one. Personal data does not go to cloud models the student has not vetted (most third-party-MCP-hosted models), to models whose terms allow training on user content without an opt-out, or to experimental models on platforms the student is trying for the first time. The local model is also a valid destination for personal data; the choice between local and vetted-cloud for personal data is a quality/convenience tradeoff, not a security one.
Sensitive data → local, or not through an agent at all. Sensitive data stays on the student’s own hardware. In practice this means: Ollama or LM Studio running a model the student pulled to their machine, with no network egress on the inference call. If the task genuinely requires a cloud-scale model to complete — if the local model cannot do the job well enough — the student’s options are (a) process the sensitive data themselves without an agent, or (b) redact/generalize the sensitive portions until the remaining task is personal-class, then route accordingly. The option that is not available is “just this once, I’ll send the sensitive data to the cloud model.” The whole point of the rule is that it is categorical, because exceptions are what lead to slow slips in posture.
There is an implicit fourth rule for data that is not supposed to be in an agent’s context at all: information the student does not own and was not given permission to redistribute. A friend’s private message forwarded to the student in confidence, an in-confidence document someone shared privately, a conversation overheard and written down. This data is not the student’s to route anywhere — not even locally — without the originator’s consent. Module 9 flags this once here and treats it as a responsibility question more than a routing question; Lesson 9.5 will revisit it under the ethics contract.
What about data that is public but embarrassing? A public tweet the student would rather not have their essay-writing agent process. The classification rule still applies: it is public (the test is “a stranger could read this tomorrow,” which for a public tweet is already true). The routing rule is “any model.” The student may choose to route it locally for their own comfort; that is fine, but it is a preference, not a security rule.
What about data that is personal but would become sensitive if combined with other data? The example from Content Block 2 — four digits of a card, plus the parent’s name — is the case. If the agent’s context will contain both pieces together, rate up: the combined artifact is sensitive, and the routing rule for the combined artifact applies to the agent call that reads both.
Finishing Section 3 (trust boundaries for the pipeline) CORE
Lesson 9.2 drafted Section 3 — Trust boundaries, and left the pipeline-specific portion for this lesson — because pipelines cannot be fully hardened until the data classification is done. Now do it.
For your frozen pipeline from Module 8 (/capstone/pipeline-v1/), add to Section 3 of the posture document:
- The pipeline’s data classification. Usually this is the strictest classification of any artifact the pipeline reads or writes. For most students, the frozen pipeline reads public data (web search results), produces personal data (summaries, drafts), and is therefore personal-class for routing purposes. If your pipeline touches sensitive data at any step, re-classify.
- The per-stage trust boundary hardening. For each stage, note the segregation-and-refusal framing applied to its prompt, and the containment (audience-only-you) applied to the pipeline as a whole. If a stage does not yet have segregation-and-refusal applied, add it now.
- The shared-state folder hardening. The folder where stages hand off to each other is, itself, an injection surface (Lesson 9.2, Surface 3). The hardening here is: (a) the downstream stage’s prompt treats upstream-stage output as untrusted (it applies segregation and refusal to the upstream output), and (b) the shared-state folder is not a folder any third-party MCP or unaudited plugin can read or write.
Note the sentence in the posture document marking Section 3 as complete, with the date. This is the only section that was split across two lessons by design; the completion marker is how you know you have closed the loop.
Walking your capstone and labeling every artifact RECIPE
This is the main hands-on block of the lesson.
What to walk. Your capstone folder is the authoritative set, but you have more artifacts than just what is under /capstone/. Use this checklist:
- Everything under /capstone/ — my-first-loop.md, workstation manifest, Module 3 codebase posture, Module 4 source posture, Module 5 inbox/calendar posture, Module 6 schedule register, Module 7 plugin/skill register, Module 8 pipeline blueprint, /capstone/pipeline-v1/ and its shared-state subfolders, /capstone/security-posture.md itself.
- Every scheduled-task definition and the folders those tasks read from and write to (from Module 6 and Module 8).
- Every agent’s “input sources” — which folders, which email labels, which calendar calendars, which URLs or URL patterns.
- Any notes-app page an agent reads, any sync folder an agent watches.
How to label. Open /resources/module-09/data-classification-table/ and fill one row per artifact (or per clearly-distinct group of artifacts — you do not need a row for every file in a public-repo source tree). Columns:
- Artifact — the file/folder/source, in plain language.
- Class — public / personal / sensitive.
- Reason — one phrase. (“Wikipedia article,” “my own unpublished essay draft,” “third-party mental-health context”).
- Routing rule — which models may process this artifact.
- Current routing — which models actually process it today.
- Gap? — yes/no. “Yes” if current routing does not match the required rule.
Honesty is the whole deal. The table is only useful if the “current routing” column is honest. Most students discover at least one gap — a folder they have labelled personal but whose contents have been read by a third-party MCP the student installed casually, or a file they now realize is sensitive but was being processed by their cloud-default research agent. The gap is not a failure of the lesson; the gap is why the lesson exists. The next block is the drill that closes the most important one.
Folder-level inheritance. When filling the table, work outside-in: label folders first, then walk their contents. The folder inherits the strictest classification of any child; a personal folder containing one sensitive file is a sensitive folder for routing purposes. This is why the /capstone/pipeline-v1/ shared-state folder is usually rated personal-or-sensitive on first pass — the research agent may drop a web-page quote (public) into the same folder where a personal email was previously summarized, and the folder’s rating is the strictest of any of its contents.
Re-routing one sensitive flow to a local model RECIPE
Pick the highest-impact gap from the previous block — the one where current routing is cloud and required routing is local — and re-route it. If your table has no such gap (some students’ systems do not touch sensitive data at all), create one for the drill: take a small sample of sensitive-class data (for example, three sentences of a personal health note), route it through your cloud-default once to see what happens, then re-route the flow so the same kind of data would go local next time. The drill is about having done the re-routing, so next time you need to do it in a hurry, you have.
Route a sensitive sub-task to a local model RECIPE
| Tool | Ollama (or LM Studio) + your coding agent pointed at the local endpoint |
| Platform | macOS, Windows, Linux |
| Version tested | Ollama 0.4.x; LM Studio 0.3.x |
| Last verified | 2026-04-20 |
| Next review | 2026-07-20 |
Pre-flight
Confirm your local model stack from Module 2 is up. On the Ollama path: ollama list and ollama run <model-name> to confirm a conversation opens. On the LM Studio path: open the app, load a model, confirm the local server is running at http://localhost:1234/v1 (or wherever the app binds). The Recipe Book entries recipe-book/install-ollama-mac.md, recipe-book/install-ollama-windows.md, and recipe-book/install-lm-studio.md carry the current versions and paths.
# confirm local model stack is up (Ollama path) $ollama list $ollama run llama3.2 # or confirm LM Studio’s local server is bound $curl http://localhost:1234/v1/models
The re-routing, on each path
- Claude Code CLI. The CLI’s primary inference goes to the cloud model; you do not route the CLI itself through Ollama. What you route instead is the sensitive sub-task. Add a small helper — a local-model CLI call the Claude Code CLI session invokes via the Bash tool — that handles the specific step where the sensitive data would enter context. The pattern: the CLI orchestrates the non-sensitive steps; when the sensitive step comes up, the session calls ollama run <model> -p "<prompt including the sensitive data>" and reads the local model’s output. The sensitive data never leaves the machine.
- Cowork tab. The scheduled task that handles the sensitive sub-task targets the local model directly via its HTTP endpoint (LM Studio exposes an OpenAI-compatible API at http://localhost:1234/v1; Ollama exposes an endpoint at http://localhost:11434). The Recipe Book entry routing-sensitive-data-to-a-local-model carries the current task-definition syntax. The task’s non-sensitive steps can still use the cloud default; the data-flow line is drawn around the sensitive step specifically.
# Claude Code CLI pattern: session invokes local model for the sensitive step $ollama run llama3.2 -p "summarize this health note: <sensitive text>" # Cowork-tab pattern: scheduled task targets the local HTTP endpoint $curl http://localhost:11434/api/generate -d '{"model":"llama3.2","prompt":"..."}'
Verify the data did not leave the machine
This is the step most students skip. Do not skip it.
- Ollama: watch ~/.ollama/logs/server.log during the run. You should see the request land against your local model with no outbound HTTP calls related to the inference. You can also, before the run, disable network on your machine briefly and confirm the local call still succeeds; this is the strongest possible evidence.
- LM Studio: LM Studio’s “Developer” tab shows inbound requests to the local server; you should see the request appear there and nowhere else.
- Both paths: a packet-sniffing tool like tcpdump or a lightweight alternative is overkill for the drill; watching the app-level logs is sufficient.
# watch the Ollama server log while the drill runs $tail -f ~/.ollama/logs/server.log
After-action
Add one line to Section 2 of the posture document: “Re-routed [name of flow] from [cloud-default] to [local model] on [date]. Verified local-only via [log path / app log].”
If your local model’s quality is not adequate for a sensitive flow you genuinely need
The options, in order: (1) redact or generalize the sensitive specifics until the remaining task is personal-class, then route to your vetted cloud model; (2) use a larger local model that fits on your hardware (check what your machine can run before giving up on local); (3) decide the task genuinely cannot be automated and do it yourself. What you do not do is “just this once” route the raw sensitive data to the cloud. The Module 9 posture is categorical for a reason — exceptions are the thing posture is designed to prevent.
Try it — Classify, route, verify RECIPE
deliverables: completed data-classification table; Section 2 of /capstone/security-posture.md; pipeline-specific portion of Section 3 finished; one sensitive-data re-routing drill completed and verified, with one sensitive-data flow actually re-routed to a local model
Part 1 — Walk and classify.
Open /resources/module-09/data-classification-table/. Walk every artifact, folder, and input source from the checklist in Content Block 4. Fill every column. Be honest on “current routing” and “gap?”. Do not combine rows to save typing — each artifact you will route differently should be its own row.
Part 2 — Draft Section 2 of the posture document.
In /capstone/security-posture.md (template at /resources/module-09/security-posture-template/), fill Section 2 — Data classification:
## 2. Data classification I classify data my agents touch into three buckets, with routing rules: - **Public** (e.g., web pages, news, public repos): any model. - **Personal** (e.g., my drafts, my calendar, my inbox): vetted cloud default (provider named in workstation manifest) or local model. - **Sensitive** (e.g., medical, financial, others’ private info, credentials beyond Section 4): local model only, or no agent at all. **Artifacts classified (summary — full table in `/resources/module-09/your-completed-worksheets/data-classification-table.md`):** - `/capstone/pipeline-v1/` → personal-class folder (inherits). - Inbox (Module 5) → personal, with sensitive sub-category for emails matching <pattern>. See Section 3 for trust-boundary notes. - Calendar (Module 5) → personal. - <any sensitive-class flows that need special routing, named> **Gaps found and closed:** - <list> **Gaps found and deferred (with deadline):** - <list — there should be very few; any deferred gap has a date>
Part 3 — Re-route one sensitive flow and verify.
Pick the highest-impact gap. If none, run the drill on a synthetic sample. Follow the recipe callout above step by step — pre-flight, re-route, verify the call stayed local via the app-level log. Add the after-action line to Section 2.
Part 4 — Finish Section 3.
In Section 3 of the posture document, add the pipeline-specific portion from Content Block 6: per-stage hardening, shared-state-folder hardening, and the classification of the pipeline overall. Mark Section 3 complete with today’s date.
Done with the hands-on?
When the recipe steps and any activity above are complete, mark this stage to unlock the assessment, reflection, and project checkpoint.
Quick check — quiz
Four questions. Tap a question to reveal the answer and the reasoning.
Show explanation
Answer: C. A Wikipedia article is public regardless of the student’s embarrassment — the “stranger could read this tomorrow” test already applies, because a stranger can read it tomorrow. (A), (B), and (D) are all sensitive under the Module 9 definition: (A) is a third party’s mental-health detail, (B) is financial information that could combine with other data to create real harm, (D) is private information the student does not own and was not given permission to redistribute.
Show explanation
Answer: B. Folder classification inherits the strictest rating of any content. This is the Module 9 rule that makes routing mechanical at the folder level and that explains why shared-state folders need careful hygiene. (A) is wrong — majority does not decide. (C) misses the point: any agent that reads the folder reads all its files. (D) is incorrect; a summary that includes a sensitive detail is sensitive.
Show explanation
Answer: B. The rule is categorical, and the two named exits are the ones the lesson endorses. (A) is exactly the posture decay the module is designed to prevent. (C) is not an action. (D) is a reasonable student choice but is stricter than the rule — local models are fine for sensitive data, and “sensitive data should never be processed by any agent at all” is not the Module 9 posture unless the student decides that for themselves.
Show explanation
Answer: C. App-level logs are sufficient verification for this class of check; packet capture is overkill for a one-student system. (A) skips verification, which is how routing claims end up fictional. (B) is unnecessary for this tier. (D) is not a verification method — you cannot take the model’s word for where it ran.
Reflection prompt
What the table actually revealed
Write a short paragraph (4–6 sentences) in your journal or my-first-loop.md in response to the following: Walking the data-classification table, what surprised you most — the artifact you expected to be sensitive but turned out to be personal, or the artifact you had treated as personal and now realize is sensitive? How many “gaps” did your table reveal, and what does the existence of those gaps tell you about how you had been routing data before this lesson? If you had to sum up the posture change in one sentence — “I used to route X; now I route Y” — what is the sentence?
The purpose is to notice the shift from “cloud by default because it’s better” to “data class by default, with local and cloud as the two legitimate destinations depending on the class.” Most students leave this lesson with a meaningful change in their default routing, not just in their documentation.
Project checkpoint
By the end of this lesson, you should have: a data-classification table filled in for every artifact and input source the agents touch; Section 2 — Data classification written in /capstone/security-posture.md; Section 3 — Trust boundaries finished (pipeline-specific portion added, including the pipeline’s own classification, per-stage segregation-and-refusal hardening, and shared-state-folder hardening, with the date marking Section 3 complete); one sensitive-data flow re-routed to a local model, with the local-only verification performed via the app-level log and the after-action line logged in Section 2; and the reflection paragraph in your journal or my-first-loop.md. Do not proceed to Lesson 9.5 until Section 2 is written in your own voice, the re-routing drill has actually run, and Section 3 carries a completion date.
Instructor / parent note
This lesson does three jobs. First, it installs the three-class taxonomy — public, personal, sensitive — as the load-bearing rule of Module 9’s data posture, with the single principle that holds it together: the lower the bucket’s tolerance for disclosure, the tighter the routing rule. Second, it walks the student through every artifact their agents touch and produces an honest data-classification table, where the “current routing” column is the honesty test. Third, it converts one sensitive-data flow from cloud-default to local, with a verification step at the app log — because a re-routing the student has never actually performed is a re-routing they will postpone the first time it matters.
Watch for three failure modes. The first is the student who under-classifies sensitive data because they do not, in the moment, feel threatened — the friend’s health remark that “isn’t a big deal,” the parent’s card digits that “aren’t enough to do anything with.” Walk them back to Content Block 2: sensitivity is a property of what the data could become, not how the student feels right now, and the lesson’s rule is to rate up when uncertain, especially for third-party data. The second is the student who labels the table honestly, finds a gap, and then skips the re-routing drill because “I know what I’d do if I needed to.” The drill is the deliverable, not the knowledge — insist on the actual re-routing, with the verification line logged in Section 2. The third is the student who wants to make an exception for a cloud model they prefer (“but this one is so much better at medical summarization”). The lesson’s answer is the two named exits in Q3 — redact and route personal-class, or keep it local and accept the tradeoff. “Just this once” is the posture decay the module is built to prevent.
Parent prompt if the student’s table shows zero gaps: “Which agent reads email? Which folder does a third-party MCP touch? Walk me through what happened the last time you asked a cloud model to summarize something that mentioned another person by name.” Zero gaps on the first pass is either remarkable discipline or under-reading; the conversation usually surfaces which. Parent prompt if the student skipped the verification step: “Show me the log line that proves the sensitive data stayed local.” If they cannot produce it, the drill is not finished.
Next in Module 9
Lesson 9.5 — Cost, responsibility, and the incident loop.
Name a monthly AI-spend budget, set the provider hard cap to match, write the Cost posture and Incident loop sections, name the one human you will tell when something goes wrong, and run one realistic incident drill end-to-end through the four-step response loop. After the drill, you will freeze /capstone/security-posture.md as the ninth capstone artifact.