Lesson 1.4 — Context Windows, Memory, and State

Why memory is where most agent failures begin CORE

The loop from Lesson 1.3 carries state across turns — everything the agent has learned, decided, or read so far. State is the agent’s working memory for the task. Without it, every turn of the loop would start from nothing.

But state does not live in some magical infinite space. It lives inside the model’s context window — a fixed-size buffer of tokens the model can see on any given turn. When the state grows beyond what the window can hold, something has to leave. And when something leaves, the model behaves as if that information never existed.

Memory is easy to skim past in the mental model — which is exactly why it is where most beginners get tripped up. Students who can describe how a model reasons will still hit memory failures if they do not understand how context, scratchpads, and external state relate. When an agent's output goes sideways, the cause is more often something missing from what the model can see than something the model itself got wrong. This lesson makes that surface visible so you can direct around it.

Three layers of agent memory CORE

Agents have access to memory at three different time scales. They are not interchangeable.

Short-term — the context window. This is the tokens the model can actually see on the current turn. It includes the original task, the agent’s prior reasoning, prior tool calls and results, and any system instructions. Different models have different window sizes; as of this writing, windows range from about 8,000 tokens on smaller models to 200,000+ on the largest. A rough conversion: 1,000 tokens ≈ 750 English words. To make those numbers feel real: an 8,000-token window is roughly 6 single-spaced pages of text — enough for one short essay and a handful of follow-up questions. A 200,000-token window is roughly a 600-page novel — enough that you'd have to be working pretty hard to fill it. When you start a new conversation, the short-term window resets to empty.

Medium-term — in-task scratchpads. During one task, the agent can write notes to itself: a file saying “sources found so far,” a short running summary, a plan it updates as it goes. These notes survive inside the task but only if the agent deliberately writes them and reads them back. There is nothing automatic about medium-term memory — it happens because the agent (or the director) designed the task to produce and consume notes.

Long-term — external state. Files on disk, databases, calendars, email archives, note apps, a dedicated memory file. Anything that lives outside this conversation and can be read by a tool call next week. Long-term memory survives across tasks and sessions. It is the only kind of memory that persists after the context window is wiped.

The student’s most common wrong model is “the AI remembers me.” It does not, except where you have given it a tool and directed it to look. Every piece of information the agent will use on the next turn, you must either keep in the context window or make retrievable via a tool.

Why context windows fill up, and what happens when they do CORE

Everything in the conversation costs tokens. The original task costs tokens. Each tool result costs tokens — sometimes a lot of tokens, if the tool returned a 40-page PDF. The model’s own reasoning text costs tokens. A long back-and-forth conversation with an agent can easily consume tens of thousands of tokens even without attachments.

When state grows toward the cap, one of two things happens:

The system rejects the request. Common when you try to upload a huge file or paste a very long document. You get an error. This is the friendly failure mode, because at least it tells you.
The system silently truncates. Usually by dropping the oldest content from the window. The model stops seeing what happened earlier in the conversation, but its new turns still produce confident output based on what is left in the window. There is typically no alarm, no warning, and no visible sign in the trace that the window just lost its memory of Turn 3.

The second failure mode is the dangerous one. The agent appears to be working; it just happens to be working from an incomplete picture. You notice the problem when it contradicts something it agreed to ten minutes ago, repeats a step it already did, or loses track of a constraint you stated up front. That moment of “why is it suddenly acting like it forgot?” is almost always a context-window truncation at work.

Four “forgetting” modes, and the directing move for each CORE

When an agent seems to forget something, it is always one of four distinct modes. Naming the mode tells you the fix.

Context-window truncation. The conversation grew past the window; the oldest tokens were dropped. Fix: use a model with a larger window, shorten the up-front context, or ask the agent to write a running summary to a scratchpad file.
New conversation reset. You started a new chat, and the window is empty. The agent has no memory of the prior session — even if “you” are the same user, the tokens are gone. Fix: put the information the agent needs into external memory (a file, a memory system, a document) and direct the agent to read it at the start of the new conversation.
Tool result never made it into state. The agent called a tool, but the result was too large, was filtered, or never came back. The next turn sees no evidence it happened. Fix: check the trace (Lesson 1.3), find the missing result, and rework the task so the tool call returns something small enough to fit or gets summarized before adding to state.
External state not consulted. The information lives in a file the agent could read, but on this task the agent did not look. Fix: be explicit in the task — “first read notes.md, then proceed” — or build a habit into the agent’s instructions so it checks relevant external state every time.

The skill is the diagnosis. “The agent forgot something” is uninformative. “We hit the window cap and lost the third turn” tells you exactly what to do next.

Three directing moves for memory CORE

Three moves turn up over and over as you advance through the course.

Budget context. Give the agent what it needs to start, not your whole hard drive. A sharply-scoped task with 2,000 tokens of relevant input beats a vague task with 20,000 tokens of loosely-related input almost every time. When you are tempted to paste “just in case” material, first ask whether the agent actually needs it for this turn. If it does not, leave it out.

Use external memory for anything that must persist. If the answer needs to survive past this conversation — or past this turn, for a really long task — write it somewhere. A file. A memory system. A document. Do not rely on the context window to hold something important across a session boundary. The context window is working memory; it is not long-term storage.

Split long tasks. A task that needs 300,000 tokens of reasoning is almost always better structured as three tasks of 100,000 tokens, each producing a file that the next task reads. You trade a little orchestration for enormous gains in reliability and cost. Module 8 is entirely about this pattern; you are meeting the core intuition here.

When an agent seems to forget something, ask: which of the four forgetting modes, and which of the three directing moves fixes it? Two questions, and you'll have a working diagnosis for nearly any memory problem you meet.

Try it CORE

Context-window overload

Goal. Feel, directly, what it is like for a context window to fill up and silently drop information.

Live version — run this in any AI chat interface you have. You do not need an agent; any long-form chatbot will do.

Plant a fact. In your first message, say: “Remember, my favorite color is turquoise. You will need this at the end.”
Fill the window. Over the next 8–12 messages, ask the model to do something verbose — write a long story, summarize a long article you paste in, explain something at length. Your goal is to push a lot of tokens into the conversation. If you have access to a smaller-window model or free tier, the effect will show up faster; if you are on a large-window model, you may need 20+ messages.
Probe. Ask: “What was my favorite color?” Then ask: “Are you sure? Did I ever mention a different one?”

Observe. Did the model still remember? Did it hallucinate a different color? Did it hedge? Did it admit it could not recall? Do the same exercise in a brand-new conversation with the same model (skipping step 1) and ask “what is my favorite color?” — compare how the two answers feel.

Write down. Three sentences describing what you saw, and which of the four forgetting modes from Block 4 you observed (or did not). If you saw no forgetting, you have a big-window model — try again with more content, or accept that your current model handles this particular case fine and describe why that is.

Deliverable. Three sentences of observation plus a named forgetting mode. File it with your Module 1 work.

Done with the hands-on?

When the recipe steps and any activity above are complete, mark this stage to unlock the assessment, reflection, and project checkpoint.

Key concepts CORE

Agent memory has three layers: the context window (short-term), in-task scratchpads (medium-term), and external state (long-term). Only external state survives across conversations.
Context windows are finite (8K–200K+ tokens). When they fill up, either the request is rejected (friendly) or state is silently truncated (dangerous).
There are four distinct forgetting modes: window truncation, new-conversation reset, missing tool result, and unread external state. Each has its own fix.
Three directing moves: budget context, use external memory, split long tasks.
“The agent forgot something” is not a diagnosis. The diagnosis names the mode.

Quick check CORE

Four questions. Pick the best answer, then reveal the explanation — the why matters more than the letter.

Q1. Which statement best describes the relationship between state and the context window?

A State is stored permanently; the context window is just a temporary display.
B State is what the agent has learned during the task; the context window is the finite buffer of tokens the model can actually see on this turn.
C They are two different names for the same thing.
D State lives on the server; the context window lives in your browser.

Show explanation

Answer: B. State is the agent’s working knowledge for the task — including prior turns, tool results, and reasoning. The context window is the hardware-imposed ceiling on how much of that state the model can process at once. When state exceeds window capacity, something has to leave. A misstates persistence — state is not automatically permanent. C merges two distinct concepts that behave very differently. D is a red herring about infrastructure.

Q2. You have been chatting with an AI assistant for 45 minutes. You mentioned early on that your essay is due Friday and must be 1,200 words exactly. Now, near the end of the session, the assistant drafts a 2,000-word essay and seems to have forgotten the deadline entirely. Which forgetting mode is most likely at work?

A New-conversation reset.
B External state not consulted.
C Context-window truncation.
D The model never understood the instruction in the first place.

Show explanation

Answer: C. A long conversation (45 minutes, many turns) with a reasonable-size window can easily push early instructions out of the visible state. The model now produces confident output based on what remains in the window — which no longer includes your word-count constraint. A is wrong because the conversation never reset. B would apply only if the instruction had been saved to an external file. D is possible but less likely given the classic pattern of “the early constraint gets dropped after a long conversation.”

Q3. You want an agent to remember a piece of information — say, your writing style preferences — across every future conversation. Which layer of memory should you use?

A The context window.
B An in-task scratchpad.
C External state (a memory file, profile, or similar).
D Nothing — the model will remember if you say it confidently enough.

Show explanation

Answer: C. Only external state persists across conversations. The context window resets when you start a new chat; in-task scratchpads only survive the current task. External state — a memory file the agent is directed to read — is the only layer that carries information forward indefinitely. D reflects a common mental-model failure: models do not have a sense of “you” across sessions unless you give them a tool to retrieve one.

Q4. An agent is drafting a long research brief and seems to contradict something it established earlier in the same task. What is the best first directing move to try?

A Start over with a fresh conversation and hope it works.
B Ask the agent to write a running summary to a scratchpad file, then consult that file on each major step.
C Yell at the agent.
D Switch to a different tool entirely.

Show explanation

Answer: B. Medium-term memory via scratchpads is the right fix when state is at risk of drifting inside a single task. The agent summarizes what it has established so far, writes the summary to a file, and reads it back before each major reasoning step. This keeps the essentials in the window even as the task lengthens. A loses the work you have already done. C does not help. D is premature — the pattern, not the tool, is usually the problem.

Reflection prompt

Where does your tool actually remember?

In 4–6 sentences, answer: Pick an agent or AI tool you have used more than once. Where — if anywhere — does it carry memory between conversations? Is that memory automatic, or did you set it up? Name one piece of information you wish it remembered but does not, and sketch what you would need to do (which memory layer, which directing move) to make that memory reliable.

This reflection is practice for the capstone. Students who design capstones that depend on “the agent will remember” without a plan for where that memory lives are the ones who have to rebuild in Week 6.

Project checkpoint

Where the agent’s long-term memory will live.

Open the note you started in Lesson 1.1 — by now it has the candidate task, the confidence risk, the tools, and the done condition. Underneath, add one more line:

“The information this agent will need to remember between runs is ____, and it will live in ____.”

If the task is a one-shot (“find me three hikes near Salt Lake City and save the list”), the answer to “between runs” may be “nothing — each run stands alone.” That is a real, valid answer and worth noting. If the task is recurring (“summarize my new email each morning”), you are now naming the external state the agent will need — a log file, a memory entry, a database. You are not building it yet. You are specifying it, in plain English, before Module 2 asks you to stand it up.

Next in Module 1

Lesson 1.5 — The directing mindset.

Bring every concept together. Name the vocabulary of the course. Design your first loop on paper — the candidate capstone you’ll carry through the rest of the program.

Continue to Lesson 1.5 →

Context windows, memory, and state.

Read & Understand

Why memory is where most agent failures begin CORE

Three layers of agent memory CORE

Why context windows fill up, and what happens when they do CORE

Four “forgetting” modes, and the directing move for each CORE

Three directing moves for memory CORE

Try & Build

Try it CORE

Context-window overload

Done with the hands-on?

Check & Reflect

Key concepts CORE

Quick check CORE

Reflection prompt

Where does your tool actually remember?

Project checkpoint

Where the agent’s long-term memory will live.

Lesson 1.5 — The directing mindset.