Local-Model Latency & Cost Worksheet

Module 2, Lesson 2.2 paper fallback · pencil-and-paper

Use this worksheet if the HTML calculator is unavailable. You will run three prompts through your local model, record the numbers your runner prints, and do the break-even math by hand. Keep this page — the project checkpoint reads the throughput figure off this sheet.

Your setup

runnerOllama / LM Studio / other: ________________

model pullede.g., llama3.1:8b-instruct-q4_K_M: ________________

machineOS + chip + RAM: ________________

date run________________

Usability thresholds — a reminder

comfortable30+ tok/s, first token < 2 s. Iterative work feels fine.

workable8–29 tok/s, first token 2–8 s. Slow but usable for single-shot tasks.

below threshold< 5 tok/s or first token > 30 s. Pull a smaller model, or move the task to cloud.

Three prompts — record the numbers your runner prints

PROMPT 1short · ~20 tokens in, ~80 tokens out

“In one sentence, explain what a local LLM runner does. Do not use the name of any specific product.”

first token (s)

tok/sec

total time (s)

verdictcomfortable / workable / below

PROMPT 2medium · ~150 tokens in, ~400 tokens out

“Summarize the following three paragraphs into five bullet points for a study guide. [Paste any three paragraphs from an article or textbook here.]”

first token (s)

tok/sec

total time (s)

verdictcomfortable / workable / below

PROMPT 3long · ~600 tokens in, ~800 tokens out

“Given the following two-page document, produce a one-page brief with: a three-sentence summary, five key points, and three open questions a reader should still have. [Paste a ~600-word document here.]”

first token (s)

tok/sec

total time (s)

verdictcomfortable / workable / below

Break-even vs. cloud

Use your medium prompt throughput as your working number. Assume a representative cloud price of $0.003 per medium prompt (varies by provider and model; this is a rough midpoint). Local electricity for the same prompt on a laptop is roughly $0.00005 — effectively zero at this scale. The break-even question is: how many prompts per day before local saves you meaningful money?

1. Cost per medium prompt — cloud given $ 0.003

2. Cost per medium prompt — local (electricity) given $ 0.00005

3. Savings per prompt if local works for the task #1 − #2 $ __________

4. Your daily prompt count (a realistic guess) estimate __________ per day

5. Daily savings if all of #4 ran local #3 × #4 $ __________ / day

6. Monthly savings #5 × 30 $ __________ / mo

7. Prompts/day at which local saves > $5 per month 5 ÷ (30 × #3) __________ per day

Read row 7 carefully. If your daily use is well below the number in row 7, the cost case for local is weak — stay cloud-first and use local for privacy or offline only. If your daily use is at or above that number, local starts to pay for the setup effort. Neither answer is wrong; both are defensible.

Verdicts

Local is comfortable for tasks of this size on my machine:

Local is too slow for:

Throughput I will record in my-first-loop.md: ______ tok/s on ______

Based on row 7, my cost case for local is: strong / neutral / weak

Student name: _________________________ Date: ___________