Local-Model Latency & Cost Worksheet
Use this worksheet if the HTML calculator is unavailable. You will run three prompts through your local model, record the numbers your runner prints, and do the break-even math by hand. Keep this page — the project checkpoint reads the throughput figure off this sheet.
Your setup
Usability thresholds — a reminder
Three prompts — record the numbers your runner prints
PROMPT 1short · ~20 tokens in, ~80 tokens out
“In one sentence, explain what a local LLM runner does. Do not use the name of any specific product.”
PROMPT 2medium · ~150 tokens in, ~400 tokens out
“Summarize the following three paragraphs into five bullet points for a study guide. [Paste any three paragraphs from an article or textbook here.]”
PROMPT 3long · ~600 tokens in, ~800 tokens out
“Given the following two-page document, produce a one-page brief with: a three-sentence summary, five key points, and three open questions a reader should still have. [Paste a ~600-word document here.]”
Break-even vs. cloud
Use your medium prompt throughput as your working number. Assume a representative cloud price of $0.003 per medium prompt (varies by provider and model; this is a rough midpoint). Local electricity for the same prompt on a laptop is roughly $0.00005 — effectively zero at this scale. The break-even question is: how many prompts per day before local saves you meaningful money?
Read row 7 carefully. If your daily use is well below the number in row 7, the cost case for local is weak — stay cloud-first and use local for privacy or offline only. If your daily use is at or above that number, local starts to pay for the setup effort. Neither answer is wrong; both are defensible.