|
|
Claude Prompt Caching
Author: Venkata Sudhakar
Every Claude API call charges for every input token - your system prompt, background documents, conversation history. If a product FAQ chatbot sends a 5,000-token product manual with every customer question, you pay for those 5,000 tokens ten thousand times a day. Prompt caching solves this: mark the large static portion of your prompt with a cache_control header and Claude stores those tokens server-side for up to 5 minutes. Subsequent requests that include the same cached prefix cost only 10% of normal input pricing for that portion. For any application that sends the same large context repeatedly, this single change cuts input token costs by 80-90%. You enable caching by adding a cache_control block with type "ephemeral" to any content block you want cached. Put static content first (system instructions, reference documents) and dynamic content last (user question). The cache lasts 5 minutes from the last use - each hit resets the timer, so a busy chatbot has effectively permanent caching during peak hours. The API response shows cache_read_input_tokens (tokens served from cache at 10% cost) and cache_creation_input_tokens (tokens written to cache at 125% cost, charged once). The below example shows a home appliance support chatbot that caches its product manual. The same manual is referenced by thousands of customers daily - caching it cuts the input cost from full price to 10% for every request after the first.
Running three customer questions to observe cache behaviour,
It gives the following output,
Q1: My machine is showing E2. What should I do?
[CACHE MISS (first call)] cached=0 written=412 regular=38
Answer: E2 is a drainage error. Check your drain hose is not kinked or blocked,
then clean the pump filter behind the bottom-front panel...
Q2: What programme should I use for my sports kit?
[CACHE HIT] cached=412 written=0 regular=21
Answer: Use the Sports programme - designed for technical fabrics and activewear,
it runs for 1 hour 20 minutes...
Q3: How often should I clean the pump filter?
[CACHE HIT] cached=412 written=0 regular=22
Answer: Clean the pump filter quarterly. It is located behind the bottom-front
panel of the machine...
=== Daily Cost for 10,000 Customer Questions ===
Without caching: $3.20
With caching: $0.32
Saving: 90%
# Q1: Cache MISS - manual written to cache (charged once at 125%)
# Q2 and Q3: Cache HIT - manual served at 10% cost
# 90% cost reduction at 10,000 daily calls = saves $2.88/day = $86/month
Three rules for maximum cache efficiency: always put static content before dynamic content so the cached prefix is identical across requests; cache at the right granularity - cache the product manual, not the whole conversation history which changes every turn; and keep your system prompt consistent across all calls because even one character difference breaks the cache. Prompt caching works best for: large knowledge bases queried by many customers, long system prompts with detailed instructions, and the growing conversation history in multi-turn agents where earlier turns are stable.
|
|