tl  tr
  Home | Tutorials | Articles | Videos | Products | Tools | Search
Interviews | Open Source | Tag Cloud | Follow Us | Bookmark | Contact   
 Generative AI > OpenAI API > OpenAI Prompt Caching - Reducing Costs on Repeated Context

OpenAI Prompt Caching - Reducing Costs on Repeated Context

Author: Venkata Sudhakar

OpenAI Prompt Caching automatically reduces the cost of API calls when the same large context prefix is sent repeatedly. When a prompt begins with a cached prefix, OpenAI charges a discounted rate (50 percent off for gpt-4o) for the cached tokens instead of full price. ShopMax India uses prompt caching to serve hundreds of daily customer queries against a large product catalogue system prompt, reducing monthly API costs significantly without any code changes.

Prompt caching is automatic on gpt-4o, gpt-4o-mini, and o-series models - no explicit configuration is needed. The cache is keyed on the exact text of the prompt prefix (minimum 1024 tokens). To benefit from caching, always place stable, reusable content at the start of the prompt (system instructions, product catalogue, policy documents) and append the variable user query at the end. The usage object in the response includes a prompt_tokens_details field with cached_tokens showing how many tokens were served from cache.

The below example shows ShopMax India sending multiple customer queries against a large product catalogue prompt and observing cache hits in the usage statistics.


It gives the following output,

Query 1: What is the price of the Sony headphones and do they suppor...
  Answer: The Sony WH-1000XM5 Headphones are priced at Rs 29,990. Yes, they feature industry-leading Active Noise Cancellation...
  Tokens: 312 prompt (0 cached), 58 completion

Query 2: Is EMI available on the Samsung TV? What are the options?...
  Answer: Yes, EMI is available on the Samsung 65-inch LED TV (Rs 89,999). You can choose from 3, 6, or 12-month plans at 0 percent interest...
  Tokens: 312 prompt (256 cached), 62 completion

Query 3: Which laptop do you have under Rs 60000 and what are the sp...
  Answer: ShopMax India offers the Dell Inspiron 15 Laptop at Rs 58,499 - Intel i5 processor, 16GB RAM, and 512GB SSD...
  Tokens: 312 prompt (256 cached), 55 completion

To maximise cache hits, always structure prompts with the static prefix first and the dynamic user content last - even a single character change in the cached prefix invalidates it. Cache entries persist for 5-10 minutes of inactivity, so high-traffic applications will see near-100 percent cache hit rates. For very large static contexts (full product catalogues, legal documents), consider using a single shared system prompt across all users rather than personalising it, to maximise cache reuse. Monitor cached_tokens in production to measure actual savings - a 300-token system prompt with 1000 daily queries saves roughly 270,000 tokens at the 50 percent discount rate. Prompt caching also applies to multi-turn conversations where previous turns are re-sent as history.


 
  


  
bl  br