tl  tr
  Home | Tutorials | Articles | Videos | Products | Tools | Search
Interviews | Open Source | Tag Cloud | Follow Us | Bookmark | Contact   
 Generative AI > OpenAI API > OpenAI Streaming Responses in Python

OpenAI Streaming Responses in Python

Author: Venkata Sudhakar

OpenAI's streaming API lets your application receive tokens as they are generated, rather than waiting for the full response. This makes your application feel faster and more responsive. For a customer service chatbot at ShopMax India, streaming means customers see replies forming in real time instead of waiting several seconds.

To enable streaming, pass stream=True to client.chat.completions.create(). The API returns a generator that yields ChatCompletionChunk objects. Each chunk contains a delta field with the incremental text content. You collect these deltas and print or display them as they arrive.

The following example shows a streaming chat completion for a ShopMax India order status query. Tokens arrive one by one and are printed without a newline, creating a live-typing effect in the terminal.


It gives the following output,

Your order #SM-78234 placed in Mumbai is currently being processed at our Andheri
warehouse. Expected delivery to your address is within 2-3 business days. You will
receive an SMS notification once the order is dispatched.

Total characters received: 198

Always collect the full streamed response into a variable if you need to log or process it after display. For production deployments at ShopMax India, wrap the stream loop in a try-except block to handle network interruptions gracefully, and use stream.close() in a finally clause to release the connection.


 
  


  
bl  br