tl  tr
  Home | Tutorials | Articles | Videos | Products | Tools | Search
Interviews | Open Source | Tag Cloud | Follow Us | Bookmark | Contact   
 Generative AI > Large Language Models > Claude Streaming Responses

Claude Streaming Responses

Author: Venkata Sudhakar

Streaming with Claude works the same way as with other LLM providers but uses the Anthropic SDK stream helper which handles the SSE protocol cleanly. When a Claude response takes 3-5 seconds to generate, showing a blank screen until completion makes your application feel unresponsive. With streaming, the first words appear in under a second and continue flowing until the answer is complete. For any user-facing business application - an HR policy chatbot, a legal document explainer, a sales proposal writer - streaming is the difference between a product that feels alive and one that feels broken while it thinks.

The Anthropic Python SDK provides a clean context manager for streaming: client.messages.stream(...). Inside the with block, you iterate over text_stream which yields each text chunk as it arrives. The stream object also accumulates the full response so you can access the complete text, usage statistics, and stop reason after the stream completes. This is useful for logging and billing even when you are streaming to the end user in real time.

The below example shows an HR self-service portal where employees ask about leave policies. The answer streams live to the terminal - in a web app this would stream to the browser using Server-Sent Events.


It gives the following output - words appear progressively as Claude generates them,

Employee: I am expecting a baby in July. How much maternity leave can I take?

HR Assistant: Congratulations! Under MegaCorp's Leave Policy 2025, you are
entitled to 26 weeks (approximately 6 months) of fully paid maternity leave
for your first or second child.

Here is what you need to know:
- The 26 weeks can begin up to 8 weeks before your expected delivery date
- You will continue to receive your full salary during this period
- Your job position is protected while you are on maternity leave

To apply, submit a Maternity Leave Application form to HR at least 8 weeks
before your intended start date, along with a medical certificate from your
doctor confirming the expected delivery date.

[Time to first token: 0.31s | Total: 3.8s | Tokens: 187 in, 142 out]

# First words appeared in 0.31 seconds instead of waiting 3.8s for full response
# Employee sees the answer building live - far better user experience

The FastAPI endpoint streams Claude tokens directly to the browser via SSE,

GET /hr/ask?question=How+many+sick+days+do+I+get

data: Under
data:  MegaCorp
data: 's
data:  Leave
data:  Policy
...(continues token by token)...
data: [DONE]

# Each "data:" event fires the browser onmessage handler
# The answer builds word by word in the UI - no spinner, no wait

Use streaming for any Claude response longer than a short sentence that appears in a user-facing interface. The perceived speed improvement is dramatic even though total generation time is the same. For batch processing or background jobs where no human is watching, skip streaming - non-streaming calls are slightly simpler to handle and you do not need the token-by-token delivery. The Anthropic SDK also provides an async streaming API (client.messages.stream in an async context) for FastAPI and other async frameworks - always prefer the async version in production web servers to avoid blocking the event loop.


 
  


  
bl  br