
Agent Chat using LangChain Part 2 – Token Streaming with WebSockets
We all dislike typing a message to an AI agent and then just staring at a spinning loader for ten seconds with zero feedback. It feels slow and lifeless.
In part 2 of this series, I’m going to walk you through how I added proper token streaming over WebSockets so users see the agent’s response appear word by word, just like in ChatGPT.
Why Token Streaming Matters
When Claude (or any modern LLM) generates a response, it doesn’t spit out the whole thing at once—it produces tokens one by one. If you wait for the complete response before sending anything to the client, the user is left hanging until it’s all done.
Streaming changes everything.
- It feels way faster, even if the total time is identical
- People can start reading right away
- They get immediate feedback that the agent is working
Why WebSockets
Sure, you could use Server-Sent Events for one-way streaming, and that would work fine for just tokens. But WebSockets give me bidirectional communication, which turned out to be perfect for showing real-time progress when the agent calls tools. For example, when it decides to run search_contacts, I can push a quick “Using search_contacts…” update before the tool even finishes.
The Backend Implementation
I built an AgentGateway in NestJS using the built-in WebSocket support with Socket.io. The main handler listens for incoming chat messages and streams everything back as it happens.
Implementing token streaming via WebSockets significantly improves the perceived speed of AI chat interfaces. Instead of waiting for a full response (the “spinning loader” effect), users see text appear word-by-word. This implementation uses NestJS with Socket.io on the backend to forward Anthropic API streams to an Angular frontend, while maintaining a REST fallback for network compatibility.
The real work happens in chatWithStreaming, which takes callbacks for tokens and tool progress. On the Anthropic side, I switched to their streaming API instead of the regular synchronous call.
This code snippet demonstrates how to switch from a synchronous API call to an asynchronous stream using Anthropic’s SDK. It iterates through incoming data packets (deltas) and triggers a callback function (onToken) to send text to the client immediately, rather than waiting for the full response.
I just forward each text delta to the callback, which emits it straight to the client. Once the stream ends, I grab the final message so I can process any tool calls that might still be pending.
Handling Authentication
WebSockets bypass the usual HTTP middleware, so I had to handle auth myself. When a client connects, they have 30 seconds to send an authenticate event with their JWT. If they do, I verify it and attach the user info to the socket. If not, I kick them off.
The User Experience Difference
Honestly, it’s night and day. Now users see:
- An immediate “Agent is thinking…” indicator
- Text appearing smoothly as Claude generates it
- Live updates like “Using search_contacts…” when tools run
- Confirmation when tools finish
- The final polished response with any links or buttons
It stops feeling like you’re submitting a form and waiting for processing. It actually feels like chatting with a helpful assistant.
Keeping a REST Fallback
I didn’t want to break things for environments where WebSockets might be blocked, so I kept the original REST endpoint. It still works exactly like before—just returns the full response once everything is done. The WebSocket version is purely additive; the frontend picks the best option available.
The Angular Side
In Angular, I wrapped everything in an AgentService that manages the socket connection and exposes RxJS observables the component can subscribe to. I used BehaviorSubjects so late subscribers still get the current stat.
In the template, while streaming I show a special bubble with a dashed border and a little typing animation.
The sendMessage method checks if the socket is connected and uses WebSockets if possible, otherwise falls back to the REST call.
This way the feature works everywhere, and users only miss the fancy streaming if their network blocks WebSockets.

Wrapping Up
Adding proper token streaming completely changed how the agent feels—from “kinda slow but functional” to “this is actually fun to use.” It took some extra work (auth on sockets, managing state, coordinating all the callbacks), but the improvement in user experience was absolutely worth it. If you’re adding any kind of AI chat to your app, do yourself a favor and build streaming in from the beginning. You’ll thank yourself later.
In the next part of this series, I’ll show some fun examples of this at work.


