How I Learned to Build a Streaming Chat

December 30, 2025

reactstreamingchat

I was curious about how ChatGPT makes text appear instantly, so I decided to build it myself. Here's what I learned about streams, chunks, and the tricky 'buffer' problem.

I was curious about how apps like ChatGPT make text appear one word at a time. It's such a useful pattern—it feels much more alive than waiting for a loading spinner and then seeing a big block of text pop up all at once.

It turns out this is called streaming, and I decided to dive in and see if I could figure out how it works.

Seeing it in Action

Before I looked at any code, I wanted to build something that worked. Here is a small chat component I made. If you type something, you'll see the response "stream" in, chunk by chunk.

Chat Panel

Ready!

The "Garden Hose" Concept

The biggest thing I learned is that instead of the server sending one big "package" of data, it opens a connection and keeps it open.

Think of it like a garden hose. In a normal website, you ask for water, wait for the bucket to fill up, and then the server hands you the bucket. With streaming, the server just turns on the hose, and you start catching the water as soon as it comes out.

How the Data Flows

I made this diagram to help me visualize what's happening:

sequenceDiagram
    participant Me as My React App
    participant Server as The API

    Me->>Server: "Hey, can we start a chat?"
    activate Server
    Server-->>Me: "Sure! Here is a piece: 'Hello '"
    activate Me
    Me->>Me: Show 'Hello ' on screen
    Server-->>Me: "And another piece: 'there!'"
    Me->>Me: Show 'Hello there!'
    Server-->>Me: "That's everything!"
    deactivate Server
    deactivate Me

How I Built the Server

To make this work, the server has to use a special format called Server-Sent Events (SSE). It basically just means every message starts with data: and ends with two new lines (\n\n).

Here is the simple version of the code I wrote for my server:

// Setting up the "hose"
const stream = new ReadableStream({
  start(controller) {
    for (const word of words) {
      // We wrap each word in the "data: " format
      const message = `data: ${JSON.stringify({ text: word })}\n\n`;
      controller.enqueue(new TextEncoder().encode(message));
    }
    controller.close(); // Turn off the hose
  },
});

return new Response(stream, {
  headers: { "Content-Type": "text/event-stream" },
});

The Tricky Part: The Buffer

This is where I got stuck for a while! Sometimes, the browser receives half of a message. Imagine getting data: {"te and then having to wait a millisecond for the rest. If you try to read that as a message, your app will break because it's not finished yet.

I learned that I need a "buffer"—a temporary spot to hold onto these broken pieces until they are complete.

How I handled the broken pieces:

Catch a chunk of data.
Add it to my buffer string.
Check if there is a \n\n (the "end of message" marker).
If yes, take that full message out and show it.
If no, just wait for more data to arrive.

let buffer = "";

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  buffer += new TextDecoder().decode(value);

  const parts = buffer.split("\n\n");

  // Save the last part back to the buffer
  // (it might be a half-finished message!)
  buffer = parts.pop() || "";

  for (const part of parts) {
    if (part.startsWith("data: ")) {
      const data = JSON.parse(part.slice(6));
      setMyMessage(prev => prev + data.text);
    }
  }
}

Easier Ways to Do This

While I loved learning how to do this from scratch, I found out that people have already built amazing tools that handle all the "broken pieces" and buffering for us:

TanStack AI: A really cool tool that makes streaming feel like normal React state.
Vercel AI SDK: Great if you are connecting to things like OpenAI or Anthropic.
fetch-event-source: A library that helps you use POST requests with streaming (which the default browser tools sometimes struggle with).

Wrapping Up

Building this taught me that "real-time" isn't magic—it's just about being ready to handle data as it arrives instead of waiting for it all at once.

If you're curious about this, I highly recommend trying to build it yourself first. It makes you realize how much work those libraries are doing behind the scenes!