Text Streaming on Frontend: From JavaScript Protocols to Real-Time LLM Chat

19 de janeiro de 2026•11 min de leitura

Introduction

If you’ve ever used ChatGPT, Claude, or any other modern LLM chat interface, you are familiar with the almost “magic” experience of asking a question and watching the answer flow to the screen, as if someone is typing it word by word in real time. You don’t have that distressful feeling of staring at a blank page with a spinner while waiting for the model to answer. The answer just “flows”.

This is not just an aesthetic choice. It became the industry standard. When users interact with conversational AIs nowadays, they expect this behavior. So when we started implementing the Agent Chat feature at Looqbox, this was naturally one of the first requirements for the frontend interface.

But how does that work under the hood? How do we get data that comes in chunks from the server and progressively display it on the screen in real time? In this article we will explore some of the base concepts used in this scenario, and implement a minimal example of a real time chat interface in pure Vanilla Javascript, without any external libraries, demonstrating the native browser streaming capabilities. Here are the key concepts we will need to understand for this implementation:

The Iterator and Iterable Javascript protocols;
Generators and Async Generators;
Server-Sent Events (SSE);
The ReadableStream interface;

Let’s get to it!

What are the Iterator and Iterable protocols in Javascript?

If you study or work with Javascript, you probably already use these protocols quite often without knowing, in the form of arrays, maps, sets, strings, and basically any collection.

An Iterator is an object that implements the Iterator Protocol. It is the most primitive protocol Javascript provides for creating objects that can be iterated over. On its lowest level implementation, an Iterator object implements a next() function, that returns an object containing the current value and done, which is a boolean flag that indicates whether the iteration is over:

const iterator = { 
 next() { 
  return { 
   value: 0, 
   done: false 
  } 
 }  
}

On this example, the next() function is static and always returns the same value. We can, however, add a modifier to this object, to demonstrate a basic iteration, visualizing that it delivers the values in the sequence, one at a time:

const iterator = { 
 counter: 0, 
 next() { 
  return { 
   value: this.counter++, 
   done: this.counter > 5 
  } 
 }  
} 
 
let next 
 
do { 
 next = iterator.next() 
 console.log(next.value) 
} while (!next.done) 
 
/* 
 Output:  
  
 0 
 1 
 2 
 3 
 4 
 5 
*/

Moving up from this concept, we have the Iterable object, which is an object that implements the symbol [Symbol.iterator]() that returns an Iterator object. This allows us to use any Javascript native iteration capabilities, such as for...of loop, the spread [...obj] operator, and so on.

To convert our previous example into an Iterable object, we can implement the symbol as:

const iterable = { 
 counter: 0, 
 next() { 
  return { 
   value: this.counter++, 
   done: this.counter > 5 
  } 
 }, 
 [Symbol.iterator](){ 
  return this; 
 } 
} 
 
for (const i of iterable) { 
 console.log(i) 
} 
 
/* 
Output: 
 
0 
1 
2 
3 
4 
*/

We can also move the other way around, and produce an Iterator from an Iterable object (like an array for example):

const array = [0, 1, 2, 3, 4, 5] 
const iterator = array[Symbol.iterator]() 
 
console.log(iterable.next().value) 
console.log(iterable.next().value) 
console.log(iterable.next().value) 
 
/* 
Output: 
 
0 
1 
2 
*/

One important thing to note about the Iterators is that they are consumable objects. They maintain an “internal state” that, once we iterate over them until the end (done:true) there is no way to go back to the beginning to iterate again. That is where the Generator Protocol come in handy.

Generators and Async Generators

The Generator Protocol is one step above the previous Iterator concept. It is a simpler way of creating Iterators and Iterables by defining a function that returns these objects. Generators remove the need to manually manage internal iterator state.

The Generator Function can be defined by using a star (*) next to the function keyword, and defining what values are to be returned by the next() function by using the yield keyword:

function* generator(){ 
 let count = 0; 
 while (true) { 
  
  if(count >= 5){ 
   break; 
  } 
   
  yield count; 
  count++ 
 } 
}

The yield keyword "pauses” the function execution until the next time it is called, making this an equivalent to our previous example by browsing the iterable items one at a time.

This function returns an instance of a Generator Object, which implements both the Iterator and Iterable objects at the same time. We can use the Generator function to create multiple Generator Object instances to demonstrate this behavior:

const iter1 = generator() 
const iter2 = generator() 
 
console.log("consuming as an iterable:") 
for (const i of iter1){ 
 console.log(i) 
} 
 
console.log("consuming as an iterator:") 
console.log(iter2.next().value) 
console.log(iter2.next().value) 
console.log(iter2.next().value) 
 
/* 
Output: 
 
consuming as an iterable: 
0 
1 
2 
3 
4 
consuming as an iterator: 
0 
1 
2 
*/

This concept is quite confusing at first, to think that the Generator Object is both an Iterable and an Iterator at the same time, but this basically means that when we call the [Symbol.iterator] symbol in a Generator Object, it returns the object itself (just like in our first iterable example).

And what about the Async Generator protocol? It is basically the same thing as the regular Generator, but instead of primitive objects, the Async Generator function yields Promises. Here is an example of an Async Generator function that yields a Promise generated by the setTimeout function from timers/promises:

import { setTimeout } from "node:timers/promises"; 
 
async function* asyncGenerator(max = 10) { 
  let count = 0; 
  while (true) { 
    if (count >= max) { 
      break; 
    } 
    yield setTimeout(count * 100, count); 
    count++; 
  } 
}

The Async Generator object implements the symbol [Symbol.asyncIterator], meaning it can be iterated over as an asynchronous iterable using methods like a for await loop:

const numbers = asyncGenerator(); 
 
for await (const i of numbers) { 
  console.log(i); 
}

This will be important for our example implementation, because it is the method we will use to consume the LLM response chunks that come from the server, and process them one by one as they come.

The Server-Sent Events (SSE) protocol

I’ve mentioned a few times that the LLM response comes in “chunks” from the server, but what exactly does that mean?

The Server-Sent Events protocol (SSE) enables the server to push data streams over a single, long-lived HTTP connection, using the text/event-stream format. This format consists of a series of events sent sequentially from the server to the client, each containing text fields followed by two empty new lines, in the following format:

id: <event id> 
event: <event name> 
data: <event data>

The SSE protocol is natively supported by browsers via the EventSource interface, which allows us to listen to the data stream and attach event listeners to process each event individually.

const eventSource = new EventSource(<event endpoint>); 
 
eventSource.onmessage = (event) => { 
  console.log('New message:', event.data); 
};

EventSource handles automatic reconnection if the connection drops, ensuring that the client continues to receive real-time updates from the server. This interface, however, is limited. It doesn’t support HTTP methods other than GET, or passing custom HTTP headers to the request.

But since the SSE events come in a text format, they can be manually parsed and processed by opening the connection with fetch and parsing the incoming packages by using the ReadableStream interface. This approach gives us full control: we can use POST requests (important for sending the conversation history), add authentication headers, and manually handle connection errors and completion signals (like the [DONE] message many LLM APIs send).

The ReadableStream interface

In Javascript, a ReadableStream is an interface that represents a stream of byte data that can be read from. It is present in the body property of the Response object that we obtain when using the native fetch api.

When fetching an endpoint that returns a text/event-stream content type, we can obtain a continuous stream of data from the response by calling the getReader() method from the response body. Here is an example of SSE parsing with a ReadableStream:

const response = await fetch('<SSE endpoint>', { 
  method: 'POST', 
  headers: { 
    'Content-Type': 'application/json', 
  }, 
  body: JSON.stringify({ message: 'Hello!' }) 
}); 
 
// Get the ReadableStream from the response body 
const reader = response.body.getReader(); 
 
// Read chunks in a loop 
while (true) { 
  const { done, value } = await reader.read(); 
   
  if (done) { 
    console.log('Stream finished'); 
    break; 
  } 
   
  console.log('Received chunk:', value); 
}

The reader.read() method returns a promise that resolves to an object containing two properties: done (a boolean indicating if the stream has ended) and value (a Uint8Array containing the raw bytes of the chunk).

Familiar, isn’t it? It is important to mention that the ReadableStreamReader IS NOT an AsyncIterator, but this design consistency makes the API intuitive, as we’re manually reading chunks in a pattern that mirrors async iteration.

Now that we understand the basic concepts we are going to use, let’s get to the fun part!

Implementing a minimal LLM chat interface

For the LLM completion API, we are going to use the Groq platform, that provides a fast AI inference API that is compatible with the OpenAI standards. It provides free API keys for testing, so there is no need to buy token credit from a provider like OpenAI or Anthropic for this demonstration. You can create your own Groq account and generate free API tokens here.

For the UI interface, we are building the simplest possible screen for a LLM chat, containing only a text input to write the user messages, and a space that will contain the message sequence as they are generated:

<!doctype html> 
<html> 
    <head> 
        <link rel="preconnect" href="https://fonts.googleapis.com" /> 
        <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin /> 
 
        <title>Streaming POC</title> 
        <meta charset="UTF-8" /> 
        <link rel="stylesheet" href="src/css/index.css" /> 
 
        <script type="module" src="src/js/index.js"></script> 
        <link 
            href="https://fonts.googleapis.com/css2?family=Oxygen:wght@300;400;700&display=swap" 
            rel="stylesheet" 
        /> 
    </head> 
    <body> 
        <section id="container"> 
            <div id="thread"></div> 
            <div id="input"> 
                <textarea 
                    id="message-input" 
                    placeholder="Type your message here" 
                ></textarea> 
                <button id="send-button"> 
                    <img src="src/assets/send-icon.svg" alt="Send" /> 
                </button> 
            </div> 
        </section> 
    </body> 
</html>

After some pretty CSS styling, here is what this page looks like:

Now, for the functionality part, let’s start by implementing the function that fetches the ReadableStream object. We will use the default Fetch API to send a POST request to the API’s completions endpoint, passing the API token on the headers, and sending the array of messages on the body. We also specify the model we want to use (in this case we are using one of LLAMA models for simplicity), and a flag stream: true , to indicate we expect the response to come in the text/event-stream format (SSE). The return of this function is the response.body property, which is already a ReadableStream ready for parsing:

import { API_URL, GROQ_API_KEY } from "./config.js"; 
 
export const openChatStream = async (messages) => { 
  const response = await fetch(API_URL, { 
    method: "POST", 
    headers: { 
      "Content-Type": "application/json", 
      Authorization: `Bearer ${GROQ_API_KEY}`, 
    }, 
    body: JSON.stringify({ 
      messages, 
      model: "llama-3.3-70b-versatile", 
      stream: true, 
    }), 
  }); 
  return response.body; 
};

⚠️ Never expose API keys in frontend code. Our example does this for simplicity, but production apps should proxy requests through a backend, where keys can be stored securely. Your backend should also implement rate limiting and user authentication before forwarding requests to the LLM API to prevent abuse.

Now let’s parse this stream. For that, we will implement an Async Generator Function that will call the openChatStream function to obtain the ReadableStream object to consume, decode and parse each chunk from the stream and yield it’s text content.

According to the OpenAI chat streaming documentation, the basic chat completion chunk object for a text response is returned with the following format:

interface ChunkData { 
  id: string, 
  object: string, 
  created: string, 
  model: string, 
  system_fingerprint: string, 
  choices: Array<{ 
      index: number, 
      delta: {  
       role: "assistant" | "user", 
       content: string 
     }, 
    }>, 
};

So, in order to extract the actual text content from a chunk structured like that, we need to parse the chunk’s data line as an object, and access this object’s data.choices[0].delta.content . This is how this parsing looks like in our example implementation:

async function* parseStream(messages) { 
  const stream = await openChatStream(messages); 
 
  const reader = stream.getReader(); 
  const decoder = new TextDecoder(); 
  let buffer = ""; 
 
  while (true) { 
    const { done, value } = await reader.read(); 
    if (done) break; 
 
    buffer += decoder.decode(value, { stream: true }); 
    const lines = buffer.split("\n"); 
    buffer = lines.pop() || ""; 
 
    for (const line of lines) { 
      if (line.startsWith("data: ")) { 
        const data = line.slice(6); 
        if (data === "[DONE]") return; 
        const parsedData = JSON.parse(data); 
        yield parsedData.choices[0].delta.content ?? ""; 
      } 
    } 
  } 
}

Now let’s get these chunks to our interface. We will start by defining some basic variables and functions to help us do some DOM manipulation:

const messageInput = document.querySelector("#message-input"); 
const sendButton = document.querySelector("#send-button"); 
const threadContainer = document.querySelector("#container #thread"); 
 
const createUserMessageElement = (message) => { 
  const messageElement = document.createElement("div"); 
  messageElement.classList.add("user-message"); 
  messageElement.textContent = message; 
  threadContainer.appendChild(messageElement); 
}; 
 
const createAssistantMessageElement = () => { 
  const messageElement = document.createElement("div"); 
  messageElement.classList.add("assistant-message"); 
  threadContainer.appendChild(messageElement); 
}; 
 
const getCurrentAssistantMessage = () => { 
  const lastMessage = threadContainer.querySelector( 
    ".assistant-message:last-child", 
  ); 
 
  return lastMessage; 
};

Then, we are going to define an event handler for our text input box, to handle sending the message and manipulating the DOM for rendering the message chunks as they are consumed from the stream. This handler function needs to:

Get the message value from the input box and append it as a user message in a global messages array (that will be used in the API request);
Create the user and assistant message elements on the DOM for rendering the text;
Create an Async Generator Object by calling our previously defined parseStream function;
Asynchronously iterate over the iterable object with a for await loop, getting the text content returned from each chunk, appending it to the final text message, and updating the UI to display the complete text;
At the end of the stream, push the completed assistant message to the global messages array to be included in subsequent messages.

The final handler implementation looks like this:

let streamedMessage = ""; 
let messages = []; 
 
const handleSendMessage = async () => { 
  const message = messageInput.value; 
  if (message) { 
    messageInput.value = ""; 
    messages.push({ 
      role: "user", 
      content: message, 
    }); 
 
    createUserMessageElement(message); 
    createAssistantMessageElement(); 
 
    const lastAssistantMessage = getCurrentAssistantMessage(); 
    const messageChunks = parseStream(messages); 
     
    for await (const chunk of messageChunks) { 
      streamedMessage += chunk; 
      lastAssistantMessage.textContent = streamedMessage; 
      threadContainer.scrollTop = threadContainer.scrollHeight; 
    } 
 
    streamedMessage = ""; 
    messages.push({ 
      role: "assistant", 
      content: lastAssistantMessage.textContent, 
    }); 
  } 
}; 
 
sendButton.addEventListener("click", handleSendMessage);

And here is what the interface looks like when we send a message:

The repository with this working example is available here.

Considerations for production

While our minimal example works fine for demonstration purposes, when implementing a feature like this for a production app, there are several essential considerations to make. Let’s mention some of them:

Error Handling and Recovery

Network failures and malformed responses can happen in production. The stream consumption should be wrapped in try-catch blocks to handle exceptions and define recovery mechanisms in case they occur.

Request Cancellation

Users expect to be able to cancel the agent’s generation mid-stream at any time. This can be implemented by using the AbortController interface, passing its signal to the fetch request. This prevents wasting unnecessary tokens when users no longer require the completion to finish, and overall improve the user’s experience.

Performance optimization

Updating the DOM on every tiny chunk can cause performance issues, especially with fast-streaming models. Consider batching updates using requestAnimationFrame to sync DOM updates with the browser’s repaint cycle. This significantly reduces CPU usage and makes scrolling smoother, especially important on lower-end devices.

You could also consider using a dedicated library to handle the rendering or the stream consumption. For example, in the Looqbox interface we used the Assistant-ui library, that automatically handles the rendering of chunks based on a custom structured Async Generator Function we implemented.

Retry logic

It is important to implement handling for when a SSE request fails mid-stream, to prevent incomplete responses. It should include an exponential backoff logic, to retry with increasing delays until exhausting retries. This handles temporary network issues and ensure a smooth experience for the user.

User experience

The interface should block the “send” button while there is a response being streamed to prevent the user from sending multiple requests at once. Also, since most LLMs responses come in Markdown format, it is important to implement a dedicated component responsible for displaying this format correctly.

References

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Iterators_and_generators

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/AsyncIterator/Symbol.asyncIterator

https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events

https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream

https://console.groq.com/docs/api-reference#chat

https://platform.openai.com/docs/api-reference/chat-streaming/streaming