Skip to main content
Saved
Pattern
Difficulty Intermediate

Stop Generation

Let users abort a streaming model response with an AbortController, keep the partial text, and actually stop the work on the server.

Den Odell
By Den Odell Added

Stop Generation

Problem

The model misread the question and is now confidently generating six paragraphs about the wrong thing. The user can see it going wrong in real time, and they can do nothing but watch it finish. There’s no stop button, so they wait out the mistake, or they refresh the page and lose the whole conversation.

Even when there is a “stop” button, it’s often a lie. It hides the streaming text and re-enables the composer, but the request is still open and tokens are still being generated and billed on the server. The user thinks they stopped it; your infrastructure knows they didn’t.

A model response is worth interrupting more than most things a UI does. It’s slow, so there’s time to react; it changes from one run to the next, so it often goes somewhere the user doesn’t want; and it costs money per token, so wasted generation is wasted money. Not offering a real stop is leaving the user trapped watching a mistake unfold.

Solution

Create an AbortController when the generation starts and pass its signal to fetch. Keep the controller around while the response streams. When the user clicks stop, call controller.abort(), which rejects the in-flight fetch and ends the read loop. Catch the resulting AbortError and treat it as a normal outcome, not a failure.

Crucially, keep the tokens that already arrived. Aborting should mark the message stopped and leave its partial content in place, so the user keeps the half-answer they interrupted, and can hand it to Response Regeneration if they want a fresh try. Swap the Prompt Input’s send button for the stop button while streaming, and swap it back when the generation ends for any reason.

Aborting the fetch stops the browser reading, but on many setups the server keeps generating unless you tell it otherwise. When the underlying connection closes, wire your server to detect it (for example, an Express req.on('close')) and abort the upstream model call too. Without that, the user’s stop saves them from reading the tokens but not from paying for them.

Example

The controller wiring across frameworks, preserving partial output on abort, and closing the loop on the server.

Wiring the Abort

function useGeneration() {
  const [text, setText] = useState('');
  const [status, setStatus] = useState('idle'); // idle | streaming | stopped | error
  const controllerRef = useRef(null);

  const start = async (prompt) => {
    const controller = new AbortController();
    controllerRef.current = controller;
    setText('');
    setStatus('streaming');

    try {
      const res = await fetch('/api/chat', {
        method: 'POST',
        body: JSON.stringify({ prompt }),
        signal: controller.signal,
      });
      const reader = res.body.getReader();
      const decoder = new TextDecoder();

      while (true) {
        const { value, done } = await reader.read();
        if (done) break;
        setText(prev => prev + decoder.decode(value, { stream: true }));
      }
      setStatus('complete');
    } catch (err) {
      // Abort is an expected outcome, not an error
      setStatus(err.name === 'AbortError' ? 'stopped' : 'error');
    }
  };

  const stop = () => controllerRef.current?.abort();

  return { text, status, start, stop };
}

The Stop Control

The button reflects the live state and reads clearly to assistive tech.

function GenerationControls({ status, onStop }) {
  if (status !== 'streaming') return null;
  return (
    <button type="button" className="stop" onClick={onStop} aria-label="Stop generating">
      <StopIcon aria-hidden="true" /> Stop generating
    </button>
  );
}

Keeping the Partial Answer

Aborting should never blank the message. Mark it stopped and leave the text, with an affordance to continue or regenerate.

function AssistantMessage({ text, status, onRegenerate }) {
  return (
    <article data-status={status}>
      <div className="content">{text}</div>
      {status === 'stopped' && (
        <footer className="stopped-note">
          <span>Stopped.</span>
          <button onClick={onRegenerate}>Regenerate</button>
        </footer>
      )}
    </article>
  );
}

Stopping the Work on the Server

Aborting the browser fetch closes the connection; detect that on the server and cancel the upstream model call so generation actually halts.

// Express handler proxying a streaming model API
app.post('/api/chat', async (req, res) => {
  const upstream = new AbortController();

  // When the browser aborts the connection closes, so stop generating too
  req.on('close', () => upstream.abort());

  const modelStream = await model.chat({ messages: req.body.messages, signal: upstream.signal });
  for await (const chunk of modelStream) {
    res.write(chunk);
  }
  res.end();
});

Benefits

  • Users escape a response that’s going the wrong way instead of waiting out a visible mistake.
  • Wiring abort through to the server stops token generation, so you don’t pay for output nobody reads.
  • Preserving partial text means stopping is non-destructive: the user keeps what was useful and can regenerate the rest.
  • It builds directly on the Streaming Response reader and signal you already have, so it takes little to add.
  • A clear stop control makes the whole interface feel controllable rather than something that runs away from the user.

Tradeoffs

  • Aborting the fetch doesn’t stop server-side generation by itself; the connection-close handling is easy to forget and invisible when it’s missing.
  • AbortError arrives as a rejected promise, so forgetting to special-case it turns a normal stop into a spurious error state.
  • There’s a small race near completion where the user hits stop just as the last token arrives, so decide whether that lands as stopped or complete.
  • Partial responses can be misleading if truncated mid-sentence, so the stopped state needs to be visually obvious.
  • Cleaning up the controller reference matters; a stale controller from a previous turn can abort the wrong request if you’re not careful.

Summary

A streaming model response is slow, often wrong, and billed per token, so it is worth letting users interrupt. Hold an AbortController while streaming, abort it on click, catch the AbortError as a normal stopped outcome, and keep the partial text. Then close the loop on the server so the generation actually ends. Otherwise you’ve only hidden the tokens you’re still paying for.

Newsletter

A Monthly Email
from Den Odell

Behind-the-scenes thinking on frontend patterns, site updates, and more

No spam. Unsubscribe anytime.