Stop streaming tokens. Start running workflows.

Most agent apps start the same way. A /chat endpoint, a model call, and a stream of tokens going back to the browser. It works, until it doesn't.

The moment you want to render a chart mid-response, gate a step behind a tool approval, or let the user close the tab and come back to a half-finished task, a raw stream falls apart. You're trying to shove too much into one shape.

This is where workflows start earning their keep. I've been building mine on Mastra, and the split has been clean. Agents for the fuzzy thinking, workflows for anything I want to control, replay, or resume.

The streaming-only trap

A plain streaming agent call looks like this:

const stream = await agent.stream("summarize this PR");
for await (const chunk of stream) {
  process.stdout.write(chunk.textDelta);
}

Fast, cheap, and fine for a chatbot. But the contract is: tokens, top to bottom, until done. You don't get clean step boundaries. You don't get a resume point. You can't easily say "step 2 is a React component, step 3 is a table."

What a workflow gives you

A Mastra workflow is a graph of typed steps. Each step has an input schema, an output schema, and an execute. You chain them with .then() and finish with .commit().

import { createWorkflow, createStep } from "@mastra/core/workflows";
import { z } from "zod";
 
const analyzePR = createStep({
  id: "analyze-pr",
  inputSchema: z.object({ prUrl: z.string().url() }),
  outputSchema: z.object({
    files: z.array(z.string()),
    risk: z.enum(["low", "med", "high"]),
  }),
  execute: async ({ inputData }) => {
    // call an agent, a tool, whatever
    return { files: ["app/api/auth.ts"], risk: "med" };
  },
});
 
const writeSummary = createStep({
  id: "write-summary",
  inputSchema: z.object({
    files: z.array(z.string()),
    risk: z.string(),
  }),
  outputSchema: z.object({ markdown: z.string() }),
  execute: async ({ inputData }) => {
    return {
      markdown: `### Risk: ${inputData.risk}\n- ${inputData.files.join("\n- ")}`,
    };
  },
});
 
export const reviewWorkflow = createWorkflow({
  id: "pr-review",
  inputSchema: z.object({ prUrl: z.string().url() }),
  outputSchema: z.object({ markdown: z.string() }),
})
  .then(analyzePR)
  .then(writeSummary)
  .commit();

Two things to notice. First, each step has a schema, so you're no longer parsing intent out of a token stream. Second, every run has an ID, which is the thing that unlocks everything else.

const run = await reviewWorkflow.createRun();
const result = await run.start({
  inputData: { prUrl: "https://github.com/x/y/pull/1" },
});
 
console.log(run.runId);     // persist this
console.log(result.result); // typed to outputSchema

Streaming the workflow, not just tokens

You don't lose streaming. You trade a flat token stream for a structured one: step starts, step outputs, step completions.

const stream = await run.stream({ inputData: { prUrl } });
 
for await (const chunk of stream) {
  // chunk tells you which step, status, and payload.
  // perfect for driving generative UI on the client.
  ui.dispatch(chunk);
}

On the frontend this is where generative UI stops being a party trick. Step 1 emits JSON, render a <FileList />. Step 2 emits markdown, render <Markdown />. Step 3 emits a chart spec, render <Chart />. The workflow tells you what it's emitting and when, so the UI can branch.

Going further: return UI blocks, not markdown

Once the output is typed, you can push it past plain markdown and have the workflow describe the UI itself. A discriminated union of "UI blocks" is the sweet spot. The model picks a block type, the client has a registry that maps each type to a real component.

import { z } from "zod";
 
const uiBlock = z.discriminatedUnion("type", [
  z.object({ type: z.literal("markdown"), text: z.string() }),
  z.object({
    type: z.literal("card"),
    title: z.string(),
    body: z.string(),
  }),
  z.object({
    type: z.literal("chart"),
    kind: z.enum(["bar", "line"]),
    data: z.array(z.object({ x: z.string(), y: z.number() })),
  }),
  z.object({
    type: z.literal("approval"),
    prompt: z.string(),
    actionId: z.string(),
  }),
]);
 
const renderStep = createStep({
  id: "render",
  inputSchema: z.object({ files: z.array(z.string()), risk: z.string() }),
  outputSchema: z.object({ blocks: z.array(uiBlock) }),
  execute: async ({ inputData }) => ({
    blocks: [
      { type: "card", title: "Risk", body: inputData.risk },
      { type: "markdown", text: `Files:\n- ${inputData.files.join("\n- ")}` },
      { type: "approval", prompt: "Merge?", actionId: "merge-pr" },
    ],
  }),
});

On the client, a tiny switch is the whole renderer:

const registry = {
  markdown: ({ text }) => <Markdown source={text} />,
  card: ({ title, body }) => <Card title={title}>{body}</Card>,
  chart: ({ kind, data }) => <Chart kind={kind} data={data} />,
  approval: ({ prompt, actionId }) => <ApprovalButton id={actionId} label={prompt} />,
};
 
export function BlockRenderer({ blocks }) {
  return blocks.map((b, i) => {
    const C = registry[b.type];
    return C ? <C key={i} {...b} /> : null;
  });
}

Why bother with the extra schema? Because the model is now picking from a closed set of components you've already designed and tested. No rogue HTML, no half-rendered markdown tables, no jailbreak-via-image. The output is as predictable as the schema you wrote.

One gotcha: if you stream partial objects, validate before rendering, or you'll flash half-built blocks. Easiest is to render per completed step, not per token.

Resume where the user left off

Because the run has an ID and Mastra snapshots state after each step, a workflow is resumable by default. Close the tab, refresh, come back tomorrow, pick up at the last completed step.

// later, same runId
const resumed = await run.resumeStream({
  resumeData: { approved: true },
});
 
for await (const chunk of resumed) {
  ui.dispatch(chunk);
}
 
// or just fetch the final state
const result = await reviewWorkflow.runById(runId);

Store the runId against the user and you have a persistence layer for free. No custom checkpointing, no "where was I?" logic.

Faking threads with workflows

Mastra has first-class memory and threads for chat-style history:

import { Memory } from "@mastra/memory";
import { Agent } from "@mastra/core/agent";
 
const agent = new Agent({
  id: "reviewer",
  memory: new Memory({
    options: { observationalMemory: true },
  }),
});
 
const memory = await agent.getMemory();
const thread = await memory?.createThread({
  resourceId: "user_123",
  title: "PR review session",
});

But threads and workflows aren't either/or, they compose. A thread is the conversation. A workflow is one turn inside it. I store both:

// schema sketch
type Turn = {
  threadId: string;   // from Memory
  runId: string;      // from workflow.createRun()
  workflowId: "pr-review" | "deploy" | "research";
  status: "running" | "done" | "paused";
};

On reload, I fetch the thread, list its turns, and for any turn still running I call resumeStream(runId) to rehydrate the UI. The user sees a conversation. Underneath, each message is a deterministic workflow with its own checkpoint.

When to reach for which

Situation	Use
Open-ended chat, tool-picking, vibes	Agent + `.stream()`
Known steps, typed outputs, generative UI	Workflow
User might disconnect mid-task	Workflow (for the resume)
Steps need approval / human-in-the-loop	Workflow (`suspend` / `resume`)
Prototype, one-shot answers	Agent

Mastra's own framing is blunt: "don't ask the model to figure out checkout. Define it." Same applies to anything with a known shape.

Honest tradeoffs

What you gain

Predictable responses. Zod schemas constrain what the model can return, so your UI never has to defend against surprise shapes.
Typed inputs and outputs per step. No more prompt-engineering JSON.
A runId you can resume, replay, or debug.
A clean path to generative UI: workflow emits a JSON block list, client renders from a component registry.
Easy human-in-the-loop via step suspension.

What it costs

More boilerplate up front: schemas, step files, a workflow definition.
Less flexible mid-turn. The model can't invent a new step.
You need persistence wired up. Mastra gives you adapters, but it's still a decision.
Harder to explain to teammates who think "agent === chatbot".

The rule I've landed on

If the output has a shape I care about, or if losing the tab mid-generation would ruin the UX, it's a workflow. Everything else can stay a stream.

Streaming isn't wrong. It's just the wrong primitive once your agent starts doing real work.