All posts
React & Next.js··9 min read

Streaming and Suspense in the App Router: Faster Perceived Loads

Streaming lets the App Router send HTML before the data is ready. How Suspense, loading.tsx, and parallel fetching cut perceived load time without rewrites.

By

On this page

A dashboard I inherited took 2.3 seconds to send its first byte. Not to finish loading — to send the first byte. The page awaited three database calls before returning a single character of HTML, so the browser sat on a blank white screen while the server worked. The fix wasn't a faster database or a bigger box. It was deleting four awaits and wrapping the slow widgets in <Suspense>. TTFB dropped to 180ms, the shell painted immediately, and the slow widgets filled in as their data arrived. Same data, same queries, same server — radically different feel.

That's streaming. It's the single highest-leverage performance technique in the Next.js App Router, and most teams either don't use it or use it wrong. Here's how it actually works and where it bites.

The blocking model you're probably still running

In a traditional server-rendered page, the server is all-or-nothing. It runs every data fetch, renders the complete HTML tree, and only then flushes the response. Your time-to-first-byte is gated by your slowest dependency. One sluggish analytics query at the bottom holds the entire document hostage.

Here's the version I keep finding in production:

// app/dashboard/page.tsx — the blocking anti-pattern
export default async function DashboardPage() {
  const user = await getUser();          // 40ms
  const revenue = await getRevenue();    // 900ms — the slow one
  const activity = await getActivity();  // 120ms
 
  return (
    <main>
      <Header user={user} />
      <RevenueChart data={revenue} />
      <ActivityFeed items={activity} />
    </main>
  );
}

Two problems here, and they compound. First, the awaits are sequential — getActivity doesn't start until getRevenue resolves, even though they're independent. That's a request waterfall, and it makes the page take 40 + 900 + 120 = ~1060ms before anything renders. Second, even after you fix the waterfall, the whole page still blocks on the 900ms revenue query.

Streaming attacks both.

How the server actually streams HTML

The App Router renders React Server Components to a stream, not a string. Under the hood it uses React 19's renderToReadableStream, which flushes HTML in chunks as the tree resolves. When the renderer hits a <Suspense> boundary whose children are still pending, it emits the fallback immediately as part of the initial document, marks that slot with a placeholder, and keeps the connection open. When the suspended data resolves, the server sends an out-of-order chunk containing the real HTML plus a tiny inline <script> that swaps the fallback for the real content in place. No client round-trip, no hydration wait for the rest of the page.

This is plain HTTP chunked transfer encoding plus React's streaming runtime. It works without JavaScript for the initial paint — the fallbacks and streamed-in content are real server HTML. That's the part people miss: streaming SSR degrades gracefully.

The mental model: everything outside a Suspense boundary is your shell and must be fast. Everything inside can be slow and arrive late.

Granular Suspense beats one big spinner

Here's the same dashboard, rewritten to stream. The shell — header and layout — renders instantly. Revenue and activity each get their own boundary so they stream independently, and a slow revenue query never blocks the fast feed.

// app/dashboard/page.tsx
import { Suspense } from "react";
import { ChartSkeleton, FeedSkeleton } from "./skeletons";
 
export default async function DashboardPage() {
  const user = await getUser(); // 40ms — part of the shell, intentionally awaited
 
  return (
    <main>
      <Header user={user} />
      <Suspense fallback={<ChartSkeleton />}>
        <RevenueChart />
      </Suspense>
      <Suspense fallback={<FeedSkeleton />}>
        <ActivityFeed />
      </Suspense>
    </main>
  );
}

Notice the data fetching moved into the components. Each is now an async Server Component fetching its own data:

// app/dashboard/RevenueChart.tsx
async function RevenueChart() {
  const revenue = await getRevenue(); // 900ms — but it no longer blocks the shell
  return <Chart data={revenue} />;
}

When React renders the page, it sees RevenueChart suspend, emits <ChartSkeleton /> into the initial HTML, and moves on. The user sees the header and two skeletons at ~180ms. The feed pops in at ~300ms, revenue at ~940ms — but it feels instant because the page was never blank.

The granularity matters. If you wrap both widgets in a single shared boundary, the fast feed waits for the slow chart, because a Suspense boundary resolves only when all its children resolve. Smaller boundaries = more independent streaming. The tradeoff is layout shift: each boundary that pops in can nudge the page. Reserve space in your skeletons (fixed heights, aspect-ratio) so streamed content lands without tanking your CLS.

loading.tsx is a Suspense boundary you didn't write

Next.js gives you a free, route-level boundary. Drop a loading.tsx next to a page.tsx and the framework wraps the page in <Suspense> with that file as the fallback:

// app/dashboard/loading.tsx
import { ChartSkeleton, FeedSkeleton } from "./skeletons";
 
export default function Loading() {
  return (
    <main>
      <HeaderSkeleton />
      <ChartSkeleton />
      <FeedSkeleton />
    </main>
  );
}

This fallback shows instantly on navigation while the page's async work runs, and — critically — it covers client-side navigations within the App Router too, giving you an instant loading state the moment a link is clicked. It's the coarsest boundary: the whole page. Use loading.tsx for the route-level shell, then add granular <Suspense> inside the page for per-widget streaming. They compose: the route shell shows first, then individual widgets stream into the rendered page.

One caveat: loading.tsx only applies to the segment's own page render. It won't show while a layout above the boundary resolves its data — put slow layout data behind its own <Suspense>.

The waterfall trap, and how to actually fetch in parallel

Streaming hides latency; it doesn't remove it. If your independent fetches still run sequentially, you've just made the waiting prettier. Kill waterfalls by starting promises before you await.

The naive sequential version:

// Sequential — total time = sum of all three
const profile = await getProfile(userId);
const orders = await getOrders(userId);
const recommendations = await getRecommendations(userId);

The fix is Promise.all, which kicks off all three and waits for the slowest:

// Parallel — total time = max of the three
const [profile, orders, recommendations] = await Promise.all([
  getProfile(userId),
  getOrders(userId),
  getRecommendations(userId),
]);

Sometimes you can't await everything in one place — you need to pass a still-pending promise down to a child. Start the promise without awaiting, then hand the promise itself to the component:

// app/profile/page.tsx
import { Suspense } from "react";
 
export default function ProfilePage({ userId }: { userId: string }) {
  // Fire the request now. Do NOT await here.
  const ordersPromise = getOrders(userId);
 
  return (
    <Suspense fallback={<OrdersSkeleton />}>
      <Orders ordersPromise={ordersPromise} />
    </Suspense>
  );
}

In the child, React 19's use() hook unwraps the promise and suspends until it resolves:

// app/profile/Orders.tsx
"use client";
import { use } from "react";
 
export function Orders({ ordersPromise }: { ordersPromise: Promise<Order[]> }) {
  const orders = use(ordersPromise); // suspends until resolved
  return <OrderList orders={orders} />;
}

This pattern — start the promise high, consume it low with use() — initiates fetches at the top of the tree (so they run in parallel and early) while deferring the render to a Client Component behind a boundary. It's how you avoid the trap where moving fetches into components reintroduces a waterfall.

When NOT to stream

Streaming is not free and not universal. Skip it when:

SituationWhy streaming hurts
SEO-critical content above the foldCrawlers may index the fallback; keep primary content in the synchronous shell
Data resolves in under ~100msThe skeleton flashes and vanishes — perceived as jank, not speed
You need an HTTP status or redirect based on the dataOnce the first byte streams, you've already sent 200; you can't change headers
set-cookie / auth that depends on fetched dataHeaders are flushed with the shell, before the streamed data exists
Static pages with no slow dataJust prerender them; streaming adds machinery for no gain

The status-code one catches people. If a request might 404 or 401 based on a fetch, do that fetch in the shell before the first byte goes out, so notFound() or redirect() can still send proper headers. Stream only what's safe to commit to a 200.

The relationship to Partial Prerendering

Streaming and Suspense are the foundation for Partial Prerendering (PPR), which in Next.js 15+ uses these exact boundaries to split a single route into static and dynamic. At build time, Next.js prerenders everything outside your Suspense boundaries into a static shell served instantly from the edge/CDN. Everything inside a boundary is dynamic, streamed in at request time.

So the same <Suspense> you added for perceived performance becomes the seam between "cached forever" and "rendered per request." A product page can serve its static layout, nav, and description from cache in single-digit milliseconds, then stream the personalized price and stock check. You get static-site TTFB with dynamic data, in one route, with no getStaticProps/getServerSideProps split. Check the "Partial Prerendering" page in the Next.js docs for the current opt-in flag — the API has been stabilizing across 15.x and 16, so trust the version you're on rather than a flag name from a blog.

The takeaway: Suspense boundaries are not just loading UI. They are the architectural unit of where your page is static vs. dynamic. Place them accordingly.

A decision checklist

Before you ship a route, run through this:

  • Is the shell fast? Everything outside a Suspense boundary must resolve in single-digit-to-low-tens of milliseconds. If getUser() takes 400ms, it doesn't belong there.
  • Did you kill the waterfalls? Independent fetches use Promise.all, or start-then-pass-promise. Sequential awaits of unrelated data are a bug.
  • Are boundaries granular? One boundary per independently-slow region, not one giant spinner — but not so granular you cause layout-shift confetti.
  • Do skeletons reserve space? Fixed dimensions so streamed content lands without CLS.
  • Did you keep status-affecting fetches in the shell? Anything that might notFound(), redirect(), or set headers runs first.
  • Is there a loading.tsx? Route-level instant feedback on navigation, with inner <Suspense> for per-widget streaming.
  • Did you avoid streaming sub-100ms data? No skeleton should flash for less than a frame.

Streaming isn't a rewrite. It's deleting awaits, moving fetches into components, and drawing boundaries around the slow parts. The first dashboard I mentioned took an afternoon. The win was the kind users actually notice — the difference between "is this broken?" and "that was instant."

Further reading

  • Next.js documentation — "Loading UI and Streaming" and "Partial Prerendering" (nextjs.org/docs)
  • React documentation — <Suspense> and the use hook (react.dev)
  • web.dev — Core Web Vitals, for measuring the TTFB, LCP, and CLS impact (web.dev)