← Back to blog

What years of Chromatic taught me to build differently

March 22, 2026 · Anil Anar

My team has used Chromatic for visual regression testing since late 2022. I never had a single moment where I thought “this is broken, I need to leave.” It was slower than that. A billing frustration here, a workflow confusion there, a growing suspicion that Chromatic’s Playwright/Cypress archiving approach — serializing the DOM, shipping it to a remote fleet, re-rendering it there — was architecturally incapable of capturing what a real browser actually renders. I never tried those plugins. I didn’t need to. Shadow DOM styles that can’t be serialized, canvas content that’s opaque to the DOM tree, JS-driven state that depends on timing — these aren’t bugs to be fixed, they’re inherent limits of the serialize-transport-re-render model. I wasn’t going to waste time debugging an approach that’s theoretically unsound when page.screenshot() already captures exactly what I see.

Any one of these frustrations is forgivable. All of them together, compounding over three years, eventually crossed a line where I stopped wanting to file feature requests and started wanting to write code.

So I did. I built pxdiff.

The billing model punishes you either way

Chromatic’s Pro tier costs $399/month for 85,000 snapshots. Go over, you pay $0.008 per extra snapshot. Stay under, you’ve overpaid for capacity you didn’t use. There’s no winning move — busy months hit you with overages, slow months waste your budget.

TurboSnap is supposed to help by skipping unchanged components. But “skipping” is generous — Chromatic still bills each skipped snapshot at 1/5 the regular price. So you have 200 stories, change one, and still pay for the other 199. Less, sure. But not nothing. When I realized I was paying a fraction of a capture for every story I didn’t touch, I felt cheated. The optimization exists to reduce the damage of the billing model, not to fix it.

I wanted something simpler: pay for what you use. pxdiff charges per screenshot — $0.004, $0.003, or $0.002 depending on volume. No monthly minimums, no tiers to optimize around, no overages. Diffs are free, permanently. You deposit credits and they drain as you use them. Busy month costs more, slow month costs less. The math is obvious.

Agentic coding broke the workflow

Here’s a thing that didn’t exist when I started using Chromatic: I now write about 20% of my code myself. The rest is AI agents — Claude Code, Cursor, OpenAI Codex. The development loop has fundamentally changed. An agent makes a change to a component, and both of us need to see the visual diff of that one component — me to verify the result, the agent to close its own feedback loop. Visual regression output is becoming an input to the next iteration, not just a gate at the end.

Chromatic’s workflow assumes you’re running full Storybook builds in CI. Full-suite-or-nothing was always the wrong approach — even before agents, you were paying to re-test 200 unchanged components to check the one you actually modified. But in 2026, when I’m iterating on a single component with an AI agent that needs visual feedback in real time, the mismatch is unbearable. Running a full suite to check one button is friction that kills the loop.

This is related to a broader problem: partial captures are unnecessarily hard. Chromatic’s --only-changed (TurboSnap) is their answer, but it’s a CI-time optimization, not a developer-time tool.

pxdiff is built around partial captures as a first-class concept. The Playwright plugin does inline diffing per-screenshot as your tests run:

// Playwright
await expect(page).toMatchPxdiff("button-primary");

The Vitest Browser Mode plugin works the same way — something Chromatic doesn’t support at all:

// Vitest Browser Mode
await expect(page).toMatchPxdiff("button-primary");

You can also run a partial Storybook capture — just the stories you care about — or use pxdiff upload with a folder of PNGs if you’re working outside any framework. Each screenshot uploads, diffs against its baseline, and returns a result. No batch. No waiting for components you didn’t touch.

The DOM serialization wall

When you use Chromatic with Playwright or Cypress, here’s what happens: Chromatic captures an archive of your DOM state — HTML, stylesheets, assets — uploads it to their capture fleet, and re-renders it there to take the screenshot.

The issue isn’t that the archive is buggy (though there are bugs). It’s that certain categories of content are architecturally impossible to archive faithfully. Shadow DOM styles in Web Components? Not captured — the elements render but their encapsulated styles are gone. Iframe content that updates dynamically? Every snapshot looks identical to the first, because the archive captured the initial frame source, not the content that loaded into it later. Canvas and WebGL are opaque to the DOM — there’s nothing to serialize. Complex JS-driven state that depends on timing or interaction sequences gets flattened into a static snapshot of the DOM tree at one arbitrary moment.

That last issue is the most telling. The reporter found that Playwright’s own page.screenshot() captured the correct content every time — the browser was rendering the right thing. But Chromatic’s archive couldn’t represent it. The screenshots were perfect; the DOM serialization was wrong. That gap between “what the browser actually rendered” and “what the archive could reconstruct” is not a bug to be fixed. It’s a fundamental limitation of the architecture.

pxdiff doesn’t do this. The Playwright plugin calls page.screenshot() right there in your test, in the browser that’s already rendering your app. The PNG bytes get uploaded and diffed server-side using pixelmatch (same algorithm Chromatic uses, same threshold of 0.063). No serialization, no re-rendering, no hoping the remote environment can reconstruct what your browser already rendered correctly. The screenshot is the screenshot.

Two review flows that nobody understands

My team has used Chromatic for over three years. And I’d bet money that if I asked any of them to explain the difference between Chromatic’s “tests” and “review” flows, they’d struggle. I know I would. Two separate flows for what is conceptually one task — “did anything change visually, and is it intentional?” — is an unnecessary split that creates confusion.

And the comments. Chromatic has a commenting feature. Over three years and roughly 9,000 builds, I can find maybe 30 comments total. That’s about 1 comment per 300 builds. The problem is obvious: you’re already reviewing the PR in GitHub. Switching to a second review surface to leave a comment about a visual change that you’re going to discuss in the PR anyway is pointless context-switching. Nobody does it.

pxdiff has one flow. Visual changes show up as a GitHub check run status. Click through to the review UI, approve or reject, done. Comments happen in the PR where the rest of the conversation already lives.

Stacked PRs are hostile territory

If you use stacked PRs (and you should — they’re the best way to keep PRs small), Chromatic will fight you. When a PR targets a branch that doesn’t have a Chromatic build, baseline resolution breaks. Chromatic can’t figure out what to compare against because it expects a linear chain of builds off your main branch.

I built a chain-walk resolver that handles this. When you open a PR against a feature branch, pxdiff walks the PR target chain — your branch’s target, that target’s target, all the way up to main if needed — and finds the closest baseline at each level. It also does ancestor commit lookups for local development, where there’s no PR to follow. Stacked PRs, trunk-based development, long-lived feature branches — the baseline resolution just works because it’s designed around how git actually works, not around a simplified mental model of “everything targets main.”

A product that went wide instead of deep

This is the hardest criticism to make because it’s about priorities, not incompetence. Chromatic’s team has been pouring effort into Storybook — supporting every framework, every bundler, interaction testing, instrumentation, addon architecture, doc generation. They went incredibly wide in breadth. And honestly, the Storybook ecosystem work is impressive.

But that’s not what I’m paying Chromatic for. I’m paying for visual regression testing. And the VRT product hasn’t meaningfully changed in three years. The UI is the same. The workflow hasn’t adapted to how people develop in 2026. The pricing got more expensive, not less. When your development workflow has shifted dramatically — agentic coding, stacked PRs, framework diversity beyond Storybook — and the visual testing tool hasn’t shifted with you, the gap becomes impossible to ignore.

The investment went into making Storybook the center of everything. Meanwhile, people are writing more Playwright tests, more Cypress tests, more Vitest Browser Mode tests — and less Storybook. The gravity moved, but the product didn’t.

What I actually built

pxdiff is a solo-founder project. No VC, no board, no pressure to optimize for enterprise upsells. The architecture is straightforward: screenshot capture and diffing on serverless compute (scale-to-zero, so I’m not paying for idle servers and neither are you), baselines in S3-compatible storage, metadata in Postgres. Plugins for Playwright, Cypress, Vitest, Storybook, and Ladle. Or just pxdiff upload with a folder of PNGs — framework-agnostic means framework-agnostic.

CI never fails on visual changes. That’s a deliberate design choice. Visual diffs surface as a “requires action” GitHub check status. Your CI stays green, your deployment isn’t blocked, but the visual changes are visible and need explicit approval before merge. This decouples visual review from the CI pipeline, which is where it belongs.

It’s open for signups at pxdiff.com. I’m not going to pretend it does everything Chromatic does — but the gap isn’t where you’d expect. Chromatic hasn’t accumulated a deep bench of VRT features over the years. What pxdiff intentionally doesn’t have is reporting on Storybook interaction tests — and that’s by design. Storybook play functions use testing-library and user-event for interaction testing, and those tools aren’t fit for the job. That’s why developers love writing tests in Playwright, Cypress, and Vitest Browser Mode but dread writing them with testing-library — whether in Jest or in Storybook play functions. pxdiff isn’t playing catch-up. It’s making different choices about where testing actually works.

If any of this resonates, I’d genuinely love to hear what frustrates you about your current visual testing setup. I’m building this in the open and the roadmap is shaped by real pain points, not investor pitch decks.

GitHub | Docs | pxdiff.com