tldr: BackstopJS is an open-source visual regression testing tool that compares screenshots of your web app across code changes. It's free, MIT-licensed, runs on Chrome Headless, and catches pixel-level UI regressions before your users do.

What BackstopJS does

BackstopJS takes screenshots of your web pages, saves them as "reference" images, then takes new screenshots after you make changes. It compares the two sets pixel by pixel. If something shifted, disappeared, or broke visually, BackstopJS shows you exactly where.

That's the whole idea. No AI. No machine learning. Just pixel comparison.

The project lives at garris/BackstopJS on GitHub. MIT-licensed. Active contributors. Used by thousands of teams, especially in the Drupal and WordPress ecosystems where theme updates and plugin changes can wreck layouts in ways unit tests never catch.

If you're new to this category, start with what visual regression testing is before going deeper here.

Why BackstopJS exists

Before BackstopJS, you had two choices for visual regression testing. You could pay for a SaaS tool like Percy or Applitools. Or you could stitch together your own solution with headless Chrome, ImageMagick, and a pile of bash scripts.

BackstopJS fills the gap. It gives you a proper framework with configuration files, CLI commands, a report UI, and CI/CD integration. All free. All local. No vendor dependencies.

The trade-off is clear: you manage the infrastructure yourself. You run Chrome Headless locally or in your CI pipeline. You store reference screenshots in your repo or on a shared drive. There's no cloud dashboard, no team collaboration features, and no AI-powered diffing. But you own everything, and the cost is zero.

Getting started

Installation

BackstopJS requires Node.js. Install it globally or as a project dependency.

# Global install
npm install -g backstopjs

# Or as a dev dependency
npm install --save-dev backstopjs

Initialize your project

Run the init command to generate a starter configuration file.

backstop init

This creates a backstop.json file in your project root and a backstop_data directory for screenshots and reports.

Your first backstop.json

The configuration file is where everything lives. Here's a realistic example for testing a marketing site with a few key pages.

{
  "id": "my-project",
  "viewports": [
    {
      "label": "phone",
      "width": 375,
      "height": 812
    },
    {
      "label": "tablet",
      "width": 768,
      "height": 1024
    },
    {
      "label": "desktop",
      "width": 1440,
      "height": 900
    }
  ],
  "scenarios": [
    {
      "label": "Homepage",
      "url": "https://staging.example.com",
      "selectors": ["document"],
      "delay": 1000,
      "misMatchThreshold": 0.1
    },
    {
      "label": "Pricing page",
      "url": "https://staging.example.com/pricing",
      "selectors": ["document"],
      "delay": 1500,
      "misMatchThreshold": 0.1
    },
    {
      "label": "Login form",
      "url": "https://staging.example.com/login",
      "selectors": [".login-form"],
      "delay": 500,
      "misMatchThreshold": 0.05
    }
  ],
  "paths": {
    "bitmaps_reference": "backstop_data/bitmaps_reference",
    "bitmaps_test": "backstop_data/bitmaps_test",
    "engine_scripts": "backstop_data/engine_scripts",
    "html_report": "backstop_data/html_report",
    "ci_report": "backstop_data/ci_report"
  },
  "engine": "puppeteer",
  "engineOptions": {
    "args": ["--no-sandbox"]
  },
  "report": ["browser"],
  "debug": false,
  "debugWindow": false
}

Let's break down the key parts.

viewports defines the screen sizes to test. BackstopJS takes a screenshot at each viewport for every scenario. Three viewports and five scenarios means 15 screenshots per run. Think about which breakpoints matter before adding too many.

scenarios is an array of pages or components to test. Each scenario has a URL and one or more CSS selectors. Use "document" to capture the full page. Use a specific selector like ".login-form" to capture just one component.

misMatchThreshold controls sensitivity. A value of 0.1 means a 0.1% pixel difference is allowed before flagging a failure. Set this too low and anti-aliasing differences between environments will flood you with false positives. Set it too high and you'll miss real regressions.

engine can be "puppeteer" or "playwright". Both use Chrome Headless under the hood.

The three core commands

BackstopJS has three commands you'll use constantly.

backstop reference

backstop reference

This takes screenshots of every scenario at every viewport and saves them as your "known good" baseline. Run this when your UI is in a correct state. These reference images are what future tests compare against.

backstop test

backstop test

This takes new screenshots and compares them against your references. If any screenshot differs beyond the misMatchThreshold, the test fails. BackstopJS exits with a non-zero code, so your CI pipeline catches it.

backstop approve

backstop approve

When a test fails because of an intentional design change, run approve to promote the new test screenshots to reference images. This is how you update your baseline after a deliberate UI update.

The workflow loop looks like this: reference (set baseline) -> make changes -> test (compare) -> approve (if changes are intentional) -> repeat.

Custom interactions with scripting

Static page screenshots are useful but limited. Most real applications need user interactions: clicking buttons, filling forms, waiting for animations, dismissing modals.

BackstopJS supports Puppeteer and Playwright scripts for custom interactions. You write a script, reference it in your scenario, and BackstopJS runs it before taking the screenshot.

Here's a Puppeteer script that logs into an app before capturing the dashboard.

// backstop_data/engine_scripts/puppet/login.js
module.exports = async (page, scenario, viewport) => {
  await page.goto(scenario.url);
  await page.waitForSelector('#email');
  await page.type('#email', 'test@example.com');
  await page.type('#password', 'password123');
  await page.click('[data-testid="login-btn"]');
  await page.waitForNavigation();
};

Then reference it in your scenario.

{
  "label": "Dashboard after login",
  "url": "https://staging.example.com/dashboard",
  "onBeforeScript": "puppet/login.js",
  "selectors": ["document"],
  "delay": 2000,
  "misMatchThreshold": 0.1
}

The onBeforeScript runs before the screenshot is taken. There's also onReadyScript which runs after the page loads but before capture. Use onBeforeScript for navigation and authentication. Use onReadyScript for final page state adjustments like scrolling to a specific position or hovering over an element.

You can also use Playwright as the engine. Just set "engine": "playwright" in your config and write your scripts using the Playwright API instead. If you're already using Playwright for visual testing, BackstopJS can fit into your existing workflow.

The HTML report UI

The report is one of BackstopJS's best features. After running backstop test, it generates an HTML report you can open in your browser.

The report shows three views for each failing scenario:

Reference image. What the page looked like when you approved it.
Test image. What the page looks like now.
Diff overlay. A pink/magenta highlight showing exactly which pixels changed.

There's also a scrubber interface. You drag a slider left and right to compare the reference and test images directly on top of each other. This is incredibly useful for catching subtle changes like a 1px border shift or a font weight change.

Passing scenarios show up with a green checkmark. Failing scenarios show up in red with the diff percentage. You can filter to see only failures, which is what you'll care about 99% of the time.

The report is static HTML. You can host it as a CI artifact. Many teams upload the html_report directory to S3 or attach it to their pull request for reviewers to inspect.

CI/CD integration

BackstopJS runs in any CI pipeline that supports Node.js and a headless browser. Here's a GitHub Actions example.

name: Visual Regression Tests
on: [pull_request]

jobs:
  visual-test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Setup Node.js
        uses: actions/setup-node@v4
        with:
          node-version: '20'

      - name: Install dependencies
        run: npm ci

      - name: Install BackstopJS
        run: npm install -g backstopjs

      - name: Run visual regression tests
        run: backstop test --config=backstop.json

      - name: Upload report
        if: failure()
        uses: actions/upload-artifact@v4
        with:
          name: backstop-report
          path: backstop_data/html_report

A few things to note.

Reference images live in your repo. Commit the backstop_data/bitmaps_reference directory. This is how your CI pipeline knows what the baseline looks like. Yes, this adds binary files to your repo. For most projects, the trade-off is acceptable. A typical reference set is 10-50 MB.

Chrome needs --no-sandbox in CI. The default Docker environments for GitHub Actions and GitLab CI run as root. Chrome refuses to start with sandboxing enabled under root. The --no-sandbox flag in engineOptions fixes this.

Environment consistency matters. Font rendering, anti-aliasing, and subpixel rendering differ between macOS, Windows, and Linux. If your developers run macOS and CI runs Ubuntu, you'll get false positives. Generate reference images in the same environment where tests run. This usually means generating references in CI, not on developer machines.

Docker solves the consistency problem. BackstopJS publishes a Docker image. Use it in CI to ensure identical rendering.

docker run --rm -v $(pwd):/src backstopjs/backstopjs reference
docker run --rm -v $(pwd):/src backstopjs/backstopjs test

This guarantees the same Chrome version, the same fonts, and the same rendering engine for both reference and test screenshots.

GitLab CI example

For teams on GitLab, the setup is similar.

visual-regression:
  image: backstopjs/backstopjs:latest
  stage: test
  script:
    - backstop test --config=backstop.json
  artifacts:
    when: on_failure
    paths:
      - backstop_data/html_report
    expire_in: 7 days
  only:
    - merge_requests

Using the official BackstopJS Docker image as your CI image eliminates font rendering inconsistencies entirely. No separate Chrome install needed. No --no-sandbox flag. It just works.

Updating references in CI

A common workflow: when a developer makes intentional UI changes, they need to update reference images. Don't let developers run backstop approve on their laptops. Instead, add a CI job that generates fresh references.

# In a dedicated CI step or script
backstop reference --config=backstop.json
# Then commit the updated reference images
git add backstop_data/bitmaps_reference
git commit -m "Update visual regression references"

Some teams automate this with a CI job triggered by a specific commit message or PR label. Others require a manual step to ensure someone actually reviewed the visual diff before approving.

Handling dynamic content

Real web apps have dynamic content. Timestamps, user avatars, ads, randomized content blocks. These change between runs and cause false positives.

BackstopJS handles this with a few built-in options.

hideSelectors and removeSelectors

{
  "label": "Homepage",
  "url": "https://staging.example.com",
  "hideSelectors": [".dynamic-banner", ".timestamp"],
  "removeSelectors": [".chat-widget", ".cookie-consent"],
  "selectors": ["document"]
}

hideSelectors sets elements to visibility: hidden. They still take up space but render as blank. removeSelectors sets elements to display: none. They're completely removed from layout. Use hideSelectors when you want to preserve page layout. Use removeSelectors when the element affects layout in ways you don't want to test.

delay and readySelector

{
  "label": "Dashboard with charts",
  "url": "https://staging.example.com/dashboard",
  "readySelector": ".chart-loaded",
  "delay": 500,
  "selectors": ["document"]
}

readySelector tells BackstopJS to wait until a specific element appears before taking the screenshot. delay adds extra time after the page loads. Use both together for pages with async content. The readySelector waits for the content to appear, and delay gives animations time to settle.

readyEvent

For complex single-page applications, you can fire a custom event from your app when it's ready.

{
  "label": "SPA dashboard",
  "url": "https://staging.example.com/app",
  "readyEvent": "backstopjs_ready",
  "selectors": ["document"]
}

Then in your app code (only in test environments), fire the event:

document.dispatchEvent(new Event('backstopjs_ready'));

This is the most reliable way to handle SPAs where page load events don't correspond to the UI being ready.

BackstopJS in the Drupal and WordPress world

BackstopJS has a particularly strong following in the Drupal and WordPress ecosystems. There's a good reason for this. CMS-driven sites change frequently, often by content editors who aren't running test suites. A theme update, a new plugin, a core version bump. Any of these can break layouts in ways that are invisible to automated functional tests.

At DrupalSouth 2025, there was a dedicated session on using BackstopJS for GovCMS workflows. Government sites need to maintain strict visual consistency across hundreds of pages. BackstopJS lets teams capture baselines for every template variation and catch regressions when Drupal modules update.

The typical setup for CMS projects:

Define scenarios for each page template (homepage, article, listing page, search results).
Add viewports for mobile, tablet, and desktop.
Run reference screenshots against a known-good staging environment.
Run tests after every deployment or dependency update.
Review the HTML report and approve intentional changes.

For WordPress specifically, teams often run BackstopJS after theme or plugin updates. A WooCommerce update might subtly change the cart layout. A PHP version bump might affect how certain shortcodes render. BackstopJS catches these before they reach production.

The pattern works well because CMS sites tend to have a finite set of page templates. You're not testing infinite user states. You're testing 10-20 templates across 3 viewports. That's 30-60 screenshots. Manageable. Meaningful. And the ROI is immediate: the first time BackstopJS catches a broken layout from a plugin update, it pays for the setup time ten times over.

Performance and scaling

BackstopJS runs scenarios in parallel by default. The asyncCaptureLimit setting controls how many scenarios capture simultaneously.

{
  "asyncCaptureLimit": 5,
  "asyncCompareLimit": 50
}

asyncCaptureLimit limits parallel browser instances during screenshot capture. Set this based on your machine's RAM. Each Chrome instance uses 100-300 MB. If you have 8 GB of RAM, keep this at 3-5.

asyncCompareLimit limits parallel image comparisons. Comparison is CPU-bound but not memory-intensive. You can set this much higher.

For large projects with 100+ scenarios, a full BackstopJS run takes 5-15 minutes depending on your CI machine. That's not fast enough for pre-commit hooks, but it's fine for PR checks and nightly runs.

If you need faster feedback, scope your tests. Run a subset of critical scenarios on every PR and the full suite nightly.

BackstopJS vs. paid tools

This is the question everyone asks. Here's an honest comparison.

BackstopJS vs. Percy

Percy (now part of BrowserStack) is the most well-known paid visual testing tool. The core difference: Percy runs in the cloud, manages infrastructure for you, and provides a team collaboration UI for approving/rejecting changes. BackstopJS runs locally or in your CI, and you manage everything.

Percy also supports cross-browser rendering. BackstopJS only uses Chrome. If you need to verify how your app looks in Safari or Firefox, BackstopJS can't help.

Percy's pricing starts at around $400/month for teams. BackstopJS is free.

BackstopJS vs. Applitools

Applitools uses AI-powered visual comparison. Instead of pixel-by-pixel matching, it uses computer vision to detect "meaningful" changes and ignore noise like anti-aliasing differences. This means fewer false positives. BackstopJS uses pure pixel comparison, which is more sensitive but also more noisy.

Applitools is significantly more expensive. It's the right choice for enterprise teams with hundreds of tests across multiple browsers. BackstopJS is the right choice for teams who want something free that works.

BackstopJS vs. Playwright visual comparison

Playwright has built-in screenshot comparison via expect(page).toHaveScreenshot(). It's lighter weight than BackstopJS but lacks the report UI, the configuration-driven approach, and the CLI workflow. If you're already using Playwright for functional tests, its built-in visual comparison might be enough. If you want a dedicated visual testing framework with better reporting and CMS-ecosystem support, BackstopJS is the better pick.

For a full comparison of options, see visual regression testing tools and open-source visual regression testing tools.

When BackstopJS is the wrong choice

Be honest about the trade-offs.

You need cross-browser testing. BackstopJS only runs Chrome. Period.

You need AI diffing. Pixel comparison generates false positives. Anti-aliasing, font rendering, and subpixel differences across environments will produce noise. You'll spend time tuning misMatchThreshold values and adding hideSelectors. AI-based tools like Applitools reduce this friction significantly.

You want zero infrastructure. BackstopJS requires you to run Chrome Headless, store reference images, and manage the pipeline. SaaS tools handle all of this.

Your team needs collaboration features. There's no web dashboard. No commenting. No approval workflows beyond backstop approve on the CLI. If your QA team and designers need to review visual changes together, a paid tool with a proper UI makes more sense.

If you want AI-powered visual testing without managing any infrastructure, Bug0 Studio handles visual regression detection as part of its end-to-end testing. No Chrome instances to manage, no reference images to store. If your team would rather hand off QA entirely, Bug0 Managed provides forward-deployed engineers who own your test suite.

Advanced configuration

Scenario-level overrides

Every top-level config option can be overridden at the scenario level. This is useful when different pages need different thresholds or viewports.

{
  "label": "Marketing hero section",
  "url": "https://staging.example.com",
  "selectors": [".hero"],
  "misMatchThreshold": 0.5,
  "viewports": [
    {
      "label": "desktop-only",
      "width": 1440,
      "height": 900
    }
  ]
}

This scenario uses a higher mismatch threshold (because the hero section has animated elements) and only tests the desktop viewport.

Multiple config files

For large projects, split your configuration into multiple files.

backstop test --config=backstop-homepage.json
backstop test --config=backstop-checkout.json
backstop test --config=backstop-admin.json

Run them in parallel in CI for faster feedback. Each config file can target a different part of your application with different thresholds and viewports.

For authenticated pages, set cookies directly.

{
  "label": "Admin dashboard",
  "url": "https://staging.example.com/admin",
  "cookiePath": "backstop_data/cookies.json",
  "selectors": ["document"]
}

The cookies file is a standard JSON array of cookie objects.

[
  {
    "name": "session_id",
    "value": "abc123",
    "domain": "staging.example.com",
    "path": "/",
    "httpOnly": true,
    "secure": true
  }
]

For more complex authentication (OAuth flows, multi-step login), use onBeforeScript with a Puppeteer or Playwright script as shown earlier.

Common pitfalls and how to avoid them

False positives from font rendering

This is the number one source of pain with BackstopJS. Different operating systems render fonts differently. macOS uses subpixel anti-aliasing. Ubuntu doesn't. The result: identical HTML produces slightly different screenshots, and BackstopJS flags them as failures.

Fix: Always generate reference screenshots in the same environment as your test screenshots. Use Docker. Or generate both reference and test in CI.

Reference image drift

If developers run backstop approve locally without reviewing the report carefully, bad screenshots become the new baseline. Over time, reference images drift from the actual intended design.

Fix: Treat reference image updates like code reviews. Require PR approval for changes to backstop_data/bitmaps_reference. Add a CI check that flags when reference images change.

Too many scenarios, slow pipeline

Each scenario at each viewport is a separate Chrome tab. 50 scenarios at 3 viewports means 150 screenshots. That takes time.

Fix: Prioritize. Test your most critical pages and components. Not every page needs visual regression testing. Focus on pages where visual bugs have the highest business impact: checkout, pricing, landing pages, onboarding flows.

Animations causing flaky tests

CSS animations and transitions create non-deterministic screenshots. The screenshot might capture the animation mid-frame.

Fix: Use delay to wait for animations to complete. Or disable animations in your test environment with a CSS override.

{
  "onReadyScript": "puppet/disableAnimations.js"
}

// backstop_data/engine_scripts/puppet/disableAnimations.js
module.exports = async (page) => {
  await page.evaluate(() => {
    const style = document.createElement('style');
    style.textContent = `
      *, *::before, *::after {
        animation-duration: 0s !important;
        transition-duration: 0s !important;
      }
    `;
    document.head.appendChild(style);
  });
};

Recommended project structure

Here's how to organize BackstopJS in a real project.

project-root/
  backstop.json
  backstop_data/
    bitmaps_reference/       # Committed to git
    bitmaps_test/            # Gitignored
    html_report/             # Gitignored
    engine_scripts/
      puppet/
        login.js
        disableAnimations.js
        onReady.js

Add to your .gitignore:

backstop_data/bitmaps_test
backstop_data/html_report
backstop_data/ci_report

Keep bitmaps_reference and engine_scripts in version control. Everything else is generated on each run.

FAQs

What is BackstopJS visual regression testing?

BackstopJS is an open-source, MIT-licensed tool for visual regression testing. It takes screenshots of your web pages using Chrome Headless, compares them against approved reference images pixel by pixel, and reports any visual differences. It's configured via a backstop.json file and runs from the command line.

Is BackstopJS free?

Yes. BackstopJS is completely free and open source under the MIT license. The GitHub repository is garris/BackstopJS. There are no paid tiers or premium features. You run it on your own infrastructure.

How does BackstopJS compare to Percy or Applitools?

BackstopJS is free and self-hosted. Percy and Applitools are paid SaaS tools. The main technical difference is that BackstopJS uses pure pixel comparison while Applitools uses AI-powered visual diffing that reduces false positives. Percy offers cloud rendering and cross-browser testing. BackstopJS only supports Chrome. For teams on a budget who are comfortable managing their own infrastructure, BackstopJS is a strong choice. For teams that need cross-browser support, AI diffing, or collaboration features, paid tools are worth the investment.

Yes. You have two options. First, you can provide session cookies via the cookiePath config option. Second, you can write Puppeteer or Playwright scripts that perform the login flow before screenshots are captured. Use the onBeforeScript scenario property to reference your login script.

Does BackstopJS work with Playwright?

Yes. BackstopJS supports both Puppeteer and Playwright as rendering engines. Set "engine": "playwright" in your backstop.json config and write your interaction scripts using the Playwright API. Both engines use Chrome Headless for rendering.

How do I reduce false positives in BackstopJS?

Three strategies. First, generate reference and test screenshots in the same environment (use Docker). Second, increase misMatchThreshold slightly (0.1-0.5%) to tolerate anti-aliasing differences. Third, use hideSelectors or removeSelectors to exclude dynamic content like timestamps, ads, and user-specific data.

Can I use BackstopJS in GitHub Actions or GitLab CI?

Yes. BackstopJS runs in any CI environment that supports Node.js and headless Chrome. Install BackstopJS, run backstop test, and upload the HTML report as a CI artifact on failure. Use the --no-sandbox Chrome flag in CI environments and the BackstopJS Docker image for consistent rendering.

When should I choose BackstopJS over built-in Playwright visual testing?

Choose BackstopJS when you want a dedicated visual testing framework with a configuration-driven approach, an interactive HTML report with diff overlays and scrubber, and strong CMS ecosystem support. Choose Playwright's built-in toHaveScreenshot() when you're already using Playwright for functional tests and want lightweight visual checks without a separate tool. BackstopJS is the better pick for teams doing visual-first testing across many pages and viewports.