Playwright CLI vs. Playwright MCP: which one should drive your AI browser tests?

Cover image for Playwright CLI vs. Playwright MCP: which one should drive your AI browser tests?

tldr: Microsoft ships two ways to hand an AI agent a browser: Playwright MCP and the newer Playwright CLI. Both shipped fresh releases on June 10, 2026, and they answer the same question with opposite trade-offs. MCP keeps a live browser and rich page state in context. The CLI keeps your token budget. Here is how to choose, and the part neither one solves for you.


Two tools, same job, opposite bills

You want an AI agent to open your app, click through a flow, and tell you what broke. Until this year you had one official path for that: Playwright MCP. Point Claude Code or Cursor at the server, and the agent drives a real browser.

Earlier in 2026, Microsoft shipped a second path: Playwright CLI, published as @playwright/cli and built for coding agents. On June 10, 2026, both tools put out new releases on the same day, the CLI at version 0.1.14 and the MCP server at version 0.0.76. Same team, two front doors to the same engine.

This confuses people, and the confusion is fair. Both let an agent navigate, click, type, and screenshot. Both run on top of Playwright. Read the marketing for each and you would think they compete.

They do not. They split the work by where your agent spends its budget. One spends tokens. The other spends shell access. Pick the wrong one and your agent either runs out of context halfway through a suite, or it cannot run at all.

Let me define both, then show you the seam.


What Playwright MCP is

Playwright MCP is a Model Context Protocol server. It exposes the browser to an LLM as a set of tools. The agent calls browser_click, browser_type, browser_snapshot, and the server runs the Playwright action and hands back the result.

The key design choice is how it sees the page. MCP does not send screenshots by default. It sends an accessibility tree snapshot, a structured text map of every element, its role, and its label. The model reasons over that text instead of pixels. No vision model required. The element references are deterministic, so the agent is not guessing at coordinates.

That snapshot is the whole point, and also the whole cost. You get rich, continuous page state the agent can reason over across many turns. You also pay for that state in tokens, every turn.

You install it as a server entry in your MCP client:

{
  "mcpServers": {
    "playwright": {
      "command": "npx",
      "args": ["@playwright/mcp@latest"]
    }
  }
}

It works in Claude Desktop, Cursor, VS Code, Goose, and anything else that speaks MCP. The server exposes tools across navigation, tabs, network mocking, storage, tracing, and locator generation.


What Playwright CLI is

Playwright CLI is a command-line tool built for coding agents that already have a shell and a filesystem. Claude Code. GitHub Copilot. Cursor in agent mode. The agent runs playwright-cli click, playwright-cli snapshot, playwright-cli screenshot, and reads results back from stdout or from files on disk.

You install it globally and pull in the skills:

npm install -g @playwright/cli@latest
playwright-cli install --skills

The agent learns the command set by running playwright-cli --help when it needs it. Nothing gets pre-loaded into context. A snapshot writes a compact element map. A screenshot saves a PNG to disk and returns the path, not the image bytes.

Here is Microsoft's own line on why this exists, straight from the CLI docs:

CLI invocations are more token-efficient: they avoid loading large tool schemas and verbose accessibility trees into the model context.

That sentence is the entire argument. Hold onto it.


The real difference: token budget vs. persistent state

Strip away the surface and the split is simple.

MCP loads the tool schemas and the page snapshot into the model's context window. That is what makes it powerful for a long, autonomous loop. The agent keeps a live browser open, keeps reasoning over the same page state, and self-corrects across dozens of turns without re-establishing where it is.

The CLI keeps almost nothing in context. Each command runs, returns a short result, and exits. Artifacts go to disk. The agent reaches back for detail only when it asks.

So the question is not "which tool is better." It is "what is scarce for this agent." If the scarce resource is context, because the agent is also holding your codebase, your test files, and a long reasoning chain, the CLI wins. If the scarce resource is continuity, because the agent needs to explore and heal over a long unattended run, MCP wins.

One independent benchmark put numbers on it: roughly 114,000 tokens for a browser task over MCP versus roughly 27,000 over the CLI, about a 4x cut (ytyng.com, 2026). Treat that as directional, not gospel. Your mileage depends on page complexity and how often the agent snapshots. The direction matches Microsoft's own framing, so the shape is right even if the multiplier moves.


Playwright MCP: advantages and disadvantages

Where MCP wins

Persistent browser, persistent context. The session stays live across turns. For a long autonomous run, the agent never loses its place. This is why Microsoft points self-healing and exploratory loops at MCP.

Rich introspection. The accessibility snapshot gives the model a full, structured view of the page to reason over. When the agent needs to understand a complex form or a deep menu, that detail is right there in context.

Runs anywhere MCP runs. No shell, no filesystem required. If your agent only speaks MCP, like a hosted assistant or a locked-down client, this is the path that works at all.

Deterministic, no vision model. Snapshot mode uses stable element references. You are not paying for image tokens or fighting coordinate drift. Vision is there as an opt-in for the cases that need it.

Where MCP hurts

It eats your context window. The schemas and snapshots that make MCP smart are the same bytes that crowd out your codebase. For a coding agent generating a full suite, that is a real ceiling.

Snapshots balloon on heavy pages. A dense dashboard produces a large accessibility tree. Feed that back turn after turn and the cost compounds fast.

Heavier to stand up. You configure a server. Fine for a permanent setup, friction for a quick task inside a shell you already have.

It is not a security boundary. The docs say this plainly. An agent driving a real browser with your session needs the same guardrails you would put on any automation with credentials.


Playwright CLI: advantages and disadvantages

Where the CLI wins

Token efficiency. No schemas in context, no full tree dumped per turn, artifacts on disk. The agent spends its window on your code and its reasoning, not on plumbing.

It fits how coding agents already work. Claude Code and Copilot have a shell and a filesystem. The CLI meets them there. No new transport to wire up.

Artifacts land on disk, not in context. Screenshots, videos, and traces save as files. The agent gets a path. You get evidence you can open later without paying for it in tokens now.

Simple install, skills on demand. One global install, then the agent reads --help when it needs a command. The surface area stays out of the way until it is wanted.

Where the CLI hurts

It needs a shell and a filesystem. No shell, no CLI. A pure MCP client cannot use it. This is the mirror image of MCP's reach.

Less continuous page state. Command-by-command execution means the agent reconstructs context more often. For a long unattended self-healing loop, that is exactly the continuity MCP was built to keep.

Younger and smaller. It launched this year. The ecosystem, the examples, and the community answers are thinner than MCP's.

More bookkeeping for the agent. Session names, artifact paths, and disk cleanup are now the agent's job. Power, with a little more rope.


How to actually choose

Skip the feature grid. Ask one question: what is your agent short on?

Pick the CLI when the agent is a coding agent with your repo open. It is generating or repairing a test suite, it already has a shell, and its context is precious because your codebase is sitting in it. This is the common case for Claude Code and Copilot, and it is the case Microsoft built the CLI for.

Pick MCP when the agent runs long, runs alone, and needs to keep a browser and its page state alive across many turns. Exploratory crawling. A self-healing loop that reasons over structure for an hour. A client that speaks only MCP and has no shell at all.

Plenty of teams will run both. The CLI for the agent that writes the tests, MCP for the agent that roams the app unattended. They are not rivals. They are two tools on the same belt.


The problem neither tool solves

Here is the part the CLI-vs-MCP debate quietly skips.

Both tools are about authoring. They give an agent a browser so it can write or run a test once. Neither one owns what happens on run number two hundred.

Because the test the agent wrote still goes stale. The selector it picked still breaks when you rename a button. The flow it learned on Tuesday still fails on Friday when the UI shifts. You have automated the writing and inherited the maintenance. That is the bill that actually hurts, and it arrives every sprint, forever.

This is the gap Passmark was built to close. Passmark is the open-source Playwright library that runs Bug0's AI testing, and it treats AI as the fallback, not the hot path.

You write a step in plain English. Passmark resolves it to a Playwright action once, then caches the proven action by flow and description. On every run after, the cached action replays in milliseconds with no AI call. So the token-efficiency argument that splits the CLI and MCP camps mostly disappears at the suite level. You are not paying a model to re-derive a working click on every run. You pay once, then replay.

await runSteps({
  page,
  userFlow: "Add product to cart",
  steps: [
    { description: "Navigate to https://demo.vercel.store" },
    { description: "Click Acme Circles T-Shirt" },
    { description: "Add to cart", waitUntil: "My Cart is visible" },
  ],
  assertions: [{ assertion: "My Cart shows Acme Circles T-Shirt" }],
  test,
  expect,
});

When the UI changes and a cached step fails, Passmark does not fail the run on a dead selector. It re-resolves the step with fresh AI execution, re-caches the new action, and keeps going. The suite heals itself once and stays fast after. That is the maintenance bill, gone, without a human editing locators.

Assertions get the same treatment from the other end. Each one runs on Claude and Gemini in parallel, and an arbiter model settles any disagreement, so a single model's hallucination never decides your pass or fail. For ephemeral UI like a toast that flashes and vanishes, you flip video: true and the assertion runs against the recorded step sequence instead of a lucky screenshot.

Be clear about the boundaries. Passmark is the wrong tool for a one-off. If you need an agent to scrape a page once, reach for the CLI. If you are building an agent that roams an app with no fixed path, MCP fits better. Passmark earns its keep on the flows you run on every deploy, the ones you are tired of repairing by hand. That is regression testing, and it is the only job it claims.

The CLI and MCP get an agent to write the test. Passmark gets the test to survive your next deploy. Different problems. You need both solved.


Where this leaves you

The CLI-vs-MCP question is real and worth getting right. If your coding agent keeps running out of room, move it to the CLI. If your autonomous agent keeps losing its place, give it MCP. Microsoft built two doors on purpose, and now you know which is which.

But do not mistake authoring for testing. An agent that writes a thousand tests has handed you a thousand things to maintain. The win is not generating tests faster. The win is owning a suite that stays green while you ship, without a person babysitting selectors.

That is the outcome Bug0 delivers. A forward-deployed engineer plans your coverage, builds it on Passmark, and verifies every run, while the engine caches, heals, and reaches consensus on its own. You get the speed of an AI agent and the durability of a maintained suite, for one flat rate. You build, we test.


FAQs

What is the difference between Playwright CLI and Playwright MCP?

Both let an AI agent drive a Playwright browser. Playwright MCP is a Model Context Protocol server that keeps a live browser and a full accessibility-tree snapshot in the model's context, which suits long autonomous loops. Playwright CLI is a command-line tool that runs each action as a shell command and writes artifacts to disk, which keeps token usage low for coding agents. MCP spends tokens to hold state. The CLI spends shell access to save tokens.

Which is more token-efficient, Playwright CLI or MCP?

The CLI. Microsoft's own docs say CLI invocations avoid loading large tool schemas and verbose accessibility trees into context. One independent 2026 benchmark measured roughly 114,000 tokens for a task over MCP versus roughly 27,000 over the CLI. Treat the exact figure as directional, since it depends on page complexity and snapshot frequency, but the direction is consistent with Microsoft's framing.

When should I use Playwright MCP instead of the CLI?

Use MCP when the agent runs long and alone and needs continuous browser state, such as exploratory crawling or a self-healing loop that reasons over page structure for an extended run. Also use MCP when your client speaks only MCP and has no shell or filesystem access, since the CLI requires both.

Do Playwright CLI and MCP replace writing tests by hand?

No. Both help an agent author or run a test. Neither maintains it. The test still breaks when your UI changes, and you still own the repair. That maintenance bill is the real cost of a test suite over time.

How does Passmark fit with Playwright CLI and MCP?

Passmark is the open-source Playwright library that runs Bug0's AI testing. It is about durability, not just authoring. You write steps in plain English, Passmark caches the resolved Playwright action so repeat runs skip the AI, and it auto-heals the step when the UI changes. Where the CLI and MCP get an agent to write a test once, Passmark keeps that test passing across deploys. Source is at github.com/bug0inc/passmark.

How does Bug0 use all of this?

Bug0 Managed is a done-for-you QA service built on Passmark. A dedicated forward-deployed engineer plans your coverage, builds your tests on the Passmark engine, and verifies every run, while the engine caches, heals, and reaches multi-model consensus on assertions. You get full coverage and a suite that stays green, for a flat monthly rate, without operating any of the tooling yourself.

playwrightPlaywright MCPai testingbrowser automation

Ship every deploy with confidence.

Bug0 gives you a dedicated AI QA engineer that tests every critical flow, on every PR, with zero test code to maintain. 200+ engineering teams already made the switch.

From $2,500/mo. Full coverage in 7 days.

Go on vacation. Bug0 never sleeps. The AI tests every commit, every deploy, every schedule. Your forward-deployed engineer reviews every failure and files the bugs. Coverage holds while you're off the grid.

Go on vacation.
Bug0 never sleeps.

The AI tests every commit, every deploy, every schedule. Your forward-deployed engineer reviews every failure and files the bugs. Coverage holds while you're off the grid.