tldr: Production testing means running tests against your live environment. Done with feature flags, canary deployments, synthetic users, and observability, it catches bugs no staging environment can reproduce. Done carelessly, it is how you create incidents.

Why teams test in production

Three problems make production testing necessary.

Staging is never production. Different data scale, different traffic patterns, different external integrations. Bugs that depend on these never surface until the code reaches production.

Some integrations are unique to production. Payment processors, carrier APIs, partner OAuth flows often have no realistic non-production equivalent.

Real users do real things. No matter how thorough QA is, users will exercise paths the team did not predict.

The choice is not whether bugs reach production. They will. The choice is whether you find them or your users do.

Safe production testing techniques

Five techniques cover most cases. None require shipping untested code blind.

1. Feature flags

Ship the code disabled. Turn it on for internal users, then 1%, then 10%, then everyone. If something breaks, flip the flag off.

This is the foundation of safe production work. Without feature flags, every other technique is more dangerous.

2. Canary deployments

Deploy the new version to one server while the rest run the old version. Compare error rates, latency, and business metrics. If the canary looks worse, abort.

Canary requires good observability. Without metrics, you have no way to compare.

3. Synthetic users

Run automated tests against production using known test accounts. Continuously verify that critical flows still work. Most failures get caught within minutes instead of hours.

Tools like Bug0 make this practical: write a flow once, schedule it, get alerted when it fails. The open-source Passmark engine runs against any environment, including production.

4. Shadow traffic

Send a copy of real production traffic to the new system, but throw away its responses. Compare the new system's behavior to the old. Useful for backend rewrites and migrations.

5. A/B testing

Show different versions of a feature to different user segments. Measure outcomes. Useful for product decisions, also useful for catching subtle regressions before full rollout.

What you should not test in production

Some tests do not belong in production, period.

Load tests that affect real users. Run those against staging that mirrors production.
Tests that write garbage data. Even with cleanup scripts, tests leak. Use synthetic accounts only.
Destructive tests. Anything that drops, deletes, or corrupts. Save for ephemeral environments.
Penetration tests without coordination. Active scans on production trip security alerts and look like real attacks.

What "synthetic users" means in practice

A synthetic user is a real account in production that is owned by your engineering team and used only for testing.

Properties of a good synthetic account:

Tagged in your analytics so its activity does not pollute metrics.
Has realistic but synthetic data (fake name, fake email at a domain you control).
Cannot affect other users (does not appear in social features, leaderboards, etc.).
Has a known initial state, restored before each test run.

Most production test suites run against five to ten synthetic accounts: one per major user role.

Observability is non-negotiable

Production testing without observability is reckless. You need at minimum:

Error tracking (Sentry, Datadog Errors).
Request logging with correlation IDs.
Real-time metrics on the flows you are testing.
Alerting tied to deployment events.

A failed production test is useful only if you can trace what happened. Without observability, you have a red light and no way to diagnose.

FAQs

Is production testing the same as monitoring?

Related but not the same. Monitoring observes whatever traffic happens. Production testing actively exercises specific flows. Both are valuable.

How often should I run production tests?

Critical paths every 5 to 15 minutes. Important paths hourly. Less critical paths daily. Calibrate based on how quickly you would want to know about a failure.

Will my synthetic users skew analytics?

Only if you do not tag them. Most analytics platforms support filtering specific user IDs out of reports.

What happens when a production test fails?

It pages an on-call engineer. The test artifact (screenshot, network log, DOM snapshot) makes the issue immediately diagnosable.

How does Bug0 enable production testing?

Bug0 is an outsourced QA team that runs your end-to-end suite against production on a schedule, with full reproduction artifacts when something fails. The same tests run against staging, PRs, and production from a single source of truth.

Production testing