Deployment testing

tldr: Deployment testing verifies that a release is safe to deploy and behaves correctly after deploy. Pre-deploy smoke tests, canary checks, rollback drills, and post-deploy synthetic monitoring all fall under this umbrella.


What deployment testing covers

Three phases, each with its own checks.

1. Pre-deploy

The build is ready, the pipeline is green, but is the deploy itself safe?

  • Migration scripts apply cleanly on a production-shaped database.
  • Configuration changes are validated against the target environment.
  • Feature flags default to the correct state.
  • Rollback procedure is documented and tested.

This stage catches the class of bug where the code is fine but the deploy mechanism is broken.

2. During deploy

The deployment is happening. Tests run continuously to detect immediate failures.

  • Health checks pass on each new instance.
  • Old and new versions coexist correctly during rolling deployment.
  • Database migrations complete without locking the table for too long.
  • Traffic shifts gradually if using canary or blue-green.

3. Post-deploy

The new code is live. Tests verify it actually works in production.

  • Smoke tests against the live environment.
  • Synthetic monitoring on critical user flows.
  • Real user monitoring picks up unusual error or latency patterns.
  • Business metrics (orders, signups, conversions) stay within expected ranges.

See production testing for the broader pattern of testing live systems.


What "safe to deploy" means

Specific exit criteria, not vibes.

  • All pipeline tests pass. See CI/CD testing.
  • Migration tested against a production-data clone.
  • Feature flags configured intentionally for this release.
  • Runbook reviewed for the changes in this build.
  • Roll-forward and rollback paths defined.
  • On-call engineer identified for the deploy window.

Skipping any of these increases the risk of a deploy-time incident.


Canary deployments

Deploy to a small fraction of traffic first. Compare metrics between old and new versions. If the new version looks worse, abort.

Useful for:

  • Changes with hard-to-predict load profiles.
  • Changes affecting business-critical metrics.
  • Anything where rolling back is expensive.

Tools: built into most modern orchestration platforms (Kubernetes via Argo Rollouts, AWS CodeDeploy, GCP Cloud Deploy).


Blue-green deployments

Run two complete copies of the production environment. Switch traffic between them instantly. Rollback is a traffic switch.

Useful for:

  • Stateless services where doubling infrastructure briefly is affordable.
  • Cases where you need instant rollback.

Less useful for:

  • Stateful systems where blue and green cannot share state.
  • Cost-constrained deployments.

Rollback testing

The forgotten half of deployment testing. Most teams know how to deploy. Fewer have actually tested the rollback path.

Quarterly rollback drills:

  • Deploy a known-good change to staging.
  • Roll it back.
  • Verify the system returns to a fully working state.

Without this, rollback becomes the high-stress experiment you run during an incident.


What gets missed

Database migrations. A new column with a default value seems harmless. On a 100-million-row table, it can lock writes for an hour. Test migrations on production-shaped data before deploy.

Feature flag defaults. A new feature behind a flag, the flag defaults to "on," and the feature ships unexpectedly. Always default new flags to "off" and explicitly toggle them on after deploy.

Third-party integrations. The new code calls a partner API that has a stricter rate limit than expected. Production traffic exceeds it. Test against the real partner sandbox where possible.

Cache and session state. A new schema for cached objects, deployed without cache invalidation, breaks every user with an active session. Always plan cache invalidation as part of deploy.


How AI testing fits

AI testing platforms like Bug0 run end-to-end flows continuously against any environment. Pre-deploy, the same tests run against the release candidate. Post-deploy, they run against production. A failure surfaces with full reproduction context (screenshot, network trace, DOM snapshot) within minutes.

This compresses the deploy-test-monitor loop: by the time the deploy completes, the smoke results are already in.


FAQs

How is deployment testing different from release testing?

Release testing verifies the build is shippable. Deployment testing verifies the deploy mechanism itself works. Different concerns, both needed.

Should I test deploys to staging the same way?

Yes. Staging deploys should mirror production deploys exactly. If you only test deploy procedures against production, you have no rehearsal.

What about zero-downtime deploys?

That is the goal of rolling, canary, and blue-green strategies. Each has its own deployment test requirements. Rolling deploys need to verify old/new version coexistence; blue-green needs traffic-switch verification.

How often should rollback be tested?

Quarterly minimum. Monthly if you deploy frequently. The cost of testing is small. The cost of a broken rollback during an incident is enormous.

How does Bug0 help with deployment testing?

Bug0 is a done-for-you QA service that runs the same E2E suite against pre-deploy, staging, and production environments. Deploy gate, smoke check, and ongoing monitoring all use one source of truth.

Ship every deploy with confidence.

Bug0 gives you a dedicated AI QA engineer that tests every critical flow, on every PR, with zero test code to maintain. 200+ engineering teams already made the switch.

From $2,500/mo. Full coverage in 7 days.

Go on vacation. Bug0 never sleeps. - Your AI QA engineer runs 24/7

Go on vacation.
Bug0 never sleeps.

Your AI QA engineer runs 24/7 — on every commit, every deploy, every schedule. Full coverage while you're off the grid.