CI/CD testing

tldr: CI/CD testing runs the right tests at each pipeline stage: fast unit and lint on commit, integration and smoke E2E on PR, full regression on merge, smoke and synthetic monitoring after deploy. The point is fast feedback at each stage, not running every test all the time.


What "CI/CD testing" actually means

Two related concepts get bundled into the term.

Continuous Integration testing. Tests that run when code is integrated (commit, PR, merge). Goal: catch issues early, before they accumulate.

Continuous Delivery / Deployment testing. Tests that gate or follow the release. Goal: ensure the build is shippable and confirm it works once shipped.

Both fit inside the same pipeline, but the test types and frequencies differ at each stage.


A pipeline that works

Most teams converge on roughly this shape.

Stage 1: On every commit

  • Lint, type check, formatter
  • Unit tests
  • Fast integration tests with stubs

Budget: under 5 minutes. Fails the build if anything regresses.

Stage 2: On PR open

  • Full integration suite
  • Smoke E2E (5-15 critical-path tests)
  • Security scan (SAST, dependency check)
  • Code coverage report

Budget: under 15 minutes. Provides a clear pass/fail signal for code review.

Stage 3: Pre-merge / on merge to main

  • Full E2E regression
  • Visual regression
  • Cross-browser checks
  • Performance baseline check

Budget: 30 minutes is reasonable. Acceptable to run slower if reliable.

Stage 4: Post-deploy

  • Smoke tests against production
  • Synthetic monitoring on critical flows
  • Real user monitoring (RUM)

Budget: continuous. Detects regressions within minutes.

Bug0 runs Stage 2 and Stage 3 as part of a forward-deployed QA team. The same test suite covers PR validation, post-merge regression, and post-deploy smoke. See testing in DevOps for the broader practice.


What makes CI/CD testing hard

Three problems compound.

1. Test speed

The unit test suite that ran in 2 minutes a year ago now runs in 25. The PR pipeline takes 45. Engineers stop running tests locally because they wait too long.

Fixes:

  • Parallelize at the test level.
  • Split slow tests into a separate stage.
  • Cache dependencies and build artifacts aggressively.
  • Run only affected tests on commits using build graph analysis.

2. Flakiness

Tests that pass sometimes and fail other times destroy trust. Engineers retry until green, which masks real failures.

Fixes:

  • Track flake rate per test. Quarantine repeat offenders.
  • Invest in deterministic test data and ephemeral environments.
  • Move to AI testing platforms for E2E, which are far less flake-prone than selector-based suites.

3. Maintenance

Selectors break. Fixtures drift. The suite rots. By year two, half the tests are commented out and no one runs the suite locally.

Fixes:

  • Budget time explicitly for test maintenance.
  • Use AI-driven test platforms that adapt to UI changes automatically. See test maintenance.

What to test at each layer

A useful mental model is the test pyramid, adapted for CI/CD.

LayerWhere in pipelineVolumeSpeed
UnitOn commitThousandsMilliseconds each
IntegrationOn commit / PRHundredsSeconds each
API contractOn PRTensSeconds each
E2E smokeOn PR5-20Seconds to minutes
E2E fullOn mergeHundredsMinutes total with parallelization
Production smokePost-deploy5-20Minutes

The pyramid is still mostly right. AI testing has made the E2E layers dramatically cheaper, so the practical mix tilts slightly toward more E2E than the original pyramid assumed.


Exit criteria for the pipeline

A build "passes CI/CD" when:

  • All unit and integration tests pass.
  • All P0 acceptance criteria covered by smoke tests pass.
  • Full regression on main is green.
  • No new critical security findings.
  • Performance budgets met.

If any of these fail, the build does not deploy. Tools like feature flags and canary deployments add a safety layer beyond the pipeline itself.


FAQs

Should every commit run E2E tests?

No. Run smoke E2E on PR, full E2E on merge. Running every E2E test on every commit makes the feedback loop too slow.

How long should the full pipeline take?

PR pipeline under 15 minutes. Post-merge pipeline under 30 minutes. Above these thresholds, parallelize or split.

What about deploying to production?

The pipeline produces a deployable artifact. Whether to deploy is a separate decision (manual approval, time-of-day rules, feature flag rollout). See deployment testing.

Should I use the same test for PR and post-deploy?

Yes, where possible. Same suite means same coverage, same confidence. Tools like Bug0 run identical flows against PR environments, staging, and production from a single source of truth.

How does Bug0 fit CI/CD?

Bug0 plugs into your CI as an outsourced QA team that runs E2E on every PR and reports back to the pipeline as a gate. Failed tests block the merge. Passing tests trigger downstream stages.

Ship every deploy with confidence.

Bug0 gives you a dedicated AI QA engineer that tests every critical flow, on every PR, with zero test code to maintain. 200+ engineering teams already made the switch.

From $2,500/mo. Full coverage in 7 days.

Go on vacation. Bug0 never sleeps. - Your AI QA engineer runs 24/7

Go on vacation.
Bug0 never sleeps.

Your AI QA engineer runs 24/7 — on every commit, every deploy, every schedule. Full coverage while you're off the grid.