tldr: Test execution runs your test suite and records pass, fail, blocked, and skipped results. The mechanics that scale from 10 tests to 10,000: parallelization, isolation, deterministic data, and clear failure context.
What "execution" actually involves
Six things happen on every test run.
- Provision the environment.
- Set up test data.
- Run the test.
- Record the result.
- Capture artifacts on failure.
- Tear down the environment.
Each is a place tests fail in non-test ways. A test that fails because the environment did not provision is a test infrastructure failure, not a product bug. The two get confused often.
Parallelization
Sequential test runs do not scale past a few hundred tests. After that, suite duration becomes the bottleneck.
Three parallelization strategies.
Test-level parallelism. Each test runs in its own worker. Requires test isolation (no shared state).
Suite-level parallelism. Suites run in parallel; tests within a suite run sequentially. Less isolation needed.
Cross-machine parallelism. Tests distributed across multiple CI runners. Needed for very large suites.
Most modern test frameworks (Jest, Pytest, JUnit, Playwright) support test-level parallelism out of the box. Use it.
Isolation
Tests that share state break under parallelism. The minimum isolation requirements:
- Each test has its own data (or a known, reset state).
- Each test has its own environment (or a clean fixture).
- No test depends on the order of other tests.
Achieving this is harder than it sounds. The most common violation: tests that depend on shared database state. Fix: per-test transactions or ephemeral databases.
Recording results
Each test result has a status, a duration, and (on failure) artifacts.
Statuses worth tracking:
- Pass. Worked as expected.
- Fail. Did not work as expected.
- Blocked. Could not run because of an environment issue.
- Skipped. Intentionally not run.
- Flaky. Passed on retry. Worth tracking separately.
Lumping blocked into fail hides infrastructure problems. Lumping flaky into pass hides test stability issues.
Failure artifacts
A failure without artifacts is a failure that takes hours to debug. Capture:
- Stack trace
- Screenshot or video (for UI tests)
- Network logs (for integration and E2E tests)
- DOM snapshot (for web tests)
- Server logs from the test environment
Bug0 captures all of these on every failure. Most teams using traditional automation miss at least two.
Continuous execution
Modern teams run tests continuously, not on a schedule.
- On every commit: fast tests (unit, lint).
- On every PR: integration and smoke E2E.
- On every merge: full E2E regression.
- Continuous synthetic monitoring on production.
The frequency matters. Tests that run hourly catch bugs hours after they were introduced. Tests that run on every commit catch them within minutes.
What slows execution down
Sequential setup. Spinning up environments takes time. Cache and reuse where possible.
Slow tests in the critical path. A 5-minute test in the unit suite blocks the whole suite. Move slow tests to a separate stage.
Flaky tests with retries. A retried test takes 3x as long. Fix the flake or remove the test.
Bloated test data. A 10GB seed database takes minutes to load. Trim aggressively.
FAQs
How long should the test suite take?
Unit and integration: under 5 minutes. Full E2E: under 30 minutes. Above this, parallelize or split.
Should I retry failed tests?
For genuine flakes, yes (with limits). For real failures, no. Retry without diagnosis hides bugs.
What is the difference between pass rate and stability?
Pass rate is the percentage of tests passing. Stability is whether the same test passes consistently. A test with 99% pass rate may still be flaky.
How does Bug0 handle test execution?
Bug0 runs tests in parallel by default with full artifact capture on failure. Test stability is high because the agent adapts to UI changes that would break selector-based tests.
