Types of Testing (and How They Fit Together)
Testing Fundamentals introduces the individual techniques; this page is the map. Conversations about testing go wrong because "unit test", "black-box test" and "performance test" sound like items on one list when they actually answer different questions. Untangle the axes and the types stop competing and start cooperating.
Three Axes That Get Muddled
Axis 1 — Scope: how much of the system is under test?
| Level | Under test | Doubles/fakes for | Typical speed |
|---|---|---|---|
| Unit | One behaviour (function, class) | Everything else | Milliseconds |
| Integration | A few components wired together (code + database, code + queue) | Remote third parties | Seconds |
| System / end-to-end | The deployed application | As little as possible | Minutes |
| Acceptance | A stakeholder-visible behaviour, phrased in their language | Varies | Varies |
Axis 2 — Knowledge: what can the test see? Black-box tests know only the specification; white-box tests exploit the code's structure (see Code Path Analysis); grey-box mixes the two. Any scope can be tested at any knowledge level — a black-box unit test is perfectly normal.
Axis 3 — The question being asked. Functional testing asks "does it do the right thing?"; non-functional testing asks "does it do the thing right?" — performance, load, security, accessibility, usability, recoverability. A load test is not a "bigger" unit test; it answers a different question entirely.
The Same Behaviour at Three Altitudes
The types are complementary because each altitude fails for a different reason. Here is one pricing rule examined three ways:
# UNIT — the pricing rule in isolation (milliseconds; run thousands)
def test_gold_discount_rate():
assert discount_for(tier="gold", total=100) == 15
# INTEGRATION — the rule wired to a real repository (seconds; run dozens)
def test_discount_uses_stored_tier(postgres):
repo = CustomerRepository(postgres)
repo.save(Customer(id=7, tier="gold"))
engine = PricingEngine(repo)
assert engine.discount_for_customer(7, total=100) == 15
# ACCEPTANCE — the behaviour a stakeholder asked for (minutes; run a few)
# Given a gold-tier customer
# When they check out a £100 basket
# Then the receipt shows a "£15 loyalty discount"
# (executable via pytest-bdd — see the TDD/BDD page)
// UNIT — the pricing rule in isolation (milliseconds; run thousands)
TEST_CASE("gold discount rate") {
CHECK(discount_for("gold", 100) == 15);
}
// INTEGRATION — the rule wired to a real repository (seconds; run dozens)
TEST_CASE("discount uses stored tier") {
CustomerRepository repo{postgres_url};
repo.save(Customer{7, "gold"});
PricingEngine engine{repo};
CHECK(engine.discount_for_customer(7, 100) == 15);
}
// ACCEPTANCE — the behaviour a stakeholder asked for (minutes; run a few)
// Given a gold-tier customer
// When they check out a £100 basket
// Then the receipt shows a "£15 loyalty discount"
// (executable via cucumber-cpp — see the TDD/BDD page)
// UNIT — the pricing rule in isolation (milliseconds; run thousands)
@Test
void goldDiscountRate() {
assertEquals(15, discountFor("gold", 100));
}
// INTEGRATION — the rule wired to a real repository (seconds; run dozens)
@Test
void discountUsesStoredTier() {
var repo = new CustomerRepository(postgresUrl);
repo.save(new Customer(7, "gold"));
var engine = new PricingEngine(repo);
assertEquals(15, engine.discountForCustomer(7, 100));
}
// ACCEPTANCE — the behaviour a stakeholder asked for (minutes; run a few)
// Given a gold-tier customer
// When they check out a £100 basket
// Then the receipt shows a "£15 loyalty discount"
// (executable via Cucumber-JVM — see the TDD/BDD page)
// UNIT — the pricing rule in isolation (milliseconds; run thousands)
[Fact]
public void GoldDiscountRate() =>
Assert.Equal(15, DiscountFor(tier: "gold", total: 100));
// INTEGRATION — the rule wired to a real repository (seconds; run dozens)
[Fact]
public void DiscountUsesStoredTier()
{
var repo = new CustomerRepository(PostgresUrl);
repo.Save(new Customer(7, "gold"));
var engine = new PricingEngine(repo);
Assert.Equal(15, engine.DiscountForCustomer(7, total: 100));
}
// ACCEPTANCE — the behaviour a stakeholder asked for (minutes; run a few)
// Given a gold-tier customer
// When they check out a £100 basket
// Then the receipt shows a "£15 loyalty discount"
// (executable via SpecFlow — see the TDD/BDD page)
# UNIT — the pricing rule in isolation (milliseconds; run thousands)
RSpec.describe "gold discount rate" do
it "is 15% of the total" do
expect(discount_for(tier: "gold", total: 100)).to eq(15)
end
end
# INTEGRATION — the rule wired to a real repository (seconds; run dozens)
RSpec.describe PricingEngine do
it "uses the customer's stored tier" do
repo = CustomerRepository.new(postgres_url)
repo.save(Customer.new(id: 7, tier: "gold"))
engine = PricingEngine.new(repo)
expect(engine.discount_for_customer(7, total: 100)).to eq(15)
end
end
# ACCEPTANCE — the behaviour a stakeholder asked for (minutes; run a few)
# Given a gold-tier customer
# When they check out a £100 basket
# Then the receipt shows a "£15 loyalty discount"
# (executable via Cucumber — see the TDD/BDD page)
If the unit test fails, the rule is wrong. If only the integration test fails, the wiring is wrong (the tier isn't being read from the database). If only the acceptance test fails, everyone built what nobody wanted. Deleting any layer doesn't remove the failures — it removes your ability to localise them.
The Test Pyramid (and Its Critics)
The pyramid's advice is economic, not moral: put most tests where they are cheapest to write, fastest to run and easiest to diagnose [1]. The classic anti-pattern is the inverted "ice-cream cone" — hundreds of slow, flaky end-to-end tests over a hollow unit layer. But the pyramid is a heuristic, not scripture: teams whose risk lives in integration (heavy I/O, thin logic) legitimately fatten the middle layer [2].
A Question-Driven Field Guide
| The question you're asking | Reach for |
|---|---|
| Is this logic correct for the cases I thought of? | Unit tests (example-based) |
| Is it correct for the cases I didn't think of? | Property-based tests (see TDD & BDD) |
| Did I exercise all the code's decision points? | Code path analysis and coverage criteria |
| Do my components actually talk to each other? | Integration tests (real database, real queue) |
| Will the other team's service still accept my messages? | Contract tests (Pact) |
| Does the whole thing work for a user? | A few end-to-end journeys |
| Did we build what the stakeholder meant? | Acceptance tests in their language (BDD) |
| Would my tests notice if the code were wrong? | Mutation testing |
| Is testing happening at the right project stage — including ethics? | The V model, with its third V for Values |
Read the table top to bottom and a shape emerges: the types form a pipeline of doubt. Each one removes a category of uncertainty the others cannot see, and the last two audit the process itself — mutation testing checks the tests, the V model checks that the right conversations happened at the right time.
References
- Fowler, M. (2012). "TestPyramid." https://martinfowler.com/bliki/TestPyramid.html
- Vocke, H. (2018). "The Practical Test Pyramid." martinfowler.com. https://martinfowler.com/articles/practical-test-pyramid.html
- ISTQB. Standard Glossary of Terms Used in Software Testing. https://glossary.istqb.org/