Types of Testing (and How They Fit Together)

Testing Fundamentals introduces the individual techniques; this page is the map. Conversations about testing go wrong because "unit test", "black-box test" and "performance test" sound like items on one list when they actually answer different questions. Untangle the axes and the types stop competing and start cooperating.

Three Axes That Get Muddled

Axis 1 — Scope: how much of the system is under test?

Level	Under test	Doubles/fakes for	Typical speed
Unit	One behaviour (function, class)	Everything else	Milliseconds
Integration	A few components wired together (code + database, code + queue)	Remote third parties	Seconds
System / end-to-end	The deployed application	As little as possible	Minutes
Acceptance	A stakeholder-visible behaviour, phrased in their language	Varies	Varies

Axis 2 — Knowledge: what can the test see? Black-box tests know only the specification; white-box tests exploit the code's structure (see Code Path Analysis); grey-box mixes the two. Any scope can be tested at any knowledge level — a black-box unit test is perfectly normal.

Axis 3 — The question being asked. Functional testing asks "does it do the right thing?"; non-functional testing asks "does it do the thing right?" — performance, load, security, accessibility, usability, recoverability. A load test is not a "bigger" unit test; it answers a different question entirely.

The Same Behaviour at Three Altitudes

The types are complementary because each altitude fails for a different reason. Here is one pricing rule examined three ways:

# UNIT — the pricing rule in isolation (milliseconds; run thousands)
def test_gold_discount_rate():
    assert discount_for(tier="gold", total=100) == 15

# INTEGRATION — the rule wired to a real repository (seconds; run dozens)
def test_discount_uses_stored_tier(postgres):
    repo = CustomerRepository(postgres)
    repo.save(Customer(id=7, tier="gold"))
    engine = PricingEngine(repo)

    assert engine.discount_for_customer(7, total=100) == 15

# ACCEPTANCE — the behaviour a stakeholder asked for (minutes; run a few)
# Given a gold-tier customer
# When they check out a £100 basket
# Then the receipt shows a "£15 loyalty discount"
# (executable via pytest-bdd — see the TDD/BDD page)

// UNIT — the pricing rule in isolation (milliseconds; run thousands)
TEST_CASE("gold discount rate") {
    CHECK(discount_for("gold", 100) == 15);
}

// INTEGRATION — the rule wired to a real repository (seconds; run dozens)
TEST_CASE("discount uses stored tier") {
    CustomerRepository repo{postgres_url};
    repo.save(Customer{7, "gold"});
    PricingEngine engine{repo};

    CHECK(engine.discount_for_customer(7, 100) == 15);
}

// ACCEPTANCE — the behaviour a stakeholder asked for (minutes; run a few)
// Given a gold-tier customer
// When they check out a £100 basket
// Then the receipt shows a "£15 loyalty discount"
// (executable via cucumber-cpp — see the TDD/BDD page)

// UNIT — the pricing rule in isolation (milliseconds; run thousands)
@Test
void goldDiscountRate() {
    assertEquals(15, discountFor("gold", 100));
}

// INTEGRATION — the rule wired to a real repository (seconds; run dozens)
@Test
void discountUsesStoredTier() {
    var repo = new CustomerRepository(postgresUrl);
    repo.save(new Customer(7, "gold"));
    var engine = new PricingEngine(repo);

    assertEquals(15, engine.discountForCustomer(7, 100));
}

// ACCEPTANCE — the behaviour a stakeholder asked for (minutes; run a few)
// Given a gold-tier customer
// When they check out a £100 basket
// Then the receipt shows a "£15 loyalty discount"
// (executable via Cucumber-JVM — see the TDD/BDD page)

// UNIT — the pricing rule in isolation (milliseconds; run thousands)
[Fact]
public void GoldDiscountRate() =>
    Assert.Equal(15, DiscountFor(tier: "gold", total: 100));

// INTEGRATION — the rule wired to a real repository (seconds; run dozens)
[Fact]
public void DiscountUsesStoredTier()
{
    var repo = new CustomerRepository(PostgresUrl);
    repo.Save(new Customer(7, "gold"));
    var engine = new PricingEngine(repo);

    Assert.Equal(15, engine.DiscountForCustomer(7, total: 100));
}

// ACCEPTANCE — the behaviour a stakeholder asked for (minutes; run a few)
// Given a gold-tier customer
// When they check out a £100 basket
// Then the receipt shows a "£15 loyalty discount"
// (executable via SpecFlow — see the TDD/BDD page)

# UNIT — the pricing rule in isolation (milliseconds; run thousands)
RSpec.describe "gold discount rate" do
  it "is 15% of the total" do
    expect(discount_for(tier: "gold", total: 100)).to eq(15)
  end
end

# INTEGRATION — the rule wired to a real repository (seconds; run dozens)
RSpec.describe PricingEngine do
  it "uses the customer's stored tier" do
    repo = CustomerRepository.new(postgres_url)
    repo.save(Customer.new(id: 7, tier: "gold"))
    engine = PricingEngine.new(repo)

    expect(engine.discount_for_customer(7, total: 100)).to eq(15)
  end
end

# ACCEPTANCE — the behaviour a stakeholder asked for (minutes; run a few)
# Given a gold-tier customer
# When they check out a £100 basket
# Then the receipt shows a "£15 loyalty discount"
# (executable via Cucumber — see the TDD/BDD page)

If the unit test fails, the rule is wrong. If only the integration test fails, the wiring is wrong (the tier isn't being read from the database). If only the acceptance test fails, everyone built what nobody wanted. Deleting any layer doesn't remove the failures — it removes your ability to localise them.

The Test Pyramid (and Its Critics)

graph TD A["End-to-end / UI — few, slow, high confidence"] --> B["Integration & contract — some"] B --> C["Unit & property — many, fast, precise"] style A fill:#FF7B2A,color:#fff style B fill:#FFC857 style C fill:#2ECC71,color:#fff

The pyramid's advice is economic, not moral: put most tests where they are cheapest to write, fastest to run and easiest to diagnose [1]. The classic anti-pattern is the inverted "ice-cream cone" — hundreds of slow, flaky end-to-end tests over a hollow unit layer. But the pyramid is a heuristic, not scripture: teams whose risk lives in integration (heavy I/O, thin logic) legitimately fatten the middle layer [2].

A Question-Driven Field Guide

The question you're asking	Reach for
Is this logic correct for the cases I thought of?	Unit tests (example-based)
Is it correct for the cases I didn't think of?	Property-based tests (see TDD & BDD)
Did I exercise all the code's decision points?	Code path analysis and coverage criteria
Do my components actually talk to each other?	Integration tests (real database, real queue)
Will the other team's service still accept my messages?	Contract tests (Pact)
Does the whole thing work for a user?	A few end-to-end journeys
Did we build what the stakeholder meant?	Acceptance tests in their language (BDD)
Would my tests notice if the code were wrong?	Mutation testing
Is testing happening at the right project stage — including ethics?	The V model, with its third V for Values

Read the table top to bottom and a shape emerges: the types form a pipeline of doubt. Each one removes a category of uncertainty the others cannot see, and the last two audit the process itself — mutation testing checks the tests, the V model checks that the right conversations happened at the right time.

References

Fowler, M. (2012). "TestPyramid." https://martinfowler.com/bliki/TestPyramid.html
Vocke, H. (2018). "The Practical Test Pyramid." martinfowler.com. https://martinfowler.com/articles/practical-test-pyramid.html
ISTQB. Standard Glossary of Terms Used in Software Testing. https://glossary.istqb.org/