Data Structures as Behavioural Contracts
Students meet ArrayList, HashMap and TreeSet as a menu of options and pick whichever appeared in the last example they read. That framing misses what a collections library actually is: a set of behavioural contracts (interfaces) with multiple implementations, each occupying a deliberate point in an algorithmic trade-off space. Choose by contract and cost, not by habit.
The Interface Contract
List<String> list = new ArrayList<>(); // why not ArrayList<String> list = ...?
The left-hand side is a promise of behaviour: ordered, indexable, duplicates allowed. The right-hand side is one way of keeping that promise. Declaring against the interface means every caller depends only on the promise — so when profiling later shows you need a LinkedList, or a test needs an immutable List.of(...), nothing but one line changes. The collections framework is the standard library's masterclass in the dependency-inversion habit from SOLID:
| Contract | The promise | Common implementations |
|---|---|---|
List<T> | Ordered, positional access, duplicates | ArrayList, LinkedList |
Set<T> | No duplicates (per equals) | HashSet, LinkedHashSet, TreeSet |
Map<K,V> | Key → value, unique keys | HashMap, TreeMap, ConcurrentHashMap |
Deque<T> | Efficient both-ends access | ArrayDeque, LinkedList |
Big-O as a Design Metric
Complexity notation belongs in coding decisions, not just maths lectures — and the mechanism is always memory layout. An array is one contiguous block: index \(i\) is at address \(\text{base} + i \times \text{size}\), one multiplication away — \(O(1)\) — and iteration streams through CPU cache beautifully. A linked structure is nodes scattered across the heap: reaching element \(i\) means walking \(i\) pointers — \(O(n)\) — and each hop is a probable cache miss.
| Operation | ArrayList | LinkedList | HashMap | TreeMap |
|---|---|---|---|---|
| Access by index / key | \(O(1)\) | \(O(n)\) | \(O(1)\) average | \(O(\log n)\) |
| Search by value | \(O(n)\) | \(O(n)\) | — | — |
| Insert / delete at end | \(O(1)\) amortised | \(O(1)\) | \(O(1)\) average | \(O(\log n)\) |
| Insert / delete in middle | \(O(n)\) (shift) | \(O(1)\) once positioned (+ \(O(n)\) to position) | — | — |
| Ordered iteration | insertion order | insertion order | no order | sorted, \(O(n)\) |
Note the honest reading of LinkedList's famous \(O(1)\) insertion: it applies after you have walked to the position. For most workloads ArrayList wins even at "linked-list-shaped" tasks, because contiguous memory and the cache dominate constant factors. Big-O chooses the shape; the memory hierarchy chooses among close calls — measure when it matters.
The Mechanics of Hashing
A HashMap promises \(O(1)\) lookup through a beautiful trick: hashCode() maps a key to a bucket index, and only that bucket's (tiny) contents are compared with equals(). The trick rests on a contract between two methods you may not have known were related:
- If
a.equals(b)thena.hashCode() == b.hashCode()— always. - Unequal objects should spread across hash values (quality, not correctness).
- An object's hash must not change while it sits in a hash-based collection.
// The classic disaster: equals overridden, hashCode forgotten
public class Ticket {
private final String id;
@Override public boolean equals(Object o) {
return o instanceof Ticket t && t.id.equals(id);
}
// no hashCode(): two "equal" Tickets land in DIFFERENT buckets
}
Set<Ticket> seen = new HashSet<>();
seen.add(new Ticket("T-1"));
seen.contains(new Ticket("T-1")); // false! The set "loses" the ticket.
Break rule 1 and hash collections silently malfunction — duplicates accumulate, lookups miss. (A record generates both methods consistently; another argument for records.) Break rule 2 — many keys, few hash values — and every key lands in the same bucket: the bucket's contents degrade into a linear scan and your \(O(1)\) lookup quietly becomes an \(O(n)\) crawl (modern Java partially rescues this by treeifying large buckets to \(O(\log n)\), but the fix for bad hashing is better hashing). Rule 3 is the sting in the tail: mutate a field that feeds hashCode() while the object is inside a HashSet, and the object is now filed under a hash it no longer has — unfindable, unremovable. Immutable keys (see Advanced Java) make rule 3 unbreakable by construction.
Choosing in Practice
- Start with the contract the caller needs (ordered? unique? keyed? sorted?), then pick the cheapest implementation that keeps it.
- Default to
ArrayList,HashMap,HashSet; move toTree*only when you need sorted iteration or range queries; reach forArrayDequefor stacks/queues. - Let Big-O veto, not elect: it rules out catastrophic choices (
containson a hugeListin a loop); among survivors, the cache and the profiler decide.