Reverse-Engineering Dawkins' Biomorphs

How we went from guesswork to verified fidelity — using algorithm auditing, binary archaeology, and genetic algorithm search.

The Insect: gene 7 = 3 (old range) vs gene 7 = 8 (real Dawkins value)

The Problem

Richard Dawkins' 1988 paper The Evolution of Evolvability contains roughly 50 biomorph figures — insects, ferns, trees, letters of the alphabet, echinoderms. These are the canonical examples that made the concept famous. But the paper never lists their genotypes.

When we built Biomorph Builder — an interactive reimplementation of Dawkins' system — we wanted to reproduce these exact figures. Our first attempt: stare at the paper, guess the gene values, tweak until it looks "close enough." The results were recognizable but unconvincing. The proportions were off. The branching angles were wrong. We were eyeballing 9-dimensional space.

This is the story of how we went from guesswork to verified fidelity.

Step 1: Verify the Algorithm

Before blaming the genes, we verified the algorithm. Dawkins published his Tree and PlugIn procedures in Pascal on page 207 of the paper:

procedure PlugIn(ThisGenotype: Genotype);
begin
  order := gene[9];
  dx[3] := gene[1]; dx[4] := gene[2]; dx[5] := gene[3];
  dx[1] := -dx[3]; dx[0] := -dx[4]; dx[2] := 0; dx[6] := 0; dx[7] := -dx[5];
  dy[2] := gene[4]; dy[3] := gene[5]; dy[4] := gene[6]; dy[5] := gene[7]; dy[6] := gene[8];
  dy[0] := dy[4]; dy[1] := dy[3]; dy[7] := dy[5];
end;

We compared this line-by-line against our JavaScript defineVectors function. The result: exact match. Our implementation uses array indices offset by +2 from Dawkins' (our indices 1–8 vs Dawkins' 0–7), but this is compensated by starting the recursion at index 4 instead of Dawkins' index 2. Every direction vector is identical.

The recursion logic (drawTree) also matches: same binary branching, same depth decrement, same direction accumulation. The algorithm isn't the problem.

Step 2: Discover the Real Gene Range

This was the breakthrough. Our initial implementation used gene ranges of [-3, 3] for genes 1–8 — a reasonable guess based on the paper's description and the fact that small values produce recognizable biomorphs. But something was wrong: Dawkins' famous Insect biomorph has gene 7 = 8. His Chess piece has gene 7 = 6. These are literally impossible in a [-3, 3] range.

The Trickle Factor

The key to understanding Dawkins' gene values is a variable called trickle that appears in the original Blind Watchmaker source code but is never mentioned in the paper. Here's how it works:

We expanded our gene ranges from [-3, 3] to [-9, 9], which covers >90% of Dawkins' original specimens. The visual difference was immediate and dramatic — the Insect's swept-back wings, previously invisible, suddenly appeared with full fidelity.

How we found the range
  1. Algorithm audit against Dawkins' published Pascal confirmed our code was correct
  2. Source code archaeology of the WatchmakerSuite repository (a preservation of Dawkins' original Macintosh program) revealed the trickle scaling factor
  3. Binary parsing of the zoo files provided ground truth effective gene values
  4. Statistical analysis of all 42 Exhibition zoo specimens showed effective ranges clustering in [-9, 9]

Step 3: Extract the Original Specimens

The WatchmakerSuite repository contains Dawkins' original saved biomorphs as binary files — the actual creatures he bred and curated in 1986–1988. These are the ground truth.

Binary File Format

Each biomorph is stored as a 40-byte "person" record in big-endian format:

BytesFieldDescription
0–17genes[1..9]16-bit signed integers (9 genes × 2 bytes)
18–19segNoSegment count
20–21segDistSegment spacing
22–23completenessSingle or Double
24–25spokesNorthOnly, NSouth, or Radial
26–39gradient genesPer-gene gradient factors

We parsed three collections:

All 74 extracted genotypes are stored in shared/dawkins-zoo.json and browsable in the Museum.

Example: The Insect
Raw genes:  [10, 10, -40, 10, -10, -20, 80, -40, 6]
Trickle:    10
Effective:  [ 1,  1,  -4,  1,  -1,  -2,  8,  -4, 6]

Gene 7 = 8 creates those iconic swept-back wings. Under our old [-3, 3] range, this was clamped to 3 — producing a completely different, much less dramatic shape.

Step 4: Automated Image Matching

For figures where we don't have the original binary data — or where we want to verify our matches against the paper's printed figures — we built an automated search using a genetic algorithm.

The Meta-Circularity

Dawkins' whole point was that cumulative selection — small random mutations filtered by a fitness criterion — can navigate enormous search spaces efficiently. We're using his method to find his specimens. The selection criterion changed from "looks interesting to a zoologist" to "looks like this specific picture" — but the evolutionary machinery is the same.

The Search Space

With the expanded [-9, 9] range: genes 1–8 have 19 values each, giving 198 ≈ 16.9 billion combinations. Multiply by 8 depth values and there are ~135 billion possible mode-1 genotypes. Far too large for exhaustive search. The GA is essential.

Scoring Functions: A Hard-Won Lesson

We tried three scoring approaches, each solving a different problem:

1. Intersection-over-Union (IoU): Score = |A ∩ B| / |A ∪ B|. Too strict — even a 1-pixel shift produces near-zero overlap. The fitness landscape is effectively flat, giving the GA no gradient to follow.

2. Chamfer Distance: For each black pixel in image A, compute the distance to the nearest black pixel in image B. Much better — captures "almost right" matches. Achieved a perfect 1.0 on clean ground truth. But scored only 0.065 on real PDF scans — blurry gray scan pixels don't match crisp 1px rendered lines.

3. Grayscale Blur + NCC: Both images converted to grayscale, inverted, blurred, then compared using normalized cross-correlation. Scored 0.795 on the same PDF scan where chamfer scored 0.065 — a 12× improvement. NCC is invariant to brightness scaling; the blur creates tolerance for thickness and position differences.

The optimal scoring function depends on the noise characteristics of the data. A function that works perfectly for clean-on-clean comparison can completely fail when the reference is noisy. This mirrors a deep truth about evolution: the fitness landscape's shape matters more than the search algorithm.

You can try the search yourself in the Gene Search tool.

Step 5: Populate the Paper

With all three data sources — binary archaeology, GA search, and manual fitting — we replaced the guessed genotypes throughout the interactive paper:

FigureDescriptionSource# Specimens
4Basic treeBasicTree preset (trickle=9)1
5Breeding screenBasicTree as parent9
6Mode 1 portfolioExhibition zoo #1–1510
10Segmented portfolioExhibition zoo #18–278
12Asymmetric segmentedExhibition zoo #16–266
13Radial portfolioExhibition zoo #28–396
14EchinodermsExhibition zoo #28–426

The visual improvement was dramatic. The old guessed specimens (all constrained to [-3, 3]) looked generic and repetitive. The real Dawkins specimens use the full [-9, 9] range and show the extraordinary morphological diversity that made biomorphs famous.

What We Learned

  1. The gene range was the real problem. We spent significant effort on sophisticated image-matching algorithms, only to discover that the fundamental issue was much simpler: our gene ranges were too narrow. Check your assumptions before optimizing your algorithms.
  2. Source code archaeology beats reverse engineering. The GA search was elegant and meta-circular, but parsing the actual binary files from Dawkins' original program gave us ground truth in minutes. When the original data exists, go find it.
  3. Hidden scaling factors change everything. The trickle variable isn't mentioned in the paper, doesn't appear in the published Pascal code, and only exists in the full program source. Documentation debt from 1988 is real.
  4. The fitness function is everything — and context-dependent. IoU made the landscape flat. Chamfer distance worked for clean references but failed on scans. Grayscale NCC handled noise. The optimal scoring function depends on the data's noise characteristics.
  5. Genotypic degeneracy is real even in simple systems. The GA found a correct phenotype via a different genotype — negated horizontal genes producing a mirror-identical biomorph. Multiple genotypes map to identical phenotypes. A real biological parallel: codon degeneracy in DNA.
  6. Selection really is powerful. With correct scoring, the GA found a perfect match in 20 generations (~6,000 evaluations) out of 46 million possible genotypes. That's 0.013% of the search space. Dawkins' central argument demonstrated by the very tool we built to study his work.
  7. Reference quality sets the ceiling. A blurry 170×160 crop from a 1988 paper scan doesn't contain enough information to uniquely identify a genotype. Higher-confidence results need either better references, the original binary data, or human-in-the-loop refinement.
  8. Dawkins' algorithm is remarkably compact. The entire generative procedure is ~20 lines of code. Yet it produces enough morphological diversity that 42 curated specimens fill six distinct figure portfolios, each with its own visual character.

Fidelity Summary

ComponentStatusNotes
defineVectorsExact matchVerified gene by gene against published Pascal
drawTreeExact matchSame binary recursion, direction accumulation
Gene ranges (g1–g8)[-9, 9]Covers >90% of Exhibition zoo
Depth range (g9)[1, 8]Matches original
Bilateral symmetryCorrectBuilt into vector definitions
SegmentationCorrectRepeat loop with segment count and spacing
GradientsSimplified2 gradient genes vs Dawkins' per-gene gradients
Radial symmetryApproximate4-way only (Dawkins also 4-way)