Install
openclaw skills install bookforge-characterization-test-writingWrite tests that pin down the actual current behavior of untested legacy code as a safety net for change. Use whenever a developer needs to create a regression safety net before modifying code — 'I don't know what tests to write', 'what should I test in legacy code', 'how do I write tests for code I didn't write', 'tests to preserve behavior', 'golden tests', 'snapshot tests for old code', 'characterization test', 'cover before modify'. Activates for 'legacy code testing', 'untested code', 'write tests before changing', 'regression safety net', 'behavior-preservation tests', 'pin down behavior'.
openclaw skills install bookforge-characterization-test-writingYou are about to change untested legacy code and need to know that your changes won't silently alter the system's existing behavior. You do not have specs or requirements documents you trust — or if you do, you know the code may have drifted from them. You need a behavioral safety net anchored to what the code actually does, not to what you hope it does.
This skill is Step 4 of the legacy-code-change-algorithm. Before you arrive here, change-effect-analysis should have told you where to write tests. This skill tells you how.
Do not use this skill to find bugs. Manual exploration is faster for bug discovery. Characterization tests are a change-detection mechanism, not a correctness oracle.
Before writing a single test, collect:
change-effect-analysis first.change-effect-analysis. This tells you which methods to prioritize.test-harness-entry-diagnostics before proceeding.Use the effect analysis output to prioritize. Feathers' three-level heuristic, in priority order:
Why this order matters: Starting too broad wastes effort characterizing code your change won't touch. Starting too narrow misses regression surface above the change point. The three levels give you the minimum safety net, nothing more.
Open or create a test file for the target class. Instantiate the class under test using the simplest construction path available.
// Example: minimal harness setup
void testInitialState() {
PageGenerator generator = new PageGenerator();
// harness is ready — proceed to Step 3
}
If construction requires dependencies that are difficult to create (database connections, external services, deeply coupled collaborators), this is a dependency problem — invoke test-harness-entry-diagnostics. Do not skip the harness step by writing untestable assertions.
Choose an expected value you are certain is wrong. Something obviously fake ("fred", -999, "PLACEHOLDER") works well. The point is not to guess the right answer — the point is to force the test runner to tell you the real answer.
// Deliberately wrong — "fred" is certainly not what generate() returns
void testGenerate_initialState() {
PageGenerator generator = new PageGenerator();
assertEquals("fred", generator.generate()); // We know this will fail
}
Why write an assertion you know is wrong? Because the test framework's failure message will contain the actual value. The failure IS the answer. This technique turns the test runner into a probe: you're not guessing the behavior, you're asking the code to show you.
Run the test. Do not fix it yet. Read the failure message carefully.
junit.framework.ComparisonFailure: expected:<fred> but was:<>
The but was: portion is the actual current behavior of the code under those conditions. Write it down.
Change the expected value to what the failure message reported. The test now documents one true fact about the system.
void testGenerate_initialState() {
PageGenerator generator = new PageGenerator();
assertEquals("", generator.generate()); // Documents: empty PageGenerator generates ""
}
Run the test again. It must pass. If it does not (e.g., the output is nondeterministic or environment-dependent), investigate before proceeding.
You will encounter behaviors during characterization that look like bugs. The code produces an output that seems clearly wrong — an off-by-one, an unexpected null, an exception where a value was expected.
Do not change the test expectation to the "correct" value. The test must document what the code actually does, not what it should do. Changing it to the intended behavior creates a test that fails immediately — a correctness test, not a characterization test.
Instead:
// SUSPECTED BUG: parse('') throws NullPointerException.
// Preserving actual behavior. See discovered-bugs.md entry #2.
@Test
public void testParse_emptyString_throwsNPE() {
assertThrows(NullPointerException.class, () -> parser.parse(""));
}
discovered-bugs.md (create this file in the test directory or working directory):
## Bug Candidate #2
- Method: CsvParser.parse(String)
- Input: empty string ""
- Actual behavior: throws NullPointerException
- Expected (likely correct): return empty list []
- Decision needed: fix before or after coverage is established?
- Risk if fixed now: unknown callers may depend on the exception
Why preserve the buggy behavior in the test? Because callers in legacy systems often depend on behaviors that look like bugs. A method that throws NPE on empty input may have callers that already catch that exception and route around it. Fixing the bug silently changes the callers' environment. Preserve, flag, and decide deliberately — after full coverage is in place and you can see the ripple effects.
Return to the target code. Ask: does what I've characterized so far cover every code path my change will touch?
Techniques for finding more cases:
if branch, every loop condition, every method call is a hint about what to exercise.Continue iterating (Steps 3–7) until you can answer yes to: "If my planned change silently alters behavior outside its intended scope, will at least one of these tests catch it?"
Review the full characterization test suite against the planned change:
You do not need complete coverage of the class. You need coverage sufficient for this change. Stop there.
| Input | Required | Description |
|---|---|---|
| Target class/method | Yes | The code you will change |
| Change description | Yes | What you plan to modify — defines coverage scope |
| Effect analysis output | Yes | From change-effect-analysis — test-placement plan or effect sketch |
| Test framework | Yes | Configured xUnit runner targeting the codebase |
| Output | Description |
|---|---|
| Characterization test suite | Source files containing passing tests that document actual behavior |
discovered-bugs.md | Log of flagged suspicious behaviors with context, preserved-but-suspicious test references, and decision notes |
"What the system DOES is more important than what it SHOULD do." In nearly every legacy system, the code has diverged from the original intent. Characterization tests document reality, not the requirements document. If you write tests based on what you think the code should do, you are writing bug-finding tests — a different and less useful activity for this context.
A deliberately-failing assertion is a diagnostic tool — the failure IS the answer. You are not guessing the expected value. You are asking the test runner to reveal it. The only investment is running the test twice: once to learn the actual value, once to confirm the updated assertion passes.
Discovered bugs go on paper, not into the corrected test.
Preserve actual behavior in the test. Flag the bug in discovered-bugs.md. Decide whether to fix it after full coverage is established and you can see downstream effects. This is not laziness — it is disciplined risk management in a system where callers may depend on buggy behavior.
Stop when coverage is sufficient for THIS change. The goal is not full system coverage. It is a safety net for a specific planned change. Overcharacterizing wastes effort on code paths your change will never affect. The effect analysis output defines the boundary.
Testing to detect change — not testing to show correctness. These are two different philosophies. Tests that show correctness assert the right answer. Tests that detect change assert the current answer and alert you when it shifts. Characterization tests use the second philosophy. This is intentional. The tests have no moral authority over the code — they just record what the pieces do, so you can see when something has moved.
You need to change PageGenerator.generate(). No tests exist. You don't know what it currently returns.
// Iteration 1: Probe with obviously-wrong value
void testGenerate_noData() {
PageGenerator generator = new PageGenerator();
assertEquals("fred", generator.generate());
}
// Failure: expected:<fred> but was:<>
// Update:
void testGenerate_noData() {
PageGenerator generator = new PageGenerator();
assertEquals("", generator.generate()); // Documents: returns "" with no data loaded
}
// Iteration 2: Feed data, probe again
void testGenerate_withBaseRow() {
PageGenerator generator = new PageGenerator();
generator.assoc(RowMappings.getRow(Page.BASE_ROW));
assertEquals("fred", generator.generate());
}
// Failure: expected:<fred> but was:<node><carry>1.1 vectrai</carry></node>
// Update:
void testGenerate_withBaseRow() {
PageGenerator generator = new PageGenerator();
generator.assoc(RowMappings.getRow(Page.BASE_ROW));
assertEquals("<node><carry>1.1 vectrai</carry></node>", generator.generate());
}
After two iterations, you have two tests that document exactly what PageGenerator.generate() produces under two conditions. Any change you make that accidentally alters either of these outputs will be caught immediately.
You are characterizing CsvParser.parse(String input). Your probe:
void testParse_emptyInput() {
CsvParser parser = new CsvParser();
assertEquals("fred", parser.parse("").toString());
}
// Failure: NullPointerException at CsvParser.parse(CsvParser.java:34)
The code throws NPE on empty input. This looks like a bug — most CSV parsers should return an empty list. What do you do?
// SUSPECTED BUG: throws NPE on empty input. See discovered-bugs.md #2.
void testParse_emptyInput_throwsNPE() {
assertThrows(NullPointerException.class, () -> new CsvParser().parse(""));
}
discovered-bugs.md:
## Bug Candidate #2 — CsvParser.parse("")
Actual: NullPointerException at CsvParser.java:34
Expected (likely): return Collections.emptyList()
Risk: 3 callers identified in OrderImporter, UserUploader, DataMigrationJob.
Decision: review callers before fixing. Mark for post-coverage fix.
You plan to extract FuelShare.addReading()'s inner if (lease.isMonthly()) block and move it to ZonedHawthorneLease.computeValue(). Before moving:
// Characterize the else branch (gallons >= CORP_MIN) — the path being moved
void testAddReading_gallonsAboveCorpMin_chargesAtRate() {
StandardLease lease = new StandardLease(Lease.MONTHLY);
FuelShare share = new FuelShare(lease);
share.addReading(FuelShare.CORP_MIN + 1, new Date());
assertEquals(expectedCost, share.getCost());
}
Choose inputs that force the else branch. Confirm the test exercises the conversion from priceForGallons(gallons) to totalPrice — use a debugger or sensing variable if uncertain. After extraction, run the same test against computeValue() directly to confirm the behavior survived the move.
references/legacy-code-fundamentals.md — core definitions: legacy code, seams, characterization testsreferences/test-harness-patterns.md — common patterns for getting legacy classes into test harnessesCC BY-SA 4.0 — BookForge Skills, derived from "Working Effectively with Legacy Code" by Michael C. Feathers. Skill interpretation, structure, and synthesis are original BookForge content.
Direct dependencies (run before this skill):
legacy-code-change-algorithm — this skill is Step 4 of that algorithmchange-effect-analysis — provides the effect sketch and test-placement plan that scopes this skill's workunit-test-quality-checker — validates the characterization tests you write here meet quality thresholdsInvoke if blocked:
test-harness-entry-diagnostics — when the target class cannot be instantiated in the test harness (Step 2 blocker)Next steps:
duplication-removal-via-extraction — apply after characterization tests are in place, when refactoring to remove duplication