Y
Testing Master Manual - Single Source of Truth
This document is the authoritative guide for all testing, architecture rules, and stability loop workflows.
Table of Contents
- Executive Summary & Objectives
- The Architecture
- The Trace Schema
- The Workflow (The "Loop")
- Coding Standards (The Strategy Pattern)
- Control API & Configuration
- Monitoring & Debugging
Executive Summary & Objectives
Project background: An original contract clause is sent to the LLM, which reviews it, and returns the revised clause as the output. The tool then applies the differences between the original and revised clauses as tracked changes using the MS Word API.
Goal: Stability Against Non-Deterministic LLM Output
The primary goal is Stability Testing - ensuring the system handles non-deterministic LLM output consistently and correctly.
Key Objectives:
- Validate document restoration - Ensure document is restored to clean state before each test
- Verify original test clause - Confirm same original clause is passed to LLM each time
- Achieve consistent results - Get 5 consecutive passes where diff-ed output matches LLM output
- Confirm anchor behavior - Verify correct anchor finding (no failures, no fallbacks)
- Record & Replay - Capture exact Word API behavior in traces for offline debugging
Concept: Record & Replay (Not Simulation)
Instead of traditional mocks, we use Record & Replay:
- Record:
WordAdapterrecords exact API inputs/outputs totrace-log.jsonduring live Word execution - Replay:
ReplayWordAdapterreplays these exact API calls offline to reproduce bugs deterministically - Stability: This handles non-deterministic LLM output by capturing the exact Word behavior that led to failures
Why Record & Replay?
- LLM Output is Non-Deterministic: The same stability test may produce different LLM outputs, making it unreliable for verification
- Word Behavior is Unpredictable: MS Word has hidden characters, edge cases, and behaviors that are impossible to simulate accurately
- Exact Reproduction: Traces capture exact Word API behavior, allowing deterministic bug reproduction
- No Guessing: Don't simulate Word behavior - replay exact reality from traces
- Offline Debugging: No need for live Word instance - traces can be replayed in Jest tests
- Fast Feedback: ReplayWordAdapter tests run quickly without Word overhead
- Regression Prevention: Traces can be replayed after code changes to verify fixes don't break existing behavior
Critical Principle: You are not writing tests to simulate the issue. You are writing tests to host the recording of the issue. This is the only way to reliably fix logic bugs caused by Word's unpredictability.
Key Concept: Coding Agent Controls Loop
CRITICAL: When MS Word shows "waiting for trigger", it means:
- ✅ The Word machine has paused and is waiting
- ✅ The coding agent (assistant) controls when the next iteration starts
- ✅ The loop will NOT auto-continue - it waits for the coding agent to trigger it
- ✅ The coding agent will automatically analyze logs, create Jest tests, fix code, verify tests pass, and trigger the next iteration
No manual intervention is needed - the coding agent handles everything automatically after each iteration completes.
The Architecture
Components
The stability testing system consists of three components:
- Word Add-in (taskpane.js): Runs stability loop, records traces on failure, pushes logs to dev server
- Dev Server (webpack.config.cjs): Receives logs, stores trace-log.json artifacts
- Coding Agent: Analyzes traces, uses ReplayWordAdapter to reproduce bugs offline, fixes code, triggers next iteration
The Loop: Trace Generation on Failure
Normal Flow
- Word Add-in runs stability test iteration
- WordAdapter executes Word API calls (with tracing enabled)
- Test completes → If pass, continue to next iteration
- If failure → Save trace to
trace-log.jsonand pause loop
Trace Generation
When a test failure occurs:
- WordAdapter has been recording all API calls to its internal
tracearray - On failure, the trace is serialized and saved to
logs/trace-log-{testRunNumber}.json - Trace format follows strict JSON schema (see below)
- Loop pauses and waits for coding agent to analyze and fix
ReplayWordAdapter Modes
Mode A: Replay (Stability Debugging)
When loadTrace() is called:
- All API calls are compared against trace entries
- Method name must match exactly
- Arguments must match exactly (Range objects compared by start/end/text)
- Returns result from trace (not computed)
- Throws TraceDeviationError on any mismatch
Mode B: Simulation (Legacy Fallback - Not Recommended)
When loadTrace() is never called:
- Uses string manipulation to simulate Word API behavior
- Preserves existing unit tests that don't use traces
- ⚠️ WARNING: Simulation mode is unreliable - always prefer trace replay for Stability Loop failures
- Useful for general integration testing without specific traces (but trace replay is preferred)
File Structure
word-ai-redliner/
├── src/
│ ├── lib/
│ │ ├── word-adapter.js # Records traces during live execution
│ │ └── replay-word-adapter.js # Replays traces offline
│ └── taskpane/
│ └── taskpane.js # Stability loop (enables tracing, saves traces on failure)
├── tests/
│ └── tests_index.md # Dynamic inventory of all test files (Must update this when adding tests)
├── logs/
│ ├── trace-log-1.json # Trace from test run 1 (if failure)
│ ├── trace-log-2.json # Trace from test run 2 (if failure)
│ ├── e2e-test-logs.json # Human-readable test logs
│ └── fix-logs.json # Code fix logs
└── webpack.config.cjs # Dev server (stores trace files)
The Trace Schema
Trace File Format
Trace files saved to logs/trace-log-{testRunNumber}.json contain complete test context:
{
"testRunNumber": 5,
"testId": "test-doc-1234567890-abc",
"originalText": "Original document text...",
"expectedText": "Expected LLM output text...",
"finalText": "Final document text after operations...",
"trace": [
{
"timestamp": 1703123456789,
"method": "searchWithinRange",
"args": [
{
"start": 0,
"end": 100,
"text": "Sample document text..."
},
"search query",
{
"ignorePunct": false,
"ignoreSpace": false
}
],
"result": [
{
"start": 10,
"end": 25,
"text": "search query"
}
]
},
{
"timestamp": 1703123456790,
"method": "insertTextAtRange",
"args": [
{
"start": 10,
"end": 25,
"text": "search query"
},
"replacement text",
"replace"
],
"result": null
},
{
"timestamp": 1703123456791,
"method": "getWholeDocumentText",
"args": [],
"result": "Sample document with replacement text..."
},
{
"timestamp": 1703123456792,
"method": "applyOperation",
"args": [
{
"start": 0,
"end": 100,
"text": "Sample document text..."
},
{
"type": "REPLACE",
"origText": "old text",
"newText": "new text",
"beforeContext": "context before",
"afterContext": "context after"
}
],
"result": {
"success": true,
"strategy": "PrimaryAnchor"
}
}
],
"timestamp": "2025-11-20T07:19:39.047Z"
}
Trace File Schema
Top-Level Fields:
testRunNumber: Test run number (used in filename:trace-log-{testRunNumber}.json)testId: Unique test identifieroriginalText: Original document text (for test setup - use withnew ReplayWordAdapter(originalText))expectedText: Expected LLM output (for verification)finalText: Final document text after operations (for comparison)trace: Array of trace entries with exact API calls (use withadapter.loadTrace(trace))timestamp: When trace was saved (ISO format)
Trace Entry Schema (within trace array)
Each entry in the trace array records exact API inputs and outputs:
timestamp: Unix timestamp in millisecondsmethod: Method name (e.g., "searchWithinRange", "insertTextAtRange")args: Array of serialized arguments- Range objects are serialized as
{start, end, text} - Primitives (strings, numbers, booleans) are preserved as-is
- Objects are JSON-serialized
- Range objects are serialized as
result: Serialized result- Range objects are serialized as
{start, end, text} - Arrays of ranges are arrays of serialized ranges
- Errors are recorded as
{error: "message", code: "errorCode"} nullorundefinedfor void methods- For
applyOperationmethod: Result includes{success: boolean, strategy: string}when anchor is foundsuccess:trueif operation succeeded,falseif anchor not foundstrategy: Name of the strategy that found the anchor (e.g., "PrimaryAnchor", "FuzzyScanNormalized")- Rationale: Logging which strategy succeeded enables observability and prevents "silent degradation"
- Why Critical?: If
_searchByFuzzyScanstarts matching 90% of anchors, it indicates our Primary strategy is broken, even if tests are passing
- Range objects are serialized as
Trace Recording in WordAdapter
Enabling Tracing
const wordAdapter = new WordAdapter();
// Enable tracing before test run
wordAdapter.enableTracing();
// Run test operations...
await wordAdapter.searchWithinRange(range, query, options);
await wordAdapter.insertTextAtRange(range, text, 'replace');
// On failure, save trace
const trace = wordAdapter.getTrace();
fs.writeFileSync('logs/trace-log-5.json', JSON.stringify(trace, null, 2));
Automatic Recording
All public methods in WordAdapter automatically record traces when tracingEnabled is true:
searchWithinRange()insertTextAtRange()deleteRange()getRangeText()getWholeDocumentText()applyOperation()- And all other public methods
Trace Deviation Errors
When ReplayWordAdapter detects a mismatch:
TraceDeviationError: Trace deviation at index 5: searchWithinRange() called with unexpected arguments
methodName: 'searchWithinRange'
expected: { start: 0, end: 100, text: '...' }
actual: { start: 0, end: 150, text: '...' }
traceIndex: 5
This indicates:
- Expected: What was recorded in the trace
- Actual: What the code is trying to call now
- Root cause: Code behavior changed, or trace is from different scenario
The Workflow (The "Loop")
Step 1: Failure in Word (Trace Generation)
When a stability test fails:
- WordAdapter has been recording all API calls
- Trace is saved to
logs/trace-log-{testRunNumber}.json - Loop pauses and waits for coding agent
Step 2: Pause & Agent Takeover
The loop MUST pause after each iteration and will NOT auto-continue on its own.
The loop:
- Calls
/api/e2e-loop/pauseafter each iteration - Polls
/api/e2e-loop/statusevery 2 seconds - Waits indefinitely until coding agent triggers next iteration
- Will NOT auto-continue if server is unreachable
Step 3: Offline Reproduction (Loading Traces into ReplayWordAdapter)
The coding agent uses ReplayWordAdapter to reproduce the bug offline:
import ReplayWordAdapter from './src/lib/replay-word-adapter.js';
import { applyAmendment } from './src/lib/diff-orchestrator.js';
import fs from 'fs';
// Load trace (includes originalText, expectedText, and trace array)
const traceData = JSON.parse(fs.readFileSync('logs/trace-log-5.json', 'utf8'));
// Create ReplayWordAdapter with initial document state from trace
const adapter = new ReplayWordAdapter(traceData.originalText);
// Load trace array for Replay Mode
// CRITICAL: This replays EXACT Word behavior, not simulated behavior
adapter.loadTrace(traceData.trace);
// Now replay the exact operations that caused the failure
const bodyRange = {
start: 0,
end: traceData.originalText.length,
getText: () => adapter.document
};
// Replay exact API calls from trace
await applyAmendment(
bodyRange,
traceData.originalText,
traceData.expectedText,
null,
adapter
);
// Verify final state matches expected
const finalText = await adapter.getWholeDocumentText();
expect(finalText).toBe(traceData.expectedText);
Key Points:
- ✅ Load trace file which includes
originalText,expectedText, andtracearray - ✅ Use
traceData.originalTextfor initial document state - ✅ Use
traceData.trace(nottraceDataitself) forloadTrace() - ✅ Replay exact operations - don't simulate or guess Word behavior
Step 4: The Fix (Reference Coding Standards Below)
CRITICAL: All fixes to anchor finding logic MUST follow the Strategy Pattern. Ad-hoc patching is forbidden.
4.1: Identify the Pattern
Analyze the trace to understand why the anchor search failed:
- Punctuation issues? (e.g., Word ignores punctuation in search)
- Spacing issues? (e.g., missing spaces, malformed text)
- Missing context? (e.g., operation has no beforeContext/afterContext)
- New Word behavior? (e.g., Word splits words across table cells)
- Malformed text? (e.g., camelCase words stuck together from previous operations)
4.2: Select or Create Strategy
Option A: Tune Existing Strategy
If an existing Strategy (e.g., _searchByPrimaryAnchor) should have caught it but didn't:
- Identify the gap: What specific condition did the strategy miss?
- Enhance the strategy: Add logic to handle the new condition within that strategy method
- Verify: Ensure the trace replay passes with the enhanced strategy
Example: If _searchByPrimaryAnchor fails because of malformed text, add normalization logic to that strategy (not a new fallback).
Option B: Create New Strategy
If the failure represents a completely new class of issue:
- Create new private method:
_searchBy[Name]()following the naming convention - Implement specific logic: Handle only this new class of issue
- Fail fast: Return
nullimmediately if strategy doesn't apply - Return strategy name: Return
{ range: result, strategy: 'StrategyName' }ornull
Example: If Word splits words across table cells, create _searchAcrossTableCells() strategy.
4.3: Register Strategy
Add the new method to the strategies pipeline array in word-adapter.js:
// In _findAnchorInContext method
const strategies = [
this._searchByPrimaryAnchor,
this._searchByContextCombination,
this._searchByOrigTextOnly,
this._searchByFuzzyScan,
this._searchByInsertionPoints,
this._searchAcrossTableCells // ← New strategy added here
];
Placement: Add strategies in order of preference (most specific → least specific).
4.4: Verify Fix
- Replay trace: Load trace with
ReplayWordAdapter.loadTrace(traceData) - Run Jest test: Ensure trace replay passes
- Check strategy name: Verify the correct strategy is being used (check trace logs)
- Run all tests:
npm testto ensure no regressions
Success Criteria:
- Trace replay passes
- Strategy name is logged correctly
- No regressions in existing tests
No Second Round of Live Testing Needed: Once trace replay succeeds and Jest tests pass, the fix is considered verified. The next stability iteration will naturally test the fix in the live Word environment, but we don't need to wait for a second successful run because LLM output variance makes it unreliable as a verification step.
Step 5: Verification & Trigger Next
After fixes are verified:
- Coding agent triggers next iteration via
/api/e2e-loop/trigger - Loop continues with fixed code
- Process repeats until 5 consecutive passes
Coding Agent Responsibilities
The coding agent (assistant) is fully responsible for the following tasks after each stability test iteration:
- Analyze Logs: Read
logs/e2e-test-logs.jsonto identify failures and root causes - Load Trace: If failure occurred, load
logs/trace-log-{testRunNumber}.jsonto get exact Word API behavior - Reproduce with ReplayWordAdapter (Trace Replay):
- Create
ReplayWordAdapterinstance with initial document state - Call
loadTrace(traceData)to enable Replay Mode - CRITICAL: Do NOT simulate or guess Word behavior - always replay from traces
- Reproduce the bug offline by replaying exact API calls from trace
- Write Jest test that wraps the trace file
- Create
- Fix Code: The coding agent MUST fix the code (WordAdapter and related logic) until trace replay succeeds
- Verify Fix: Replay trace again to ensure fix works, then run
npm testto ensure no regressions - Automatically Trigger Next Iteration: After verification, the coding agent MUST automatically call
/api/e2e-loop/trigger
Critical Requirements:
- The coding agent is responsible for creating all Jest tests - no manual test creation needed
- The coding agent MUST use trace replay, not simulation - always load traces with
loadTrace()and replay exact Word behavior - The coding agent MUST NOT simulate or guess Word behavior - always replay from traces captured during failures
- The coding agent is responsible for fixing all code - fixes are applied automatically
- Fixes cannot involve hardcoding of specific words or tokens - fixes must be able to generalize to other clauses
- The coding agent is responsible for verifying Jest tests pass - verification happens automatically
- The user does NOT need to manually trigger the next iteration - the coding agent handles everything automatically
Note: All these steps are performed automatically by the coding agent. The user does not need to manually create tests, fix code, verify tests, or trigger iterations - the coding agent handles the entire workflow automatically.
Coding Standards (The Strategy Pattern)
Anti-Pattern (Forbidden)
DO NOT use ad-hoc patching or numbered fallbacks:
❌ Forbidden Patterns:
- Numbered fallbacks (e.g., "Fallback 2a", "Fallback 3h")
- Nested if/else blocks for specific edge cases inside main methods
- Ad-hoc boolean flags (
if (specialCase) { ... }) scattered throughout code - Inline special-case handling within
_findAnchorInContextor similar methods
Why Forbidden?
- Creates "spaghetti code" that becomes unmaintainable
- Makes debugging nearly impossible (non-linear logic flow)
- Leads to silent degradation (fixes mask root causes)
- Violates single responsibility principle
Requirement: Standalone Strategy Methods
Every search logic modification MUST be implemented as a Standalone Strategy Method.
✅ Required Pattern:
- Each search strategy is a private method with a single, clear responsibility
- Strategies are named by intent/logic, not by history or order
- Strategies return
{ range: Range | null, strategy: string }ornull - Strategies are registered in an ordered pipeline array
Naming Convention
Strategies must be named by intent/logic, not by history:
❌ BAD:
tryFallback4()
tryFallback2a()
_searchByFallback3h()
✅ GOOD:
_searchByPrimaryAnchor() // Searches full anchor (before+orig+after)
_searchByContextCombination() // Searches context combinations
_searchByOrigTextOnly() // Searches origText alone
_searchByFuzzyScan() // Normalized text, wildcards, partial matches
_searchByInsertionPoints() // INSERT-specific logic
_searchAcrossTableCells() // Handles Word splitting words across table cells
Rationale: Strategy names should describe what they do, not when they were added or where they appear in the fallback chain.
Strategy Pipeline
All strategies are registered in a single, ordered array in _findAnchorInContext:
const strategies = [
this._searchByPrimaryAnchor, // Most specific - try first
this._searchByContextCombination, // Context combinations
this._searchByOrigTextOnly, // OrigText alone
this._searchByFuzzyScan, // Fuzzy matching
this._searchByInsertionPoints // INSERT-specific
];
for (const strategy of strategies) {
const result = await strategy.call(this, selectionRange, op, context);
if (result && result.range) {
return result; // Return both range and strategy name
}
}
Benefits:
- Linear, predictable execution flow
- Easy to add new strategies (just add to array)
- Easy to reorder strategies (change array order)
- Each strategy is isolated and testable
- Strategy name is logged and recorded in traces for observability
Observability Requirement
CRITICAL: Every strategy MUST return its name so we can track which strategy succeeded:
// ✅ GOOD: Returns strategy name
return { range: result, strategy: 'PrimaryAnchor' };
// ❌ BAD: Returns only range
return result;
Why? If _searchByFuzzyScan starts matching 90% of anchors, it indicates our Primary strategy is broken, even if tests are passing. This prevents "silent degradation."
Strategy Template
Use this template when creating a new strategy:
/**
* Strategy: [Descriptive Name]
* Context: Handles cases where [specific Word behavior or edge case]
*
* @private
* @param {Object} selectionRange - Word Range to search within
* @param {Object} op - Operation object
* @param {Object} context - Word context
* @returns {Promise<Object|null>} Range object or null, with strategy name if found
*/
async _searchBy[Name](selectionRange, op, context) {
const logger = (await import('./logger.js')).default;
const MAX_SEARCH_LENGTH = 250;
// 1. Check preconditions (fail fast if strategy doesn't apply)
if (!this._isApplicable(op)) {
return null; // Strategy doesn't apply - return immediately
}
// 2. Execute specific search logic
try {
// Build search query
const query = this._buildQuery(op);
// Truncate if needed (use shared helper)
const truncatedQuery = this._truncateAnchor(query, MAX_SEARCH_LENGTH);
// Search with fallback (use shared helper)
const result = await this._searchWithFallback(selectionRange, truncatedQuery, context);
if (result) {
logger.debug(`[Strategy: [Name]] Found anchor using [description]`);
return { range: result, strategy: '[Name]' }; // Return strategy name!
}
} catch (error) {
logger.debug(`[Strategy: [Name]] Error: ${error.message}`);
}
// 3. Return null if not found
return null;
}
Key Principles
- Fail Fast: Return
nullimmediately if strategy doesn't apply - Use Shared Helpers: Always use
_truncateAnchor()and_searchWithFallback() - Return Strategy Name: Always return
{ range, strategy }format (not justrange) - Log Strategy Name: Include
[Strategy: Name]prefix in debug logs - Single Responsibility: Each strategy handles ONE specific class of issue
Control API & Configuration
Endpoints
GET /api/e2e-loop/status
- Returns:
{ canProceed: boolean, waitingForTrigger: boolean, lastIteration: number } - Used by: Loop polling for trigger
POST /api/e2e-loop/trigger
- Action: Sets
canProceed = true,waitingForTrigger = false - Used by: Coding agent to trigger next iteration
POST /api/e2e-loop/pause
- Action: Sets
canProceed = false,waitingForTrigger = true - Used by: Loop to pause after each iteration
POST /api/trace-log
- Action: Receives trace logs from Word add-in
- Stores: Trace files to
logs/trace-log-{testRunNumber}.json
POST /api/fix-log
- Action: Receives fix logs from coding agent
- Stores: Fix logs to
logs/fix-logs.json
Fix Log Schema:
Each fix entry must follow this JSON structure:
{
"timestamp": "ISO-8601 String",
"level": "INFO",
"message": "Code Fix Applied",
"metadata": {
"type": "fix-applied",
"file": "src/lib/word-adapter.js",
"issue": "Description of the bug",
"fix": "Description of the strategy added",
"testRunNumber": 123,
"strategyName": "_searchByTableCells"
}
}
Required Fields:
timestamp: ISO-8601 formatted timestamp stringlevel: Log level (typically "INFO")message: Fixed message "Code Fix Applied"metadata.type: Must be "fix-applied"metadata.file: File path that was modifiedmetadata.issue: Description of the bug being fixedmetadata.fix: Description of the fix/strategy appliedmetadata.testRunNumber: Test run number when fix was applied (optional but recommended)metadata.strategyName: Name of strategy method added (optional but recommended)
POST /log
- Action: Receives general logs from Word add-in
- Stores: Logs to
logs/e2e-test-logs.json
GET /logs
- Returns: All logs from
logs/e2e-test-logs.json
POST /logs/clear
- Action: Clears logs (one-time at start)
Loop Flow
1. Loop runs test iteration
2. Loop pauses (calls /api/e2e-loop/pause) - CRITICAL: Loop MUST pause after each iteration
3. Loop polls /api/e2e-loop/status every 2 seconds (waits until triggered)
4. Coding agent automatically:
a. Analyzes logs from logs/e2e-test-logs.json
b. Creates Jest tests (ReplayWordAdapter with trace replay) to reproduce issues found in Stability Loop logs
c. Fixes code until Jest tests pass (offline verification)
d. Automatically calls /api/e2e-loop/trigger after Jest tests pass
5. Loop detects canProceed = true
6. Loop continues to next iteration
IMPORTANT: The loop will NOT auto-continue on its own - it waits for the coding agent.
However, the coding agent will AUTOMATICALLY trigger the next iteration after:
- Analyzing logs
- Creating Jest tests
- Fixing code
- Verifying Jest tests pass
No manual intervention is needed - the coding agent handles the entire workflow automatically.
The loop will NOT auto-continue if the server is unreachable - it waits until the coding agent can trigger it.
Monitoring & Debugging
Key Metrics to Monitor
- Test Run Count: Should increment sequentially
- Consecutive Passes: Should increment on success, reset on failure
- Validation Pass Rate: Percentage of successful validations
- Anchor Behavior: Number of anchor failures per run
- Strategy Distribution: Which strategies are succeeding (from trace logs)
- Log Hash Patterns: Detect identical runs
- Error Patterns: Track common error types
- Fix History: Track what fixes were attempted and their outcomes
Debugging Tools
- Log Analysis:
src/e2e/analyze-logs.js - Test Runner:
src/e2e/test-runner.js - Show Clean Copy:
src/e2e/show-clean-copy.js - Trigger Control:
src/e2e/trigger-next-iteration.js - Run All Tests:
src/e2e/run-all-tests.js - Fix Logging:
src/e2e/log-fix.js(for coding agent to log fixes)
Success Criteria Summary
✅ Test passes when:
- Diff-ed output (clean copy) exactly matches LLM output
- No anchor failures detected
- Primary strategies preferred; secondary strategies (fuzzy/scan) logged but accepted
- Document restored cleanly (no tracked changes)
- Original test clause validated correctly
✅ Overall success when:
- 5 consecutive test runs pass
- All 5 runs have correct anchor behavior
- No stop conditions triggered
Stop Conditions
The loop stops when:
- A. Max Runs:
testRunNumber > 500 - B. Word Errors: Invalid document state errors detected
- C. Timeout: No logs for 5 minutes (after retry)
- D. Oscillation: Same edit → same LLM output → same error pattern repeats
- E. Identical Hashes: Same log hash appears 6 consecutive times
Summary
This manual consolidates all testing documentation into a single source of truth. Key principles:
- Record & Replay: Always replay exact Word behavior from traces, never simulate
- Strategy Pattern: All fixes must use standalone strategy methods, no ad-hoc patching
- Coding Agent Control: The agent automatically handles the entire workflow
- Observability: Strategy names are logged to detect silent degradation
- Trace-Based Debugging: All failures are debugged offline using recorded traces
For test file organization and Jest test details, see tests/tests_index.md.