MS Word tracked changes API

Y

19 Nov 2025 • 2 min read

Documenting my test plan requirements and implementing.

1. In the testing script, the script should check that the original test clause is present after each document refresh and that the same original test clause is being passed to the AI LLM, before each test run. If not, throw an error and stop the loop.

2. After creating additional tests and passed, all previous tests should be re-run and passed (if not, debug until all tests passed). I recall there were unit tests and integration tests previously (28 original, 40+ after a few manual test runs). Add on to those, and number them so I know how many tests have been run.

3. Clear the existing logs. Going forward, each run in the logs should be numbered e.g. test run 1, test run 2 etc. with date and time stamps. It should also contain the diff-ed output from that run. Output the diff-ed output on this chat window too.

4. Add another final success condition: in addition to the 5 successful runs (where diff-ed output = AI LLM output), logs must confirm correct anchor behaviour.

5. Conditions for stopping the loop (hitting any condition should stop the loop):

A. Number of run loops (mentioned at point 3 above) hits 500.

B. Any error that indicates the Word document is no longer in a valid editable state e.g. rich API call failed, property not loaded, track changes mode not supported, failure to insert or delete content controls, failure to replace tokens or apply formatting, selection collapsed unexpectedly.

C. No new logs coming from MS Word machine detected in 5 minutes (timeout) after dev machine triggers new LLM run (allowed 1 retry after 2 minutes).

D. Same edit -> same LLM output -> same error repeatedly, or oscillation between two or more diff patterns (e.g. A -> B -> A -> Bcycles)

E. Identical log hashes for 6 retries. If system produces the exact log hash for 6 consecutive test runs.

Implemented features

1. Validation checks

Validates original test clause after document refresh
Validates original test clause before LLM call
Stops loop with error if validation fails

2. Test re-running logic

Created src/e2e/run-all-tests.js to run all unit and integration tests
Updated src/e2e/run-analysis.js to automatically run all tests after generating new test cases
Tests are numbered and tracked

3. Improved logging

Logs are cleared at start
Each run is numbered: "TEST RUN 1", "TEST RUN 2", etc.
Timestamps included (ISO format)
Diff-ed output logged and displayed in chat window (first 500 chars)
Full diff-ed output included in structured logs

4. Anchor behavior success condition

Added checkAnchorBehavior() function
Checks logs for anchor failures and fallback strategies
Success requires 5 consecutive passes AND correct anchor behavior
Resets consecutive passes if anchor behavior issues detected

5. Stop conditions

All stop conditions implemented:

A. Maximum runs: Stops at 500 test runs
B. Word document errors: Detects invalid state errors (property not loaded, API failures, track changes issues, etc.)
C. Timeout: Stops if no logs from Word machine for 5 minutes (with 1 retry after 2 minutes)
D. Oscillation: Detects same edit → same LLM output → same error patterns
E. Identical log hashes: Stops if same log hash appears 6 consecutive times

Additional improvements

Log hash tracking for oscillation detection
Error pattern tracking for debugging
Enhanced metadata in all log entries (testRunNumber, timestamps)
Better error messages with context