Tokio's Famous Deadlock Pattern, Caught and Fixed in an Async Runtime
/deadlock-finder-and-fixer ran an exhaustive audit pass over asupersync (Jeffrey's Tokio replacement) and filed 8 skill-tagged beads: 5 findings of specific concurrency hazards (each with a shipped fix in git) and 3 clean audits over whole subsystems. The headline finding (`asupersync-0x7fdb`, watch::send_modify calling a user closure while holding the write lock) is the canonical Rust async deadlock pattern. It was found in the project's own code and fixed via a clone-modify-update refactor in commit `3a6ad1ea8`.
Tokio's Famous Deadlock Pattern, Caught and Fixed in an Async Runtime
"I am increasingly delivering very big, sophisticated skills like the saas security audit skill, the deadlock finder skill, the tax skill, etc."
-- @doodlestein on X (2026-04-16)
1) The Hook
There is a canonical concurrency bug in the Rust async ecosystem. watch::send_modify calls a user-supplied closure while the channel's internal write-lock guard is held. If that closure ever touches another watch channel that someone else is also writing, you get a textbook AB-BA deadlock. The pattern is well-known and explicitly called out in Tokio's watch module documentation.
Asupersync (Jeffrey's Tokio replacement, the runtime under several FrankenSuite projects) shipped its own version of the same bug. The deadlock-finder-and-fixer skill caught it, filed bead asupersync-0x7fdb with the title [deadlock-audit] watch::send_modify runs caller closure inside value.write(), and a fix shipped (3a6ad1ea8) refactoring send_modify to a clone-modify-update pattern.
That same skill ran an exhaustive audit pass over asupersync and produced eight tagged beads in total: five finding specific concurrency hazards (each with a corresponding fix in git) and three reporting clean audits over whole subsystems. The three clean audits matter as much as the five findings; they're the proof that someone walked the code and didn't just stop at the first scary-looking pattern.
2) The Challenge
Concurrency bugs are the worst category of software defect. They survive entire test suites green. They emerge under load in production, often months after the offending commit. They produce hung processes, flaky tests, lost wakeups, livelocks, silent message drops, and database "is locked" timeouts that nobody can reproduce on demand. The community has decades of literature about lock-ordering, await-holding-lock, lost-wakeup, reader-starvation, and the dozen other named pitfalls. None of that literature walks YOUR code looking for YOUR specific instances.
Manual audit gets you a few good hits and then runs out of attention budget. Static analyzers that flag every Mutex::lock() produce hundreds of false positives. Dynamic tools (TSAN, lockdep) catch what they observe, which is whatever your tests happen to exercise.
3) The Discovery
/deadlock-finder-and-fixer is a structured audit workflow rather than a generic "look for bugs" prompt. The SKILL.md opens with two explicit rules that govern every finding:
The Universal Rule. When you think you found the deadlock and fixed the three instances you could see, there is almost always a fourth. Keep searching until you can prove exhaustively, by code audit, that no hazard remains.
The False-Positive Rule. Every finding must survive: "Can I construct a concrete interleaving of real threads that reaches this state?" If you cannot, it is not a bug; it is a pattern match.
Around those rules sits a Symptom Triage Table mapping observed runtime behavior to bug class, plus separate sections for each class:
| Observed symptom | Bug class |
|---|---|
Process 0% CPU, threads in futex_wait / __lll_lock_wait | Classic AB-BA / self-deadlock |
Async tasks pending, all tokio workers in epoll_wait | Mutex held across .await, channel cycle |
| 100% CPU, futex spam, no progress | Livelock, retry storm, broken condvar |
database is locked, SQLITE_BUSY, timeouts | SQLite WAL contention, long-transaction writer fight |
Hang during library load, strlen / malloc hangs | LD_PRELOAD / runtime-init reentrancy |
The runtime triage half of the workflow lives in the gdb-for-debugging sibling skill (gdb backtraces of all threads, lock-graph construction, async runtime analysis, TSAN/rr). This skill handles the parts that don't need a running process: taxonomy, static-audit discovery, fix catalog, prevention-by-design.
Importantly, the skill files its findings as beads with explicit [deadlock-audit] or [deadlock-finder] tags, so future code review and AI agents can grep for them. Clean audits are filed too. A bead titled "src/sync/* + src/channel/* deep audit complete — no findings" is a positive assertion that someone audited those files and didn't find issues, which is a real piece of evidence about the code's safety.
4) The Transformation
The canonical worked example is asupersync-0x7fdb. The skill walked src/channel/watch.rs, found the closure-under-write-lock pattern, and filed the bead with that exact wording. The branch tag [br-asupersync-0x7fdb] was used by the agent that took the bead and shipped the fix. Commit 3a6ad1ea8 (touching src/channel/watch.rs):
[br-asupersync-0x7fdb] Fix watch::send_modify deadlock by avoiding closure under write lock
Refactored send_modify to use clone-modify-update pattern instead of calling
user closure while holding the write lock. This prevents deadlocks when user
closures try to access other watch channels.
Old implementation: f(&mut guard.0) called while holding write lock
New implementation: Clone value, call f() without locks, then atomically update
The doc comment on send_modify was updated in the same commit to spell out the new contract for users:
To avoid deadlocks, this method clones the current value, releases the lock, applies the closure to the clone, then reacquires the lock to update the value. This prevents user closures from running while holding the write lock.
That's the canonical shape: bead with a precise location and pattern name, branch tag tying the fix back to the bead, fix that adopts a known-good pattern (clone-modify-update) instead of just sprinkling more locks, doc comment that documents the new invariant for future readers.
5) The Results
Eight skill-attributed beads on asupersync, all closed:
Findings (the skill caught a real concurrency hazard):
| Bead | Finding |
|---|---|
asupersync-0x7fdb | [deadlock-audit] watch::send_modify runs caller closure inside value.write() |
asupersync-df28bg | [deadlock-audit] CrashController emits evidence while holding controller state lock |
asupersync-ryqcl6 | [deadlock-audit] SharedIoDriver invokes on_event while holding inner mutex |
asupersync-xgujaf | [deadlock-audit] runtime/state.rs:2556-2558 read-then-write on cancel_waker is non-atomic TOCTOU |
asupersync-iwqn3q | [deadlock-finder] src/lab/runtime.rs:1801-1808 lock-ordering hazard: scheduler.lock held across cx_inner.read() |
Clean audits (the skill ran and proved no hazards in those modules):
| Bead | Outcome |
|---|---|
asupersync-3sduke | [deadlock-finder] src/sync/* + src/channel/* deep audit complete — no findings |
asupersync-drggpw | [deadlock-finder] Sweep async cancel paths for await-holding-lock across channel/, sync/, obligation/, combinator/ |
asupersync-zhrk5y | [deadlock-audit] sync/* + scheduler/three_lane.rs + runtime/state.rs — clean except 1 TOCTOU finding |
The TOCTOU finding from asupersync-zhrk5y is the same one filed separately as asupersync-xgujaf, which is what the False-Positive Rule looks like in practice: the skill reports the broader area as clean and surfaces the single real hazard inside it as its own bead, instead of flooding the project with pattern-match noise.
Every one of the five findings has a corresponding fix in git log:
| Finding bead | Fix commit |
|---|---|
asupersync-0x7fdb | 3a6ad1ea8 (clone-modify-update; explicit [br-asupersync-0x7fdb] tag) |
asupersync-df28bg | 15da98895 (CrashController drops state lock before evidence emission; bead closed with close_reason: "Already fixed in 15da98895") |
asupersync-ryqcl6 | 99043ae8e fix(io_driver): prevent deadlock in on_event callbacks |
asupersync-xgujaf | 12187f2a4 fix(runtime/state): atomic single-write clear of cancel_waker on task completion [br-asupersync-xgujaf] |
asupersync-iwqn3q | dc69ed4e8 fix(lab/runtime): hoist cx_inner.read() out of scheduler.lock() scope to repair lock-ordering inversion [br-asupersync-iwqn3q] |
None of these fixes are macro one-liners; each one rewrites the offending code path to a known-good concurrency idiom (wake-outside-lock, clone-then-modify, atomic compare-exchange instead of read-then-write, hoisting an inner lock out of the scope of an outer one). One bead (df28bg) was retroactively credited to a prior commit that already fixed the issue. That's what the bead system is supposed to do: record the audit finding even when the fix landed first.
In context, asupersync's .beads/issues.jsonl carries dozens of additional concurrency-class beads filed across the project's lifetime: lost-wakeup races in the parker, scheduler starvation under continuous write load, mutex wake-under-lock in barrier.rs, sync-primitives cancellation deadlocks, and so on. The eight skill-attributed beads are not the only concurrency work on the project, but they are the ones where the discovery itself can be attributed cleanly to a single audit pass with an explicit rule about what counts as a real finding.
6) The Meta Layer
The author's framing of this skill, in the X post that opened this case study: a "very big, sophisticated skill" alongside the SaaS security audit and the tax preparation skills. The size comes from how much codified concurrency lore the SKILL.md carries (taxonomy, anti-patterns, fix catalog, validation gates) and how many companion skills it composes with. The runtime-side of debugging happens in gdb-for-debugging. After a major fix lands, multi-pass-bug-hunting runs as the deeper sweep. cass mines prior concurrency-bug sessions to surface precedent. All ship in the same subscription.
The most valuable secondary effect is the same one the isomorphic-refactor case study highlighted: artifacts. Every closed bead with a [deadlock-audit] or [deadlock-finder] tag is an explicit, searchable record of where someone walked the code and what they found. New contributors can grep the beads file and learn the project's concurrency hazards as a first-class part of onboarding, instead of having to discover them by hanging a CI run.
Source: https://x.com/doodlestein/status/2044648265438654745 · Skill page: https://jeffreys-skills.md/skills/deadlock-finder-and-fixer · Sibling skills: gdb-for-debugging, multi-pass-bug-hunting