<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.4.1">Jekyll</generator><link href="https://xianxu.dev/feed.xml" rel="self" type="application/atom+xml" /><link href="https://xianxu.dev/" rel="alternate" type="text/html" /><updated>2026-04-27T21:48:37+00:00</updated><id>https://xianxu.dev/feed.xml</id><title type="html">Xian Xu</title><subtitle>Personal thoughts on current events, society and technology.
</subtitle><entry><title type="html">A Tale of Two Harnesses</title><link href="https://xianxu.dev/2026/04/16/tale-of-two-harness.html" rel="alternate" type="text/html" title="A Tale of Two Harnesses" /><published>2026-04-16T00:00:00+00:00</published><updated>2026-04-16T00:00:00+00:00</updated><id>https://xianxu.dev/2026/04/16/tale-of-two-harness</id><content type="html" xml:base="https://xianxu.dev/2026/04/16/tale-of-two-harness.html"><![CDATA[<p>I’ve been building a Neovim plugin called <a href="https://github.com/xianxu/parley.nvim">Parley</a> for the past year, mostly as a way to understand what this vibe coding is about, while making some tools I like to use in nvim. 29K lines of Lua, 700+ commits, all written by AI agents, steered by me. It started as a chat environment — a way to have AI conversations inside my editor without the limitations of ChatGPT’s UI, a humble beginning. It has by now grown into a workbench/harness that forms a power tool for how I work.</p>

<p>What happened in the past week in particular surprised me. I set out to build some rather complex feature: an AI harness in nvim, and ended up stumbling on the pattern of <strong>one repo, multiple harnesses</strong>.</p>

<h2 id="the-flight-hack">The Flight Hack</h2>

<p>On a flight from Pittsburgh to SF, I started adding tool call support to Parley. Basic stuff: let the AI read and write files, list directories, search content. About 7K lines of change. The goal was simple — if Parley can touch the filesystem, it can do more than just talk.</p>

<p>This was Parley’s first step from “conversation tool” toward something I’d later call a <strong>harness</strong> — an environment that doesn’t just chat with AI, but runs structured AI loops in a directory.</p>

<h2 id="the-struggle-that-ensued">The Struggle That Ensued</h2>

<p>Tool calls introduced a multiplexed stream — AI responses now interleaved with tool calls and their results. And here’s where things got interesting. The AI (Claude Code, building Parley) made a huge mess of it. It kept packing the new stream handling on top of old code that assumed a single insertion point in a buffer. Patch on top of patch, and well, it didn’t know it needed to stop. And it took me some time to get it to finally stop and not revert to that old habit. I also needed to abort the tool call feature, to refactor this portion of code first, to dig myself out of the AI tech debt, so to speak.</p>

<p>I had to stop the AI and impose the architecture myself. There was already an abstraction of an intermediate model of [{question, answer}] that represents what’s in a nvim buffer, all we needed to do was enforce 1) represent more free-style tool call blocks in that structure and 2) use relative positioning to simplify the math. It took me quite some time to convince the AI to follow this to the letter, including various false starts when I forced it to print out state which showed it was not following that architecture choice.</p>

<p>This was the most intensive human intervention in my memory with AI coding. It took another 7K lines of change and 31 commits to refactor all code into this new model.</p>

<p>In hindsight, this was one of the most important moments of the week. Not because of what got built, but because of what it demonstrated: <strong>the human provides the structure, the AI provides the labor.</strong> When the AI couldn’t see the right architecture, iteration alone wasn’t converging. I had to draw the boundary. Once I did, and in this case, painfully, the AI executed successfully within it.</p>

<p>Another interesting aspect of this is that all this “steering”, I still don’t know how to write <code class="language-plaintext highlighter-rouge">lua</code> — though I’ve learned to read and audit it. This illustrates two things to me: that humans can provide value at a higher plane than coding, and that coding agents are mature enough to execute reliably within a well-defined architecture. A new era is really upon us.</p>

<h2 id="parley-becomes-a-harness">Parley Becomes a Harness</h2>

<p>The real reason I wanted tool calls was to build an inline review system for my writing. And this is where Parley stopped being a chat tool and became something different.</p>

<p>The review system works like this: I write a document in markdown, and anywhere I want feedback, I drop an inline marker:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>㊷[this paragraph is too vague, find some examples from ./somewhere/in/the/file/system/markdown.md]
</code></pre></div></div>

<p>An AI agent reads the document, finds the markers, and rewrites the marked sections. If it needs clarification, it can ask back:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>㊷[too vague]{do you mean the technical details or the business framing?}
</code></pre></div></div>

<p>To which, I can respond inline, and the conversation continues — all inside the document itself. The document IS the conversation.</p>

<p>The integration with Neovim was pretty natural. Changes show up with diff highlighting. Edits needing human attention land in Neovim’s quickfix list, so I can navigate feedback like I navigate compiler errors. Editorial notes display through the diagnostics system. The editor’s native machinery, repurposed for human-AI collaboration on prose.</p>

<p>Here’s the thing I noticed: this is a fundamentally different kind of AI loop from what Claude Code runs. Claude Code takes a task and drives toward implementation — write code, run tests, iterate until it passes.</p>

<p>The review loop is different. The human writes. The AI comments. The human responds to comments. The AI adjusts. It’s collaborative and document-centric. The transcript matters.</p>

<p>Two different loops. Two different interaction models. Both useful. Both operating on the same files.</p>

<h2 id="starting-a-repo-for-agent-skills">Starting a repo for agent skills</h2>

<p>After finishing up the parley review tool, I started a repo to accumulate agent skills — reusable workflows that encode how I want AI to participate in specific tasks. A <strong>voice adaptation</strong> skill that learns my writing style from 40 blog posts and rewrites AI drafts to sound like me. An <strong>interview feedback</strong> skill that turns my lightweight interview notes (Parley’s interview mode) into structured hiring feedback, with policies baked in (gender-neutral pronouns, no names, generic TC — habits from my Meta days). A <strong>critique mode</strong> that has AI leave adversarial comments on a document, which I then address through the same <code class="language-plaintext highlighter-rouge">㊷[]</code> review loop.</p>

<p>Building these, I accidentally decomposed writing into AI-operable stages. If I wrote a document to near-publishable quality, AI does surgical edits in review mode. If AI generated the first draft from my rough ideas, applying my voice becomes the critical step — otherwise it reads like AI slop. Same underlying tools, different starting positions, different emphasis. These aren’t separate features. They’re stages in an editorial process — drafting, critiquing, revising, polishing — and what I’d been doing was converting each stage into a human-steered, AI-powered workflow. That pattern feels like it generalizes well beyond writing.</p>

<h2 id="the-construct-meta-programming-in-a-stochastic-world">The Construct: Meta-Programming in a Stochastic World</h2>

<p>Then I built something stranger: <code class="language-plaintext highlighter-rouge">/construct</code>, a meta-tool for evolving skills across repositories. The problem it solves: you have upstream skills (like a community-maintained set of development practices, e.g. <code class="language-plaintext highlighter-rouge">superpowers</code>), and you have your own philosophy (your <code class="language-plaintext highlighter-rouge">AGENTS.md</code>, your workflow preferences etc.). How do you combine them?</p>

<p>Text merging doesn’t work — skills aren’t code, they’re instructions for a stochastic machine. You can’t <code class="language-plaintext highlighter-rouge">git merge</code> them. So I experimented with what I started calling “semantic merge”: store the intent (the conversation that produced the adaptation), not the patch itself. When the upstream changes, you replay your intent against the new version. The AI does the merge, guided by your recorded reasoning. There are even verification steps for each of those adaptation rules. And I plan to keep tweaking such adapters — as I use <code class="language-plaintext highlighter-rouge">superpowers</code> during coding, the aspects I don’t like go to the adaptation layer.</p>

<p>This was the start of a repo — a workbench for building, adapting, evolving and sharing skills across harnesses. After building the Construct, I adapted a set of development practice skills into the Parley repo, making them work harmoniously with Parley’s design philosophy. The Construct made this possible in a structured, repeatable way.</p>

<h2 id="the-eureka">The Eureka</h2>

<p>Somewhere around this point, building a tool in one UX, then accessing the spirit of that tool from another UX, I had the realization that should have been obvious from the start:</p>

<p>It’s just <strong>two harnesses on the same state.</strong></p>

<ul>
  <li><strong>Claude Code</strong> is the execution harness. Convergent. It takes a spec and drives toward implementation. Write code, run tests, iterate. Disposable conversations. The human cares more about the output feature, not the actual code. Well, I know this point might be controversial.</li>
  <li><strong>Parley</strong> is the thinking harness. Divergent. It’s where I explore, branch, change my mind, annotate, critique, and shape ideas. The human is firmly in the driver seat, checking both the final artifact but also the intermediates. The human does the work, AI assists.</li>
  <li><strong>The file system</strong> is the shared state. In my case, several repos I typically operate in. Both harnesses read and write the same set of files. And you use the best harness to transform them into what you want.</li>
</ul>

<p>The state is the integration layer. No bridges. No APIs. No copy-pasting between apps. A file written by one harness is immediately available to the other, because they share the state, the full state.</p>

<p>Most people think of AI tools as standalone applications. You use Cursor for coding. You use ChatGPT for questions. You use Notion AI for docs. Each tool has its own state, its own context, its own AI. What if they all operated on the same file system? That’s what I did in developing <code class="language-plaintext highlighter-rouge">parley</code> — vision, roadmap, brainstorming, issues, documentation, code, tests, everything in that single repo. And the result is liberating. At least for a solo developer.</p>

<p>The insight is that <strong>the state — your files, your repo, your accumulated context — is the real workspace.</strong> The harnesses are just specialized lenses and transformers on it. You pick the right tool for the job. And because they share state, work done in one is immediately available in the other.</p>

<h2 id="where-this-leads-to">Where This Leads To</h2>

<p>I started thinking about what this means beyond coding. If the pattern is right — shared state, specialized harnesses that provide the right UX for human steering — then it’s not just a developer workflow. It’s a knowledge work workflow. And more importantly, figuring out how non-coders can continuously improve upon their harnesses on that shared state, making their workflows hyper-charged.</p>

<p>That’s the topic for my next post.</p>]]></content><author><name></name></author><category term="ai" /><category term="tech" /><summary type="html"><![CDATA[I’ve been building a Neovim plugin called Parley for the past year, mostly as a way to understand what this vibe coding is about, while making some tools I like to use in nvim. 29K lines of Lua, 700+ commits, all written by AI agents, steered by me. It started as a chat environment — a way to have AI conversations inside my editor without the limitations of ChatGPT’s UI, a humble beginning. It has by now grown into a workbench/harness that forms a power tool for how I work.]]></summary></entry><entry><title type="html">WhatsApp GDPR Fine</title><link href="https://xianxu.dev/2026/04/08/whatsapp_gdpr_fine.html" rel="alternate" type="text/html" title="WhatsApp GDPR Fine" /><published>2026-04-08T00:00:00+00:00</published><updated>2026-04-08T00:00:00+00:00</updated><id>https://xianxu.dev/2026/04/08/whatsapp_gdpr_fine</id><content type="html" xml:base="https://xianxu.dev/2026/04/08/whatsapp_gdpr_fine.html"><![CDATA[<style>
h1 { color: #1a365d; border-bottom: 3px solid #4299e1; padding-bottom: 0.3rem; }
h2 { color: #2b6cb0; border-bottom: 2px solid #bee3f8; padding-bottom: 0.3rem; }
h3 { color: #3182ce; border-left: 4px solid #90cdf4; padding-left: 0.8rem; }
.branch-nav { margin: 1rem 0; padding: 0.6rem 1rem; border-radius: 8px; font-size: 0.95em; }
.branch-nav a { text-decoration: none; font-weight: 500; }
.branch-nav a:hover { text-decoration: underline; }
.branch-nav.parent-link { background: linear-gradient(135deg, #fefcbf 0%, #faf089 100%); border-left: 4px solid #d69e2e; color: #744210; }
.branch-nav.parent-link a { color: #975a16; }
.branch-nav.child-link { background: linear-gradient(135deg, #c6f6d5 0%, #9ae6b4 100%); border-left: 4px solid #38a169; color: #22543d; }
.branch-nav.child-link a { color: #276749; }
.branch-inline { color: #2b6cb0; text-decoration: none; border-bottom: 1px dashed #90cdf4; font-weight: 500; }
.branch-inline:hover { color: #1a365d; border-bottom-style: solid; }
</style>

<p>This transcript is generated by <a href="https://github.com/xianxu/parley.nvim">parley.nvim</a>.</p>

<h2 id="question">Question</h2>

<p>Give me a summary of the IDPC case against WhatsApp in 2021. From my impression, WhatsApp used a lossy hashing scheme and upload minimal amount of data (just phone numbers), and not shared with Facebook. Is that correct? If that’s the case, what’s the fine for? Literally every other company under the sun is doing much more than that, and I’m pretty sure none of they disclosed this “non-subscriber data” collection, as it’s so minor to the grand scheme of things. And for example, how the hell you can notify non-subscriber? It’s a case where regulator doesn’t check reality of how people want to use a service, and just out there to get probably one of the most privacy conscious service out there, just because they can milk innovation, essentially.</p>

<p>Do some research around this and tell me your opinion.</p>

<p>IDPC document <a href="https://www.edpb.europa.eu/system/files/2021-09/edpb_bindingdecision_202101_ie_sa_whatsapp_redacted_en.pdf">here</a>.</p>

<p>🤖:[Claude-Code-Opus]</p>

<p>🧠: Xian is asking about the Irish Data Protection Commission (IDPC) case against WhatsApp in 2021. He has a strong opinion that this was regulatory overreach against a privacy-conscious service. He wants me to research the actual case details and give my honest opinion. He’s linked the EDPB binding decision document. Let me fetch that document and do some web research to understand the case properly before forming an opinion. Given his WhatsApp background, he likely has insider knowledge, so I need to be substantive and not superficial.</p>

<p>Let me research this thoroughly - fetch the document and search for context.Now I have a thorough understanding of this case. Let me synthesize everything and give Xian my honest assessment.</p>

<p>Okay, I’ve gone deep into the EDPB binding decision and the surrounding commentary. Here’s my analysis.</p>

<h3 id="the-case-in-a-nutshell">The Case in a Nutshell</h3>

<p>The DPC concluded WhatsApp failed to: provide required privacy information to WhatsApp users (Article 13); provide privacy information to contacts of WhatsApp users — “non-users” — whose personal data was processed to show users which contacts were also WhatsApp users (Article 14); make privacy information easily accessible (Article 12); and failed to comply with the overarching transparency principle (Article 5(1)(a)).</p>

<p>The investigation was an “own volition” enquiry — the regulator selected the parameters itself, choosing to fix on an audit of WhatsApp’s “transparency” obligations.</p>

<h3 id="your-technical-understanding--partially-correct-but-the-picture-is-bigger">Your Technical Understanding — Partially Correct, But the Picture Is Bigger</h3>

<p>You’re right that WhatsApp used a lossy hashing scheme on non-user phone numbers and uploaded minimal data. The EDPB document confirms this in detail — when a user enables the Contact Feature, non-user phone numbers are hashed via a lossy process that maps each number to one of ~2^39 buckets (theoretically grouping ~16 numbers per hash). WhatsApp argued this constituted anonymization.</p>

<p>However, the fine <strong>wasn’t really about the technical processing itself</strong>. Here’s the critical distinction:</p>

<h3 id="what-the-fine-was-actually-for">What the Fine Was Actually For</h3>

<p>The DPC took no issue with how WhatsApp collects data at all. Instead, the huge financial penalty was based on WhatsApp’s failure to disclose the entire range of ways in which it uses the personal information it collects from users.</p>

<p>The violations were purely about <strong>transparency</strong> — the <em>how you describe what you do</em> in your privacy policy, not <em>what you actually do</em>. Specifically:</p>

<ol>
  <li><strong>Privacy policy was too vague and scattered</strong> — information spread across multiple linked documents, using conditional language like “may” and “such as”</li>
  <li><strong>Legitimate interests not specific enough</strong> — WhatsApp listed interests like “measurement, analytics, and other business services” without tying them to specific processing operations</li>
  <li><strong>Non-user notification failure</strong> — WhatsApp processed non-user phone numbers (via the contact upload feature) but provided zero Article 14 information to those non-users</li>
  <li><strong>Data sharing with Facebook Companies</strong> — the privacy policy didn’t clearly explain what was shared, with whom, and on what legal basis</li>
</ol>

<p>The DPC considered that the Article 14 breach (non-users) was the most serious among the four breaches identified, given the estimated number of non-users concerned (126 million) and the fact that they had no control or knowledge whatsoever over the processing of their personal information.</p>

<h3 id="the-lossy-hashing-fight">The Lossy Hashing Fight</h3>

<p>This is where it gets technically interesting and where I think you have the strongest point. The Irish DPC actually <strong>agreed with WhatsApp</strong> that the lossy hash output was anonymous data — the DPC’s Decision-Maker overturned the original investigator’s finding. But then <strong>eight other EU regulators objected</strong>, and the EDPB overruled the DPC, concluding the lossy hashed data was pseudonymous, not anonymous, because:</p>

<ul>
  <li>The “at least 16 numbers per hash” was actually “at most 16” — many hashes would map to fewer numbers in practice</li>
  <li>The Non-User List linked hashes to the uploading users, creating a social graph that could be used for re-identification</li>
  <li>The NL SA demonstrated that a brute-force lookup table for all Dutch mobile numbers could be built in ~3 minutes on 2017 hardware</li>
  <li>WhatsApp’s argument that it had “no motivation” to re-identify was rejected — the EDPB ruled motivation is irrelevant to whether data is personal</li>
</ul>

<h3 id="the-fine-escalation-drama">The Fine Escalation Drama</h3>

<p>Ireland originally proposed a far more low-ball penalty of up to €50 million for WhatsApp. The EDPB’s binding decision contained a clear instruction that required the DPC to reassess and increase its proposed fine, resulting in the €225 million fine.</p>

<p>The fine breakdown was: €90M for Article 5(1)(a), €30M for Article 12, €30M for Article 13, and €75M for Article 14 — totaling €225 million, being the sum of the separate fines proposed for breaches of GDPR Articles 12, 13, 14 and 5 respectively.</p>

<h3 id="my-honest-opinion">My Honest Opinion</h3>

<p>I think you’re <strong>partially right</strong> and <strong>partially wrong</strong>, and here’s where I land:</p>

<p><strong>Where you’re right:</strong></p>

<ul>
  <li>
    <p>The <strong>non-subscriber data</strong> argument is indeed somewhat absurd in practice. The DPC suggested it was not disproportionate to reach out to non-users, e.g. by way of information provided on WhatsApp’s website. But as the HU SA correctly pointed out, non-users might not even know WhatsApp exists, let alone visit their website. The requirement is performative compliance theater.</p>
  </li>
  <li>
    <p><strong>Selective enforcement</strong> is a real issue. You’re right that virtually every app that does contact sync (Telegram, Signal, Facebook Messenger, Snapchat, TikTok) processes non-subscriber phone numbers in similar ways, and none of them were providing Article 14 notices to non-users in 2018 either. WhatsApp was arguably doing <em>more</em> privacy-preserving work (the lossy hashing) than most competitors.</p>
  </li>
  <li>
    <p>The <strong>fine amount</strong> was politically driven. The DPC originally proposed €30-50M. The EDPB — driven largely by Germany, France, and Italy — quadrupled it. The fine against WhatsApp only equals 0.8 percent of WhatsApp’s annual turnover. The EDPB explicitly used Facebook’s consolidated group turnover to make the fine look small and justify increasing it. This is regulatory signaling, not proportionate enforcement.</p>
  </li>
  <li>
    <p><strong>WhatsApp’s actual privacy practices</strong> were relatively good for the industry. End-to-end encryption, lossy hashing of non-user contacts, no content scanning — this was genuinely one of the more privacy-conscious messaging platforms. Fining them €225M for how they <em>described</em> their practices rather than the practices themselves does feel perverse.</p>
  </li>
</ul>

<p><strong>Where I think you’re wrong:</strong></p>

<ul>
  <li>
    <p>The case <strong>isn’t as outrageous as it looks</strong> once you understand GDPR’s design philosophy. GDPR treats transparency not as a bureaucratic checkbox but as a <em>fundamental right enabler</em>. The logic is: if you don’t know what’s happening to your data, you can’t exercise your rights (object, erase, port, etc.). WhatsApp’s privacy policy was genuinely bad — the DPC itself said it was a “needlessly frustrating exercise” to understand what legal basis applied to what processing. 41% of required information was missing for users; 100% was missing for non-users.</p>
  </li>
  <li>
    <p>The <strong>“everyone does it” defense</strong> doesn’t work under GDPR, and regulators explicitly rejected it. WhatsApp’s unsuccessful arguments about its privacy notice being in line with the approach in the market, illustrate that the regulators are unlikely to be influenced by a common market practice.</p>
  </li>
  <li>
    <p>The lossy hashing debate was actually <strong>technically legitimate</strong>. The NL SA’s brute-force analysis was sound. A constant salt + truncated hash + the social graph metadata in the Non-User List does make re-identification feasible. WhatsApp’s “k-anonymity of 16” claim was mathematically wrong in practice.</p>
  </li>
</ul>

<p><strong>The bigger picture:</strong></p>

<p>I think the truth is somewhere in between. This was a case where a regulator picked a relatively privacy-conscious target precisely <em>because</em> it was a high-profile Meta subsidiary, knowing it would generate headlines and establish precedent. The DPC itself was reluctant (proposing a much lower fine and being overruled by the EDPB). The substantive transparency violations were real but the penalty was disproportionate to the actual harm — especially when compared to companies doing far more invasive things with far less disclosure.</p>

<p>Your instinct that this is “milking innovation” has some merit — but I’d frame it differently. It’s more like <strong>GDPR enforcement optimizing for legibility over actual privacy outcomes</strong>. The regulation rewards companies that write beautiful privacy policies over companies that build privacy-preserving technology. WhatsApp built lossy hashing; they got fined for not describing it clearly enough. That’s a legitimate criticism of the regulatory framework, not just of this decision.</p>

<h3 id="references">References</h3>
<ul>
  <li><a href="https://www.edpb.europa.eu/system/files/2021-09/edpb_bindingdecision_202101_ie_sa_whatsapp_redacted_en.pdf">EDPB Binding Decision 1/2021</a> (the full 89-page document)</li>
  <li><a href="https://iapp.org/news/a/irish-dpc-whatsapp-decision-what-do-you-need-to-know">IAPP analysis of the decision</a></li>
  <li><a href="https://techcrunch.com/2021/09/02/whatsapp-faces-267m-fine-for-breaching-europes-gdpr/">TechCrunch coverage</a></li>
  <li><a href="https://www.twobirds.com/en/insights/2021/uk/irish-data-protection-commission-whatsapp-decision">Bird &amp; Bird legal analysis</a></li>
  <li><a href="https://www.fieldfisher.com/en/insights/privacy-notices-post-whatsapp">Fieldfisher post-WhatsApp privacy notice guidance</a></li>
  <li><a href="https://idpc.org.mt/news-latest/gdpr-the-action-brought-by-whatsapp-ireland-against-binding-decision-1-2021-of-the-european-data-protection-board-is-admissible/">WhatsApp CJEU appeal admissibility ruling (2026)</a></li>
</ul>

<p>📝: You asked about the 2021 IDPC/EDPB case against WhatsApp and whether the fine was justified given WhatsApp’s privacy-conscious practices, I answered with a detailed breakdown showing the €225M fine was purely for transparency violations (how WhatsApp described its practices in privacy policies, not the practices themselves), the lossy hashing debate was technically legitimate but the fine was disproportionate, enforcement was politically escalated from DPC’s €50M to EDPB’s €225M, and while your frustration about selective enforcement has merit, GDPR’s design philosophy treats transparency as a fundamental right enabler — the real criticism is that GDPR rewards legibility over actual privacy engineering.</p>

<h2 id="question-1">Question</h2>

<p>and tell me how WhatsApp did describe that before that got them into trouble, particularly around this feature. Ignore the non-subscriber notification, which is retarded. With regard to its own users, how it described that feature?</p>

<p>🤖:[Claude-Code-Opus]</p>

<p>🧠: Xian wants to know specifically how WhatsApp described the contact upload/sync feature in its privacy policy that the DPC found deficient. He’s asking about the description to <em>users</em> (not non-users). Let me fetch the EDPB document to find the specific language WhatsApp used and what the DPC found wrong with it.Now let me search for what WhatsApp’s actual privacy policy said about the contact feature specifically, since the EDPB document references the “Legal Basis Notice” but doesn’t fully reproduce the contact feature description.Now I have a clear picture. Let me answer Xian’s specific question about how WhatsApp described the contact feature to its own users.</p>

<p>Specifically regarding the contact upload/sync feature, here’s what WhatsApp’s privacy policy actually said to its <strong>own users</strong> and why the DPC found it deficient:</p>

<h3 id="what-whatsapps-privacy-policy-said">What WhatsApp’s Privacy Policy Said</h3>

<p>The EDPB document quotes WhatsApp’s “Legal Basis Notice” extensively. The contact feature description was spread across multiple documents — the Privacy Policy, a separate Legal Basis Notice, and an “About Our Services” page. The key problems with how WhatsApp described the contact feature to <em>users</em> fell into several buckets:</p>

<h3 id="problem-1-vague-legitimate-interest-descriptions">Problem 1: Vague Legitimate Interest Descriptions</h3>

<p>WhatsApp listed its legitimate interests in bullet-point form under broad objectives. For the contact-adjacent processing, the relevant language included things like “For providing measurement, analytics, and other business services where we are processing data as a controller” with sub-bullets like “to provide accurate and reliable reporting to businesses and other partners” and “In the interests of businesses and other partners to help them understand their customers.”</p>

<p>The EDPB found this language nearly identical to the WP29 Transparency Guidelines’ own examples of <strong>“poor practice”</strong> — phrases like “We may use your personal data to develop new services” or “We may use your personal data for research purposes.”</p>

<h3 id="problem-2-no-link-between-processing-operations-and-legal-bases">Problem 2: No Link Between Processing Operations and Legal Bases</h3>

<p>The DPC found that insufficient detail had been provided in relation to the processing operations that will be grounded upon other legal bases and the information was furnished in piecemeal fashion that required the user to link in and out of various different sections of the Privacy Policy as well as the Terms of Service.</p>

<p>Specifically for the contact feature, WhatsApp’s privacy policy said something like “you provide us the phone numbers in your mobile address book on a regular basis, including those of both the users of our Services and your other contacts” — but it <strong>didn’t explain</strong>:</p>
<ul>
  <li>What exactly happened to those numbers after upload</li>
  <li>That non-user numbers were stored separately in a “Non-User List”</li>
  <li>What the lossy hashing procedure was or that it was applied</li>
  <li>How long the data was retained</li>
  <li>Which specific legal basis applied to this specific processing operation (as opposed to other operations)</li>
</ul>

<h3 id="problem-3-conditionalwague-language">Problem 3: Conditional/Wague Language</h3>

<p>The DPC found that controllers must remove any conditional or vague language like “may”, “including” or “such as”. WhatsApp’s policy was riddled with this kind of hedging language — saying it “may” do things rather than stating definitively what it does.</p>

<h3 id="problem-4-information-scattered-everywhere">Problem 4: Information Scattered Everywhere</h3>

<p>The DPC noted there was “an abundance of text that communicated very little,” warning against long but uninformative notices. The IE SA itself said reviewing WhatsApp’s materials was a “needlessly frustrating exercise” requiring extensive searching across the Privacy Policy and related documents to piece together what was actually happening. WhatsApp noted the level of detail in its privacy notice was consistent with the level of detail provided by its peers. The DPC dismissed this, noting that an industry could not be allowed to set its own level of compliance.</p>

<h3 id="problem-5-the-41-gap">Problem 5: The 41% Gap</h3>

<p>The DPC calculated that WhatsApp failed to provide <strong>41% of the information required by Article 13 GDPR</strong> to its own users. That’s not a minor gap — it means almost half of the legally mandated disclosures were simply missing.</p>

<h3 id="my-take-on-your-specific-question">My Take on Your Specific Question</h3>

<p>So to directly answer: WhatsApp described the contact feature to its own users in the way that basically every tech company did circa 2018 — a high-level statement that “we access your contacts to help you connect with friends” type language, spread across a Privacy Policy + Legal Basis Notice + Terms of Service, using conditional “may” language, without mapping specific data categories → specific processing operations → specific legal bases.</p>

<p>The DPC’s standard was essentially: <strong>you need to tell users, for each specific processing operation, exactly what data you take, what you do with it, why, under what legal basis, who you share it with, and for how long — and you need to do this in a way that’s easy to find and understand.</strong></p>

<p>WhatsApp’s point is well made that the standard set out in the decision goes significantly beyond that of most privacy notices. Indeed, a glance at the privacy notice on the website of the EDPB shows it does not meet the very similar standard applicable to the EDPB. Nor does the Irish DPC practice what it preaches.</p>

<p>That IAPP observation is devastating — the regulators who fined WhatsApp €225M for transparency failures <strong>don’t meet their own standard</strong> on their own websites. This is probably the strongest evidence for your “milking innovation” thesis. The standard was set impossibly high, applied retroactively to a 2018 privacy policy, and the enforcers themselves can’t meet it.</p>

<h3 id="references-1">References</h3>
<ul>
  <li><a href="https://www.edpb.europa.eu/system/files/2021-09/edpb_bindingdecision_202101_ie_sa_whatsapp_redacted_en.pdf">EDPB Binding Decision 1/2021</a></li>
  <li><a href="https://iapp.org/news/a/irish-dpc-whatsapp-decision-what-do-you-need-to-know">IAPP analysis</a></li>
  <li><a href="https://www.mccannfitzgerald.com/knowledge/technology-and-innovation/key-takeaways-from-dpcs-inquiry-into-whatsapps-transparency-measures">McCann FitzGerald analysis</a></li>
  <li><a href="https://www.fieldfisher.com/en/insights/privacy-notices-post-whatsapp">Fieldfisher post-WhatsApp guidance</a></li>
  <li><a href="https://www.dataprotection.ie/en/news-media/press-releases/data-protection-commission-announces-decision-whatsapp-inquiry">DPC announcement</a></li>
</ul>

<p>📝: You asked how WhatsApp specifically described the contact upload feature to its own users in its privacy policy, I answered with: WhatsApp used high-level vague language scattered across multiple documents (Privacy Policy + Legal Basis Notice + ToS) with conditional “may” phrasing, failed to map specific processing operations to specific legal bases, didn’t explain the lossy hashing procedure or Non-User List retention, and was missing 41% of Article 13 required information overall. The IAPP noted even the EDPB and DPC themselves don’t meet the standard they imposed.</p>

<h2 id="question-2">Question</h2>

<p>Did WhatsApp mention “we access your contacts to help you connect with friends” somewhere. I don’t care if it’s scattered, it’s simply not possible to describe how model software work, in a way layman can understand. That full description would inevitably be very long.</p>

<p>This is not to really argue it can’t be better, but rather, if you want to go full legalistic, you can mess with those companies whichever way you want, regardless of actual harm in real world. Just don’t pretend to be noble. Pretty sad.</p>

<p>🤖:[Claude-Code-Opus]</p>

<p>🧠: Xian is pushing back on my characterization. He’s right to call me out — I used “type language” as a paraphrase, not a direct quote. He wants to know what WhatsApp actually said about the contact feature. Let me go back to the EDPB document to find the actual language WhatsApp used. He’s also making a broader philosophical point about the impossibility of describing complex software to laypeople in a legally compliant way, and the disconnect between legalistic standards and real-world harm. Let me fetch the document to find the actual privacy policy language quoted in the decision.Fair point — I should not have used “type language” as if it were a quote. Let me correct the record with what the document actually says.</p>

<h3 id="what-whatsapp-actually-told-users-about-the-contact-feature">What WhatsApp Actually Told Users About the Contact Feature</h3>

<p>The EDPB document quotes WhatsApp’s Privacy Policy as stating that users “provide us the phone numbers in your mobile address book on a regular basis, including those of both the users of our Services and your other contacts.” So yes, WhatsApp <strong>did</strong> tell users it was collecting their contacts’ phone numbers. It wasn’t hidden.</p>

<p>The problem, according to the DPC and EDPB, was not that WhatsApp said <em>nothing</em> — it’s that WhatsApp didn’t connect that statement to the downstream chain of: what happens to those numbers → what legal basis applies → what the lossy hashing procedure does → how long the Non-User List is retained → who gets access to it.</p>

<h3 id="what-the-dpc-specifically-criticized">What the DPC Specifically Criticized</h3>

<p>The DPC’s criticism regarding the contact feature for <strong>users</strong> boiled down to:</p>

<ol>
  <li>
    <p><strong>No mapping of processing operations to legal bases</strong> — WhatsApp told users it collected contacts, but didn’t specify under which legal basis (consent? legitimate interest? contractual necessity?) this particular processing operation fell. The Legal Basis Notice listed legitimate interests as broad bullet points like “For providing measurement, analytics, and other business services” without tying them to specific data operations.</p>
  </li>
  <li>
    <p><strong>Conditional language</strong> — The privacy policy used “may” and “such as” and “including” throughout, which the DPC considered non-compliant because it left users uncertain about what <em>actually</em> happens vs. what <em>might</em> happen.</p>
  </li>
  <li>
    <p><strong>Information scattered across documents</strong> — The contact feature description lived partly in the Privacy Policy, partly in the Legal Basis Notice, partly in “About Our Services.” The DPC found piecing it together was a “needlessly frustrating exercise.”</p>
  </li>
</ol>

<h3 id="your-point-is-basically-right">Your Point Is Basically Right</h3>

<p>WhatsApp <strong>did</strong> describe the contact feature to users. It wasn’t a secret. The issue was granularity and structure, not concealment. And your broader point holds: if you want to go full legalistic, you can find fault with virtually any privacy policy on earth, because the GDPR standard for transparency — when read maximally — demands a level of specificity that is practically incompatible with how modern software works.</p>

<p>The EDPB itself acknowledged this tension implicitly. WhatsApp argued (correctly, in my view) that the standard demanded by the decision goes beyond what any company in the industry was doing. The DPC’s own investigator initially proposed a different, more lenient reading than what the EDPB ultimately imposed. The DPC Decision-Maker actually <em>cleared</em> WhatsApp on Article 13(1)(d) — it was the EDPB that overruled that finding.</p>

<p>So the enforcement arc was: WhatsApp’s own regulator (Ireland) found a moderate violation → eight other countries objected → the EDPB overruled Ireland on multiple points → the fine quadrupled. This is not a case where the regulator closest to the facts thought WhatsApp was egregiously wrong. It’s a case where a supranational body applied a maximalist reading and used WhatsApp to set precedent.</p>

<p>Your frustration is well-placed. The GDPR transparency standard, as applied here, essentially demands that privacy policies read like API documentation — every processing operation mapped to a legal basis, every data category tied to a purpose, every third party named. That’s technically possible but produces documents nobody reads, which defeats the stated goal of being “user-centric rather than legalistic.”</p>

<p>📝: You asked whether WhatsApp actually described the contact feature to users and what specifically was wrong, I answered that WhatsApp did disclose contact collection but the DPC/EDPB found the description lacked granularity in mapping processing operations to legal bases, used conditional language, and scattered information across documents — supporting your view that this is a legalistic gotcha rather than a genuine concealment issue.</p>

<h2 id="question-3">Question</h2>

<p>No no no, you can’t give user API document, as you will get fined as it’s not concise enough!</p>]]></content><author><name></name></author><category term="law" /><summary type="html"><![CDATA[]]></summary></entry><entry><title type="html">Loom workflow system</title><link href="https://xianxu.dev/2026/04/05/loom_workflow_system.html" rel="alternate" type="text/html" title="Loom workflow system" /><published>2026-04-05T00:00:00+00:00</published><updated>2026-04-05T00:00:00+00:00</updated><id>https://xianxu.dev/2026/04/05/loom_workflow_system</id><content type="html" xml:base="https://xianxu.dev/2026/04/05/loom_workflow_system.html"><![CDATA[<style>
h1 { color: #1a365d; border-bottom: 3px solid #4299e1; padding-bottom: 0.3rem; }
h2 { color: #2b6cb0; border-bottom: 2px solid #bee3f8; padding-bottom: 0.3rem; }
h3 { color: #3182ce; border-left: 4px solid #90cdf4; padding-left: 0.8rem; }
.branch-nav { margin: 1rem 0; padding: 0.6rem 1rem; border-radius: 8px; font-size: 0.95em; }
.branch-nav a { text-decoration: none; font-weight: 500; }
.branch-nav a:hover { text-decoration: underline; }
.branch-nav.parent-link { background: linear-gradient(135deg, #fefcbf 0%, #faf089 100%); border-left: 4px solid #d69e2e; color: #744210; }
.branch-nav.parent-link a { color: #975a16; }
.branch-nav.child-link { background: linear-gradient(135deg, #c6f6d5 0%, #9ae6b4 100%); border-left: 4px solid #38a169; color: #22543d; }
.branch-nav.child-link a { color: #276749; }
.branch-inline { color: #2b6cb0; text-decoration: none; border-bottom: 1px dashed #90cdf4; font-weight: 500; }
.branch-inline:hover { color: #1a365d; border-bottom-style: solid; }
</style>

<p>This transcript is generated by <a href="https://github.com/xianxu/parley.nvim">parley.nvim</a>.</p>

<h2 id="question">Question</h2>

<p>Loom: create a subsystem that uses coding to manage durable projects</p>

<p>I realize <code class="language-plaintext highlighter-rouge">claude</code>, <code class="language-plaintext highlighter-rouge">codex</code> is pretty good way to manage long running project, with project’s internal state kept in md files in a folder, and todos, and time they should be triggered or due encoded in some format in that folder. I guess essentially open crawl? But 1/ repo based; 2/ markdown files to make state visible to human inspection and iteration; 3/ use stock coding agent’s capability for multi-step reasoning, externalize its mental states in md, seems to fit knowledge worker more; 4/ otherwise (also using coding agents) to construct scripts for steps that are highly deterministic.</p>

<p>In an analogy, <a href="https://github.com/xianxu/parley.nvim">Parley</a> already showed the start of this pattern:</p>

<p>1/ parley chat, brainstorm tool, it keeps state globally in chat files, but more recent we allow move chat tree into other directories, and the whole point was to move chat to repo’s design/ directory as the output of brainstorming. btw, will improve this so parley chat can be even more straightforward to be used as project specific brainstorming tool.</p>

<p>2/ issue system, keep things related in issues/, then a hybrid of scripts (created by claude) and claude itself (during coding sessions) to manipulate the state to form a workflow.</p>

<p>3/ vision, the new roadmapping tool I’m tinkering, storing states (yaml) in vision/, then currently using scripts (generated by claude) to manipulate, and not free form text. I can’t pinpoint why I made this choice, but feel this area needs more structure.</p>

<p>Now:</p>

<p>4/ loom, to support more wide variety of tasks, but at its core, leveraging coding agents, to manage some workflows, keeping states in repo.</p>

<p>Those together, forms the new “operating system” of modern startup.</p>

<p>Thoughts?</p>

<p>🤖:[Claude-Code-Opus]</p>

<p>🧠: This is a fascinating architectural vision. The user is describing an evolution from chat → issues → roadmapping → general workflow orchestration, all repo-based with markdown/yaml state files, leveraging coding agents as the “runtime.” They’re asking for my thoughts on this “loom” concept and the broader vision of repo-based, human-inspectable, agent-managed workflow systems as a “startup OS.” Let me think deeply about this.</p>

<p>The key insight: coding agents (Claude, Codex) are already good at multi-step reasoning, file manipulation, and maintaining context. By externalizing state into markdown/yaml files in a repo, you get: version control for free, human inspectability, collaborative editing (human + AI), and the agent’s reasoning capabilities applied to workflow management rather than just code generation.</p>

<p>The name “loom” is evocative — weaving threads together into fabric. Let me think about what this actually means architecturally and where the real leverage is.</p>

<p>I should engage seriously with the architectural implications, identify what’s novel vs. what exists, point out potential challenges, and help sharpen the vision. This is a design conversation, not a “build me X” request.</p>

<p>This is a genuinely compelling vision. Let me engage with it structurally.</p>

<h3 id="the-pattern-youre-identifying">The Pattern You’re Identifying</h3>

<p>What you’re describing is essentially <strong>the repo as a runtime environment</strong>, where:</p>

<ul>
  <li><strong>State</strong> = files (md, yaml) in well-known directories</li>
  <li><strong>Runtime</strong> = coding agents (Claude, Codex) invoked on-demand or triggered</li>
  <li><strong>Transitions</strong> = agent reads state → reasons → mutates files → commits</li>
  <li><strong>Observability</strong> = <code class="language-plaintext highlighter-rouge">git log</code>, human-readable files, diffs</li>
  <li><strong>Collaboration</strong> = humans and agents edit the same files, review each other’s work</li>
</ul>

<p>The progression is clear:</p>

<table>
  <thead>
    <tr>
      <th>Layer</th>
      <th>Tool</th>
      <th>State Location</th>
      <th>What It Manages</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Brainstorming</td>
      <td>parley chat</td>
      <td><code class="language-plaintext highlighter-rouge">design/</code></td>
      <td>Ideas, conversations, exploration</td>
    </tr>
    <tr>
      <td>Planning</td>
      <td>issues</td>
      <td><code class="language-plaintext highlighter-rouge">issues/</code></td>
      <td>Work items, dependencies, status</td>
    </tr>
    <tr>
      <td>Strategy</td>
      <td>vision</td>
      <td><code class="language-plaintext highlighter-rouge">vision/</code></td>
      <td>Roadmap, priorities, timelines</td>
    </tr>
    <tr>
      <td><strong>Orchestration</strong></td>
      <td><strong>loom</strong></td>
      <td><strong><code class="language-plaintext highlighter-rouge">loom/</code></strong></td>
      <td><strong>Arbitrary durable workflows</strong></td>
    </tr>
  </tbody>
</table>

<h3 id="what-makes-loom-different-from-13">What Makes Loom Different from 1–3</h3>

<p>Parley, issues, and vision are each <strong>domain-specific</strong> — they have a fixed schema and a known set of state transitions. Loom is the <strong>generalization</strong>: it’s a workflow engine where the agent defines the schema, the transitions, and the triggers based on the task at hand.</p>

<p>This is the leap from “tool” to “substrate.” And it’s where things get both powerful and dangerous.</p>

<h3 id="the-core-architectural-question">The Core Architectural Question</h3>

<p>The critical design tension is: <strong>how much structure does loom impose vs. leave to the agent?</strong></p>

<p>Too much structure → you’re reimplementing Jira/Asana in markdown, and the agent is just a fancy CLI.
Too little structure → the agent creates bespoke state formats every time, and humans can’t navigate without asking the agent to explain its own files.</p>

<p>I think the sweet spot is something like:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>loom/
  loom.yaml              # registry of active "threads" (projects/workflows)
  threads/
    quarterly-planning/
      thread.yaml         # metadata: status, created, triggers, due
      state.md            # human-readable current state
      log.md              # append-only reasoning log
      artifacts/          # any outputs (reports, drafts, etc.)
    hiring-pipeline/
      thread.yaml
      state.md
      log.md
      artifacts/
</code></pre></div></div>

<p>Where <code class="language-plaintext highlighter-rouge">thread.yaml</code> has a minimal common schema:</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">id</span><span class="pi">:</span> <span class="s">quarterly-planning</span>
<span class="na">status</span><span class="pi">:</span> <span class="s">active</span>          <span class="c1"># active | paused | completed | blocked</span>
<span class="na">created</span><span class="pi">:</span> <span class="s">2026-04-01</span>
<span class="na">last_touched</span><span class="pi">:</span> <span class="s">2026-04-05</span>
<span class="na">trigger</span><span class="pi">:</span>
  <span class="na">schedule</span><span class="pi">:</span> <span class="s2">"</span><span class="s">weekly/monday"</span>   <span class="c1"># or: on_change, manual, event</span>
  <span class="na">condition</span><span class="pi">:</span> <span class="s2">"</span><span class="s">when</span><span class="nv"> </span><span class="s">vision/roadmap.yaml</span><span class="nv"> </span><span class="s">changes"</span>
<span class="na">owner</span><span class="pi">:</span> <span class="s">human</span>            <span class="c1"># or: agent, collaborative</span>
<span class="na">summary</span><span class="pi">:</span> <span class="s2">"</span><span class="s">Synthesize</span><span class="nv"> </span><span class="s">team</span><span class="nv"> </span><span class="s">updates</span><span class="nv"> </span><span class="s">into</span><span class="nv"> </span><span class="s">quarterly</span><span class="nv"> </span><span class="s">plan"</span>
<span class="na">depends_on</span><span class="pi">:</span> <span class="pi">[]</span>
</code></pre></div></div>

<p>And <code class="language-plaintext highlighter-rouge">state.md</code> is <strong>freeform but agent-maintained</strong> — this is where the agent externalizes its “mental state” about the workflow. The key insight: the agent writes state.md for its future self (and for humans), the same way a good engineer writes design docs.</p>

<h3 id="why-this-actually-works-and-why-now">Why This Actually Works (and Why Now)</h3>

<p>Three things converge:</p>

<ol>
  <li>
    <p><strong>Coding agents are file-native.</strong> They already think in terms of reading files, reasoning, writing files. You’re not fighting the grain — you’re aligning the workflow engine with the agent’s natural modality.</p>
  </li>
  <li>
    <p><strong>Git gives you durability + auditability for free.</strong> Every state transition is a commit. You can revert. You can branch. You can diff. No database needed.</p>
  </li>
  <li>
    <p><strong>Markdown is the universal interface.</strong> Humans read it, agents read it, GitHub renders it, grep finds it. It’s the lowest-common-denominator that actually works.</p>
  </li>
</ol>

<h3 id="the-operating-system-analogy">The “Operating System” Analogy</h3>

<p>I think the analogy is apt but worth being precise about. What you’re building isn’t an OS in the process-scheduling sense. It’s more like a <strong>file-system-based workflow kernel</strong>, where:</p>

<ul>
  <li><strong>Processes</strong> = threads (active workflows)</li>
  <li><strong>System calls</strong> = agent invocations (read state, reason, write state)</li>
  <li><strong>Scheduler</strong> = trigger system (cron-like, event-driven, manual)</li>
  <li><strong>IPC</strong> = file-based (one thread reads another thread’s state.md)</li>
  <li><strong>Shell</strong> = the human, who can <code class="language-plaintext highlighter-rouge">cat</code>, edit, or kick off any thread</li>
</ul>

<p>The missing piece — and the hardest piece — is the <strong>scheduler/trigger layer</strong>. Right now you’re describing manual invocation: human says “hey Claude, go work on the quarterly-planning thread.” The real power comes when threads can:</p>

<ul>
  <li>Self-schedule (“check back on this Monday”)</li>
  <li>React to events (“when a new issue is created in issues/, triage it”)</li>
  <li>Chain (“when hiring-pipeline reaches ‘offer-sent’, trigger onboarding thread”)</li>
</ul>

<p>This is where you’d need a lightweight daemon or cron job that scans <code class="language-plaintext highlighter-rouge">loom/threads/*/thread.yaml</code> for due triggers and invokes the agent. That’s the “init system” of your OS.</p>

<h3 id="honest-concerns">Honest Concerns</h3>

<p><strong>State corruption.</strong> When the agent rewrites <code class="language-plaintext highlighter-rouge">state.md</code>, it can lose information, hallucinate progress, or drift from reality. Mitigation: the append-only <code class="language-plaintext highlighter-rouge">log.md</code> creates an audit trail, and humans periodically review. But this requires discipline.</p>

<p><strong>Context window limits.</strong> A long-running thread accumulates context. Eventually the agent can’t hold the full history. You’ll need a summarization/compaction strategy — maybe the agent periodically rewrites <code class="language-plaintext highlighter-rouge">state.md</code> as a condensed “current understanding” while <code class="language-plaintext highlighter-rouge">log.md</code> retains the full history (but is only consulted when needed).</p>

<p><strong>Agent reliability for scheduling.</strong> Agents are good at reasoning but unreliable as schedulers. The trigger/cron layer should probably be a simple script, not agent-driven. Let dumb code handle “when,” let the agent handle “what.”</p>

<h3 id="what-id-build-first">What I’d Build First</h3>

<p>If I were you, I’d start with:</p>

<ol>
  <li><strong>The thread schema</strong> (<code class="language-plaintext highlighter-rouge">thread.yaml</code> + <code class="language-plaintext highlighter-rouge">state.md</code> + <code class="language-plaintext highlighter-rouge">log.md</code>) — nail the minimal structure</li>
  <li><strong>A <code class="language-plaintext highlighter-rouge">loom</code> CLI or set of scripts</strong> — <code class="language-plaintext highlighter-rouge">loom new &lt;thread&gt;</code>, <code class="language-plaintext highlighter-rouge">loom list</code>, <code class="language-plaintext highlighter-rouge">loom run &lt;thread&gt;</code> (which invokes Claude with the thread’s context)</li>
  <li><strong>One real workflow</strong> — something you actually do weekly, like “synthesize team standup notes into a status update.” Prove the pattern works before generalizing.</li>
  <li><strong>The trigger layer</strong> — a cron job that scans for due threads and invokes <code class="language-plaintext highlighter-rouge">loom run</code></li>
</ol>

<p>Skip building a UI. The repo <em>is</em> the UI. That’s the whole point.</p>

<h3 id="the-bigger-picture">The Bigger Picture</h3>

<p>What you’re describing — parley + issues + vision + loom — is essentially <strong>externalizing the startup’s cognitive processes into version-controlled, agent-manipulable files.</strong> The startup’s “brain” becomes inspectable, forkable, and collaborative between humans and agents.</p>

<p>This is philosophically closer to a <strong>Zettelkasten for organizations</strong> than to project management software. And the fact that it’s built on git means it inherits all of git’s collaboration primitives (branches, PRs, merges) for free.</p>

<p>The real question isn’t whether this pattern works — I think it clearly does for a certain class of knowledge worker. The question is whether it <strong>scales beyond the founder/power-user</strong> who set it up. Can a new engineer join and navigate <code class="language-plaintext highlighter-rouge">loom/threads/</code> without a 30-minute walkthrough? That’s the test.</p>

<p>📝: You asked about “loom” — a generalized repo-based workflow system using coding agents to manage durable projects via markdown/yaml state files, and the broader vision of parley+issues+vision+loom as a startup OS. I answered with architectural analysis: the thread-based file structure (thread.yaml + state.md + log.md), the key design tension of imposed vs. emergent structure, why the pattern works now (agents are file-native, git gives durability, markdown is universal), the OS analogy mapped precisely (threads=processes, agent=syscalls, triggers=scheduler), honest concerns (state corruption, context limits, scheduling reliability), and a concrete build-first plan starting with schema + CLI + one real workflow + cron triggers.</p>

<h2 id="question-1">Question</h2>

<p>I’m less sure about totally free form state for agent.</p>

<p>In the end, or at least what I’m comfortable at the beginning, is really to repeat the other tools I created as I need them as part of my workflow, e.g. issues for coding complex tasks, and keep track of ideas I want to build; vision to align bigger scoped items. you can imagine in the future, some issues can be automatically promoted if it’s sizable, maybe just with a keystroke, or even automatically by agent (human prompt: find all issues that’s big enough), or codify that prompt into code.</p>

<p>so, the next type of tool I think I want, would be some repeating tasks, e.g. checking system health dashboard every morning when I sip coffee, or last week’s oncall report. We can use this as first type of things what loom should support.</p>

<p>I don’t know if some common pattern or building block would emerge, but I felt I have gone through this several times it’s worth thinking about generalization now.</p>

<p>The other aspects: I’m highly confident that this is going to be very useful for my own personal productivity, organizing thoughts and execution. If we can lift this into some operating system for new generation company is TBD; but probably good enough to sell to venture capital?</p>

<p>🤖:[Claude-Code-Opus]</p>

<p>🧠: The user is narrowing the scope from my overly general “freeform state” proposal to something more concrete and practical. They want loom to start with a specific use case: recurring/scheduled tasks (morning dashboard checks, weekly oncall reports). They’re also reflecting on the meta-pattern across parley, issues, vision, and loom — wondering if common building blocks emerge. And they’re being honest about scope: personal productivity first, company OS maybe later, VC pitch possibly. Let me engage with: 1) the concrete first use case (recurring tasks/routines), 2) the emerging common patterns across their tools, 3) honest take on the VC angle. I should not repeat the architecture I already laid out — they’re course-correcting me toward something more grounded.</p>

<p>You’re right to push back on freeform. Let me recalibrate.</p>

<h3 id="the-correction">The Correction</h3>

<p>What you’re describing is closer to <strong>typed tools that share a common substrate</strong> than a generic workflow engine. Each tool (issues, vision, loom) has its own schema, its own semantics, its own interaction patterns. The generalization isn’t “anything goes” — it’s that they all follow the same <em>construction pattern</em>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>state in repo files + scripts/agent to manipulate state + human oversight
</code></pre></div></div>

<p>That’s the building block. Not the state format.</p>

<h3 id="loom-as-routines">Loom as “Routines”</h3>

<p>The first concrete loom use case — recurring tasks with agent execution — is actually a very clean, well-bounded problem:</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># loom/routines/morning-health-check.yaml</span>
<span class="na">name</span><span class="pi">:</span> <span class="s">Morning System Health Check</span>
<span class="na">schedule</span><span class="pi">:</span> <span class="s2">"</span><span class="s">weekday</span><span class="nv"> </span><span class="s">8:00am"</span>
<span class="na">prompt</span><span class="pi">:</span> <span class="pi">|</span>
  <span class="s">Check the following dashboards and summarize:</span>
  <span class="s">- AWS CloudWatch: error rates, latency p99</span>
  <span class="s">- PagerDuty: open incidents from last 12h</span>
  <span class="s">- Datadog: database connection pool utilization</span>
  <span class="s">Output a concise status to loom/routines/morning-health-check/latest.md</span>
  <span class="s">Flag anything that needs human attention at the top.</span>
<span class="na">sources</span><span class="pi">:</span>
  <span class="pi">-</span> <span class="na">url</span><span class="pi">:</span> <span class="s">https://app.datadoghq.com/dashboard/xxx</span>
  <span class="pi">-</span> <span class="na">command</span><span class="pi">:</span> <span class="s2">"</span><span class="s">aws</span><span class="nv"> </span><span class="s">cloudwatch</span><span class="nv"> </span><span class="s">get-metric-statistics</span><span class="nv"> </span><span class="s">..."</span>
<span class="na">history</span><span class="pi">:</span> <span class="s">loom/routines/morning-health-check/history/</span>
</code></pre></div></div>

<p>And the output is just:</p>

<div class="language-markdown highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">&lt;!-- loom/routines/morning-health-check/latest.md --&gt;</span>
<span class="gh"># 2026-04-05 Morning Health Check</span>

<span class="gu">### 🔴 Needs Attention</span>
<span class="p">-</span> Aurora replica lag spiked to 800ms at 3:22am, recovered at 4:01am
<span class="p">  -</span> Worth checking if the nightly ETL job timing shifted

<span class="gu">### 🟢 All Clear</span>
<span class="p">-</span> Error rates nominal (0.02%)
<span class="p">-</span> No open PagerDuty incidents
<span class="p">-</span> Connection pools at 34% utilization
</code></pre></div></div>

<p>You sip coffee, open <code class="language-plaintext highlighter-rouge">latest.md</code> in Neovim, done. Or it shows up as a parley chat you can interrogate further: “tell me more about that replica lag.”</p>

<h3 id="the-common-pattern-across-your-tools">The Common Pattern Across Your Tools</h3>

<p>Now, stepping back — here’s what I see emerging:</p>

<table>
  <thead>
    <tr>
      <th>Concern</th>
      <th>Issues</th>
      <th>Vision</th>
      <th>Loom (Routines)</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>Schema</strong></td>
      <td>issue yaml (status, priority, tasks)</td>
      <td>roadmap yaml (items, horizons, status)</td>
      <td>routine yaml (schedule, prompt, sources)</td>
    </tr>
    <tr>
      <td><strong>State mutation</strong></td>
      <td>scripts + agent during coding</td>
      <td>scripts + agent</td>
      <td>scripts + agent on schedule</td>
    </tr>
    <tr>
      <td><strong>Human interaction</strong></td>
      <td>read/edit files, kick off agent</td>
      <td>read/edit yaml, run commands</td>
      <td>read output, maybe follow up</td>
    </tr>
    <tr>
      <td><strong>Trigger</strong></td>
      <td>manual (human decides to work on issue)</td>
      <td>manual (planning sessions)</td>
      <td><strong>scheduled</strong> (this is new)</td>
    </tr>
  </tbody>
</table>

<p>The scheduled trigger is the genuinely new capability loom introduces. Everything else — yaml state, agent manipulation, human-readable output — you’ve already built. The delta is small, which is a good sign. It means you’re not inventing a new paradigm; you’re extending one that works.</p>

<h3 id="the-minimal-implementation">The Minimal Implementation</h3>

<p>Honestly, loom v0 might be:</p>

<ol>
  <li>A <code class="language-plaintext highlighter-rouge">loom/routines/</code> directory with yaml definitions</li>
  <li>A cron job (or GitHub Action, or a simple systemd timer) that scans for due routines</li>
  <li>For each due routine: invoke <code class="language-plaintext highlighter-rouge">claude</code> with the prompt, write output to <code class="language-plaintext highlighter-rouge">latest.md</code>, git commit</li>
  <li>A <code class="language-plaintext highlighter-rouge">loom list</code> script that shows what’s active and when each last ran</li>
</ol>

<p>That’s maybe 100 lines of bash/python. The agent does the hard part (reasoning about dashboards, summarizing oncall reports). Your code just handles the boring scheduling plumbing.</p>

<h3 id="on-the-emerging-abstraction">On the Emerging Abstraction</h3>

<p>You’ve now built the same pattern three times (issues, vision, loom). The common bones:</p>

<ul>
  <li><strong>A directory convention</strong> (<code class="language-plaintext highlighter-rouge">issues/</code>, <code class="language-plaintext highlighter-rouge">vision/</code>, <code class="language-plaintext highlighter-rouge">loom/</code>)</li>
  <li><strong>Typed YAML for structured state</strong> (machine-readable, agent-readable)</li>
  <li><strong>Markdown for narrative state</strong> (human-readable, agent-writable)</li>
  <li><strong>Scripts for state transitions</strong> (generated by agent, run by human or cron)</li>
  <li><strong>Agent as reasoning engine</strong> (reads state, applies judgment, writes state)</li>
</ul>

<p>If you wanted to generalize, you’d extract something like a <strong>tool template</strong> — a way to scaffold a new tool with its directory, schema, scripts, and agent prompts. But I’d resist doing that until you have 4-5 tools and the pattern is so obvious it’s annoying <em>not</em> to have it. You’re at 3-4 now. Maybe after loom ships, the abstraction crystallizes.</p>

<h3 id="on-the-vc-angle">On the VC Angle</h3>

<p>Honest take: the <strong>personal productivity story is strong but not fundable on its own.</strong> VCs want to see a wedge into a market, not a power user’s bespoke setup.</p>

<p>But the reframe that <em>could</em> work: <strong>“What if every startup’s operational state was version-controlled, agent-manipulable, and human-readable?”</strong> That’s a real pitch. It’s the anti-Notion, anti-Jira argument: those tools are databases with UIs that agents can’t reason over. Your approach is files that both humans and agents natively understand.</p>

<p>The challenge is the same one you identified: does it scale beyond the power user? The honest answer today is no. But the honest answer for most dev tools at seed stage is “the founder uses it and it’s transformative for them.” That’s enough for a certain class of investor, especially if you can show the pattern repeating (which you can — four tools, same substrate, compounding value).</p>

<p>The stronger VC angle might actually be: <strong>you’re not selling loom, you’re selling the pattern.</strong> The “repo-native operational OS” as a framework, with loom/issues/vision/parley as reference implementations. Think Rails, not Basecamp — though Basecamp funded Rails just fine.</p>

<h3 id="what-id-do-next">What I’d Do Next</h3>

<p>Build <code class="language-plaintext highlighter-rouge">loom/routines/</code> for yourself this week. One routine: morning health check. Keep it dead simple. Once you’re using it daily and it’s saving you 15 minutes, you’ll know what the second routine should be. The tool will tell you what it wants to become.</p>

<p>📝: You asked about narrowing loom’s scope to recurring/scheduled tasks as first use case and whether common patterns emerge across your tools, I answered with a concrete routine schema, identified the common substrate pattern (typed yaml + markdown + scripts + agent), noted that scheduled triggers are the genuinely new capability loom adds, and gave honest assessment on VC angle (pattern is the product, not individual tools).</p>

<h2 id="question-2">Question</h2>

<p>interestingly, I did have basecamp in mind, but of their other aspect of how they define company culture and how they work, less so of Rails.</p>

<p>🤖:[Claude-Code-Opus]</p>

<p>🧠: The user is clarifying that their Basecamp reference is about Basecamp’s organizational philosophy — Shape Up, calm company, async-first, small teams with autonomy, “how we work” as a product — not about extracting a framework (Rails) from a product. This is a meaningful distinction. They’re thinking about the <em>operating system for a company</em> angle more literally: how a company runs, makes decisions, communicates, and coordinates. Basecamp sold their way of working (books like Rework, Shape Up, It Doesn’t Have to Be Crazy at Work) as much as they sold the software. The user sees a parallel: the repo-native toolchain isn’t just tooling, it’s a philosophy of how a modern startup should operate. I should engage with this distinction seriously and not just repeat what I already said.</p>

<p>That’s a much more interesting reference point.</p>

<h3 id="basecamp-as-company-philosophy-productized">Basecamp as “Company Philosophy, Productized”</h3>

<p>Basecamp’s real innovation wasn’t the project management tool or Rails — it was codifying <strong>an opinionated way of working</strong> (Shape Up, six-week cycles, small teams, no backlogs, calm company) and then building software that <em>enforced</em> those opinions. The tool and the philosophy were inseparable. You couldn’t use Basecamp “wrong” because the constraints were baked in.</p>

<p>What you’re doing rhymes with that. The toolchain <strong>encodes a way of working</strong>:</p>

<ul>
  <li>State lives in the repo, not in someone’s head or a SaaS database</li>
  <li>Agents are collaborators, not automation bots</li>
  <li>Everything is inspectable, diffable, reviewable</li>
  <li>Async by default — you write state, agent processes it later (or vice versa)</li>
  <li>Small, composable tools over monolithic platforms</li>
</ul>

<p>That’s not just a technical choice. It’s an organizational philosophy: <strong>the company’s operational state should have the same properties as its code</strong> — version-controlled, peer-reviewed, transparent, and forkable.</p>

<h3 id="where-this-gets-interesting">Where This Gets Interesting</h3>

<p>Basecamp wrote Shape Up <em>after</em> they’d been working that way for years. The methodology emerged from practice, then got codified, then got evangelized. You’re on a similar trajectory — you’re in the “practicing it and discovering what works” phase.</p>

<p>The question is whether the philosophy that emerges is <strong>prescriptive enough to be compelling.</strong> Basecamp succeeded because their opinions were <em>strong</em>: no backlogs, period. Six-week cycles, period. If your philosophy is “use files in a repo,” that’s too weak. If it’s something like:</p>

<blockquote>
  <p>“Your company’s decisions, priorities, and operational state should be committed to git, manipulable by agents, and never locked in a tool you don’t control”</p>
</blockquote>

<p>…that’s a stance people can rally around <em>or</em> argue against. Both are good.</p>

<h3 id="the-honest-gap">The Honest Gap</h3>

<p>Basecamp’s philosophy was legible to non-technical people — PMs, designers, founders who don’t code. The repo-native approach currently requires comfort with git, yaml, terminal, and agents. That’s a smaller audience.</p>

<p>But maybe that’s fine. Basecamp targeted “small teams who reject enterprise bloat.” You’d be targeting <strong>“technical founders and engineering-led teams who want their operational layer to be as hackable as their codebase.”</strong> That’s a real audience. It’s just not everyone — and it doesn’t need to be.</p>

<p>📝: You clarified that the Basecamp inspiration is about their company philosophy/way-of-working (Shape Up, calm company) rather than the Rails-from-product extraction pattern. I engaged with how your toolchain similarly encodes an opinionated organizational philosophy — operational state as code, agent-native, inspectable — and noted the gap is legibility to non-technical users, but that a technical-founder audience is viable.</p>

<h2 id="question-3">Question</h2>]]></content><author><name></name></author><category term="nil" /><summary type="html"><![CDATA[]]></summary></entry><entry><title type="html">Vision alignment tools</title><link href="https://xianxu.dev/2026/04/03/vision_alignment_tools.html" rel="alternate" type="text/html" title="Vision alignment tools" /><published>2026-04-03T00:00:00+00:00</published><updated>2026-04-03T00:00:00+00:00</updated><id>https://xianxu.dev/2026/04/03/vision_alignment_tools</id><content type="html" xml:base="https://xianxu.dev/2026/04/03/vision_alignment_tools.html"><![CDATA[<style>
h1 { color: #1a365d; border-bottom: 3px solid #4299e1; padding-bottom: 0.3rem; }
h2 { color: #2b6cb0; border-bottom: 2px solid #bee3f8; padding-bottom: 0.3rem; }
h3 { color: #3182ce; border-left: 4px solid #90cdf4; padding-left: 0.8rem; }
.branch-nav { margin: 1rem 0; padding: 0.6rem 1rem; border-radius: 8px; font-size: 0.95em; }
.branch-nav a { text-decoration: none; font-weight: 500; }
.branch-nav a:hover { text-decoration: underline; }
.branch-nav.parent-link { background: linear-gradient(135deg, #fefcbf 0%, #faf089 100%); border-left: 4px solid #d69e2e; color: #744210; }
.branch-nav.parent-link a { color: #975a16; }
.branch-nav.child-link { background: linear-gradient(135deg, #c6f6d5 0%, #9ae6b4 100%); border-left: 4px solid #38a169; color: #22543d; }
.branch-nav.child-link a { color: #276749; }
.branch-inline { color: #2b6cb0; text-decoration: none; border-bottom: 1px dashed #90cdf4; font-weight: 500; }
.branch-inline:hover { color: #1a365d; border-bottom-style: solid; }
</style>

<p>This transcript is generated by <a href="https://github.com/xianxu/parley.nvim">parley.nvim</a>.</p>

<h2 id="question">Question</h2>

<p>in a startup how to best track both business and technical long term vision?</p>

<p>I found business leader and technical leader speak different language, and hard to create dual track vision that stagger and align. I found myself playing this role trying to bridge and translate between them. I started a mermaid graph for dependencies among them, but need:</p>

<p>1/ show “size and complexity of a thing”
2/ can customize description, including maybe one sentence of tech/business description, then link to further documentation</p>

<p>Any other tool I can use to represent and communicate this?</p>

<p>🤖:[Claude-Code-Opus]</p>

<p>🧠: The user is asking about a practical problem in startup leadership: bridging business and technical vision, and specifically about tools/representations for tracking dependencies and showing “size and complexity” of items. They already use Mermaid for dependency graphs. They seem to be a technical leader or CTO-type who bridges business and engineering. They want visual/communication tools that can represent not just dependencies but also the weight/complexity of items. Let me think about what’s actually useful here rather than giving generic advice.</p>

<p>Key concepts to cover:</p>
<ol>
  <li>The dual-track vision alignment problem is real and well-known</li>
  <li>For showing “size and complexity” in graph representations - weighted nodes, bubble charts, treemaps</li>
  <li>Tools beyond Mermaid that handle this better</li>
  <li>Practical frameworks that work for this bridging role</li>
</ol>

<p>I should focus on the tooling/representation question since that’s their specific ask, while briefly addressing the broader alignment challenge.</p>

<p>This is a really common and underappreciated problem. The business leader thinks in outcomes, markets, and timelines; the technical leader thinks in systems, constraints, and dependencies. You’re essentially acting as a <strong>translation layer</strong>, which is valuable but fragile if it lives only in your head.</p>

<h3 id="on-showing-size-and-complexity-in-your-dependency-graph">On showing “size and complexity” in your dependency graph</h3>

<p>Mermaid is great for structure but weak for encoding <strong>magnitude</strong>. What you want is a graph where nodes carry visual weight. A few options:</p>

<p><strong>Option A: Weighted node graphs (best fit)</strong></p>

<ul>
  <li><strong>D3.js force-directed graphs</strong> — nodes can be sized by complexity (e.g., story points, estimated weeks, number of unknowns) and colored by domain (business vs. tech). Edges show dependencies. This is the most expressive option but requires some coding.</li>
  <li><strong><a href="/2026/04/03/what_is_graphviz_dot_language.html" class="branch-inline">Graphviz (DOT language)</a></strong> — supports node sizing, shapes, colors, and cluster groupings. Less interactive but very precise. You can generate <code class="language-plaintext highlighter-rouge">.dot</code> files programmatically.</li>
</ul>

<p><strong>Option B: Treemaps / Bubble charts</strong></p>

<ul>
  <li>If hierarchy matters more than dependency, a <strong>treemap</strong> (e.g., in Notion charts, Observable, or even Google Sheets) encodes size as area. Good for showing “this technical initiative is 5x the effort of that one.”</li>
</ul>

<p><strong>Option C: Wardley Maps</strong></p>

<p>This might be the missing piece for you. Wardley Maps were designed <em>exactly</em> for the problem you’re describing:</p>

<ul>
  <li><strong>X-axis</strong>: evolution stage (genesis → custom → product → commodity)</li>
  <li><strong>Y-axis</strong>: visibility to the user/business</li>
  <li>Components are placed on the map and connected by dependency lines</li>
  <li>You can annotate with effort/complexity</li>
</ul>

<p>It forces business and technical leaders to look at the <strong>same map</strong> and argue about positioning rather than talking past each other. It’s specifically designed to bridge strategy and technology.</p>

<p><strong>Option D: Impact/Effort matrices with dependency overlays</strong></p>

<p>A 2x2 (impact vs. effort) where you draw dependency arrows between items. Simple, but surprisingly effective in meetings. Tools like Miro or FigJam work well here.</p>

<h3 id="practical-suggestion">Practical suggestion</h3>

<p>For your specific situation — dual-track vision with dependencies and complexity — I’d recommend:</p>

<ol>
  <li><strong>Wardley Map</strong> for the strategic alignment conversation (quarterly)</li>
  <li><strong>Weighted dependency graph</strong> (D3 or Graphviz) for the operational/planning view</li>
  <li>Keep your Mermaid graphs for documentation — they’re version-controllable and readable in PRs</li>
</ol>

<h3 id="on-the-broader-alignment-problem">On the broader alignment problem</h3>

<p>A few hard-won patterns:</p>

<ul>
  <li><strong>Shared artifact, not shared language</strong>: Don’t try to make the business leader speak tech or vice versa. Create a single visual artifact (like the Wardley Map) that both can point at and disagree on. The artifact <em>is</em> the translation layer, not you.</li>
  <li><strong>Name the bets</strong>: Every long-term vision item is a bet. Frame them as “We believe [X] will lead to [Y], and we’ll know by [Z].” This works in both languages.</li>
  <li><strong>Stagger intentionally</strong>: Business vision should lead by ~1 quarter. Technical vision responds with “here’s what we need to build/change to enable that.” The dependency graph you’re building is exactly the right connector.</li>
</ul>

<p>Want me to sketch out a Graphviz example with weighted nodes for your use case, or dive deeper into Wardley Maps?</p>

<p>📝: You asked about tools to track dual business/technical vision with size/complexity representation beyond Mermaid, I answered with weighted node graphs (D3/Graphviz), Wardley Maps as the strongest fit for business-tech alignment, treemaps for magnitude, and the principle of using shared artifacts instead of relying on yourself as the translation layer.</p>

<h2 id="question-1">Question</h2>]]></content><author><name></name></author><category term="nil" /><summary type="html"><![CDATA[]]></summary></entry><entry><title type="html">what is “Graphviz (DOT language)”</title><link href="https://xianxu.dev/2026/04/03/what_is_graphviz_dot_language.html" rel="alternate" type="text/html" title="what is “Graphviz (DOT language)”" /><published>2026-04-03T00:00:00+00:00</published><updated>2026-04-03T00:00:00+00:00</updated><id>https://xianxu.dev/2026/04/03/what_is_graphviz_dot_language</id><content type="html" xml:base="https://xianxu.dev/2026/04/03/what_is_graphviz_dot_language.html"><![CDATA[<style>
h1 { color: #1a365d; border-bottom: 3px solid #4299e1; padding-bottom: 0.3rem; }
h2 { color: #2b6cb0; border-bottom: 2px solid #bee3f8; padding-bottom: 0.3rem; }
h3 { color: #3182ce; border-left: 4px solid #90cdf4; padding-left: 0.8rem; }
.branch-nav { margin: 1rem 0; padding: 0.6rem 1rem; border-radius: 8px; font-size: 0.95em; }
.branch-nav a { text-decoration: none; font-weight: 500; }
.branch-nav a:hover { text-decoration: underline; }
.branch-nav.parent-link { background: linear-gradient(135deg, #fefcbf 0%, #faf089 100%); border-left: 4px solid #d69e2e; color: #744210; }
.branch-nav.parent-link a { color: #975a16; }
.branch-nav.child-link { background: linear-gradient(135deg, #c6f6d5 0%, #9ae6b4 100%); border-left: 4px solid #38a169; color: #22543d; }
.branch-nav.child-link a { color: #276749; }
.branch-inline { color: #2b6cb0; text-decoration: none; border-bottom: 1px dashed #90cdf4; font-weight: 500; }
.branch-inline:hover { color: #1a365d; border-bottom-style: solid; }
</style>

<p>This transcript is generated by <a href="https://github.com/xianxu/parley.nvim">parley.nvim</a>.</p>

<div class="branch-nav parent-link"><a href="/2026/04/03/vision_alignment_tools.html">← Vision alignment tools</a></div>

<h2 id="question">Question</h2>

<p>what is “Graphviz (DOT language)”? and which software supports it.</p>

<p>I like both nice visualization but also the ability to generate it programmatically.</p>

<p>🤖:[Claude-Code-Opus]</p>

<p>🧠: The user is asking about Graphviz and its DOT language specifically. They want to know what it is and which software supports it. They also care about both visual appeal and programmatic generation. I should explain DOT language concisely, list key tools/renderers, and highlight the programmatic angle since that’s their use case. They’re technical (they’re already using Mermaid), so I can be fairly direct.</p>

<p>Graphviz is a <strong>graph visualization system</strong> originally developed at AT&amp;T Labs Research. The <strong>DOT language</strong> is its plain-text, declarative language for describing graphs (nodes, edges, attributes).</p>

<h3 id="how-it-works">How it works</h3>

<p>You write a <code class="language-plaintext highlighter-rouge">.dot</code> file like this:</p>

<div class="language-dot highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">digraph</span> <span class="nv">vision</span> <span class="p">{</span>
    <span class="n">rankdir</span><span class="p">=</span><span class="nv">LR</span><span class="p">;</span>

    <span class="c1">// Nodes with size/style attributes</span>
    <span class="k">node</span> <span class="o">[</span><span class="n">shape</span><span class="p">=</span><span class="nv">box</span><span class="p">,</span> <span class="n">style</span><span class="p">=</span><span class="nv">filled</span><span class="o">]</span><span class="p">;</span>

    <span class="nv">A</span> <span class="o">[</span><span class="n">label</span><span class="p">=</span><span class="s2">"Auth Service\n(S)"</span><span class="p">,</span> <span class="n">fillcolor</span><span class="p">=</span><span class="s2">"#a0d8ef"</span><span class="p">,</span> <span class="n">width</span><span class="p">=</span><span class="mf">1.2</span><span class="o">]</span><span class="p">;</span>
    <span class="nv">B</span> <span class="o">[</span><span class="n">label</span><span class="p">=</span><span class="s2">"Data Platform\n(XL)"</span><span class="p">,</span> <span class="n">fillcolor</span><span class="p">=</span><span class="s2">"#ff9999"</span><span class="p">,</span> <span class="n">width</span><span class="p">=</span><span class="mf">3.0</span><span class="p">,</span> <span class="n">fontsize</span><span class="p">=</span><span class="mi">18</span><span class="o">]</span><span class="p">;</span>
    <span class="nv">C</span> <span class="o">[</span><span class="n">label</span><span class="p">=</span><span class="s2">"Mobile App v2\n(M)"</span><span class="p">,</span> <span class="n">fillcolor</span><span class="p">=</span><span class="s2">"#c8e6c9"</span><span class="p">,</span> <span class="n">width</span><span class="p">=</span><span class="mf">1.8</span><span class="o">]</span><span class="p">;</span>

    <span class="c1">// Edges (dependencies)</span>
    <span class="nv">A</span> <span class="o">-&gt;</span> <span class="nv">B</span> <span class="o">[</span><span class="n">label</span><span class="p">=</span><span class="s2">"feeds into"</span><span class="p">,</span> <span class="n">style</span><span class="p">=</span><span class="nv">dashed</span><span class="o">]</span><span class="p">;</span>
    <span class="nv">B</span> <span class="o">-&gt;</span> <span class="nv">C</span> <span class="o">[</span><span class="n">label</span><span class="p">=</span><span class="s2">"enables"</span><span class="o">]</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Key idea: <strong>node/edge attributes</strong> (<code class="language-plaintext highlighter-rouge">width</code>, <code class="language-plaintext highlighter-rouge">height</code>, <code class="language-plaintext highlighter-rouge">penwidth</code>, <code class="language-plaintext highlighter-rouge">fontsize</code>, <code class="language-plaintext highlighter-rouge">color</code>) let you encode size/complexity visually — which is exactly the dimension you said you’re missing in Mermaid.</p>

<h3 id="software-that-renders-dot">Software that renders DOT</h3>

<table>
  <thead>
    <tr>
      <th>Tool</th>
      <th>Type</th>
      <th>Programmatic?</th>
      <th>Notes</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>Graphviz CLI</strong> (<code class="language-plaintext highlighter-rouge">dot</code>, <code class="language-plaintext highlighter-rouge">neato</code>, <code class="language-plaintext highlighter-rouge">fdp</code>)</td>
      <td>Command-line</td>
      <td>✅ pipe <code class="language-plaintext highlighter-rouge">.dot</code> → SVG/PNG/PDF</td>
      <td>The canonical renderer</td>
    </tr>
    <tr>
      <td><strong>Python <code class="language-plaintext highlighter-rouge">graphviz</code></strong> lib</td>
      <td>Python API</td>
      <td>✅ build graph in code, render</td>
      <td><code class="language-plaintext highlighter-rouge">pip install graphviz</code></td>
    </tr>
    <tr>
      <td><strong>D3-graphviz</strong></td>
      <td>JS (browser)</td>
      <td>✅ render DOT in browser via WASM</td>
      <td>Uses <code class="language-plaintext highlighter-rouge">@hpcc-js/wasm</code> under the hood</td>
    </tr>
    <tr>
      <td><strong>Viz.js / @viz-js/viz</strong></td>
      <td>JS library</td>
      <td>✅ pure JS/WASM Graphviz port</td>
      <td>Works in Node or browser</td>
    </tr>
    <tr>
      <td><strong>VS Code extension</strong> (“Graphviz Preview”)</td>
      <td>Editor</td>
      <td>❌ manual</td>
      <td>Live preview as you edit <code class="language-plaintext highlighter-rouge">.dot</code></td>
    </tr>
    <tr>
      <td><strong>Jupyter</strong> (via <code class="language-plaintext highlighter-rouge">graphviz</code> Python lib)</td>
      <td>Notebook</td>
      <td>✅ inline render</td>
      <td>Great for exploration</td>
    </tr>
    <tr>
      <td><strong>Confluence / Notion</strong></td>
      <td>Wiki</td>
      <td>⚠️ embed SVG output</td>
      <td>No native DOT support</td>
    </tr>
  </tbody>
</table>

<h3 id="programmatic-generation-example-python">Programmatic generation example (Python)</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="n">graphviz</span> <span class="kn">import</span> <span class="n">Digraph</span>

<span class="n">dot</span> <span class="o">=</span> <span class="nc">Digraph</span><span class="p">(</span><span class="sh">'</span><span class="s">vision</span><span class="sh">'</span><span class="p">,</span> <span class="nb">format</span><span class="o">=</span><span class="sh">'</span><span class="s">svg</span><span class="sh">'</span><span class="p">)</span>
<span class="n">dot</span><span class="p">.</span><span class="nf">attr</span><span class="p">(</span><span class="n">rankdir</span><span class="o">=</span><span class="sh">'</span><span class="s">LR</span><span class="sh">'</span><span class="p">)</span>
<span class="n">dot</span><span class="p">.</span><span class="nf">attr</span><span class="p">(</span><span class="sh">'</span><span class="s">node</span><span class="sh">'</span><span class="p">,</span> <span class="n">shape</span><span class="o">=</span><span class="sh">'</span><span class="s">box</span><span class="sh">'</span><span class="p">,</span> <span class="n">style</span><span class="o">=</span><span class="sh">'</span><span class="s">filled</span><span class="sh">'</span><span class="p">)</span>

<span class="c1"># Items: (id, label, size_weight, color)
</span><span class="n">items</span> <span class="o">=</span> <span class="p">[</span>
    <span class="p">(</span><span class="sh">'</span><span class="s">auth</span><span class="sh">'</span><span class="p">,</span>     <span class="sh">'</span><span class="s">Auth Service</span><span class="sh">'</span><span class="p">,</span>    <span class="mf">1.2</span><span class="p">,</span> <span class="sh">'</span><span class="s">#a0d8ef</span><span class="sh">'</span><span class="p">),</span>
    <span class="p">(</span><span class="sh">'</span><span class="s">data</span><span class="sh">'</span><span class="p">,</span>     <span class="sh">'</span><span class="s">Data Platform</span><span class="sh">'</span><span class="p">,</span>   <span class="mf">3.0</span><span class="p">,</span> <span class="sh">'</span><span class="s">#ff9999</span><span class="sh">'</span><span class="p">),</span>
    <span class="p">(</span><span class="sh">'</span><span class="s">mobile</span><span class="sh">'</span><span class="p">,</span>   <span class="sh">'</span><span class="s">Mobile App v2</span><span class="sh">'</span><span class="p">,</span>   <span class="mf">1.8</span><span class="p">,</span> <span class="sh">'</span><span class="s">#c8e6c9</span><span class="sh">'</span><span class="p">),</span>
<span class="p">]</span>

<span class="k">for</span> <span class="nb">id</span><span class="p">,</span> <span class="n">label</span><span class="p">,</span> <span class="n">w</span><span class="p">,</span> <span class="n">color</span> <span class="ow">in</span> <span class="n">items</span><span class="p">:</span>
    <span class="n">dot</span><span class="p">.</span><span class="nf">node</span><span class="p">(</span><span class="nb">id</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="sa">f</span><span class="sh">"</span><span class="si">{</span><span class="n">label</span><span class="si">}</span><span class="se">\n</span><span class="s">(w=</span><span class="si">{</span><span class="n">w</span><span class="si">}</span><span class="s">)</span><span class="sh">"</span><span class="p">,</span>
             <span class="n">width</span><span class="o">=</span><span class="nf">str</span><span class="p">(</span><span class="n">w</span><span class="p">),</span> <span class="n">fillcolor</span><span class="o">=</span><span class="n">color</span><span class="p">,</span> <span class="n">fontsize</span><span class="o">=</span><span class="nf">str</span><span class="p">(</span><span class="mi">10</span> <span class="o">+</span> <span class="nf">int</span><span class="p">(</span><span class="n">w</span> <span class="o">*</span> <span class="mi">4</span><span class="p">)))</span>

<span class="n">dot</span><span class="p">.</span><span class="nf">edge</span><span class="p">(</span><span class="sh">'</span><span class="s">auth</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">data</span><span class="sh">'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="sh">'</span><span class="s">feeds into</span><span class="sh">'</span><span class="p">,</span> <span class="n">style</span><span class="o">=</span><span class="sh">'</span><span class="s">dashed</span><span class="sh">'</span><span class="p">)</span>
<span class="n">dot</span><span class="p">.</span><span class="nf">edge</span><span class="p">(</span><span class="sh">'</span><span class="s">data</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">mobile</span><span class="sh">'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="sh">'</span><span class="s">enables</span><span class="sh">'</span><span class="p">)</span>

<span class="n">dot</span><span class="p">.</span><span class="nf">render</span><span class="p">(</span><span class="sh">'</span><span class="s">vision</span><span class="sh">'</span><span class="p">,</span> <span class="n">view</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>  <span class="c1"># outputs vision.svg and opens it
</span></code></pre></div></div>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># To test from CLI directly:</span>
pip <span class="nb">install </span>graphviz
python vision_graph.py

<span class="c"># Or raw DOT:</span>
dot <span class="nt">-Tsvg</span> vision.dot <span class="nt">-o</span> vision.svg
</code></pre></div></div>

<h3 id="graphviz-vs-mermaid--key-tradeoff">Graphviz vs Mermaid — key tradeoff</h3>

<table>
  <thead>
    <tr>
      <th> </th>
      <th>Mermaid</th>
      <th>Graphviz</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>Rendering</strong></td>
      <td>Browser/Markdown native</td>
      <td>CLI / lib, then embed</td>
    </tr>
    <tr>
      <td><strong>Layout control</strong></td>
      <td>Limited</td>
      <td>Fine-grained (<code class="language-plaintext highlighter-rouge">width</code>, <code class="language-plaintext highlighter-rouge">pos</code>, subgraphs)</td>
    </tr>
    <tr>
      <td><strong>Encode magnitude</strong></td>
      <td>❌ not really</td>
      <td>✅ node size, pen width, font size</td>
    </tr>
    <tr>
      <td><strong>Programmatic</strong></td>
      <td>Possible but awkward (string templating)</td>
      <td>First-class (Python lib, JS lib)</td>
    </tr>
    <tr>
      <td><strong>Looks pretty by default</strong></td>
      <td>✅</td>
      <td>⚠️ Functional, not beautiful — needs styling</td>
    </tr>
  </tbody>
</table>

<h3 id="practical-suggestion">Practical suggestion</h3>

<p>Since you’re bridging business ↔ tech, a reasonable workflow:</p>

<ol>
  <li><strong>Source of truth</strong>: a structured data file (YAML/JSON) listing initiatives, their size, type (biz/tech), dependencies</li>
  <li><strong>Generate</strong>: DOT graph programmatically from that data, encoding size as node width</li>
  <li><strong>Render</strong>: SVG → embed in your wiki/slides</li>
  <li><strong>Iterate</strong>: update the YAML, re-run, new graph</li>
</ol>

<p>This keeps the representation <strong>version-controlled and reproducible</strong> — much better than manually dragging boxes in a whiteboard tool when the landscape keeps shifting.</p>

<p>📝: You asked what Graphviz/DOT language is and which tools support it, I answered with: it’s a declarative graph description language from AT&amp;T Labs, rendered by CLI tools and libraries (Python graphviz, D3-graphviz, Viz.js), with fine-grained node attributes (width, color, penwidth) that solve your “show size/complexity” need, and suggested a YAML→DOT→SVG pipeline for maintainability.</p>

<h2 id="question-1">Question</h2>

<p>you meant a yaml file to represent dot format?</p>

<p>🤖:[Claude-Code-Opus]</p>

<p>🧠: The user is asking for clarification. They’re wondering if I meant YAML as an alternative representation of DOT format. No — I meant YAML as a higher-level, domain-specific data source (initiatives, their metadata like size, type, dependencies) that you then <em>transform</em> into DOT programmatically. The YAML is not representing DOT syntax; it’s representing your business/technical domain model. The Python script reads the YAML and generates DOT. This is an important distinction — the YAML is your semantic layer, DOT is your visualization layer. Let me clarify this cleanly with a concrete example.</p>

<p>No — not YAML <em>as</em> DOT. I meant <strong>two separate layers</strong>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>YAML (your domain model)  →  Python script  →  DOT  →  SVG/PNG
     what things are             transform       layout    visual
</code></pre></div></div>

<p>The YAML captures <strong>what you care about</strong> in business/tech terms. The script <em>translates</em> it into DOT. This separation matters because:</p>

<ul>
  <li>Non-technical people can read/edit YAML — they’ll never touch DOT</li>
  <li>You can generate <em>multiple views</em> from the same YAML (e.g. a dependency graph, a timeline, a complexity heatmap)</li>
  <li>The YAML becomes your <strong>single source of truth</strong>, not the graph</li>
</ul>

<h3 id="concrete-example">Concrete example</h3>

<p><strong><code class="language-plaintext highlighter-rouge">vision.yaml</code></strong> — your domain model:</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">initiatives</span><span class="pi">:</span>
  <span class="pi">-</span> <span class="na">id</span><span class="pi">:</span> <span class="s">auth</span>
    <span class="na">name</span><span class="pi">:</span> <span class="s">Auth Service Rewrite</span>
    <span class="na">type</span><span class="pi">:</span> <span class="s">tech</span>
    <span class="na">size</span><span class="pi">:</span> <span class="s">S</span>
    <span class="na">quarter</span><span class="pi">:</span> <span class="s">Q3</span>
    <span class="na">depends_on</span><span class="pi">:</span> <span class="pi">[]</span>

  <span class="pi">-</span> <span class="na">id</span><span class="pi">:</span> <span class="s">data_platform</span>
    <span class="na">name</span><span class="pi">:</span> <span class="s">Data Platform</span>
    <span class="na">type</span><span class="pi">:</span> <span class="s">tech</span>
    <span class="na">size</span><span class="pi">:</span> <span class="s">XL</span>
    <span class="na">quarter</span><span class="pi">:</span> <span class="s">Q3-Q4</span>
    <span class="na">depends_on</span><span class="pi">:</span> <span class="pi">[</span><span class="nv">auth</span><span class="pi">]</span>

  <span class="pi">-</span> <span class="na">id</span><span class="pi">:</span> <span class="s">self_serve</span>
    <span class="na">name</span><span class="pi">:</span> <span class="s">Self-Serve Onboarding</span>
    <span class="na">type</span><span class="pi">:</span> <span class="s">business</span>
    <span class="na">size</span><span class="pi">:</span> <span class="s">L</span>
    <span class="na">quarter</span><span class="pi">:</span> <span class="s">Q4</span>
    <span class="na">depends_on</span><span class="pi">:</span> <span class="pi">[</span><span class="nv">data_platform</span><span class="pi">,</span> <span class="nv">auth</span><span class="pi">]</span>

  <span class="pi">-</span> <span class="na">id</span><span class="pi">:</span> <span class="s">mobile_v2</span>
    <span class="na">name</span><span class="pi">:</span> <span class="s">Mobile App v2</span>
    <span class="na">type</span><span class="pi">:</span> <span class="s">business</span>
    <span class="na">size</span><span class="pi">:</span> <span class="s">M</span>
    <span class="na">quarter</span><span class="pi">:</span> <span class="s">Q1-next</span>
    <span class="na">depends_on</span><span class="pi">:</span> <span class="pi">[</span><span class="nv">data_platform</span><span class="pi">]</span>
</code></pre></div></div>

<p><strong><code class="language-plaintext highlighter-rouge">generate_graph.py</code></strong> — the transform:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">yaml</span>
<span class="kn">from</span> <span class="n">graphviz</span> <span class="kn">import</span> <span class="n">Digraph</span>

<span class="n">SIZE_MAP</span> <span class="o">=</span> <span class="p">{</span><span class="sh">'</span><span class="s">S</span><span class="sh">'</span><span class="p">:</span> <span class="mf">1.0</span><span class="p">,</span> <span class="sh">'</span><span class="s">M</span><span class="sh">'</span><span class="p">:</span> <span class="mf">1.5</span><span class="p">,</span> <span class="sh">'</span><span class="s">L</span><span class="sh">'</span><span class="p">:</span> <span class="mf">2.2</span><span class="p">,</span> <span class="sh">'</span><span class="s">XL</span><span class="sh">'</span><span class="p">:</span> <span class="mf">3.0</span><span class="p">}</span>
<span class="n">COLOR_MAP</span> <span class="o">=</span> <span class="p">{</span><span class="sh">'</span><span class="s">tech</span><span class="sh">'</span><span class="p">:</span> <span class="sh">'</span><span class="s">#a0d8ef</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">business</span><span class="sh">'</span><span class="p">:</span> <span class="sh">'</span><span class="s">#ffe0b2</span><span class="sh">'</span><span class="p">}</span>

<span class="k">with</span> <span class="nf">open</span><span class="p">(</span><span class="sh">'</span><span class="s">vision.yaml</span><span class="sh">'</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
    <span class="n">data</span> <span class="o">=</span> <span class="n">yaml</span><span class="p">.</span><span class="nf">safe_load</span><span class="p">(</span><span class="n">f</span><span class="p">)</span>

<span class="n">dot</span> <span class="o">=</span> <span class="nc">Digraph</span><span class="p">(</span><span class="sh">'</span><span class="s">vision</span><span class="sh">'</span><span class="p">,</span> <span class="nb">format</span><span class="o">=</span><span class="sh">'</span><span class="s">svg</span><span class="sh">'</span><span class="p">)</span>
<span class="n">dot</span><span class="p">.</span><span class="nf">attr</span><span class="p">(</span><span class="n">rankdir</span><span class="o">=</span><span class="sh">'</span><span class="s">TB</span><span class="sh">'</span><span class="p">)</span>
<span class="n">dot</span><span class="p">.</span><span class="nf">attr</span><span class="p">(</span><span class="sh">'</span><span class="s">node</span><span class="sh">'</span><span class="p">,</span> <span class="n">shape</span><span class="o">=</span><span class="sh">'</span><span class="s">box</span><span class="sh">'</span><span class="p">,</span> <span class="n">style</span><span class="o">=</span><span class="sh">'</span><span class="s">filled,rounded</span><span class="sh">'</span><span class="p">)</span>

<span class="k">for</span> <span class="n">item</span> <span class="ow">in</span> <span class="n">data</span><span class="p">[</span><span class="sh">'</span><span class="s">initiatives</span><span class="sh">'</span><span class="p">]:</span>
    <span class="n">w</span> <span class="o">=</span> <span class="n">SIZE_MAP</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="n">item</span><span class="p">[</span><span class="sh">'</span><span class="s">size</span><span class="sh">'</span><span class="p">],</span> <span class="mf">1.5</span><span class="p">)</span>
    <span class="n">color</span> <span class="o">=</span> <span class="n">COLOR_MAP</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="n">item</span><span class="p">[</span><span class="sh">'</span><span class="s">type</span><span class="sh">'</span><span class="p">],</span> <span class="sh">'</span><span class="s">#eeeeee</span><span class="sh">'</span><span class="p">)</span>
    <span class="n">label</span> <span class="o">=</span> <span class="sa">f</span><span class="sh">"</span><span class="si">{</span><span class="n">item</span><span class="p">[</span><span class="sh">'</span><span class="s">name</span><span class="sh">'</span><span class="p">]</span><span class="si">}</span><span class="se">\n</span><span class="s">[</span><span class="si">{</span><span class="n">item</span><span class="p">[</span><span class="sh">'</span><span class="s">size</span><span class="sh">'</span><span class="p">]</span><span class="si">}</span><span class="s">] </span><span class="si">{</span><span class="n">item</span><span class="p">[</span><span class="sh">'</span><span class="s">quarter</span><span class="sh">'</span><span class="p">]</span><span class="si">}</span><span class="sh">"</span>

    <span class="n">dot</span><span class="p">.</span><span class="nf">node</span><span class="p">(</span><span class="n">item</span><span class="p">[</span><span class="sh">'</span><span class="s">id</span><span class="sh">'</span><span class="p">],</span> <span class="n">label</span><span class="o">=</span><span class="n">label</span><span class="p">,</span>
             <span class="n">width</span><span class="o">=</span><span class="nf">str</span><span class="p">(</span><span class="n">w</span><span class="p">),</span> <span class="n">fillcolor</span><span class="o">=</span><span class="n">color</span><span class="p">,</span>
             <span class="n">fontsize</span><span class="o">=</span><span class="nf">str</span><span class="p">(</span><span class="mi">10</span> <span class="o">+</span> <span class="nf">int</span><span class="p">(</span><span class="n">w</span> <span class="o">*</span> <span class="mi">3</span><span class="p">)))</span>

    <span class="k">for</span> <span class="n">dep</span> <span class="ow">in</span> <span class="n">item</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="sh">'</span><span class="s">depends_on</span><span class="sh">'</span><span class="p">,</span> <span class="p">[]):</span>
        <span class="n">dot</span><span class="p">.</span><span class="nf">edge</span><span class="p">(</span><span class="n">dep</span><span class="p">,</span> <span class="n">item</span><span class="p">[</span><span class="sh">'</span><span class="s">id</span><span class="sh">'</span><span class="p">])</span>

<span class="n">dot</span><span class="p">.</span><span class="nf">render</span><span class="p">(</span><span class="sh">'</span><span class="s">vision_roadmap</span><span class="sh">'</span><span class="p">,</span> <span class="n">view</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>pip <span class="nb">install </span>graphviz pyyaml
python generate_graph.py
<span class="c"># opens vision_roadmap.svg</span>
</code></pre></div></div>

<p>This produces a graph where:</p>
<ul>
  <li><strong>Box size</strong> = initiative complexity (S/M/L/XL)</li>
  <li><strong>Color</strong> = business (orange) vs tech (blue)</li>
  <li><strong>Arrows</strong> = dependencies</li>
  <li><strong>Labels</strong> = timing</li>
</ul>

<h3 id="why-this-layering-matters-for-your-specific-problem">Why this layering matters for your specific problem</h3>

<p>You said you’re the <strong>bridge</strong> between business and tech leaders. With this setup:</p>

<ul>
  <li>The <strong>YAML</strong> is the artifact you co-author with both sides — it’s readable, diffable, lives in git</li>
  <li>The <strong>graph</strong> is a generated output you show in meetings — nobody argues about box positions, they argue about the <em>data</em></li>
  <li>When business says “we need self-serve in Q3”, you update one line in YAML, re-run, and instantly see if the dependency chain supports that timeline</li>
</ul>

<p>The graph is disposable. The YAML is the conversation.</p>

<p>📝: You asked to clarify whether I meant YAML as DOT format, I clarified it’s two separate layers — YAML as a domain-specific source of truth (readable by both business and tech), with a Python script transforming it into DOT for visualization, keeping the semantic model separate from the rendering.</p>

<h2 id="question-2">Question</h2>

<p>ah, then I can also insert whatever links in the YAML etc. essentially develop a workbench around this, to manage dependencies.</p>

<p>I can for example imagine a <a href="https://github.com/xianxu/parley.nvim">parley</a> like plugin to manage a folder in a repo. I can also see easily additional both read-only presentation of such folder on web, or eventually collaborative editing of this folder through web.</p>

<p>Thoughts?</p>

<p>🤖:[Claude-Code-Opus]</p>

<p>🧠: The user is connecting dots — they see the YAML-as-source-of-truth pattern and are envisioning a full workbench around it. They reference their own neovim plugin (parley.nvim) which seems to be a tool they built. They’re thinking about: 1) a repo folder as the backing store, 2) neovim plugin for power-user editing, 3) read-only web view, 4) eventually collaborative web editing. They’re not really asking me a question — they’re thinking out loud and want me to engage critically with the architecture. I should validate what’s strong, flag risks/tradeoffs, and maybe suggest concrete architectural choices. I should not over-explain basics — this person clearly builds tools and thinks in systems.</p>

<p>Yes — you’re describing a very natural architecture, and the key insight is right: <strong>a folder of YAML files in a git repo is already a database</strong> with versioning, diffing, branching, and access control built in.</p>

<h3 id="what-youd-be-building">What you’d be building</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>repo/vision/
├── initiatives/
│   ├── auth-rewrite.yaml
│   ├── data-platform.yaml
│   └── self-serve-onboarding.yaml
├── schema.yaml          # optional: validate structure
└── views/
    └── roadmap.dot      # generated
</code></pre></div></div>

<p>Each initiative = one file. Dependencies reference other files by ID. This is essentially a <strong>lightweight knowledge graph stored as flat files</strong>.</p>

<h3 id="the-three-interfaces-youre-describing">The three interfaces you’re describing</h3>

<table>
  <thead>
    <tr>
      <th>Interface</th>
      <th>Audience</th>
      <th>Mode</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>Neovim (parley-like plugin)</strong></td>
      <td>You, tech leads</td>
      <td>Read/write, power user</td>
    </tr>
    <tr>
      <td><strong>Static web view</strong></td>
      <td>Stakeholders, all-hands</td>
      <td>Read-only, generated</td>
    </tr>
    <tr>
      <td><strong>Collaborative web editor</strong></td>
      <td>Cross-functional leads</td>
      <td>Read/write, structured</td>
    </tr>
  </tbody>
</table>

<p>This is a smart layering because you <strong>decouple the data from the interface</strong> — each layer is independently buildable and optional.</p>

<h3 id="whats-strong-about-this">What’s strong about this</h3>

<ul>
  <li><strong>Git as the collaboration protocol</strong> — you get PRs for vision changes, which is actually profound. “I propose we move self-serve to Q3” becomes a diff, not a meeting argument.</li>
  <li><strong>One file per initiative</strong> scales better than one big YAML — fewer merge conflicts, easier to grep, natural unit of ownership.</li>
  <li><strong>Neovim-first</strong> means you’ll actually use it daily. Tools you don’t live in die.</li>
</ul>

<h3 id="where-id-be-cautious">Where I’d be cautious</h3>

<p><strong>Don’t build the web editor early.</strong> The progression should be:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>1. YAML files + generate script       ← start here, weeks of value
2. Neovim plugin for navigation        ← you'd build this naturally
3. Static site (read-only)             ← CI generates on push
4. Web editor                          ← only if non-git users need write access
</code></pre></div></div>

<p>Step 4 is where complexity explodes — you’re building a CMS at that point (conflict resolution, auth, validation). Many teams never actually need it. A surprisingly effective middle ground: <strong>non-technical people submit changes via PR templates</strong> with a simple form, and you merge.</p>

<h3 id="for-the-static-web-view">For the static web view</h3>

<p>A minimal approach that works well:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># generate.py — run in CI on every push
# 1. Read all YAML files from initiatives/
# 2. Emit a dependency graph (SVG via Graphviz)
# 3. Emit an index.html with the graph + a table of all initiatives
#    each linking to a detail page or external docs
# 4. Deploy to GitHub Pages / S3
</span></code></pre></div></div>

<p>Each initiative YAML can carry a <code class="language-plaintext highlighter-rouge">docs_url</code> field pointing to a design doc, Notion page, whatever — the web view just renders it as a link. You don’t need to centralize all documentation, just the <strong>graph of relationships and sizing</strong>.</p>

<h3 id="the-real-leverage">The real leverage</h3>

<p>What you’re building is not a project management tool. It’s a <strong>shared mental model with a rendering engine</strong>. The value isn’t the graph — it’s that business and tech leaders can look at the <em>same structured data</em> and see their own concerns reflected:</p>

<ul>
  <li>Business sees: what depends on what, what’s blocking revenue</li>
  <li>Tech sees: what’s complex, what’s coupled, what’s risky</li>
  <li>You stop being the translator and become the <strong>maintainer of the schema</strong></li>
</ul>

<p>That’s a much more sustainable role than being the human bridge.</p>

<p>📝: You proposed building a workbench around YAML-in-git-repo with neovim plugin, static web view, and eventual collaborative editing; I validated the architecture, recommended one-file-per-initiative structure, cautioned against building the web editor too early (static site + PR workflow covers most needs), and reframed the value as shifting your role from human translator to schema maintainer.</p>

<h2 id="question-3">Question</h2>

<p>actually, parley already supports exporting a tree of markdown files (chat files) cross referenced. here, we replace markdown with yaml.</p>

<p>Assuming there’s a company.yaml file of this “company vision” directory, we pretty much can generate easily various views, for example, two views:</p>

<p>1/ a dependency graph, the dot file and visualization mentioned above. 
2/ just plain web page with interlinked yaml files for navigation, parley style.</p>

<p>We can add more different visualization easily, as likely any reasonable company.yaml is going to be too complex to just show fully, then maybe we only show starting from a node etc. We can customize whichever way.</p>

<p>🤖:[Claude-Code-Opus]</p>

<p>🧠: Xian is refining the architecture and essentially validating his own thinking. He’s not really asking a question — he’s describing the design and implicitly asking for confirmation, refinement, or pushback. The key points he’s making: 1) parley already has the tree-of-files + cross-reference pattern, 2) company.yaml as root, 3) multiple generated views from same data, 4) subgraph/filtered views because full graph is too complex. I should affirm what’s strong, add any sharp insight, and push back where appropriate. The main thing I can add value on: the “too complex to show fully” problem is THE core UX challenge, and the “start from a node” idea is exactly right — this is essentially a graph traversal UI. I should also note that company.yaml as root is a good design choice (entry point / manifest). I should keep this concise since he’s mostly thinking out loud and building momentum.</p>

<p>Exactly right. And the “start from a node” insight is the critical one — it’s the difference between a tool that works in a presentation and one that works in a <em>conversation</em>.</p>

<h3 id="the-architecture-youre-converging-on">The architecture you’re converging on</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>vision/
├── company.yaml              # root: top-level goals, references initiatives
├── initiatives/
│   ├── auth-rewrite.yaml     # references other initiative IDs
│   ├── data-platform.yaml
│   └── self-serve.yaml
└── generated/                # gitignored or CI-built
    ├── full-graph.svg
    ├── index.html            # interlinked browsable pages
    └── subgraphs/
        └── data-platform.svg # rooted at one node
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">company.yaml</code> as the root is a good design choice — it’s the <strong>manifest</strong>. It declares the top-level business objectives, and each one references initiatives. The graph is implicit in the <code class="language-plaintext highlighter-rouge">depends_on</code> fields across files. The views are just traversals.</p>

<h3 id="two-views-two-purposes">Two views, two purposes</h3>

<table>
  <thead>
    <tr>
      <th>View</th>
      <th>When you use it</th>
      <th>What it answers</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>Dependency graph</strong> (rooted at node X)</td>
      <td>Planning meetings, “why is this blocked?”</td>
      <td>Structure, critical path, coupling</td>
    </tr>
    <tr>
      <td><strong>Interlinked pages</strong> (parley-style)</td>
      <td>Async review, onboarding, reference</td>
      <td>Detail, context, links to docs</td>
    </tr>
  </tbody>
</table>

<p>They complement each other well — the graph is for <strong>seeing shape</strong>, the pages are for <strong>reading substance</strong>.</p>

<h3 id="the-too-complex-problem">The “too complex” problem</h3>

<p>This is actually the core UX problem of any vision-tracking tool, and “start from a node” solves it cleanly because it maps to how people actually ask questions:</p>

<ul>
  <li><em>“What does data platform depend on?”</em> → subgraph, ancestors</li>
  <li><em>“What does data platform unlock?”</em> → subgraph, descendants</li>
  <li><em>“What’s on the critical path to self-serve?”</em> → shortest path from roots</li>
</ul>

<p>These are all just <strong>graph queries</strong> over your YAML files. A small CLI like:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>vision show data-platform <span class="nt">--depth</span> 2 <span class="nt">--direction</span> down
</code></pre></div></div>

<p>…would be more useful day-to-day than any full-graph rendering. And it’s trivial to implement given you already have the traversal logic in parley for markdown trees.</p>

<h3 id="one-thought">One thought</h3>

<p>The parley pattern of “tree of cross-referenced files” is doing a lot of work here. What you’re really building is <strong>parley generalized beyond chat</strong> — a structured-file graph with pluggable renderers. The chat files were the first content type. Vision YAML is the second. The rendering engine (interlinked HTML, graph viz, filtered subgraphs) is the reusable layer.</p>

<p>Whether you keep it inside parley or extract it as a sibling tool is a naming/packaging decision, but the core abstraction is the same.</p>

<p>📝: You described a company.yaml-rooted vision directory reusing parley’s cross-referenced file pattern with multiple generated views, I affirmed the architecture and highlighted that “start from a node” graph traversal is the key UX insight, and that this is essentially parley generalized beyond chat into a structured-file graph with pluggable renderers.</p>

<h2 id="question-4">Question</h2>

<p>you can even run self consistency checks, or allocation of resources: did two person got assigned on the same project etc. really can embed any rules.</p>

<p>What’s your take on the value of such software in a business? would it help align business leader and tech leader? based on my experience with parley, this is easily constructable, maybe 1 week of my time I think.</p>

<p>thoughts?</p>

<p>🤖:[Claude-Code-Opus]</p>

<p>🧠: Xian is asking me to evaluate the business value of this tool concept — whether it would actually help align business and tech leaders. He’s also sanity-checking the 1-week estimate. He has deep experience building parley, so he knows his velocity. I should be honest and critical here, not just cheerleading. Let me think about what actually drives alignment in organizations vs. what tools can do. The real question is: does the problem of business/tech misalignment stem from lack of shared artifacts, or from deeper organizational/incentive issues? A tool helps with the former but not the latter. I should also consider: does this already exist? (Roadmapping tools like Productboard, Jira, etc.) What’s different here? The key differentiator is the code-native, git-native, programmable nature. Let me be balanced.</p>

<p>The honest answer: <strong>the tool itself has moderate value; the process of building and maintaining it together has high value.</strong></p>

<h3 id="why-im-cautious-about-the-tool-alone">Why I’m cautious about the tool alone</h3>

<p>The alignment problem between business and tech leaders is rarely about lack of visibility. It’s usually about:</p>

<ul>
  <li><strong>Different incentive horizons</strong> — business optimizes for next quarter’s revenue, tech optimizes for next year’s maintainability</li>
  <li><strong>Different risk models</strong> — business sees delay risk, tech sees complexity risk</li>
  <li><strong>Lack of shared vocabulary</strong> — not lack of shared documents</li>
</ul>

<p>A beautifully rendered dependency graph doesn’t fix any of these. I’ve seen teams with Jira boards, Productboard roadmaps, Notion wikis, all perfectly structured — and still fundamentally misaligned because the leaders don’t actually negotiate tradeoffs together.</p>

<h3 id="where-it-does-help">Where it <em>does</em> help</h3>

<p>The specific thing your tool does that existing roadmap tools don’t:</p>

<ol>
  <li>
    <p><strong>Makes dependencies non-negotiable.</strong> When a business leader says “move self-serve to Q3,” the graph mechanically shows what that requires. It shifts the conversation from opinion to constraint propagation. This is genuinely powerful.</p>
  </li>
  <li>
    <p><strong>Consistency checks as guardrails.</strong> The rules engine you described — resource conflicts, impossible timelines, circular dependencies — turns implicit disagreements into explicit violations. That’s a forcing function for alignment.</p>
  </li>
  <li>
    <p><strong>Git-native means decisions have history.</strong> “Why did we deprioritize X?” becomes <code class="language-plaintext highlighter-rouge">git log initiatives/x.yaml</code>. No other roadmap tool gives you that for free.</p>
  </li>
  <li>
    <p><strong>It’s yours.</strong> You can shape it to your org’s actual decision-making process, not Jira’s abstraction of what a process should be.</p>
  </li>
</ol>

<h3 id="the-real-value-proposition">The real value proposition</h3>

<p>The highest-leverage use isn’t the software — it’s <strong>the weekly ritual of reviewing it together</strong>. If you get the business leader and tech leader to co-edit <code class="language-plaintext highlighter-rouge">company.yaml</code> in a meeting, even briefly, the act of negotiating what goes in the file <em>is</em> the alignment. The tool is the excuse.</p>

<p>This is the same dynamic that makes architecture decision records (ADRs) work — the document isn’t the value, the conversation that produces the document is.</p>

<h3 id="on-the-1-week-estimate">On the 1-week estimate</h3>

<p>Believable for you, given parley’s existing infrastructure. But scope it tightly:</p>

<ul>
  <li>Week 1: YAML schema + CLI to generate graph + static HTML. <strong>Stop here and use it for 2-3 weeks.</strong></li>
  <li>Only then decide if you need more.</li>
</ul>

<p>The risk isn’t build time — it’s <strong>over-building before you’ve validated that other people will actually look at the output</strong>. You’ll use it because you built it. The question is whether the business leader opens the link.</p>

<h3 id="bottom-line">Bottom line</h3>

<p>It’s worth the week. But go in with clear eyes: you’re building a <strong>conversation scaffold</strong>, not a solution to misalignment. The tool works if it becomes a shared artifact that both sides feel ownership over. It fails if it becomes “Xian’s graph that he shows in meetings” — at that point you’re still the translator, just with a nicer visual aid.</p>

<p>📝: You asked whether this YAML-based vision tool would actually help align business and tech leaders and whether the 1-week build is worth it, I answered that the tool has moderate standalone value but high value as a conversation scaffold — the real alignment comes from co-editing the shared artifact together, not from the visualization itself, and cautioned against over-building before validating that others engage with it.</p>

<h2 id="question-5">Question</h2>

<p>Yep, the real value is to have a tool that we own, and can tweak to whatever abstraction and workflow we need, with minimal learning curve, so to speak. and yes, having the same vocabulary is one core reason to have such a thing to begin with.</p>

<p>to further flesh out the idea, everything can be in a single yaml file, the directories and files are merely creating scoped namespaces for “project names”</p>

<p>And to KTS (keep things simple), we don’t need an ID, if an “ID” is needed for project, it’s just snake case of the project name.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>- name: Auth Service Rewrite
  type: tech
  size: S
  quarter: Q3
  depends_on: []

- name: Data Platform
  type: tech
  size: XL
  quarter: Q3-Q4
  depends_on: [auth]

- name: Self-Serve Onboarding
  type: business
  size: L
  quarter: Q4
  depends_on: [data_platform, auth]

- name: Mobile App v2
  type: business
  size: M
  quarter: Q1-next
  depends_on: [data_platform]
</code></pre></div></div>

<p>We can even support prefix-ID scheme, e.g. you can write <code class="language-plaintext highlighter-rouge">data</code> to refer to <code class="language-plaintext highlighter-rouge">data platform</code>, basically all IDs are going to be prefix matched. If there are multiple match, or zero match, we would throw an error, as user to correct it. The whole thing is to get out of the way. All the fields are strings, until we assign strong semantic or syntax in code. All can be incrementally evolved, e.g. compiler will tell you error, if some syntax/semantic is tightened.</p>

<p>we can also support type-ahead in nvim, when input in the depends_on list. need to resist over engineering for now. I think the key first step is to have structure up, and a reasonable export, to a spreadsheet, so that TPM can use to organize things.</p>

<p>🤖:[Claude-Code-Opus]</p>

<p>🧠: Xian is refining the design and I should engage critically with the specific design decisions. Key observations: 1) prefix-matching IDs from snake_cased names is clever but has a known trap, 2) the spreadsheet export insight is very practical — that’s how TPMs actually work, 3) he’s showing good discipline about resisting over-engineering. I should affirm what’s sharp, flag the one real risk in prefix matching, and keep it concise since he’s in flow.</p>

<p>Yes — this is clean. A few reactions:</p>

<h3 id="prefix-matching-is-the-right-ux-with-one-guardrail">Prefix matching is the right UX, with one guardrail</h3>

<p>The scheme works well <em>now</em> with a small number of initiatives. The failure mode is: you add “Data Lake” later, and suddenly <code class="language-plaintext highlighter-rouge">data</code> is ambiguous where it wasn’t before. Your “multiple match → error” rule handles this correctly — but the <strong>error surfaces at read time</strong>, meaning someone’s previously-valid file breaks when a <em>different</em> file introduces a new initiative.</p>

<p>Not a reason to avoid it. Just worth noting that when it happens, the fix is trivial (lengthen the prefix). And the error message should say clearly: <code class="language-plaintext highlighter-rouge">"data" matches both "data_platform" and "data_lake" — be more specific</code>.</p>

<h3 id="spreadsheet-export-is-the-killer-first-output">Spreadsheet export is the killer first output</h3>

<p>This is the right call. Not the graph — the <strong>CSV/spreadsheet</strong>. Because:</p>

<ul>
  <li>TPMs live in spreadsheets. That’s their IDE.</li>
  <li>A spreadsheet with <code class="language-plaintext highlighter-rouge">name | type | size | quarter | depends_on</code> is immediately useful for status tracking, resource planning, sorting, filtering — all the things TPMs do that you don’t want to build UI for.</li>
  <li>The TPM will <em>add columns</em> (status, owner, notes) — that’s fine, the YAML is the structural source of truth, the spreadsheet is the working copy.</li>
</ul>

<p>The graph is for you and leadership. The spreadsheet is for the people who actually run the projects.</p>

<h3 id="what-id-build-in-the-first-sitting">What I’d build in the first sitting</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>company.yaml  →  Python script  →  spreadsheet.csv
                                →  roadmap.svg (graphviz)
                                →  errors.txt (validation)
</code></pre></div></div>

<p>Three outputs, one command. The validation pass (dangling refs, ambiguous prefixes, circular deps) is arguably more valuable than either of the other two, because it catches the “wait, that doesn’t make sense” moments before the meeting instead of during it.</p>

<h3 id="on-all-fields-are-strings">On “all fields are strings”</h3>

<p>This is the right starting posture. The moment you introduce enums or date parsing, you’re making schema decisions that constrain how people express things. Let <code class="language-plaintext highlighter-rouge">quarter: "Q3-Q4"</code> and <code class="language-plaintext highlighter-rouge">quarter: "late Q3"</code> both be valid until you have a reason to parse them. The compiler-tightening-over-time metaphor is exactly right.</p>

<p>You’re designing a tool that <strong>earns its strictness</strong> rather than imposing it upfront. That’s why people will actually use it.</p>

<p>📝: You asked about refining the YAML schema design (prefix IDs, string-typed fields, spreadsheet export), I answered affirming prefix-match with ambiguity guardrail, that CSV export for TPMs is the killer first feature over the graph, and that “all strings until proven otherwise” is the right schema evolution strategy.</p>

<h2 id="question-6">Question</h2>]]></content><author><name></name></author><category term="nil" /><summary type="html"><![CDATA[]]></summary></entry><entry><title type="html">Conversation: agentic state management</title><link href="https://xianxu.dev/2026/03/28/agentic_state_management.html" rel="alternate" type="text/html" title="Conversation: agentic state management" /><published>2026-03-28T00:00:00+00:00</published><updated>2026-03-28T00:00:00+00:00</updated><id>https://xianxu.dev/2026/03/28/agentic_state_management</id><content type="html" xml:base="https://xianxu.dev/2026/03/28/agentic_state_management.html"><![CDATA[<style>
h1 { color: #1a365d; border-bottom: 3px solid #4299e1; padding-bottom: 0.3rem; }
h2 { color: #2b6cb0; border-bottom: 2px solid #bee3f8; padding-bottom: 0.3rem; }
h3 { color: #3182ce; border-left: 4px solid #90cdf4; padding-left: 0.8rem; }
.branch-nav { margin: 1rem 0; padding: 0.6rem 1rem; border-radius: 8px; font-size: 0.95em; }
.branch-nav a { text-decoration: none; font-weight: 500; }
.branch-nav a:hover { text-decoration: underline; }
.branch-nav.parent-link { background: linear-gradient(135deg, #fefcbf 0%, #faf089 100%); border-left: 4px solid #d69e2e; color: #744210; }
.branch-nav.parent-link a { color: #975a16; }
.branch-nav.child-link { background: linear-gradient(135deg, #c6f6d5 0%, #9ae6b4 100%); border-left: 4px solid #38a169; color: #22543d; }
.branch-nav.child-link a { color: #276749; }
.branch-inline { color: #2b6cb0; text-decoration: none; border-bottom: 1px dashed #90cdf4; font-weight: 500; }
.branch-inline:hover { color: #1a365d; border-bottom-style: solid; }
</style>

<p>This transcript is generated by <a href="https://github.com/xianxu/parley.nvim">parley.nvim</a>.</p>

<h2 id="question">Question</h2>

<p>Let’s think about state management in agentic coding</p>

<p>So in agentic coding, the state of issues to fix are often outside the source control repo. There’s tension when a ticket is too big, then there needs to be incremental building blocks, this typically manifest as some todo.md file. In <a href="https://xianxu.github.io/2026/03/08/ai-coding-workflow.html">my workflow</a>, I keep those todo.md file outside source control as they are ephemeral.</p>

<p>The issue then is I need to either finish that big ticket, can’t really make a check point and come back later.</p>

<p>It seems generally speaking, my intuition is to keep all state in a single place, the issues and the steps we think we need to fix those issues.</p>

<p>This led me to think I should update my AI workflow to be totally repo based, e.g. all tickets should be in some folder, and maybe proposed changes in subfolders. e.g.</p>

<p>issues/</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>00001/issue.md
00001/todo.md
00001/tombed -&gt; to indicate this issue are "done"
</code></pre></div></div>

<p>then I guess you can have</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>00001/dependencies which just contain issue IDs that we need to work on first before this. 
</code></pre></div></div>

<p>It also seems, instead of running different tasks in parallel, and deal with all the synchronization and conflict, we will just do a single threaded thing, one at a time with based on topological order.</p>

<p>I still need to figure out how to bring parallelism in a structured way, you can always do service separation in different repo, but that goes against my intuition.</p>

<p>thoughts?</p>

<p>🤖:[Claude-Code-Opus]</p>

<p>🧠: The user is thinking deeply about state management in agentic coding workflows. They’re describing a real problem: when using AI agents to work on code, the “meta-state” (what needs to be done, progress tracking, decomposition of tasks) often lives outside the repo (in Jira, Linear, todo.md files excluded from git, or just in conversation context). This creates problems with checkpointing, resumability, and coordination. They’re proposing a repo-based issue tracking system specifically designed for agentic workflows. Let me think about the key tensions and offer substantive thoughts. Key points to address: 1) The fundamental insight about collocating state with code, 2) The specific folder structure they propose, 3) The single-threaded vs parallel execution problem, 4) What’s missing or could be improved in their model. I should engage critically - not just validate but push back where appropriate and add insights they might not have considered. They seem quite experienced with agentic coding workflows, so I should go deep.</p>

<p>This is a really interesting design space, and I think you’re converging on something important. Let me engage with the specific tensions.</p>

<h3 id="the-core-insight-is-right-colocate-state-with-code">The core insight is right: colocate state with code</h3>

<p>The fundamental problem you’ve identified is that <strong>context is the scarcest resource in agentic coding</strong>, and scattering it across systems (Linear, Jira, local ephemeral files, conversation history) means every agent session starts with a costly reconstruction phase. Making the repo the single source of truth for both code <em>and</em> work state is sound — it gets you versioning, diffing, branchability, and resumability for free.</p>

<h3 id="on-your-proposed-structure">On your proposed structure</h3>

<p>The structure is reasonable, but I’d push on a few things:</p>

<p><strong>The <code class="language-plaintext highlighter-rouge">tombed</code> marker feels like reinventing git.</strong> If an issue is done, that’s really a property of the commit history — the issue folder existed, work was done, and the resolution is the diff. A simpler model might be: issues that exist in <code class="language-plaintext highlighter-rouge">issues/</code> are open; closing an issue means deleting the folder (the history lives in git). Or if you want to keep them around for reference, a simple <code class="language-plaintext highlighter-rouge">status</code> file or frontmatter in <code class="language-plaintext highlighter-rouge">issue.md</code> works. The tombstone file approach creates a state you have to check that’s redundant with what git already tracks.</p>

<p><strong><code class="language-plaintext highlighter-rouge">todo.md</code> is doing a lot of work.</strong> This is really the <em>plan</em> — the decomposition of the issue into steps. I’d be more explicit about its semantics. Is it a checklist the agent checks off? Is it append-only? Can the agent revise it mid-execution? The interesting design question is: <strong>should the agent be allowed to modify the plan, or only execute it?</strong> In practice, plans change as you learn things during implementation. You probably want the agent to be able to re-plan, but you want that to be an explicit, visible action (a diff you can review), not something that silently happens.</p>

<p><strong>Dependencies as issue IDs is good but incomplete.</strong> You’ll likely need to distinguish between:</p>
<ul>
  <li>Hard dependencies (must be merged first, the code won’t work otherwise)</li>
  <li>Soft dependencies (would be easier if done first, but can be worked around)</li>
  <li>Conflicts (touches the same files, can’t be parallelized safely)</li>
</ul>

<h3 id="the-parallelism-problem-is-the-real-hard-problem">The parallelism problem is the real hard problem</h3>

<p>Your instinct toward single-threaded execution is pragmatic, but I think it’s worth being more precise about <em>why</em> parallelism is hard here. It’s not just merge conflicts — it’s <strong>semantic conflicts</strong>. Two agents can modify different files but break each other’s assumptions.</p>

<p>The structured way to bring parallelism back might not be repo separation but rather <strong>scope separation within a repo</strong>:</p>

<ul>
  <li><strong>By module/package boundary</strong>: if your codebase has well-defined module boundaries, two issues touching different modules can safely parallelize. The agent needs to know the dependency graph of <em>modules</em>, not just issues.</li>
  <li><strong>By layer</strong>: an API contract change can be split into “update the interface” (serial, blocking) and “update all callers” (parallelizable per caller).</li>
  <li><strong>Speculative execution</strong>: run N agents in parallel on independent issues, but against the <em>same base commit</em>. Then merge sequentially, and if a later merge has conflicts, re-run that agent against the new base. This is essentially optimistic concurrency control.</li>
</ul>

<p>The speculative execution model is probably the most practical — you accept that some work will be thrown away, but the wall-clock speedup is worth it for issues that <em>usually</em> don’t conflict.</p>

<h3 id="what-might-be-missing">What might be missing</h3>

<p>One thing I don’t see in your model: <strong>the agent’s reasoning trace</strong>. When you come back to a half-finished issue, the todo.md tells you <em>what</em> was planned, but not <em>why</em> certain decisions were made. You might want something like <code class="language-plaintext highlighter-rouge">00001/log.md</code> that captures key decisions and discoveries during execution. This is essentially the agentic equivalent of commit messages, but at a higher level of abstraction.</p>

<p>Also worth considering: who <em>creates</em> the issue decomposition? If a human writes <code class="language-plaintext highlighter-rouge">issue.md</code> and the agent generates <code class="language-plaintext highlighter-rouge">todo.md</code>, that’s a clear contract. If the agent does both, you need a review step before execution begins.</p>

<p>📝: You asked about state management in agentic coding workflows and a repo-based issue tracking structure, I answered with validation of the colocate-state-with-code insight, critiques of tombstone files and todo.md semantics, pushed on dependency types, suggested parallelism via scope separation and speculative/optimistic concurrency, and noted missing elements like reasoning traces and plan authorship contracts.</p>

<h2 id="question-1">Question</h2>

<p>The following are response from ChatGPT, thoughts?</p>

<p>Yes. You are noticing a real systems problem, not just an AI-tooling quirk.</p>

<p>The core tension is:
	1.	the work state exists outside the repo,
	2.	but the code evolution is inside the repo,
	3.	so resumability, auditability, and decomposition drift apart.</p>

<p>That is why large tickets feel “must finish now.” The missing piece is not coding power. It is durable, local, versioned task state.</p>

<p>Your repo-based model is directionally right. I would push it a bit further.</p>

<p>The main idea</p>

<p>Treat issue execution as a state machine stored beside the code.</p>

<p>Not just:
	•	issue statement
	•	todo scratchpad
	•	done marker</p>

<p>But:
	•	problem definition
	•	current understanding
	•	decomposition
	•	dependency edges
	•	execution log
	•	acceptance criteria
	•	resulting commits / PRs / artifacts</p>

<p>That gives the agent somewhere to “stand” when it comes back later.</p>

<p>Why your current pattern breaks</p>

<p>Keeping todo.md outside source control is fine for purely personal scratch work, but for agentic coding it causes three failures:</p>

<ol>
  <li>No checkpointing</li>
</ol>

<p>You lose the intermediate reasoning and plan state that made the next step obvious.</p>

<ol>
  <li>No causal trace</li>
</ol>

<p>Later you cannot tell whether a sub-step was finished, invalidated, or superseded.</p>

<ol>
  <li>No shared substrate</li>
</ol>

<p>If multiple agents or even the same agent across time work on it, they have no canonical source of task truth.</p>

<p>So the result is exactly what you described: large tickets become “finish in one sitting or pay large re-entry cost.”</p>

<p>Your proposed structure is mostly sound</p>

<p>Something like:</p>

<p>issues/
  00001/
    issue.md
    todo.md
    dependencies
    tombstone</p>

<p>is already much better than external ephemeral files.</p>

<p>But I would change a few things.</p>

<p>Minimal fixes</p>

<ol>
  <li>Replace tombed with explicit machine-readable state</li>
</ol>

<p>Do not use file presence as the main status signal. It is too lossy.</p>

<p>Prefer:</p>

<p>issues/00001/status.json</p>

<p>For example:</p>

<p>{
  “state”: “done”,
  “updated_at”: “2026-03-28T10:00:00Z”,
  “blocked_by”: [],
  “depends_on”: [“00003”, “00007”],
  “supersedes”: [],
  “superseded_by”: null
}</p>

<p>Or if you want human-friendly first:</p>

<p>state: done
updated_at: 2026-03-28T10:00:00Z
depends_on:</p>
<ul>
  <li>00003</li>
  <li>00007
blocked_by: []</li>
</ul>

<p>Reason: agents and scripts can consume it deterministically.</p>

<ol>
  <li>Split todo.md into plan vs log</li>
</ol>

<p>One file tends to become a mess of stale intentions plus historical junk.</p>

<p>Use:</p>

<p>plan.md
log.md</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>•	plan.md: current intended next steps only
•	log.md: chronological execution notes, discoveries, failed attempts, decisions
</code></pre></div></div>

<p>This matters a lot. Otherwise stale todos poison future runs.</p>

<ol>
  <li>Put acceptance criteria in their own file</li>
</ol>

<p>Large-ticket drift often comes from the plan replacing the goal.</p>

<p>Use:</p>

<p>acceptance.md</p>

<p>That lets an agent ask: “what counts as done?” instead of hallucinating completion.</p>

<ol>
  <li>Dependencies should be structured, not freeform text</li>
</ol>

<p>A file named dependencies is okay, but make it machine-readable:</p>

<p>depends_on.txt</p>

<p>or YAML/JSON.</p>

<p>Example:</p>

<p>00003
00007
00011</p>

<p>Simple is fine.</p>

<p>A better repo shape</p>

<p>I would probably use:</p>

<p>issues/
  00001/
    issue.md          # problem statement
    acceptance.md     # definition of done
    plan.md           # current decomposition / next steps
    log.md            # execution history, discoveries, decisions
    status.yaml       # state, priority, owner, timestamps
    depends_on.txt    # issue IDs
    artifacts/        # generated diffs, notes, screenshots, benchmarks</p>

<p>Optional:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>patches/
research/
</code></pre></div></div>

<p>if the task requires exploration before code changes.</p>

<p>State model</p>

<p>You want explicit lifecycle states, maybe:
	•	new
	•	ready
	•	in_progress
	•	blocked
	•	review
	•	done
	•	abandoned</p>

<p>That is enough.</p>

<p>For example:</p>

<p>state: in_progress
priority: medium
created_at: 2026-03-28T09:10:00Z
updated_at: 2026-03-28T10:22:00Z
depends_on:</p>
<ul>
  <li>00003
blocked_by: []</li>
</ul>

<p>This gives you resumability without inventing a huge PM system.</p>

<p>Single-threaded topological execution</p>

<p>This is the part where your instinct is strong.</p>

<p>For agentic work, single-threaded by default is correct.</p>

<p>Not because parallelism is bad in principle, but because most repo-local tasks share hidden state:
	•	same codebase assumptions
	•	same branch
	•	overlapping files
	•	overlapping test surfaces
	•	evolving plan context</p>

<p>Parallelism here often creates fake speed and real merge debt.</p>

<p>Topological execution over explicit dependencies is much saner. It gives:
	•	deterministic work order
	•	fewer conflicts
	•	clearer blocked states
	•	easier restart after interruption</p>

<p>So yes, “one at a time in dependency order” is a good default operating model.</p>

<p>Where parallelism actually belongs</p>

<p>Not “multiple agents randomly touching the same repo.”</p>

<p>Parallelism works when units are genuinely separable along one of these dimensions:</p>

<ol>
  <li>File/ownership disjointness</li>
</ol>

<p>Two tasks provably touch different bounded areas.</p>

<ol>
  <li>Interface-first decomposition</li>
</ol>

<p>One task defines contract/spec/tests, another implements against that fixed interface.</p>

<ol>
  <li>Research vs execution split</li>
</ol>

<p>One agent investigates and writes structured findings, another applies code changes.</p>

<ol>
  <li>Branch-isolated speculative work</li>
</ol>

<p>Parallel branches only for alternatives, not concurrent mutation of the same plan.</p>

<p>So I would not bring parallelism in as “do many tickets at once.” I would bring it in as controlled fork-join under explicit boundaries.</p>

<p>The deeper issue: issue systems are not enough</p>

<p>You also correctly noticed that “issues” and “steps to solve issues” want to live together.</p>

<p>Classic ticket systems separate:
	•	ticket = external tracker
	•	code = repo
	•	work log = chat / notes / memory
	•	decomposition = hidden scratchpad</p>

<p>That architecture is bad for AI workflows.</p>

<p>AI agents benefit from co-locating operational state with the code. In other words, the repo becomes not just source code, but the local working memory substrate.</p>

<p>That is a meaningful shift.</p>

<p>But there is one objection</p>

<p>Knowledgeable people will object that this pollutes the repo with ephemeral planning noise.</p>

<p>That objection is valid.</p>

<p>There are really three possible answers:</p>

<p>Option A: Commit issues/ into the repo</p>

<p>Pros:
	•	full auditability
	•	perfect resumability
	•	easy collaboration</p>

<p>Cons:
	•	lots of operational noise
	•	history clutter
	•	people may dislike mixing planning with product code</p>

<p>Option B: Keep issues/ in repo but gitignored</p>

<p>Pros:
	•	local durable state
	•	no repo pollution
	•	agents still get a canonical on-disk structure</p>

<p>Cons:
	•	not shared across machines unless separately synced
	•	weaker team-level coordination
	•	state can silently diverge</p>

<p>Option C: Separate “worktree state repo”</p>

<p>A second repo colocated with the main repo, e.g.:</p>

<p>myproject/
myproject.ops/</p>

<p>Pros:
	•	structured durable state
	•	versioned independently
	•	no pollution of main source tree
	•	sharable if desired</p>

<p>Cons:
	•	another moving part
	•	slightly weaker coupling to code unless tooling is good</p>

<p>I suspect Option C is probably your best long-term architecture.</p>

<p>It preserves your intuition that state should be repo-based, but avoids contaminating the product repo with ephemeral execution debris.</p>

<p>This is the model I would recommend</p>

<p>Use a separate operational repo or sibling directory:</p>

<p>repo/
repo.work/
  issues/
    00001/
      issue.md
      acceptance.md
      plan.md
      log.md
      status.yaml
      depends_on.txt
      links.yaml</p>

<p>Where links.yaml might contain:</p>

<p>code_repo: ../repo
branch: issue/00001-fix-auth-timeout
commits:</p>
<ul>
  <li>abc1234
pr: null</li>
</ul>

<p>That gives you durable work state, resumability, and clean separation.</p>

<p>On decomposition</p>

<p>You are also converging on an important point: a “big ticket” should usually become a DAG of smaller issues, not a single issue with a long todo list.</p>

<p>So instead of:
	•	00001 with a giant todo.md</p>

<p>prefer:
	•	00001 parent issue
	•	00002 extract parser
	•	00003 add failing tests
	•	00004 refactor interface
	•	00005 migrate callers</p>

<p>with explicit dependencies.</p>

<p>Then the parent issue becomes mostly a coordination node.</p>

<p>That is much more robust than giant freeform todo documents.</p>

<p>A practical heuristic</p>

<p>Use a new issue node when a step has any of these properties:
	•	can be committed independently
	•	has its own acceptance criteria
	•	may block or unblock other work
	•	could be delegated
	•	could be abandoned without invalidating the entire ticket</p>

<p>If not, keep it as a checklist item inside plan.md.</p>

<p>That gives a sensible boundary between “sub-issue” and “todo item.”</p>

<p>How to handle resuming later</p>

<p>When an agent starts work, it should only need to read:
	•	issue.md
	•	acceptance.md
	•	status.yaml
	•	depends_on.txt
	•	plan.md
	•	tail of log.md</p>

<p>That is the minimal context pack.</p>

<p>If it needs more, it can inspect artifacts and code. This keeps restart cheap.</p>

<p>My blunt take</p>

<p>Your current instinct is right, but I would not say “all tickets should be in the source repo” as a universal rule.</p>

<p>The stronger rule is:</p>

<p>all execution state should be on disk, structured, durable, and near the codebase.</p>

<p>Whether that lives inside the main repo, in a gitignored subtree, or in a sibling ops repo is a secondary design choice.</p>

<p>For most serious agentic coding workflows, I would choose:
	•	code repo: product source
	•	work repo / ops repo: issue DAG, plans, logs, artifacts
	•	default execution: single-threaded topological order
	•	parallelism: only through explicit fork-join boundaries</p>

<p>That is coherent and scales much better than ad hoc todo.md scratchpads.</p>

<p>If you want, I can sketch a concrete file schema plus a tiny CLI design for creating issues, marking state transitions, and selecting the next runnable task.</p>

<p>🤖:[Claude-Code-Opus]</p>

<p>🧠: The user is showing me a response from another LLM and asking for my thoughts. This is a meta-discussion about comparing perspectives. I should be honest about where I agree, where I disagree, and where I think the other LLM’s response has strengths or weaknesses compared to my own earlier response. Let me think critically.</p>

<p>Key things to evaluate:</p>
<ol>
  <li>The state machine / lifecycle model (new, ready, in_progress, blocked, review, done, abandoned)</li>
  <li>The plan.md vs log.md split</li>
  <li>The Option A/B/C analysis (in-repo, gitignored, separate repo)</li>
  <li>The recommendation of Option C (separate ops repo)</li>
  <li>The DAG decomposition of big tickets into sub-issues</li>
  <li>The structured files (status.yaml, acceptance.md, etc.)</li>
  <li>The “minimal context pack” for resuming</li>
</ol>

<p>My honest assessment: this response is thorough but I think it over-engineers the problem significantly. It’s the classic LLM pattern of taking a good idea and adding layers of structure until it becomes a project management system. Let me be specific about what I think is good and what’s over-designed.</p>

<p>The separate ops repo (Option C) recommendation is interesting but I actually think it works against the user’s original insight — colocating everything. The whole point was reducing context fragmentation.</p>

<p>The status.yaml, acceptance.md, links.yaml etc. — this is creeping toward reinventing Jira on disk. The user’s original simple structure was better.</p>

<p>The DAG decomposition point is genuinely good.</p>

<p>The plan.md vs log.md split is reasonable but I should note that in practice, for agentic coding, the log is often the conversation history itself.</p>

<p>I should be direct and critical, since the user is asking for my honest assessment.</p>

<p>This is a thoughtful response, but I have some real disagreements with it. Let me be direct.</p>

<h3 id="where-its-strong">Where it’s strong</h3>

<p><strong>The plan vs. log split is genuinely good.</strong> I made a similar point about separating the reasoning trace from the plan, but this response names it more cleanly. Stale todos poisoning future runs is a real failure mode — I’ve seen agents re-attempt steps that were already tried and abandoned because the todo.md didn’t distinguish “planned” from “tried and failed.”</p>

<p><strong>The DAG decomposition heuristic is the best part.</strong> The rule of thumb — “create a sub-issue when a step can be committed independently, has its own acceptance criteria, could be delegated, or could be abandoned without invalidating the parent” — is practical and correct. That’s the kind of guidance that actually changes how you work.</p>

<p><strong>The “minimal context pack” concept is useful.</strong> Naming the exact set of files an agent reads on resume is a good design discipline.</p>

<h3 id="where-i-think-it-goes-wrong">Where I think it goes wrong</h3>

<p><strong>It over-engineers the structure.</strong> <code class="language-plaintext highlighter-rouge">status.yaml</code>, <code class="language-plaintext highlighter-rouge">acceptance.md</code>, <code class="language-plaintext highlighter-rouge">links.yaml</code>, <code class="language-plaintext highlighter-rouge">artifacts/</code>, <code class="language-plaintext highlighter-rouge">patches/</code>, <code class="language-plaintext highlighter-rouge">research/</code> — this is drifting toward building a project management system on the filesystem. The user’s original structure was 3-4 files. This response inflated it to 7+ files per issue with YAML schemas. In practice, that structure becomes a maintenance burden <em>itself</em>, and the agent spends tokens reading and updating metadata instead of writing code.</p>

<p>The pragmatic question is: <strong>who maintains all this metadata?</strong> If the agent does, you’re spending agent cycles on bookkeeping. If the human does, you’ve just recreated Jira with worse UX.</p>

<p><strong>Option C (separate ops repo) contradicts the core insight.</strong> The user’s whole point was that <em>fragmentation of state across locations is the problem</em>. A sibling <code class="language-plaintext highlighter-rouge">repo.work/</code> directory is just a more organized version of the same fragmentation. Now you need to keep two repos in sync, worry about which commit in the ops repo corresponds to which commit in the code repo, and the <code class="language-plaintext highlighter-rouge">links.yaml</code> with branch names and commit hashes is exactly the kind of cross-referencing that rots immediately.</p>

<p>I’d argue <strong>Option A (committed to the repo) is actually correct</strong> for agentic workflows, with one nuance: use a branch convention. The <code class="language-plaintext highlighter-rouge">issues/</code> folder on <code class="language-plaintext highlighter-rouge">main</code> contains open issues. Completed issues get cleaned up as part of the closing PR. The full history lives in git. This is simpler and more robust than any of the three options as presented.</p>

<p><strong>The lifecycle states are overkill.</strong> <code class="language-plaintext highlighter-rouge">new → ready → in_progress → blocked → review → done → abandoned</code> is a workflow engine. For single-threaded topological execution — which both the user and this response agree is the right default — you really only need three states: <strong>open, blocked, done</strong>. The scheduler picks the first open issue whose dependencies are all done. That’s it. Adding <code class="language-plaintext highlighter-rouge">ready</code> vs <code class="language-plaintext highlighter-rouge">in_progress</code> vs <code class="language-plaintext highlighter-rouge">review</code> implies a multi-actor workflow that doesn’t exist yet.</p>

<h3 id="the-deeper-thing-it-misses">The deeper thing it misses</h3>

<p>This response treats the issue structure as a <em>data format</em> problem and solves it with more files and schemas. But the harder problem is <strong>what the agent does when the plan is wrong.</strong></p>

<p>An agent starts executing <code class="language-plaintext highlighter-rouge">plan.md</code>, discovers the approach won’t work at step 3 of 7. What happens? The response says the agent should be able to re-plan, but doesn’t address the mechanics. I think this is where the real design work is:</p>

<ul>
  <li>Should the agent stop and create new sub-issues in the DAG?</li>
  <li>Should it modify <code class="language-plaintext highlighter-rouge">plan.md</code> in place and note the change in <code class="language-plaintext highlighter-rouge">log.md</code>?</li>
  <li>Should it escalate to the human?</li>
</ul>

<p>The <em>protocol</em> for plan revision matters more than the file layout. A simple structure with a clear revision protocol will outperform an elaborate structure with no protocol.</p>

<h3 id="my-blunt-take-on-their-blunt-take">My blunt take on their blunt take</h3>

<p>The response says <em>“all execution state should be on disk, structured, durable, and near the codebase”</em> — that’s right. But then it recommends putting it in a <em>different repo</em>, which is “near” only in the filesystem sense. The coupling between work state and code state is semantic, not spatial. Git branches already give you a mechanism to associate work-in-progress state with code changes without polluting <code class="language-plaintext highlighter-rouge">main</code>. Use that.</p>

<p>Keep the structure minimal. Start with what the user originally proposed. Add files only when you hit a concrete failure mode, not preemptively.</p>

<p>📝: You asked for my critical assessment of another LLM’s response on repo-based issue management for agentic coding. I answered that it correctly identifies plan/log split and DAG decomposition as valuable, but over-engineers the file structure toward a filesystem PM system, the separate ops repo recommendation contradicts the core insight of colocation, the lifecycle states are overkill for single-threaded execution, and the harder unsolved problem is the agent’s plan-revision protocol rather than the data format.</p>

<h2 id="question-2">Question</h2>

<p>Alright, based on what you understand of my intention, propose a simple but effective structure.</p>

<p>🤖:[Claude-Code-Opus]</p>

<p>🧠: The user wants a practical, minimal structure for repo-based agentic coding state management. Key constraints from the conversation: 1) Everything in the repo (not separate ops repo), 2) Simple - they started with issue.md, todo.md, tombstone, 3) Single-threaded topological execution as default, 4) Must support resumability (checkpointing and coming back later), 5) Must support decomposition of big tickets into sub-issues (DAG), 6) They use agentic coding workflows where an AI agent picks up and executes work. I should propose something close to their original intuition but incorporating the good ideas: plan vs log split, DAG decomposition, machine-readable dependencies. I should NOT over-engineer with status.yaml, acceptance.md, links.yaml etc. I should be opinionated and concrete. Let me think about what’s truly minimal: - issue.md: problem statement + what done looks like (no separate acceptance.md, that’s over-engineering) - plan.md: current steps, agent can revise but the diff is visible in git - log.md: discoveries, failed attempts, key decisions - deps: just issue IDs, one per line - done marker: I argued against tombstone files earlier. But actually, for simple scripting/agent consumption, a status indicator is useful. Simplest: the agent reads a single line from a <code class="language-plaintext highlighter-rouge">status</code> file. Or… just use presence in a convention. Hmm. Let me think about what the scheduler needs: it needs to know which issues are open and which are done, and dependency order. Simplest: a <code class="language-plaintext highlighter-rouge">status</code> file with one word in it: open, blocked, done. That’s 3 states as I recommended. Actually even simpler: issues/ folder contains open issues. A done issue gets its folder moved or deleted. But that loses history… no, git keeps it. But the agent needs to check deps - if a dep issue folder doesn’t exist, it’s done. That’s actually elegant but fragile. OK, I think a simple <code class="language-plaintext highlighter-rouge">status</code> file with one word is the right balance. Let me also think about the plan revision protocol since I criticized the other response for not addressing it. And the DAG decomposition - when should a step become a sub-issue. Let me draft the structure and conventions.</p>

<p>Here’s what I’d propose — biased toward minimalism, with clear conventions over configuration.</p>

<h3 id="structure">Structure</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>issues/
  0001/
    issue.md
    plan.md
    log.md
    deps
    status
</code></pre></div></div>

<p>That’s it. Five files max per issue.</p>

<h3 id="file-semantics">File semantics</h3>

<p><strong><code class="language-plaintext highlighter-rouge">issue.md</code></strong> — The problem statement <em>and</em> what done looks like. One file. Keep them together because the definition of done is meaningless without the problem context. Format:</p>

<div class="language-markdown highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gh"># Auth tokens not refreshed on 401</span>

When the API returns 401, the client crashes instead of refreshing.

<span class="gu">## Done when</span>
<span class="p">
-</span> Client intercepts 401, refreshes token, retries original request
<span class="p">-</span> Test covers the refresh-and-retry path
<span class="p">-</span> No other callers affected
</code></pre></div></div>

<p><strong><code class="language-plaintext highlighter-rouge">plan.md</code></strong> — The current execution plan. A checklist. The agent owns this file and can revise it. Revisions are visible in git diff, which is your audit trail. No separate versioning needed.</p>

<div class="language-markdown highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">-</span> [x] Identify all call sites that hit authenticated endpoints
<span class="p">-</span> [x] Extract token refresh logic from login flow
<span class="p">-</span> [ ] Add retry interceptor to HTTP client
<span class="p">-</span> [ ] Write test for 401 → refresh → retry
<span class="p">-</span> [ ] Verify no regression in existing auth tests
</code></pre></div></div>

<p><strong><code class="language-plaintext highlighter-rouge">&lt;a href="/2026/03/28/further_on_the_ideal_of_logmd.html" class="branch-inline"&gt;log.md&lt;/a&gt;</code></strong> — Append-only. The agent writes here when something unexpected happens: a failed approach, a discovery, a decision that deviates from the plan. This is what makes resumability work — the next session reads this to understand <em>why</em> the plan looks the way it does.</p>

<div class="language-markdown highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gu">## 2025-06-28 14:30</span>
Tried wrapping fetch() directly but the interceptor pattern 
doesn't work because we use a custom transport layer. 
Need to hook in at the transport level instead.
Revising plan step 3.
</code></pre></div></div>

<p><strong><code class="language-plaintext highlighter-rouge">deps</code></strong> — One issue ID per line. Empty file or absent means no dependencies.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0003
0007
</code></pre></div></div>

<p><strong><code class="language-plaintext highlighter-rouge">status</code></strong> — A single word on a single line. Three possible values:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>open
</code></pre></div></div>

<ul>
  <li><strong><code class="language-plaintext highlighter-rouge">open</code></strong> — ready to work on (or being worked on; in single-threaded mode, this distinction doesn’t matter)</li>
  <li><strong><code class="language-plaintext highlighter-rouge">blocked</code></strong> — can’t proceed, and the reason isn’t just deps (agent got stuck, needs human input, external blocker)</li>
  <li><strong><code class="language-plaintext highlighter-rouge">done</code></strong> — finished</li>
</ul>

<p>That’s the entire lifecycle. The scheduler logic is trivial:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>pick the first open issue whose deps are all done
</code></pre></div></div>

<h3 id="dag-decomposition">DAG decomposition</h3>

<p>When a step in <code class="language-plaintext highlighter-rouge">plan.md</code> gets big enough that it could be committed, tested, and reviewed independently — pull it out into its own issue. The parent issue’s plan then references the child:</p>

<div class="language-markdown highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">-</span> [x] Extract token refresh logic → issue 0008
<span class="p">-</span> [ ] Add retry interceptor → issue 0009
<span class="p">-</span> [ ] Integration test
</code></pre></div></div>

<p>The parent’s <code class="language-plaintext highlighter-rouge">deps</code> doesn’t list its children (that would be circular). Instead, the parent stays <code class="language-plaintext highlighter-rouge">open</code> and its plan tracks children by reference. The parent is done when its plan is fully checked off.</p>

<p>Children <em>can</em> depend on each other via their own <code class="language-plaintext highlighter-rouge">deps</code> files. That’s where your topological ordering comes from.</p>

<h3 id="plan-revision-protocol">Plan revision protocol</h3>

<p>This is the part that matters more than file layout. Three rules:</p>

<ol>
  <li>
    <p><strong>The agent can revise <code class="language-plaintext highlighter-rouge">plan.md</code> freely</strong>, but must write an entry in <code class="language-plaintext highlighter-rouge">log.md</code> explaining why before making the change. This makes plan drift visible and reasoned.</p>
  </li>
  <li>
    <p><strong>If the agent discovers the issue itself is wrong</strong> (wrong problem statement, wrong scope), it sets status to <code class="language-plaintext highlighter-rouge">blocked</code> and writes in the log what needs human attention. It does not silently redefine the issue.</p>
  </li>
  <li>
    <p><strong>If a plan step turns out to need more than ~2-3 commits</strong>, the agent decomposes it into a sub-issue rather than inflating the current plan. This keeps each issue’s plan short enough to hold in context.</p>
  </li>
</ol>

<h3 id="committing-strategy">Committing strategy</h3>

<p>Commit <code class="language-plaintext highlighter-rouge">issues/</code> to the repo, on whatever branch you’re working on. When an issue is done and merged, you can either:</p>

<ul>
  <li>Delete the folder in the merge commit (clean main, history in git)</li>
  <li>Leave it (simple, searchable, slightly noisy)</li>
</ul>

<p>I’d start with deleting on merge. If you miss having the history around, switch to leaving them.</p>

<h3 id="a-minimal-scheduler">A minimal scheduler</h3>

<p>This is almost trivially implementable:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">#!/bin/bash</span>
<span class="c"># next-issue: prints the next issue to work on</span>

<span class="k">for </span><span class="nb">dir </span><span class="k">in </span>issues/<span class="k">*</span>/<span class="p">;</span> <span class="k">do
  </span><span class="nb">id</span><span class="o">=</span><span class="si">$(</span><span class="nb">basename</span> <span class="s2">"</span><span class="nv">$dir</span><span class="s2">"</span><span class="si">)</span>
  <span class="nv">status</span><span class="o">=</span><span class="si">$(</span><span class="nb">cat</span> <span class="s2">"</span><span class="nv">$dir</span><span class="s2">/status"</span> 2&gt;/dev/null <span class="o">||</span> <span class="nb">echo</span> <span class="s2">"missing"</span><span class="si">)</span>
  
  <span class="k">if</span> <span class="o">[</span> <span class="s2">"</span><span class="nv">$status</span><span class="s2">"</span> <span class="o">!=</span> <span class="s2">"open"</span> <span class="o">]</span><span class="p">;</span> <span class="k">then
    continue
  fi</span>

  <span class="c"># Check all deps are done</span>
  <span class="nv">all_done</span><span class="o">=</span><span class="nb">true
  </span><span class="k">if</span> <span class="o">[</span> <span class="nt">-f</span> <span class="s2">"</span><span class="nv">$dir</span><span class="s2">/deps"</span> <span class="o">]</span><span class="p">;</span> <span class="k">then
    while </span><span class="nv">IFS</span><span class="o">=</span> <span class="nb">read</span> <span class="nt">-r</span> dep<span class="p">;</span> <span class="k">do
      </span><span class="nv">dep</span><span class="o">=</span><span class="si">$(</span><span class="nb">echo</span> <span class="s2">"</span><span class="nv">$dep</span><span class="s2">"</span> | <span class="nb">tr</span> <span class="nt">-d</span> <span class="s1">'[:space:]'</span><span class="si">)</span>
      <span class="o">[</span> <span class="nt">-z</span> <span class="s2">"</span><span class="nv">$dep</span><span class="s2">"</span> <span class="o">]</span> <span class="o">&amp;&amp;</span> <span class="k">continue
      </span><span class="nv">dep_status</span><span class="o">=</span><span class="si">$(</span><span class="nb">cat</span> <span class="s2">"issues/</span><span class="nv">$dep</span><span class="s2">/status"</span> 2&gt;/dev/null <span class="o">||</span> <span class="nb">echo</span> <span class="s2">"missing"</span><span class="si">)</span>
      <span class="k">if</span> <span class="o">[</span> <span class="s2">"</span><span class="nv">$dep_status</span><span class="s2">"</span> <span class="o">!=</span> <span class="s2">"done"</span> <span class="o">]</span><span class="p">;</span> <span class="k">then
        </span><span class="nv">all_done</span><span class="o">=</span><span class="nb">false
        break
      </span><span class="k">fi
    done</span> &lt; <span class="s2">"</span><span class="nv">$dir</span><span class="s2">/deps"</span>
  <span class="k">fi

  if</span> <span class="o">[</span> <span class="s2">"</span><span class="nv">$all_done</span><span class="s2">"</span> <span class="o">=</span> <span class="nb">true</span> <span class="o">]</span><span class="p">;</span> <span class="k">then
    </span><span class="nb">echo</span> <span class="s2">"</span><span class="nv">$id</span><span class="s2">"</span>
    <span class="nb">exit </span>0
  <span class="k">fi
done

</span><span class="nb">echo</span> <span class="s2">"No runnable issues found."</span> <span class="o">&gt;</span>&amp;2
<span class="nb">exit </span>1
</code></pre></div></div>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Usage</span>
<span class="nb">chmod</span> +x next-issue
./next-issue
<span class="c"># Output: 0001</span>
</code></pre></div></div>

<h3 id="what-i-deliberately-left-out">What I deliberately left out</h3>

<ul>
  <li><strong>No YAML/JSON metadata.</strong> Plain text files are easier for both humans and agents to read and write. You don’t need structured data until you’re building dashboards.</li>
  <li><strong>No separate acceptance criteria file.</strong> It’s three lines in <code class="language-plaintext highlighter-rouge">issue.md</code>. Another file is another thing to keep in sync.</li>
  <li><strong>No artifacts folder.</strong> If the agent produces research or patches, they’re either commits or they go in the log.</li>
  <li><strong>No priority field.</strong> In single-threaded topological execution, priority is determined by dependency order and the sequence you create issues in. If you need explicit priority later, add a single number to the first line of <code class="language-plaintext highlighter-rouge">status</code> (e.g., <code class="language-plaintext highlighter-rouge">open 1</code>).</li>
</ul>

<p>Start here. Add structure only when you hit a specific failure mode that forces it.</p>

<p>📝: You asked me to propose a concrete minimal structure for repo-based agentic coding state management, I answered with a 5-file-per-issue structure (issue.md, plan.md, log.md, deps, status), three lifecycle states (open/blocked/done), a plan revision protocol (log before revise, block if issue is wrong, decompose if step is too big), DAG decomposition convention, and a trivial bash scheduler — deliberately excluding YAML metadata, separate acceptance files, artifacts folders, and complex lifecycle states.</p>

<h2 id="question-3">Question</h2>

<p>If I move towards that (e.g. repo based issue management), then I’ll need some base level tool to help manage them.</p>

<p>I already have a tool called parley.nvim. I gues I’ll just extend that, a subsystem there would be an issue management system based on a folder of .md and .yaml files. Given how much you know about <a href="https://github.com/xianxu/parley.nvim">parley</a>, read the specs folder for parley, propose an issue management subsystem, following the same “Keep It Simple” philosophy: issue management system is just a folder of markdown files.</p>

<p>🤖:[Claude-Code-Opus]</p>

<p>🧠: The user wants me to read the specs folder of their parley.nvim plugin to understand its architecture and philosophy, then propose an issue management subsystem that fits naturally within it. Let me fetch the repo and specs first.Let me read the specs index and a few key spec files to understand the architecture and philosophy.🧠: I can see the parley.nvim structure now. Key patterns I observe: 1) Everything is markdown/plain files (chats are markdown with emoji prefixes like ## Question</p>

<p>and 🤖:), 2) There’s already a notes system with year/month/week organization, 3) Specs are organized in a clean hierarchy, 4) The philosophy is “plain markdown files you can edit anytime” and “keyboard-first”, 5) There’s already a chat_dir and notes_dir concept, 6) There’s a picker system (telescope-based) for finding/navigating, 7) The plugin already has concepts of lifecycle, templates, and finder. The user wants me to propose an issue management subsystem that fits naturally within parley’s existing patterns. I should propose something that mirrors how chats and notes already work — markdown files, simple directory structure, telescope pickers, minimal metadata. The key insight is that parley already treats chats as “a folder of markdown files with a header” — issues should follow the exact same pattern. I should leverage existing parley infrastructure: pickers, file references (@@), the template system, and the notes-like directory structure. Let me propose something concrete.</p>

<p>Based on what I can see from the <a href="https://github.com/xianxu/parley.nvim/blob/main/specs/index.md">specs index</a> and the README, here’s what I’d propose. The key design principle: <strong>issues should work exactly like chats and notes already work in parley</strong> — markdown files with frontmatter, navigable via pickers, editable as normal buffers.</p>

<h3 id="design-issues-as-markdown-with-yaml-frontmatter">Design: Issues as Markdown with YAML Frontmatter</h3>

<p>Parley already has the pattern: chats are markdown files with a header block. Notes have templates and a directory structure. Issues should be the same thing — no new abstractions, just a new <code class="language-plaintext highlighter-rouge">issues_dir</code>.</p>

<h3 id="directory-layout">Directory layout</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{issues_dir}/
  0001-auth-token-refresh.md
  0002-extract-parser.md
  0003-add-retry-tests.md
</code></pre></div></div>

<p>That’s it. <strong>One file per issue.</strong> No subdirectories per issue. The plan, log, status, and deps all live inside the single markdown file as sections. This mirrors how a parley chat is one file with all state in it — the header plus the transcript.</p>

<h3 id="issue-file-format">Issue file format</h3>

<p>Following parley’s convention of frontmatter + structured content:</p>

<div class="language-markdown highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nn">---</span>
<span class="na">status</span><span class="pi">:</span> <span class="s">open</span>
<span class="na">deps</span><span class="pi">:</span> <span class="pi">[</span><span class="nv">0002</span><span class="pi">]</span>
<span class="na">created</span><span class="pi">:</span> <span class="s">2026-03-28</span>
<span class="na">updated</span><span class="pi">:</span> <span class="s">2026-03-28</span>
<span class="nn">---</span>

<span class="gh"># Auth tokens not refreshed on 401</span>

When the API returns 401, the client crashes instead of refreshing.

<span class="gu">## Done when</span>
<span class="p">
-</span> Client intercepts 401, refreshes token, retries original request
<span class="p">-</span> Test covers the refresh-and-retry path

<span class="gu">## Plan</span>
<span class="p">
-</span> [x] Identify all call sites that hit authenticated endpoints
<span class="p">-</span> [x] Extract token refresh logic from login flow → issue 0002
<span class="p">-</span> [ ] Add retry interceptor to HTTP client
<span class="p">-</span> [ ] Write test for 401 → refresh → retry

<span class="gu">## Log</span>

<span class="gu">### 2026-03-28 14:30</span>
Tried wrapping fetch() directly but interceptor pattern doesn't
work with custom transport layer. Revising plan step 3.

<span class="gu">### 2026-03-28 15:10</span>
Transport-level hook works. Proceeding with retry logic.
</code></pre></div></div>

<h3 id="why-single-file-not-a-folder-per-issue">Why single file, not a folder-per-issue</h3>

<p>Three reasons that align with parley’s philosophy:</p>

<ol>
  <li>
    <p><strong>Parley chats are single files.</strong> A chat’s entire state — header, system prompt reference, all turns — is one <code class="language-plaintext highlighter-rouge">.md</code>. Issues should follow the same pattern. You already have muscle memory for navigating a structured markdown file in nvim.</p>
  </li>
  <li>
    <p><strong>Telescope picker works on files, not directories.</strong> The existing <a href="https://github.com/xianxu/parley.nvim/blob/main/specs/ui/pickers.md">picker infrastructure</a> can list issues the same way it lists chats or notes — <code class="language-plaintext highlighter-rouge">rg</code> or <code class="language-plaintext highlighter-rouge">fd</code> over a directory, show filename + maybe grep the status from frontmatter. No new picker abstractions needed.</p>
  </li>
  <li>
    <p><strong><code class="language-plaintext highlighter-rouge">@@</code> file references already work.</strong> In a parley chat, you can reference <code class="language-plaintext highlighter-rouge">@@issues/0001-auth-token-refresh.md</code> to pull an issue into context when discussing it with an LLM. One file = one reference. If issues were folders, you’d need to reference multiple files or invent a new aggregation syntax.</p>
  </li>
</ol>

<h3 id="minimal-subsystem-spec">Minimal subsystem spec</h3>

<p>Following the pattern of existing parley specs, here’s what the implementation would need:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>specs/issues/structure.md    -- format, frontmatter schema, sections
specs/issues/lifecycle.md    -- status transitions, scheduler logic
specs/issues/templates.md    -- template for new issues
</code></pre></div></div>

<h3 id="config-additions">Config additions</h3>

<div class="language-lua highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">require</span><span class="p">(</span><span class="s2">"parley"</span><span class="p">).</span><span class="n">setup</span><span class="p">({</span>
  <span class="c1">-- ...existing config...</span>
  <span class="n">issues_dir</span> <span class="o">=</span> <span class="n">vim</span><span class="p">.</span><span class="n">fn</span><span class="p">.</span><span class="n">getcwd</span><span class="p">()</span> <span class="o">..</span> <span class="s2">"/issues"</span><span class="p">,</span>  <span class="c1">-- repo-local by default</span>
  <span class="n">issue_template</span> <span class="o">=</span> <span class="s2">"default"</span><span class="p">,</span>
<span class="p">})</span>
</code></pre></div></div>

<p>Note: unlike <code class="language-plaintext highlighter-rouge">chat_dir</code> and <code class="language-plaintext highlighter-rouge">notes_dir</code> which are global user directories, <code class="language-plaintext highlighter-rouge">issues_dir</code> defaults to <strong>cwd-relative</strong> because issues belong to the repo, not the user.</p>

<h3 id="commands-and-keybindings">Commands and keybindings</h3>

<p>Following parley’s <code class="language-plaintext highlighter-rouge">&lt;C-g&gt;</code> prefix convention:</p>

<table>
  <thead>
    <tr>
      <th>Command</th>
      <th>Binding</th>
      <th>Description</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">:ParleyIssueNew</code></td>
      <td><code class="language-plaintext highlighter-rouge">&lt;C-i&gt;c</code></td>
      <td>Create issue from template, auto-increment ID</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">:ParleyIssueFinder</code></td>
      <td><code class="language-plaintext highlighter-rouge">&lt;C-i&gt;f</code></td>
      <td>Telescope picker over issues, preview, open</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">:ParleyIssueNext</code></td>
      <td><code class="language-plaintext highlighter-rouge">&lt;C-i&gt;x</code></td>
      <td>Open the next runnable issue (topo sort)</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">:ParleyIssueStatus</code></td>
      <td><code class="language-plaintext highlighter-rouge">&lt;C-i&gt;s</code></td>
      <td>Cycle status in frontmatter (open → blocked → done)</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">:ParleyIssueDecompose</code></td>
      <td><code class="language-plaintext highlighter-rouge">&lt;C-i&gt;d</code></td>
      <td>Create child issue, insert dep link in current plan</td>
    </tr>
  </tbody>
</table>

<p>The <code class="language-plaintext highlighter-rouge">IssueFinder</code> should show status in the picker line, e.g.:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[open]    0001-auth-token-refresh
[open]    0003-add-retry-tests
[blocked] 0005-migrate-callers
[done]    0002-extract-parser
</code></pre></div></div>

<p>Sort order should be by default topological depending on dependency, otherwise old to new.</p>

<p>With filtering — default to showing <code class="language-plaintext highlighter-rouge">open</code> + <code class="language-plaintext highlighter-rouge">blocked</code>, toggle to show all.</p>

<h3 id="scheduler-logic-the-issuenext-command">Scheduler logic (the <code class="language-plaintext highlighter-rouge">IssueNext</code> command)</h3>

<div class="language-lua highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- Pseudocode for next runnable issue</span>
<span class="k">function</span> <span class="nc">M</span><span class="p">.</span><span class="nf">next_issue</span><span class="p">(</span><span class="n">issues_dir</span><span class="p">)</span>
  <span class="kd">local</span> <span class="n">issues</span> <span class="o">=</span> <span class="n">parse_all_issues</span><span class="p">(</span><span class="n">issues_dir</span><span class="p">)</span>
  <span class="kd">local</span> <span class="n">done_ids</span> <span class="o">=</span> <span class="n">set</span> <span class="n">of</span> <span class="n">issues</span> <span class="n">where</span> <span class="n">status</span> <span class="o">==</span> <span class="s2">"done"</span>

  <span class="k">for</span> <span class="n">_</span><span class="p">,</span> <span class="n">issue</span> <span class="k">in</span> <span class="nb">ipairs</span><span class="p">(</span><span class="n">sorted_by_id</span><span class="p">(</span><span class="n">issues</span><span class="p">))</span> <span class="k">do</span>
    <span class="k">if</span> <span class="n">issue</span><span class="p">.</span><span class="n">status</span> <span class="o">==</span> <span class="s2">"open"</span> <span class="k">then</span>
      <span class="kd">local</span> <span class="n">all_deps_done</span> <span class="o">=</span> <span class="kc">true</span>
      <span class="k">for</span> <span class="n">_</span><span class="p">,</span> <span class="n">dep</span> <span class="k">in</span> <span class="nb">ipairs</span><span class="p">(</span><span class="n">issue</span><span class="p">.</span><span class="n">deps</span><span class="p">)</span> <span class="k">do</span>
        <span class="k">if</span> <span class="ow">not</span> <span class="n">done_ids</span><span class="p">[</span><span class="n">dep</span><span class="p">]</span> <span class="k">then</span>
          <span class="n">all_deps_done</span> <span class="o">=</span> <span class="kc">false</span>
          <span class="k">break</span>
        <span class="k">end</span>
      <span class="k">end</span>
      <span class="k">if</span> <span class="n">all_deps_done</span> <span class="k">then</span>
        <span class="k">return</span> <span class="n">issue</span>
      <span class="k">end</span>
    <span class="k">end</span>
  <span class="k">end</span>
  <span class="k">return</span> <span class="kc">nil</span>
<span class="k">end</span>
</code></pre></div></div>

<p>Parsing frontmatter is trivial — read the file, extract the YAML block between <code class="language-plaintext highlighter-rouge">---</code> markers, pull <code class="language-plaintext highlighter-rouge">status</code> and <code class="language-plaintext highlighter-rouge">deps</code>. Parley already parses chat headers, so this pattern exists in the codebase.</p>

<h3 id="the-issuedecompose-command">The <code class="language-plaintext highlighter-rouge">IssueDecompose</code> command</h3>

<p>This is the one non-obvious command. When your cursor is on a plan line like:</p>

<div class="language-markdown highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">-</span> [ ] Extract token refresh logic from login flow
</code></pre></div></div>

<p>Running <code class="language-plaintext highlighter-rouge">:ParleyIssueDecompose</code> should:</p>

<ol>
  <li>Create a new issue file <code class="language-plaintext highlighter-rouge">0002-extract-token-refresh.md</code> (prompts for slug)</li>
  <li>Copy the line as the issue title</li>
  <li>Add the new ID to the parent’s <code class="language-plaintext highlighter-rouge">deps</code> in frontmatter</li>
  <li>Replace the plan line with: <code class="language-plaintext highlighter-rouge">- [ ] Extract token refresh logic → 0002</code></li>
  <li>Open the new issue for editing</li>
</ol>

<p>This makes DAG decomposition a single keystroke from within the plan.</p>

<h3 id="template">Template</h3>

<div class="language-markdown highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nn">---</span>
<span class="na">status</span><span class="pi">:</span> <span class="s">open</span>
<span class="na">deps</span><span class="pi">:</span> <span class="pi">[]</span>
<span class="na">created</span><span class="pi">:</span> 
<span class="na">updated</span><span class="pi">:</span> 
<span class="nn">---</span>

<span class="gh"># </span><span class="sb">



</span><span class="gu">## Done when</span>
<span class="p">
-</span>

<span class="gu">## Plan</span>
<span class="p">
-</span>

<span class="gu">## Log</span>
</code></pre></div></div>

<p>Follows the same template variable pattern as <a href="https://github.com/xianxu/parley.nvim/blob/main/specs/notes/templates.md">note templates</a>.</p>

<h3 id="what-i-deliberately-kept-aligned-with-parleys-existing-patterns">What I deliberately kept aligned with parley’s existing patterns</h3>

<ul>
  <li><strong>Frontmatter for metadata</strong> — chats already use this</li>
  <li><strong>Emoji-free</strong> — unlike chat turns (💬/🤖), issues don’t need visual turn markers; markdown headings suffice</li>
  <li><strong>Picker-first navigation</strong> — same as chat finder and note finder</li>
  <li><strong><code class="language-plaintext highlighter-rouge">@@</code> compatible</strong> — issues are just <code class="language-plaintext highlighter-rouge">.md</code> files, so existing file reference syntax works in chats</li>
  <li><strong>One file = one thing</strong> — same as one chat file = one conversation</li>
</ul>

<h3 id="what-this-does-not-include-deliberately">What this does NOT include (deliberately)</h3>

<ul>
  <li>No priority field (ID order + deps is enough)</li>
  <li>No assignee/owner (single-player workflow)</li>
  <li>No tags/labels (use grep)</li>
  <li>No separate <code class="language-plaintext highlighter-rouge">status</code> file, <code class="language-plaintext highlighter-rouge">plan.md</code>, <code class="language-plaintext highlighter-rouge">log.md</code> per issue (one file is simpler)</li>
  <li>No JSON/YAML status files (frontmatter is enough)</li>
</ul>

<p>The previous proposal I gave you with 5 files per issue was designed for a generic CLI tool. For parley specifically, single-file-per-issue is better because it matches the existing “one markdown file is one unit of work” pattern that pervades the plugin.</p>

<p>📝: You asked about proposing an issue management subsystem for parley.nvim that fits its philosophy, I answered with a single-file-per-issue design using YAML frontmatter + markdown sections (status/plan/log in one file), telescope picker integration, 5 commands (New/Finder/Next/Status/Decompose), repo-local issues_dir, leveraging existing parley patterns like chat files, @@ references, templates, and pickers.</p>

<h2 id="question-4">Question</h2>]]></content><author><name></name></author><category term="tech" /><category term="nil" /><category term="ai" /><summary type="html"><![CDATA[]]></summary></entry><entry><title type="html">AI Workflow Evolved, One Year Jouney</title><link href="https://xianxu.dev/2026/03/28/ai-coding-workflow2.html" rel="alternate" type="text/html" title="AI Workflow Evolved, One Year Jouney" /><published>2026-03-28T00:00:00+00:00</published><updated>2026-03-28T00:00:00+00:00</updated><id>https://xianxu.dev/2026/03/28/ai-coding-workflow2</id><content type="html" xml:base="https://xianxu.dev/2026/03/28/ai-coding-workflow2.html"><![CDATA[<p>It’s been about a year since I’ve been vibe coding on <a href="https://github.com/xianxu/parley.nvim">Parley.nvim</a>. It has also been three weeks since my last post on <a href="/2026/03/08/ai-coding-workflow.html">AI coding workflow</a>. So many things have changed, both my understanding of such workflows and the world around it. For example, Nvidia open sourced <a href="https://github.com/NVIDIA/OpenShell">OpenShell</a>. Let’s take a look at my evolution, I think this might be useful for people new to the space.</p>

<ol>
  <li>
    <p>Archaic time, a year ago, where I just used coding agent like <code class="language-plaintext highlighter-rouge">claude</code> directly, sending prompt directly inside its cli. It’s a fun but flaky workflow. You can read it <a href="/2025/04/29/reflection-on-ai-coding.html">here</a> and <a href="/2025/05/14/ai-coding-take-2.html">here</a>.</p>
  </li>
  <li>
    <p>Using workflow management like <a href="https://www.conductor.build/">Conductor</a>, or <a href="https://vibekanban.com/">Vibe Kanban</a>, that’s when I was when I wrote <a href="/2026/03/08/ai-coding-workflow.html">this</a>. OpenAI also open sourced <a href="https://github.com/openai/symphony">Symphony</a> which pushed boundary of such “workflow management” software.</p>
  </li>
  <li>
    <p>Made my own little workflow, both because of my realization that there isn’t that much those “workflows” provide, and my desire to learn what’s under the hood. This resulted in a simple <code class="language-plaintext highlighter-rouge">Makefile</code> based flow, where I use <code class="language-plaintext highlighter-rouge">make issue 42</code> to download github issue to a well-known-to-AGENTS.md location, and more diligently shaping the workflow in the AGENTS.md. I had two modes, either working on a branch in <code class="language-plaintext highlighter-rouge">worktree</code>, or working directly on <code class="language-plaintext highlighter-rouge">main</code>. After all, I’m a single guy tweaking Parley. It’s just a <a href="https://github.com/xianxu/parley.nvim/blob/01576b1cfed48477ad3f15e0a3ce418238096b82/Makefile.workflow">Makefile</a>.</p>
  </li>
  <li>
    <p>I kept tinkering this workflow, adding some archive function, to back up <code class="language-plaintext highlighter-rouge">issue.md</code> and <code class="language-plaintext highlighter-rouge">todo.md</code> after each feature was done to <code class="language-plaintext highlighter-rouge">history/</code>. This made it easier to continue tweaking key files, such as <code class="language-plaintext highlighter-rouge">AGENTS.md</code>, <code class="language-plaintext highlighter-rouge">lessons.md</code> and <code class="language-plaintext highlighter-rouge">specs</code>. Between things, accumulated 400+ commits in <a href="https://github.com/xianxu/parley.nvim">Parley.nvim</a>.</p>
  </li>
  <li>
    <p><a href="https://github.com/NVIDIA/OpenShell">OpenShell</a> was open sourced, and you can take a look at their workflow, which was expressed as a set of <a href="https://github.com/NVIDIA/OpenShell/tree/main/.agents/skills">agent skills</a>. Roughly, you can describe what I did in my <code class="language-plaintext highlighter-rouge">Makefile</code> using plain English and get agent to do it. I’m not very sure what’s the point to make such very deterministic workflow agentic; but either way, my <code class="language-plaintext highlighter-rouge">Makefile</code> was totally written by agents, so in a sense, my <code class="language-plaintext highlighter-rouge">Makefile</code> based workflow is just a compiled version of agentic <code class="language-plaintext highlighter-rouge">skills</code>.</p>
  </li>
  <li>
    <p>As I worked with my workflow in <code class="language-plaintext highlighter-rouge">3.</code>, one problem I notice is that the issues, the code, the plan to implement issues in the codebase, and incremental state of that implementation scattered in several different places, and have different life cycles. In <a href="https://xianxu.github.io/2026/03/28/agentic_state_management.html">Conversation: agentic state management</a> I wondered why not just use repo to manage issue tracking. This resulted in me just putting issues in a structured way in <code class="language-plaintext highlighter-rouge">issues/</code> in the repo itself, and the creation of simple <code class="language-plaintext highlighter-rouge">Parley</code> function to <a href="https://github.com/xianxu/parley.nvim/blob/main/specs/issues/issue-management.md">manage those issues</a>. This made my “single developer” workflow supercharged. This will also break down when there are more people operated in a repo, need to figure out how to evolve in that type of world.</p>
  </li>
  <li>
    <p>Then I integrated with <a href="https://github.com/NVIDIA/OpenShell">OpenShell</a>, basically replicating what I did in <code class="language-plaintext highlighter-rouge">6.</code> within an <code class="language-plaintext highlighter-rouge">OpenShell</code> Docker container for isolation. I just “chatted” about <a href="https://github.com/xianxu/parley.nvim/blob/main/design/2026-03-28.18-37-34.270.md">what I wanted</a> in <code class="language-plaintext highlighter-rouge">Parley</code>, committed it to repo, and asked Claude to <a href="https://github.com/xianxu/parley.nvim/blob/main/history/000010-introducing-openshell.md">make a plan</a> based on that chat. Some minor back and forth later, now I have a hermetic-enough sandboxed agentic environment. Agentic coding made setting this up really a bliss, mostly I did was: “go create an OpenShell environment based on my Mac setup in <code class="language-plaintext highlighter-rouge">~/.config/nvim</code>, <code class="language-plaintext highlighter-rouge">~/.zshrc</code> and <code class="language-plaintext highlighter-rouge">~/settings/brewfile</code>”. Well, almost, there’s a big gotcha during this process, I’ll talk about that in a separate post.</p>
  </li>
  <li>
    <p>Agents will eventually forget about instructions, based on intuition described in <code class="language-plaintext highlighter-rouge">5.</code>, I decided to start <code class="language-plaintext highlighter-rouge">lifting</code> what’s previously described as principles in <code class="language-plaintext highlighter-rouge">AGENTS.md</code>, into actual code. This resulted in <a href="https://github.com/xianxu/parley.nvim/blob/1fb21e41b1fc22ff986e2f8a28627d28084136f8/scripts/pre-merge-checks.sh#L122">pre-merge-checks.sh</a>, which essentially are “subagents” invoked deterministically. Further, I put in Claude hooks, that are automatically triggered when size of the diff cross some threshold. Really, just like how human would evolve code. <code class="language-plaintext highlighter-rouge">Claude</code> happily piled on a lot of <code class="language-plaintext highlighter-rouge">sh</code> script, I have the distinctive feeling at some point we’d need to rewrite this portion properly.</p>
  </li>
</ol>

<p>All of those tinkering is to get a real sense of the shifting ground of software development. It is very evident that agentic coding allow you to customize your environment, just like I mentioned in <a href="/2026/03/19/the-rise-of-personal-software.html">Peronalized Software</a>. Your development environment is also ripe of opportunities for customization to your hearts content, just like what I did during the development of <code class="language-plaintext highlighter-rouge">Parley</code>.</p>

<p>What’s next? I firmly believe that the future is <strong>human setting up the environment, allowing full automation of coding</strong>. <code class="language-plaintext highlighter-rouge">Parley</code> is not able to achieve this, as I’m not yet familiar with UI test automation, nor with Neovim’s UI testing frameworks. I don’t believe human can be in the loop to review most of what agents write, it will be such a big bottleneck, and no fun job. Then the million dollar question, is how would we be comfortable agent is doing the right thing, not exposing your private key, not doing <span>$O(n^2)$</span> algorithm when <span>$O(n)$</span> algorithm is available?</p>

<p>I don’t know, but intend to tinker and find out.</p>]]></content><author><name></name></author><category term="tech" /><category term="ai" /><summary type="html"><![CDATA[It’s been about a year since I’ve been vibe coding on Parley.nvim. It has also been three weeks since my last post on AI coding workflow. So many things have changed, both my understanding of such workflows and the world around it. For example, Nvidia open sourced OpenShell. Let’s take a look at my evolution, I think this might be useful for people new to the space.]]></summary></entry><entry><title type="html">Conversation: further on the ideal of log.md</title><link href="https://xianxu.dev/2026/03/28/further_on_the_ideal_of_logmd.html" rel="alternate" type="text/html" title="Conversation: further on the ideal of log.md" /><published>2026-03-28T00:00:00+00:00</published><updated>2026-03-28T00:00:00+00:00</updated><id>https://xianxu.dev/2026/03/28/further_on_the_ideal_of_logmd</id><content type="html" xml:base="https://xianxu.dev/2026/03/28/further_on_the_ideal_of_logmd.html"><![CDATA[<style>
h1 { color: #1a365d; border-bottom: 3px solid #4299e1; padding-bottom: 0.3rem; }
h2 { color: #2b6cb0; border-bottom: 2px solid #bee3f8; padding-bottom: 0.3rem; }
h3 { color: #3182ce; border-left: 4px solid #90cdf4; padding-left: 0.8rem; }
.branch-nav { margin: 1rem 0; padding: 0.6rem 1rem; border-radius: 8px; font-size: 0.95em; }
.branch-nav a { text-decoration: none; font-weight: 500; }
.branch-nav a:hover { text-decoration: underline; }
.branch-nav.parent-link { background: linear-gradient(135deg, #fefcbf 0%, #faf089 100%); border-left: 4px solid #d69e2e; color: #744210; }
.branch-nav.parent-link a { color: #975a16; }
.branch-nav.child-link { background: linear-gradient(135deg, #c6f6d5 0%, #9ae6b4 100%); border-left: 4px solid #38a169; color: #22543d; }
.branch-nav.child-link a { color: #276749; }
.branch-inline { color: #2b6cb0; text-decoration: none; border-bottom: 1px dashed #90cdf4; font-weight: 500; }
.branch-inline:hover { color: #1a365d; border-bottom-style: solid; }
</style>

<p>This transcript is generated by <a href="https://github.com/xianxu/parley.nvim">parley.nvim</a>.</p>

<div class="branch-nav parent-link"><a href="/2026/03/28/agentic_state_management.html">← Agentic state management</a></div>

<h2 id="question">Question</h2>

<p>The log.md is trick as it depends on how base model behave. doesn’t seems like something easy for me to plugin. right?</p>

<p>🤖:[Claude-Code-Opus]</p>

<p>🧠: The user is pointing out that log.md depends on the agent’s behavior - the agent needs to actively write to it during execution. This isn’t something the user can easily enforce or plug in from the outside. It’s a valid concern. The log.md requires the agent to have discipline about appending to it, which depends on the base model’s system prompt compliance, tool use patterns, etc. The user can’t just set up a file structure and expect the agent to maintain a log. This is different from issue.md (written by human), plan.md (can be written at start), deps (static), status (simple state change). Log.md requires continuous, disciplined writes during execution. Let me think about practical alternatives - maybe log.md should just be git history, or maybe the orchestration layer handles it, or maybe it’s just not worth the complexity.</p>

<p>You’re right, and this is an important practical point.</p>

<h3 id="the-problem-with-logmd">The problem with log.md</h3>

<p><code class="language-plaintext highlighter-rouge">log.md</code> requires the agent to <strong>voluntarily and consistently</strong> write structured notes mid-execution. That means:</p>

<ul>
  <li>You need the system prompt to instruct it</li>
  <li>You need the model to actually comply throughout a long session</li>
  <li>Different models behave differently — some will dutifully append, others will forget after a few steps</li>
  <li>If the agent is using tool calls (file edits, shell), adding a “write to log” step between every meaningful action is fragile</li>
</ul>

<p>It’s the <strong>least controllable</strong> file in the structure. Everything else is either human-authored or a simple state marker.</p>

<h3 id="what-actually-works-instead">What actually works instead</h3>

<p>The durable trace you want from <code class="language-plaintext highlighter-rouge">log.md</code> already exists in two places:</p>

<ol>
  <li><strong>Git history</strong> — if the agent commits incrementally (which it should), the commit messages <em>are</em> the log</li>
  <li><strong>Conversation transcript</strong> — most agent frameworks (Claude Code, Cursor, Aider) already persist the session</li>
</ol>

<p>So rather than asking the agent to maintain a separate log file, a more robust approach:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>issues/
  00001/
    issue.md
    plan.md       ← agent updates this (checked items, revised steps)
    deps
    status        ← open | blocked | done
</code></pre></div></div>

<p>And the “log” is just:</p>

<ul>
  <li><strong>plan.md diff over git history</strong> — you can see how the plan evolved</li>
  <li><strong>commits on the working branch</strong> — what actually happened</li>
</ul>

<h3 id="planmd-as-the-only-mutable-file">Plan.md as the only mutable file</h3>

<p>This simplifies the agent’s job to one thing: <strong>keep plan.md honest</strong>.</p>

<ul>
  <li>Check off steps as done</li>
  <li>Cross out steps that turned out wrong</li>
  <li>Add new steps discovered during work</li>
  <li>Commit plan.md changes alongside code changes</li>
</ul>

<p>That’s a much more natural fit for how models actually behave — they’re already editing files and committing. You’re just asking them to edit one more file as part of their normal flow.</p>

<h3 id="when-you-truly-need-a-log">When you truly need a log</h3>

<p>If you hit a case where you need to understand <em>why</em> something was tried and abandoned, you can reconstruct it from:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>git log <span class="nt">--follow</span> <span class="nt">-p</span> <span class="nt">--</span> issues/00001/plan.md
</code></pre></div></div>

<p>That gives you the full evolution of thinking, tied to real commits, without requiring the agent to maintain a separate file.</p>

<h3 id="the-honest-answer">The honest answer</h3>

<p><code class="language-plaintext highlighter-rouge">log.md</code> is a good idea in theory but a compliance burden in practice. Drop it. Let git be your log. Put all the agent’s mutable state into <code class="language-plaintext highlighter-rouge">plan.md</code> and keep the contract simple: <strong>one file to read, one file to update, commit often</strong>.</p>

<p>📝: You asked about log.md being impractical given base model behavior, I answered that you’re right — it’s the least controllable file, drop it, let git history and plan.md diffs serve as the log instead, keep the agent’s contract to a single mutable file (plan.md) alongside commits.</p>

<h2 id="question-1">Question</h2>]]></content><author><name></name></author><category term="tech" /><category term="nil" /><category term="ai" /><summary type="html"><![CDATA[]]></summary></entry><entry><title type="html">Personalized Software</title><link href="https://xianxu.dev/2026/03/19/the-rise-of-personal-software.html" rel="alternate" type="text/html" title="Personalized Software" /><published>2026-03-19T00:00:00+00:00</published><updated>2026-03-19T00:00:00+00:00</updated><id>https://xianxu.dev/2026/03/19/the-rise-of-personal-software</id><content type="html" xml:base="https://xianxu.dev/2026/03/19/the-rise-of-personal-software.html"><![CDATA[<p>The recent surge in the quality of agentic coding opens up the endless possibility that more people will be coding and creating software to their hearts content. For example, I’m a command line and vim user. For years, I learn about new plugins, and figure out how do they improve my workflow. More recently though, I started creating an nvim plugin directly to suit my workflow, resulting in <a href="https://github.com/xianxu/parley.nvim">parley.nvim</a>. At the beginning, my tinkering is limited, and a bit timid, as I didn’t know the capability of agentic coding, and I didn’t know much of lua, or nvim. But after 400 commits all human guided but AI driven, I got better at agentic workflow, and the capability of coding agents also improved by leap and bounds. Now I find myself almost freely guiding the plugin to whatever direction I want.</p>

<p>The latest example is that when I wanted mouse to work with type-a-head search interface and the all powerful telescope doesn’t support mouse well, I end up just getting the agent to write from scratch <a href="https://github.com/xianxu/parley.nvim/blob/main/lua/parley/float_picker.lua">a floating pop-up</a> that supports mouse, and the search behavior exactly I like!</p>

<p>Software development is rapidly merging with product management. The winner’s going to be whoever can both visualize the product, understand key internal states and subtle user interactions, and able to guide coding agents just enough to implement it. I suspect this is going to be the direction of software, a lot of personalize software, created just for a small number of users. That is, if human still use computer, manually, i.e. the knowledge workers still exist. What are the implications in such a world?</p>

<p>In that a world, it seems software systems need to be more programmable. We need not only great user interfaces geared towards end users, but also great programmability for coding agents to create customized workflows and user interfaces. The venerable vim/emacs are good examples of this. Extending this line of thoughts, it means for SaaS products, they need programming environments such that users can configure their own workflows. Few of them today do this today, wonder if this is going to be a trend in the future.</p>

<p>It seems we still need strong foundational software, like neovim, Emacs, or Google Sheets, to provide that programmable canvas. It’s unclear there are space for commercial software in such space. It just feels such software fitting the open source model better. It seems at minimal, SaaS products need to head into the direction of programmability, both in API form, but also in programmable user interfaces.</p>]]></content><author><name></name></author><category term="ai" /><category term="tech" /><summary type="html"><![CDATA[The recent surge in the quality of agentic coding opens up the endless possibility that more people will be coding and creating software to their hearts content. For example, I’m a command line and vim user. For years, I learn about new plugins, and figure out how do they improve my workflow. More recently though, I started creating an nvim plugin directly to suit my workflow, resulting in parley.nvim. At the beginning, my tinkering is limited, and a bit timid, as I didn’t know the capability of agentic coding, and I didn’t know much of lua, or nvim. But after 400 commits all human guided but AI driven, I got better at agentic workflow, and the capability of coding agents also improved by leap and bounds. Now I find myself almost freely guiding the plugin to whatever direction I want.]]></summary></entry><entry><title type="html">My AI Coding Workflow</title><link href="https://xianxu.dev/2026/03/08/ai-coding-workflow.html" rel="alternate" type="text/html" title="My AI Coding Workflow" /><published>2026-03-08T00:00:00+00:00</published><updated>2026-03-08T00:00:00+00:00</updated><id>https://xianxu.dev/2026/03/08/ai-coding-workflow</id><content type="html" xml:base="https://xianxu.dev/2026/03/08/ai-coding-workflow.html"><![CDATA[<p><em>I used to write code; now I write poems…</em> - so to speak</p>

<p>Previously, I introduced my thoughts that <a href="/2026/02/16/ai-as-new-abstraction.html">AI is the new abstraction layer</a>, and marveled at how far <a href="/2026/03/01/dawn-of-ai-coding.html">AI coding has matured</a>. So I went in a big way into tinkering in <a href="https://github.com/xianxu/parley.nvim">parley.nvim</a> and would love to share the experience and the workflow I follow. If you remember my post <a href="/2025/05/14/ai-coding-take-2.html#examples-of-the-bad">10 months ago</a>, I got stuck in some semi-complex features. This round, it was easy to power through them, I haven’t found anything that I wanted to create that I couldn’t get the agent to do it for me!</p>

<h2 id="the-workflow-orchestration-space">The workflow orchestration space</h2>

<p>I was initially impressed with tool like <a href="https://www.conductor.build/">Conductor</a>. In hindsight, it was a matter of the typical AI shock and awe. It turns out there are plenty of tools in this workflow orchestration space, such as <a href="https://vibekanban.com/">Vibe-Kanban</a>, and OpenAI just released <a href="https://github.com/openai/symphony">Symphony</a>.</p>

<p>When we take a closer look at those tools, you realize the core capability of such workflow came from the model and the system prompts. So I went ahead and built a simple workflow that suits my needs, mostly as a way to learn those intricacies. Part of that is to cut the middle-man’s opaque magic. Well, it turns out there aren’t too much of such magic anyway. This is similar to the days where I <a href="/2025/01/04/dot-files.html">tweaked my dotfiles</a> to fit my workflow, just super charged with AI capabilities.</p>

<p>I tweaked the workflow based on my belief of how medium scale development would be in the immediate future to take advantage of current AI capabilities. Maybe once I run this enough, I would be more comfortable with a flow further away from terminal and code.</p>

<h2 id="the-workflow-around-git-and-github">The workflow around git and github</h2>

<p>First is a set of tools to manage multiple git worktrees. It turns out with AI coding, it’s pretty simple, just ask <code class="language-plaintext highlighter-rouge">Claude</code> to do it. I choose to have targets like the following in Makefile:</p>

<ol>
  <li>
    <p><code class="language-plaintext highlighter-rouge">make issue 42</code> to create a new worktree for github issue 42. It also dumps issues into <code class="language-plaintext highlighter-rouge">tasks/issue.md</code>, such that when I’m in the worktree folder, and start AI coding agent, I just need to say: “go work”, and it will pick up <code class="language-plaintext highlighter-rouge">tasks/issue.md</code> and start working on it. The fact they should use <code class="language-plaintext highlighter-rouge">tasks/issue.md</code> is in the system prompt.</p>
  </li>
  <li>
    <p><code class="language-plaintext highlighter-rouge">make pull-request</code> to submit current worktree as a pull request.</p>
  </li>
  <li>
    <p><code class="language-plaintext highlighter-rouge">make merge</code> to merge the current pull request, and clean up local worktree and branch, close associated github issue.</p>
  </li>
</ol>

<p>You get the point, just ask <code class="language-plaintext highlighter-rouge">Claude</code> to make a Makefile target for your workflow. Those are just conventional stuff, but with AI, everyone can tweak their workflow to their heart’s desire.</p>

<p>I also reserve a simpler workflow, directly work on <code class="language-plaintext highlighter-rouge">main</code> and push to <code class="language-plaintext highlighter-rouge">main</code> when done, fine for small personal project.</p>

<h2 id="the-workflow-around-ai-agent">The workflow around AI agent</h2>

<p>You may recall previously I expressed <a href="/2026/03/01/dawn-of-ai-coding.html#para-as-to-openspec-im-still-learning-to-use-it-one-h">slight unease</a> with <code class="language-plaintext highlighter-rouge">OpenSpec</code> during my learning, as I couldn’t picture mentally how the various levels of specs stay synchronized with each other and with the code. You only get a promise from AI that it would be so. Plus, it is often very hard to understand a problem space fully upfront, and using the imprecise human language to specify all the details is hard. I went through an instance of this, when trying to create oauth flow for <code class="language-plaintext highlighter-rouge">parley.nvim</code>, and realize it is much more complicated than I initially understood it to be. There would be no way for me, who only know oauth as a user, to understand all those nuances upfront.</p>

<p>Instead, I have constructed the following workflow, which I would call: <strong>Iterative Spec Flow</strong>. The key premise is to let go full specification upfront, with the rationale that coding itself uncovers edge cases and details. The spec in this flow, is the spec of boundaries to constrain an agent’s search space, not all the exact details to follow. And as coding continues, the spec evolves with new discoveries. In this flow, human has a couple of definitive places to review and verify.</p>

<h3 id="files-involved">Files involved</h3>

<ol>
  <li>
    <p><strong>Orchestration glue</strong>: <a href="https://github.com/xianxu/parley.nvim/blob/main/AGENTS.md"><code class="language-plaintext highlighter-rouge">AGENTS.md</code></a>. This is the main system prompt file. It explains and ties several other markdown files agent should maintain.</p>
  </li>
  <li>
    <p><strong>Project definition</strong>: <code class="language-plaintext highlighter-rouge">tasks/issue.md</code>. The work for each worktree. This is user provided, for example, <code class="language-plaintext highlighter-rouge">make issue 42</code> will populate this from GitHub issue 42.</p>
  </li>
  <li>
    <p><strong>Agent state</strong>:</p>

    <ol>
      <li>
        <p><code class="language-plaintext highlighter-rouge">tasks/todo.md</code>: this is the agent’s planning space and encodes agreement with the user within this session. I instructed agent to insert checkpoints for human manual review in their checklist. At end of session (pull request merged), <code class="language-plaintext highlighter-rouge">tasks/todo.md</code> is cleaned up, its states merged to <code class="language-plaintext highlighter-rouge">specs/</code> files.</p>
      </li>
      <li>
        <p><a href="https://github.com/xianxu/parley.nvim/blob/main/tasks/lessons.md"><code class="language-plaintext highlighter-rouge">tasks/lessons.md</code></a>: agent is instructed to record their learning as session goes on. Currently this file is committed so it survives across different feature development. This represents repo specific knowledge that agent has learned, a soft version of system prompt. I believe it is worth checking this file occasionally to keep it concise.</p>
      </li>
    </ol>
  </li>
  <li>
    <p><strong>Shared knowledge base</strong>:</p>
    <ol>
      <li>
        <p>The <a href="https://github.com/xianxu/parley.nvim/tree/main/specs"><code class="language-plaintext highlighter-rouge">specs/</code></a>: this is where agent put spec files. Those are not formal specs, rather sketches of specs to limit agent’s search space. I instruct agent to always keep this updated as they change code. I even have <code class="language-plaintext highlighter-rouge">make test-changed</code> that drive testing based on which spec files changed. You can think those <code class="language-plaintext highlighter-rouge">specs/*</code> files encode sketches of what was the user request in <code class="language-plaintext highlighter-rouge">tasks/issue.md</code>, and second level details in <code class="language-plaintext highlighter-rouge">tasks/todo.md</code>.</p>

        <ol>
          <li>
            <p>There’s <a href="https://github.com/xianxu/parley.nvim/blob/main/specs/index.md"><code class="language-plaintext highlighter-rouge">specs/index.md</code></a> as an indexing file, the hope is that agent can progressively discover specs, like how <a href="https://platform.claude.com/docs/en/agents-and-tools/agent-skills/overview">agent skills</a> is processed.</p>
          </li>
          <li>
            <p>There’s <a href="https://github.com/xianxu/parley.nvim/blob/main/specs/traceability.yaml"><code class="language-plaintext highlighter-rouge">specs/traceability.yaml</code></a> as a mapping between specs and tests needing to run. The goal is to cut down time for agent to ground itself (running test).</p>
          </li>
          <li>
            <p>All the files in <code class="language-plaintext highlighter-rouge">specs/</code> are retrofit from code and previous <code class="language-plaintext highlighter-rouge">README.md</code> file, with lightweight human review. Treating this as scaffold for agent to build on.</p>
          </li>
        </ol>
      </li>
    </ol>
  </li>
</ol>

<p>With those files, they are managed by instructions in <a href="https://github.com/xianxu/parley.nvim/blob/main/AGENTS.md"><code class="language-plaintext highlighter-rouge">AGENTS.md</code></a> and <a href="https://github.com/xianxu/parley.nvim/blob/main/tasks/lessons.md"><code class="language-plaintext highlighter-rouge">tasks/lessons.md</code></a>.</p>

<h3 id="the-workflow-looks-like">The workflow looks like</h3>

<ol>
  <li>
    <p><code class="language-plaintext highlighter-rouge">make issue 42</code> to create a new worktree for the issue.</p>
  </li>
  <li>
    <p><code class="language-plaintext highlighter-rouge">claude</code> or <code class="language-plaintext highlighter-rouge">codex</code>, then ask it to “go work”. System prompt <sup id="fnref:go_work"><a href="#fn:go_work" class="footnote" rel="footnote" role="doc-noteref">1</a></sup> is strong enough for agent to pick up <code class="language-plaintext highlighter-rouge">tasks/issue.md</code> and start working on it. This typically involves creating <code class="language-plaintext highlighter-rouge">tasks/todo.md</code> and updating <code class="language-plaintext highlighter-rouge">specs/</code> files.</p>
  </li>
  <li>
    <p>System prompt instructed agent to insert manual verification steps, this is often when I discover spec issues. Back and forth with agent ensues. Occasionally agent update <code class="language-plaintext highlighter-rouge">tasks/lessons.md</code> to reflect on their learning.</p>
  </li>
  <li>
    <p>More often than not, after a pivot, I need to prod them to update <code class="language-plaintext highlighter-rouge">tasks/todo.md</code> with new plan. It seems they only remember to do it at the start of a session.</p>
  </li>
  <li>
    <p>I still manually commit code when I think it’s in stable enough state. I suspect it would be easy to instruct agent to do so directly, but I still want more control and predictability.</p>
  </li>
  <li>
    <p>When a pull request is ready, I have another agent to code review. I’m using the <a href="https://github.com/obra/superpowers">superpowers</a> agent skills library for that. Unfortunately I haven’t dive deep enough to see what it does yet.</p>
  </li>
</ol>

<h2 id="learnings-and-thoughts">Learnings and Thoughts</h2>

<ol>
  <li>
    <p><strong>Don’t over specify</strong>: treat the specs for AI as sketches to reduce AI search space, not exacts to confine AI’s creativity. Don’t over specify desired behavior. I believe specifying too much using the very imprecise human language is counter productive anyway.</p>
  </li>
  <li>
    <p><strong>Always verify</strong>: I found asking model to insert manual verification step a good practice to uncover assumptions I didn’t agree with.</p>
  </li>
  <li>
    <p><strong>Always keep stable state</strong>: git is your friend! Commit good state often.</p>
  </li>
  <li>
    <p><strong>Ground the agent</strong>: I believe that it’s on human to properly ground agent, through tests, ideally automated. Ask the agent to write a lot of tests! I went from having 0 tests to have 600+ tests (unit tests + integration test) and linting rules. I event created a <code class="language-plaintext highlighter-rouge">make test-changed</code> target that runs tests based on which spec files changed, so that agent can run it to verify their code change faster.</p>

    <ol>
      <li>
        <p>Eventually, lessons loosely held (in <code class="language-plaintext highlighter-rouge">tasks/lessons.md</code>) should migrate to some predictable framework, like linting rules.</p>
      </li>
      <li>
        <p>Figure out a way to run end to end test directly against the UI would be interesting. For example, the <a href="https://dashbit.co/blog/the-path-to-tidewave">Tidewave</a> represents a very interesting approach where AI agent have direct access to the UI state, paving way of much shorter feedback loop.</p>
      </li>
      <li>
        <p>OpenAI wrote about <a href="https://openai.com/index/harness-engineering/">harness engineering</a> where they ask coding agent to proof of work and use Chrome DevTool Protocol to access to DOM state directly.</p>
      </li>
    </ol>
  </li>
  <li>
    <p><strong>Local log files</strong> are very useful to have agent reading local log files, so that it can correlate my usage with the log.</p>
  </li>
  <li>
    <p><strong>My job</strong> is review and evolve specs, the <code class="language-plaintext highlighter-rouge">todo.md</code>, and help manual verification steps. I haven’t found it useful to review code itself, as I’m not good at lua. But sometimes I can identify code not DRY enough, and would guide agent to refactor.</p>
  </li>
  <li>
    <p><strong>Still missing</strong>: need to have tests on performance of agent produced code.</p>
  </li>
</ol>

<p>As I use AI to construct more complex features, I find this workflow is pretty effective. It’s not perfect, but it’s dramatically better than what I had 10 months ago. It is already very usable for my personal project now. Looking forward to leverage AI fully at work, in a similar vein what <a href="https://openai.com/index/harness-engineering/">OpenAI did</a>.</p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:go_work">
      <p>The following is in AGENTS.md: “Work for you is in tasks/issue.md, you MUST make plan in tasks/todo.md” <a href="#fnref:go_work" class="reversefootnote" role="doc-backlink">&#8617;&#xfe0e;</a></p>
    </li>
  </ol>
</div>]]></content><author><name></name></author><category term="ai" /><category term="tech" /><summary type="html"><![CDATA[I used to write code; now I write poems… - so to speak]]></summary></entry></feed>