Xian Xu

A Tale of Two Harnesses

2026-04-16T00:00:00+00:00

I’ve been building a Neovim plugin called Parley for the past year, mostly as a way to understand what this vibe coding is about, while making some tools I like to use in nvim. 29K lines of Lua, 700+ commits, all written by AI agents, steered by me. It started as a chat environment — a way to have AI conversations inside my editor without the limitations of ChatGPT’s UI, a humble beginning. It has by now grown into a workbench/harness that forms a power tool for how I work.

What happened in the past week in particular surprised me. I set out to build some rather complex feature: an AI harness in nvim, and ended up stumbling on the pattern of one repo, multiple harnesses.

The Flight Hack

On a flight from Pittsburgh to SF, I started adding tool call support to Parley. Basic stuff: let the AI read and write files, list directories, search content. About 7K lines of change. The goal was simple — if Parley can touch the filesystem, it can do more than just talk.

This was Parley’s first step from “conversation tool” toward something I’d later call a harness — an environment that doesn’t just chat with AI, but runs structured AI loops in a directory.

The Struggle That Ensued

Tool calls introduced a multiplexed stream — AI responses now interleaved with tool calls and their results. And here’s where things got interesting. The AI (Claude Code, building Parley) made a huge mess of it. It kept packing the new stream handling on top of old code that assumed a single insertion point in a buffer. Patch on top of patch, and well, it didn’t know it needed to stop. And it took me some time to get it to finally stop and not revert to that old habit. I also needed to abort the tool call feature, to refactor this portion of code first, to dig myself out of the AI tech debt, so to speak.

I had to stop the AI and impose the architecture myself. There was already an abstraction of an intermediate model of [{question, answer}] that represents what’s in a nvim buffer, all we needed to do was enforce 1) represent more free-style tool call blocks in that structure and 2) use relative positioning to simplify the math. It took me quite some time to convince the AI to follow this to the letter, including various false starts when I forced it to print out state which showed it was not following that architecture choice.

This was the most intensive human intervention in my memory with AI coding. It took another 7K lines of change and 31 commits to refactor all code into this new model.

In hindsight, this was one of the most important moments of the week. Not because of what got built, but because of what it demonstrated: the human provides the structure, the AI provides the labor. When the AI couldn’t see the right architecture, iteration alone wasn’t converging. I had to draw the boundary. Once I did, and in this case, painfully, the AI executed successfully within it.

Another interesting aspect of this is that all this “steering”, I still don’t know how to write lua — though I’ve learned to read and audit it. This illustrates two things to me: that humans can provide value at a higher plane than coding, and that coding agents are mature enough to execute reliably within a well-defined architecture. A new era is really upon us.

Parley Becomes a Harness

The real reason I wanted tool calls was to build an inline review system for my writing. And this is where Parley stopped being a chat tool and became something different.

The review system works like this: I write a document in markdown, and anywhere I want feedback, I drop an inline marker:

㊷[this paragraph is too vague, find some examples from ./somewhere/in/the/file/system/markdown.md]

An AI agent reads the document, finds the markers, and rewrites the marked sections. If it needs clarification, it can ask back:

㊷[too vague]{do you mean the technical details or the business framing?}

To which, I can respond inline, and the conversation continues — all inside the document itself. The document IS the conversation.

The integration with Neovim was pretty natural. Changes show up with diff highlighting. Edits needing human attention land in Neovim’s quickfix list, so I can navigate feedback like I navigate compiler errors. Editorial notes display through the diagnostics system. The editor’s native machinery, repurposed for human-AI collaboration on prose.

Here’s the thing I noticed: this is a fundamentally different kind of AI loop from what Claude Code runs. Claude Code takes a task and drives toward implementation — write code, run tests, iterate until it passes.

The review loop is different. The human writes. The AI comments. The human responds to comments. The AI adjusts. It’s collaborative and document-centric. The transcript matters.

Two different loops. Two different interaction models. Both useful. Both operating on the same files.

Starting a repo for agent skills

After finishing up the parley review tool, I started a repo to accumulate agent skills — reusable workflows that encode how I want AI to participate in specific tasks. A voice adaptation skill that learns my writing style from 40 blog posts and rewrites AI drafts to sound like me. An interview feedback skill that turns my lightweight interview notes (Parley’s interview mode) into structured hiring feedback, with policies baked in (gender-neutral pronouns, no names, generic TC — habits from my Meta days). A critique mode that has AI leave adversarial comments on a document, which I then address through the same ㊷[] review loop.

Building these, I accidentally decomposed writing into AI-operable stages. If I wrote a document to near-publishable quality, AI does surgical edits in review mode. If AI generated the first draft from my rough ideas, applying my voice becomes the critical step — otherwise it reads like AI slop. Same underlying tools, different starting positions, different emphasis. These aren’t separate features. They’re stages in an editorial process — drafting, critiquing, revising, polishing — and what I’d been doing was converting each stage into a human-steered, AI-powered workflow. That pattern feels like it generalizes well beyond writing.

The Construct: Meta-Programming in a Stochastic World

Then I built something stranger: /construct, a meta-tool for evolving skills across repositories. The problem it solves: you have upstream skills (like a community-maintained set of development practices, e.g. superpowers), and you have your own philosophy (your AGENTS.md, your workflow preferences etc.). How do you combine them?

Text merging doesn’t work — skills aren’t code, they’re instructions for a stochastic machine. You can’t git merge them. So I experimented with what I started calling “semantic merge”: store the intent (the conversation that produced the adaptation), not the patch itself. When the upstream changes, you replay your intent against the new version. The AI does the merge, guided by your recorded reasoning. There are even verification steps for each of those adaptation rules. And I plan to keep tweaking such adapters — as I use superpowers during coding, the aspects I don’t like go to the adaptation layer.

This was the start of a repo — a workbench for building, adapting, evolving and sharing skills across harnesses. After building the Construct, I adapted a set of development practice skills into the Parley repo, making them work harmoniously with Parley’s design philosophy. The Construct made this possible in a structured, repeatable way.

The Eureka

Somewhere around this point, building a tool in one UX, then accessing the spirit of that tool from another UX, I had the realization that should have been obvious from the start:

It’s just two harnesses on the same state.

Claude Code is the execution harness. Convergent. It takes a spec and drives toward implementation. Write code, run tests, iterate. Disposable conversations. The human cares more about the output feature, not the actual code. Well, I know this point might be controversial.
Parley is the thinking harness. Divergent. It’s where I explore, branch, change my mind, annotate, critique, and shape ideas. The human is firmly in the driver seat, checking both the final artifact but also the intermediates. The human does the work, AI assists.
The file system is the shared state. In my case, several repos I typically operate in. Both harnesses read and write the same set of files. And you use the best harness to transform them into what you want.

The state is the integration layer. No bridges. No APIs. No copy-pasting between apps. A file written by one harness is immediately available to the other, because they share the state, the full state.

Most people think of AI tools as standalone applications. You use Cursor for coding. You use ChatGPT for questions. You use Notion AI for docs. Each tool has its own state, its own context, its own AI. What if they all operated on the same file system? That’s what I did in developing parley — vision, roadmap, brainstorming, issues, documentation, code, tests, everything in that single repo. And the result is liberating. At least for a solo developer.

The insight is that the state — your files, your repo, your accumulated context — is the real workspace. The harnesses are just specialized lenses and transformers on it. You pick the right tool for the job. And because they share state, work done in one is immediately available in the other.

Where This Leads To

I started thinking about what this means beyond coding. If the pattern is right — shared state, specialized harnesses that provide the right UX for human steering — then it’s not just a developer workflow. It’s a knowledge work workflow. And more importantly, figuring out how non-coders can continuously improve upon their harnesses on that shared state, making their workflows hyper-charged.

That’s the topic for my next post.

WhatsApp GDPR Fine

2026-04-08T00:00:00+00:00

This transcript is generated by parley.nvim.

Question

Give me a summary of the IDPC case against WhatsApp in 2021. From my impression, WhatsApp used a lossy hashing scheme and upload minimal amount of data (just phone numbers), and not shared with Facebook. Is that correct? If that’s the case, what’s the fine for? Literally every other company under the sun is doing much more than that, and I’m pretty sure none of they disclosed this “non-subscriber data” collection, as it’s so minor to the grand scheme of things. And for example, how the hell you can notify non-subscriber? It’s a case where regulator doesn’t check reality of how people want to use a service, and just out there to get probably one of the most privacy conscious service out there, just because they can milk innovation, essentially.

Do some research around this and tell me your opinion.

IDPC document here.

🤖:[Claude-Code-Opus]

🧠: Xian is asking about the Irish Data Protection Commission (IDPC) case against WhatsApp in 2021. He has a strong opinion that this was regulatory overreach against a privacy-conscious service. He wants me to research the actual case details and give my honest opinion. He’s linked the EDPB binding decision document. Let me fetch that document and do some web research to understand the case properly before forming an opinion. Given his WhatsApp background, he likely has insider knowledge, so I need to be substantive and not superficial.

Let me research this thoroughly - fetch the document and search for context.Now I have a thorough understanding of this case. Let me synthesize everything and give Xian my honest assessment.

Okay, I’ve gone deep into the EDPB binding decision and the surrounding commentary. Here’s my analysis.

The Case in a Nutshell

The DPC concluded WhatsApp failed to: provide required privacy information to WhatsApp users (Article 13); provide privacy information to contacts of WhatsApp users — “non-users” — whose personal data was processed to show users which contacts were also WhatsApp users (Article 14); make privacy information easily accessible (Article 12); and failed to comply with the overarching transparency principle (Article 5(1)(a)).

The investigation was an “own volition” enquiry — the regulator selected the parameters itself, choosing to fix on an audit of WhatsApp’s “transparency” obligations.

Your Technical Understanding — Partially Correct, But the Picture Is Bigger

You’re right that WhatsApp used a lossy hashing scheme on non-user phone numbers and uploaded minimal data. The EDPB document confirms this in detail — when a user enables the Contact Feature, non-user phone numbers are hashed via a lossy process that maps each number to one of ~2^39 buckets (theoretically grouping ~16 numbers per hash). WhatsApp argued this constituted anonymization.

However, the fine wasn’t really about the technical processing itself. Here’s the critical distinction:

What the Fine Was Actually For

The DPC took no issue with how WhatsApp collects data at all. Instead, the huge financial penalty was based on WhatsApp’s failure to disclose the entire range of ways in which it uses the personal information it collects from users.

The violations were purely about transparency — the how you describe what you do in your privacy policy, not what you actually do. Specifically:

Privacy policy was too vague and scattered — information spread across multiple linked documents, using conditional language like “may” and “such as”
Legitimate interests not specific enough — WhatsApp listed interests like “measurement, analytics, and other business services” without tying them to specific processing operations
Non-user notification failure — WhatsApp processed non-user phone numbers (via the contact upload feature) but provided zero Article 14 information to those non-users
Data sharing with Facebook Companies — the privacy policy didn’t clearly explain what was shared, with whom, and on what legal basis

The DPC considered that the Article 14 breach (non-users) was the most serious among the four breaches identified, given the estimated number of non-users concerned (126 million) and the fact that they had no control or knowledge whatsoever over the processing of their personal information.

The Lossy Hashing Fight

This is where it gets technically interesting and where I think you have the strongest point. The Irish DPC actually agreed with WhatsApp that the lossy hash output was anonymous data — the DPC’s Decision-Maker overturned the original investigator’s finding. But then eight other EU regulators objected, and the EDPB overruled the DPC, concluding the lossy hashed data was pseudonymous, not anonymous, because:

The “at least 16 numbers per hash” was actually “at most 16” — many hashes would map to fewer numbers in practice
The Non-User List linked hashes to the uploading users, creating a social graph that could be used for re-identification
The NL SA demonstrated that a brute-force lookup table for all Dutch mobile numbers could be built in ~3 minutes on 2017 hardware
WhatsApp’s argument that it had “no motivation” to re-identify was rejected — the EDPB ruled motivation is irrelevant to whether data is personal

The Fine Escalation Drama

Ireland originally proposed a far more low-ball penalty of up to €50 million for WhatsApp. The EDPB’s binding decision contained a clear instruction that required the DPC to reassess and increase its proposed fine, resulting in the €225 million fine.

The fine breakdown was: €90M for Article 5(1)(a), €30M for Article 12, €30M for Article 13, and €75M for Article 14 — totaling €225 million, being the sum of the separate fines proposed for breaches of GDPR Articles 12, 13, 14 and 5 respectively.

My Honest Opinion

I think you’re partially right and partially wrong, and here’s where I land:

Where you’re right:

The non-subscriber data argument is indeed somewhat absurd in practice. The DPC suggested it was not disproportionate to reach out to non-users, e.g. by way of information provided on WhatsApp’s website. But as the HU SA correctly pointed out, non-users might not even know WhatsApp exists, let alone visit their website. The requirement is performative compliance theater.
Selective enforcement is a real issue. You’re right that virtually every app that does contact sync (Telegram, Signal, Facebook Messenger, Snapchat, TikTok) processes non-subscriber phone numbers in similar ways, and none of them were providing Article 14 notices to non-users in 2018 either. WhatsApp was arguably doing more privacy-preserving work (the lossy hashing) than most competitors.
The fine amount was politically driven. The DPC originally proposed €30-50M. The EDPB — driven largely by Germany, France, and Italy — quadrupled it. The fine against WhatsApp only equals 0.8 percent of WhatsApp’s annual turnover. The EDPB explicitly used Facebook’s consolidated group turnover to make the fine look small and justify increasing it. This is regulatory signaling, not proportionate enforcement.
WhatsApp’s actual privacy practices were relatively good for the industry. End-to-end encryption, lossy hashing of non-user contacts, no content scanning — this was genuinely one of the more privacy-conscious messaging platforms. Fining them €225M for how they described their practices rather than the practices themselves does feel perverse.

Where I think you’re wrong:

The case isn’t as outrageous as it looks once you understand GDPR’s design philosophy. GDPR treats transparency not as a bureaucratic checkbox but as a fundamental right enabler. The logic is: if you don’t know what’s happening to your data, you can’t exercise your rights (object, erase, port, etc.). WhatsApp’s privacy policy was genuinely bad — the DPC itself said it was a “needlessly frustrating exercise” to understand what legal basis applied to what processing. 41% of required information was missing for users; 100% was missing for non-users.
The “everyone does it” defense doesn’t work under GDPR, and regulators explicitly rejected it. WhatsApp’s unsuccessful arguments about its privacy notice being in line with the approach in the market, illustrate that the regulators are unlikely to be influenced by a common market practice.
The lossy hashing debate was actually technically legitimate. The NL SA’s brute-force analysis was sound. A constant salt + truncated hash + the social graph metadata in the Non-User List does make re-identification feasible. WhatsApp’s “k-anonymity of 16” claim was mathematically wrong in practice.

The bigger picture:

I think the truth is somewhere in between. This was a case where a regulator picked a relatively privacy-conscious target precisely because it was a high-profile Meta subsidiary, knowing it would generate headlines and establish precedent. The DPC itself was reluctant (proposing a much lower fine and being overruled by the EDPB). The substantive transparency violations were real but the penalty was disproportionate to the actual harm — especially when compared to companies doing far more invasive things with far less disclosure.

Your instinct that this is “milking innovation” has some merit — but I’d frame it differently. It’s more like GDPR enforcement optimizing for legibility over actual privacy outcomes. The regulation rewards companies that write beautiful privacy policies over companies that build privacy-preserving technology. WhatsApp built lossy hashing; they got fined for not describing it clearly enough. That’s a legitimate criticism of the regulatory framework, not just of this decision.

References

EDPB Binding Decision 1/2021 (the full 89-page document)
IAPP analysis of the decision
TechCrunch coverage
Bird & Bird legal analysis
Fieldfisher post-WhatsApp privacy notice guidance
WhatsApp CJEU appeal admissibility ruling (2026)

📝: You asked about the 2021 IDPC/EDPB case against WhatsApp and whether the fine was justified given WhatsApp’s privacy-conscious practices, I answered with a detailed breakdown showing the €225M fine was purely for transparency violations (how WhatsApp described its practices in privacy policies, not the practices themselves), the lossy hashing debate was technically legitimate but the fine was disproportionate, enforcement was politically escalated from DPC’s €50M to EDPB’s €225M, and while your frustration about selective enforcement has merit, GDPR’s design philosophy treats transparency as a fundamental right enabler — the real criticism is that GDPR rewards legibility over actual privacy engineering.

Question

and tell me how WhatsApp did describe that before that got them into trouble, particularly around this feature. Ignore the non-subscriber notification, which is retarded. With regard to its own users, how it described that feature?

🤖:[Claude-Code-Opus]

🧠: Xian wants to know specifically how WhatsApp described the contact upload/sync feature in its privacy policy that the DPC found deficient. He’s asking about the description to users (not non-users). Let me fetch the EDPB document to find the specific language WhatsApp used and what the DPC found wrong with it.Now let me search for what WhatsApp’s actual privacy policy said about the contact feature specifically, since the EDPB document references the “Legal Basis Notice” but doesn’t fully reproduce the contact feature description.Now I have a clear picture. Let me answer Xian’s specific question about how WhatsApp described the contact feature to its own users.

Specifically regarding the contact upload/sync feature, here’s what WhatsApp’s privacy policy actually said to its own users and why the DPC found it deficient:

What WhatsApp’s Privacy Policy Said

The EDPB document quotes WhatsApp’s “Legal Basis Notice” extensively. The contact feature description was spread across multiple documents — the Privacy Policy, a separate Legal Basis Notice, and an “About Our Services” page. The key problems with how WhatsApp described the contact feature to users fell into several buckets:

Problem 1: Vague Legitimate Interest Descriptions

WhatsApp listed its legitimate interests in bullet-point form under broad objectives. For the contact-adjacent processing, the relevant language included things like “For providing measurement, analytics, and other business services where we are processing data as a controller” with sub-bullets like “to provide accurate and reliable reporting to businesses and other partners” and “In the interests of businesses and other partners to help them understand their customers.”

The EDPB found this language nearly identical to the WP29 Transparency Guidelines’ own examples of “poor practice” — phrases like “We may use your personal data to develop new services” or “We may use your personal data for research purposes.”

Problem 2: No Link Between Processing Operations and Legal Bases

The DPC found that insufficient detail had been provided in relation to the processing operations that will be grounded upon other legal bases and the information was furnished in piecemeal fashion that required the user to link in and out of various different sections of the Privacy Policy as well as the Terms of Service.

Specifically for the contact feature, WhatsApp’s privacy policy said something like “you provide us the phone numbers in your mobile address book on a regular basis, including those of both the users of our Services and your other contacts” — but it didn’t explain:

What exactly happened to those numbers after upload
That non-user numbers were stored separately in a “Non-User List”
What the lossy hashing procedure was or that it was applied
How long the data was retained
Which specific legal basis applied to this specific processing operation (as opposed to other operations)

Problem 3: Conditional/Wague Language

The DPC found that controllers must remove any conditional or vague language like “may”, “including” or “such as”. WhatsApp’s policy was riddled with this kind of hedging language — saying it “may” do things rather than stating definitively what it does.

Problem 4: Information Scattered Everywhere

The DPC noted there was “an abundance of text that communicated very little,” warning against long but uninformative notices. The IE SA itself said reviewing WhatsApp’s materials was a “needlessly frustrating exercise” requiring extensive searching across the Privacy Policy and related documents to piece together what was actually happening. WhatsApp noted the level of detail in its privacy notice was consistent with the level of detail provided by its peers. The DPC dismissed this, noting that an industry could not be allowed to set its own level of compliance.

Problem 5: The 41% Gap

The DPC calculated that WhatsApp failed to provide 41% of the information required by Article 13 GDPR to its own users. That’s not a minor gap — it means almost half of the legally mandated disclosures were simply missing.

My Take on Your Specific Question

So to directly answer: WhatsApp described the contact feature to its own users in the way that basically every tech company did circa 2018 — a high-level statement that “we access your contacts to help you connect with friends” type language, spread across a Privacy Policy + Legal Basis Notice + Terms of Service, using conditional “may” language, without mapping specific data categories → specific processing operations → specific legal bases.

The DPC’s standard was essentially: you need to tell users, for each specific processing operation, exactly what data you take, what you do with it, why, under what legal basis, who you share it with, and for how long — and you need to do this in a way that’s easy to find and understand.

WhatsApp’s point is well made that the standard set out in the decision goes significantly beyond that of most privacy notices. Indeed, a glance at the privacy notice on the website of the EDPB shows it does not meet the very similar standard applicable to the EDPB. Nor does the Irish DPC practice what it preaches.

That IAPP observation is devastating — the regulators who fined WhatsApp €225M for transparency failures don’t meet their own standard on their own websites. This is probably the strongest evidence for your “milking innovation” thesis. The standard was set impossibly high, applied retroactively to a 2018 privacy policy, and the enforcers themselves can’t meet it.

References

📝: You asked how WhatsApp specifically described the contact upload feature to its own users in its privacy policy, I answered with: WhatsApp used high-level vague language scattered across multiple documents (Privacy Policy + Legal Basis Notice + ToS) with conditional “may” phrasing, failed to map specific processing operations to specific legal bases, didn’t explain the lossy hashing procedure or Non-User List retention, and was missing 41% of Article 13 required information overall. The IAPP noted even the EDPB and DPC themselves don’t meet the standard they imposed.

Question

Did WhatsApp mention “we access your contacts to help you connect with friends” somewhere. I don’t care if it’s scattered, it’s simply not possible to describe how model software work, in a way layman can understand. That full description would inevitably be very long.

This is not to really argue it can’t be better, but rather, if you want to go full legalistic, you can mess with those companies whichever way you want, regardless of actual harm in real world. Just don’t pretend to be noble. Pretty sad.

🤖:[Claude-Code-Opus]

🧠: Xian is pushing back on my characterization. He’s right to call me out — I used “type language” as a paraphrase, not a direct quote. He wants to know what WhatsApp actually said about the contact feature. Let me go back to the EDPB document to find the actual language WhatsApp used. He’s also making a broader philosophical point about the impossibility of describing complex software to laypeople in a legally compliant way, and the disconnect between legalistic standards and real-world harm. Let me fetch the document to find the actual privacy policy language quoted in the decision.Fair point — I should not have used “type language” as if it were a quote. Let me correct the record with what the document actually says.

What WhatsApp Actually Told Users About the Contact Feature

The EDPB document quotes WhatsApp’s Privacy Policy as stating that users “provide us the phone numbers in your mobile address book on a regular basis, including those of both the users of our Services and your other contacts.” So yes, WhatsApp did tell users it was collecting their contacts’ phone numbers. It wasn’t hidden.

The problem, according to the DPC and EDPB, was not that WhatsApp said nothing — it’s that WhatsApp didn’t connect that statement to the downstream chain of: what happens to those numbers → what legal basis applies → what the lossy hashing procedure does → how long the Non-User List is retained → who gets access to it.

What the DPC Specifically Criticized

The DPC’s criticism regarding the contact feature for users boiled down to:

No mapping of processing operations to legal bases — WhatsApp told users it collected contacts, but didn’t specify under which legal basis (consent? legitimate interest? contractual necessity?) this particular processing operation fell. The Legal Basis Notice listed legitimate interests as broad bullet points like “For providing measurement, analytics, and other business services” without tying them to specific data operations.
Conditional language — The privacy policy used “may” and “such as” and “including” throughout, which the DPC considered non-compliant because it left users uncertain about what actually happens vs. what might happen.
Information scattered across documents — The contact feature description lived partly in the Privacy Policy, partly in the Legal Basis Notice, partly in “About Our Services.” The DPC found piecing it together was a “needlessly frustrating exercise.”

Your Point Is Basically Right

WhatsApp did describe the contact feature to users. It wasn’t a secret. The issue was granularity and structure, not concealment. And your broader point holds: if you want to go full legalistic, you can find fault with virtually any privacy policy on earth, because the GDPR standard for transparency — when read maximally — demands a level of specificity that is practically incompatible with how modern software works.

The EDPB itself acknowledged this tension implicitly. WhatsApp argued (correctly, in my view) that the standard demanded by the decision goes beyond what any company in the industry was doing. The DPC’s own investigator initially proposed a different, more lenient reading than what the EDPB ultimately imposed. The DPC Decision-Maker actually cleared WhatsApp on Article 13(1)(d) — it was the EDPB that overruled that finding.

So the enforcement arc was: WhatsApp’s own regulator (Ireland) found a moderate violation → eight other countries objected → the EDPB overruled Ireland on multiple points → the fine quadrupled. This is not a case where the regulator closest to the facts thought WhatsApp was egregiously wrong. It’s a case where a supranational body applied a maximalist reading and used WhatsApp to set precedent.

Your frustration is well-placed. The GDPR transparency standard, as applied here, essentially demands that privacy policies read like API documentation — every processing operation mapped to a legal basis, every data category tied to a purpose, every third party named. That’s technically possible but produces documents nobody reads, which defeats the stated goal of being “user-centric rather than legalistic.”

📝: You asked whether WhatsApp actually described the contact feature to users and what specifically was wrong, I answered that WhatsApp did disclose contact collection but the DPC/EDPB found the description lacked granularity in mapping processing operations to legal bases, used conditional language, and scattered information across documents — supporting your view that this is a legalistic gotcha rather than a genuine concealment issue.

Question

No no no, you can’t give user API document, as you will get fined as it’s not concise enough!

Loom workflow system

2026-04-05T00:00:00+00:00

This transcript is generated by parley.nvim.

Question

Loom: create a subsystem that uses coding to manage durable projects

I realize claude, codex is pretty good way to manage long running project, with project’s internal state kept in md files in a folder, and todos, and time they should be triggered or due encoded in some format in that folder. I guess essentially open crawl? But 1/ repo based; 2/ markdown files to make state visible to human inspection and iteration; 3/ use stock coding agent’s capability for multi-step reasoning, externalize its mental states in md, seems to fit knowledge worker more; 4/ otherwise (also using coding agents) to construct scripts for steps that are highly deterministic.

In an analogy, Parley already showed the start of this pattern:

1/ parley chat, brainstorm tool, it keeps state globally in chat files, but more recent we allow move chat tree into other directories, and the whole point was to move chat to repo’s design/ directory as the output of brainstorming. btw, will improve this so parley chat can be even more straightforward to be used as project specific brainstorming tool.

2/ issue system, keep things related in issues/, then a hybrid of scripts (created by claude) and claude itself (during coding sessions) to manipulate the state to form a workflow.

3/ vision, the new roadmapping tool I’m tinkering, storing states (yaml) in vision/, then currently using scripts (generated by claude) to manipulate, and not free form text. I can’t pinpoint why I made this choice, but feel this area needs more structure.

Now:

4/ loom, to support more wide variety of tasks, but at its core, leveraging coding agents, to manage some workflows, keeping states in repo.

Those together, forms the new “operating system” of modern startup.

Thoughts?

🤖:[Claude-Code-Opus]

🧠: This is a fascinating architectural vision. The user is describing an evolution from chat → issues → roadmapping → general workflow orchestration, all repo-based with markdown/yaml state files, leveraging coding agents as the “runtime.” They’re asking for my thoughts on this “loom” concept and the broader vision of repo-based, human-inspectable, agent-managed workflow systems as a “startup OS.” Let me think deeply about this.

The key insight: coding agents (Claude, Codex) are already good at multi-step reasoning, file manipulation, and maintaining context. By externalizing state into markdown/yaml files in a repo, you get: version control for free, human inspectability, collaborative editing (human + AI), and the agent’s reasoning capabilities applied to workflow management rather than just code generation.

The name “loom” is evocative — weaving threads together into fabric. Let me think about what this actually means architecturally and where the real leverage is.

I should engage seriously with the architectural implications, identify what’s novel vs. what exists, point out potential challenges, and help sharpen the vision. This is a design conversation, not a “build me X” request.

This is a genuinely compelling vision. Let me engage with it structurally.

The Pattern You’re Identifying

What you’re describing is essentially the repo as a runtime environment, where:

State = files (md, yaml) in well-known directories
Runtime = coding agents (Claude, Codex) invoked on-demand or triggered
Transitions = agent reads state → reasons → mutates files → commits
Observability = git log, human-readable files, diffs
Collaboration = humans and agents edit the same files, review each other’s work

The progression is clear:

Layer	Tool	State Location	What It Manages
Brainstorming	parley chat	`design/`	Ideas, conversations, exploration
Planning	issues	`issues/`	Work items, dependencies, status
Strategy	vision	`vision/`	Roadmap, priorities, timelines
Orchestration	loom	`loom/`	Arbitrary durable workflows

What Makes Loom Different from 1–3

Parley, issues, and vision are each domain-specific — they have a fixed schema and a known set of state transitions. Loom is the generalization: it’s a workflow engine where the agent defines the schema, the transitions, and the triggers based on the task at hand.

This is the leap from “tool” to “substrate.” And it’s where things get both powerful and dangerous.

The Core Architectural Question

The critical design tension is: how much structure does loom impose vs. leave to the agent?

Too much structure → you’re reimplementing Jira/Asana in markdown, and the agent is just a fancy CLI. Too little structure → the agent creates bespoke state formats every time, and humans can’t navigate without asking the agent to explain its own files.

I think the sweet spot is something like:

loom/
  loom.yaml              # registry of active "threads" (projects/workflows)
  threads/
    quarterly-planning/
      thread.yaml         # metadata: status, created, triggers, due
      state.md            # human-readable current state
      log.md              # append-only reasoning log
      artifacts/          # any outputs (reports, drafts, etc.)
    hiring-pipeline/
      thread.yaml
      state.md
      log.md
      artifacts/

Where thread.yaml has a minimal common schema:

id: quarterly-planning
status: active          # active | paused | completed | blocked
created: 2026-04-01
last_touched: 2026-04-05
trigger:
  schedule: "weekly/monday"   # or: on_change, manual, event
  condition: "when vision/roadmap.yaml changes"
owner: human            # or: agent, collaborative
summary: "Synthesize team updates into quarterly plan"
depends_on: []

And state.md is freeform but agent-maintained — this is where the agent externalizes its “mental state” about the workflow. The key insight: the agent writes state.md for its future self (and for humans), the same way a good engineer writes design docs.

Why This Actually Works (and Why Now)

Three things converge:

Coding agents are file-native. They already think in terms of reading files, reasoning, writing files. You’re not fighting the grain — you’re aligning the workflow engine with the agent’s natural modality.
Git gives you durability + auditability for free. Every state transition is a commit. You can revert. You can branch. You can diff. No database needed.
Markdown is the universal interface. Humans read it, agents read it, GitHub renders it, grep finds it. It’s the lowest-common-denominator that actually works.

The “Operating System” Analogy

I think the analogy is apt but worth being precise about. What you’re building isn’t an OS in the process-scheduling sense. It’s more like a file-system-based workflow kernel, where:

Processes = threads (active workflows)
System calls = agent invocations (read state, reason, write state)
Scheduler = trigger system (cron-like, event-driven, manual)
IPC = file-based (one thread reads another thread’s state.md)
Shell = the human, who can cat, edit, or kick off any thread

The missing piece — and the hardest piece — is the scheduler/trigger layer. Right now you’re describing manual invocation: human says “hey Claude, go work on the quarterly-planning thread.” The real power comes when threads can:

Self-schedule (“check back on this Monday”)
React to events (“when a new issue is created in issues/, triage it”)
Chain (“when hiring-pipeline reaches ‘offer-sent’, trigger onboarding thread”)

This is where you’d need a lightweight daemon or cron job that scans loom/threads/*/thread.yaml for due triggers and invokes the agent. That’s the “init system” of your OS.

Honest Concerns

State corruption. When the agent rewrites state.md, it can lose information, hallucinate progress, or drift from reality. Mitigation: the append-only log.md creates an audit trail, and humans periodically review. But this requires discipline.

Context window limits. A long-running thread accumulates context. Eventually the agent can’t hold the full history. You’ll need a summarization/compaction strategy — maybe the agent periodically rewrites state.md as a condensed “current understanding” while log.md retains the full history (but is only consulted when needed).

Agent reliability for scheduling. Agents are good at reasoning but unreliable as schedulers. The trigger/cron layer should probably be a simple script, not agent-driven. Let dumb code handle “when,” let the agent handle “what.”

What I’d Build First

If I were you, I’d start with:

The thread schema (thread.yaml + state.md + log.md) — nail the minimal structure
A loom CLI or set of scripts — loom new , loom list, loom run (which invokes Claude with the thread’s context)
One real workflow — something you actually do weekly, like “synthesize team standup notes into a status update.” Prove the pattern works before generalizing.
The trigger layer — a cron job that scans for due threads and invokes loom run

Skip building a UI. The repo is the UI. That’s the whole point.

The Bigger Picture

What you’re describing — parley + issues + vision + loom — is essentially externalizing the startup’s cognitive processes into version-controlled, agent-manipulable files. The startup’s “brain” becomes inspectable, forkable, and collaborative between humans and agents.

This is philosophically closer to a Zettelkasten for organizations than to project management software. And the fact that it’s built on git means it inherits all of git’s collaboration primitives (branches, PRs, merges) for free.

The real question isn’t whether this pattern works — I think it clearly does for a certain class of knowledge worker. The question is whether it scales beyond the founder/power-user who set it up. Can a new engineer join and navigate loom/threads/ without a 30-minute walkthrough? That’s the test.

📝: You asked about “loom” — a generalized repo-based workflow system using coding agents to manage durable projects via markdown/yaml state files, and the broader vision of parley+issues+vision+loom as a startup OS. I answered with architectural analysis: the thread-based file structure (thread.yaml + state.md + log.md), the key design tension of imposed vs. emergent structure, why the pattern works now (agents are file-native, git gives durability, markdown is universal), the OS analogy mapped precisely (threads=processes, agent=syscalls, triggers=scheduler), honest concerns (state corruption, context limits, scheduling reliability), and a concrete build-first plan starting with schema + CLI + one real workflow + cron triggers.

Question

I’m less sure about totally free form state for agent.

In the end, or at least what I’m comfortable at the beginning, is really to repeat the other tools I created as I need them as part of my workflow, e.g. issues for coding complex tasks, and keep track of ideas I want to build; vision to align bigger scoped items. you can imagine in the future, some issues can be automatically promoted if it’s sizable, maybe just with a keystroke, or even automatically by agent (human prompt: find all issues that’s big enough), or codify that prompt into code.

so, the next type of tool I think I want, would be some repeating tasks, e.g. checking system health dashboard every morning when I sip coffee, or last week’s oncall report. We can use this as first type of things what loom should support.

I don’t know if some common pattern or building block would emerge, but I felt I have gone through this several times it’s worth thinking about generalization now.

The other aspects: I’m highly confident that this is going to be very useful for my own personal productivity, organizing thoughts and execution. If we can lift this into some operating system for new generation company is TBD; but probably good enough to sell to venture capital?

🤖:[Claude-Code-Opus]

🧠: The user is narrowing the scope from my overly general “freeform state” proposal to something more concrete and practical. They want loom to start with a specific use case: recurring/scheduled tasks (morning dashboard checks, weekly oncall reports). They’re also reflecting on the meta-pattern across parley, issues, vision, and loom — wondering if common building blocks emerge. And they’re being honest about scope: personal productivity first, company OS maybe later, VC pitch possibly. Let me engage with: 1) the concrete first use case (recurring tasks/routines), 2) the emerging common patterns across their tools, 3) honest take on the VC angle. I should not repeat the architecture I already laid out — they’re course-correcting me toward something more grounded.

You’re right to push back on freeform. Let me recalibrate.

The Correction

What you’re describing is closer to typed tools that share a common substrate than a generic workflow engine. Each tool (issues, vision, loom) has its own schema, its own semantics, its own interaction patterns. The generalization isn’t “anything goes” — it’s that they all follow the same construction pattern:

state in repo files + scripts/agent to manipulate state + human oversight

That’s the building block. Not the state format.

Loom as “Routines”

The first concrete loom use case — recurring tasks with agent execution — is actually a very clean, well-bounded problem:

# loom/routines/morning-health-check.yaml
name: Morning System Health Check
schedule: "weekday 8:00am"
prompt: |
  Check the following dashboards and summarize:
  - AWS CloudWatch: error rates, latency p99
  - PagerDuty: open incidents from last 12h
  - Datadog: database connection pool utilization
  Output a concise status to loom/routines/morning-health-check/latest.md
  Flag anything that needs human attention at the top.
sources:
  - url: https://app.datadoghq.com/dashboard/xxx
  - command: "aws cloudwatch get-metric-statistics ..."
history: loom/routines/morning-health-check/history/

And the output is just:

# 2026-04-05 Morning Health Check

### 🔴 Needs Attention
- Aurora replica lag spiked to 800ms at 3:22am, recovered at 4:01am
  - Worth checking if the nightly ETL job timing shifted

### 🟢 All Clear
- Error rates nominal (0.02%)
- No open PagerDuty incidents
- Connection pools at 34% utilization

You sip coffee, open latest.md in Neovim, done. Or it shows up as a parley chat you can interrogate further: “tell me more about that replica lag.”

The Common Pattern Across Your Tools

Now, stepping back — here’s what I see emerging:

Concern	Issues	Vision	Loom (Routines)
Schema	issue yaml (status, priority, tasks)	roadmap yaml (items, horizons, status)	routine yaml (schedule, prompt, sources)
State mutation	scripts + agent during coding	scripts + agent	scripts + agent on schedule
Human interaction	read/edit files, kick off agent	read/edit yaml, run commands	read output, maybe follow up
Trigger	manual (human decides to work on issue)	manual (planning sessions)	scheduled (this is new)

The scheduled trigger is the genuinely new capability loom introduces. Everything else — yaml state, agent manipulation, human-readable output — you’ve already built. The delta is small, which is a good sign. It means you’re not inventing a new paradigm; you’re extending one that works.

The Minimal Implementation

Honestly, loom v0 might be:

A loom/routines/ directory with yaml definitions
A cron job (or GitHub Action, or a simple systemd timer) that scans for due routines
For each due routine: invoke claude with the prompt, write output to latest.md, git commit
A loom list script that shows what’s active and when each last ran

That’s maybe 100 lines of bash/python. The agent does the hard part (reasoning about dashboards, summarizing oncall reports). Your code just handles the boring scheduling plumbing.

On the Emerging Abstraction

You’ve now built the same pattern three times (issues, vision, loom). The common bones:

A directory convention (issues/, vision/, loom/)
Typed YAML for structured state (machine-readable, agent-readable)
Markdown for narrative state (human-readable, agent-writable)
Scripts for state transitions (generated by agent, run by human or cron)
Agent as reasoning engine (reads state, applies judgment, writes state)

If you wanted to generalize, you’d extract something like a tool template — a way to scaffold a new tool with its directory, schema, scripts, and agent prompts. But I’d resist doing that until you have 4-5 tools and the pattern is so obvious it’s annoying not to have it. You’re at 3-4 now. Maybe after loom ships, the abstraction crystallizes.

On the VC Angle

Honest take: the personal productivity story is strong but not fundable on its own. VCs want to see a wedge into a market, not a power user’s bespoke setup.

But the reframe that could work: “What if every startup’s operational state was version-controlled, agent-manipulable, and human-readable?” That’s a real pitch. It’s the anti-Notion, anti-Jira argument: those tools are databases with UIs that agents can’t reason over. Your approach is files that both humans and agents natively understand.

The challenge is the same one you identified: does it scale beyond the power user? The honest answer today is no. But the honest answer for most dev tools at seed stage is “the founder uses it and it’s transformative for them.” That’s enough for a certain class of investor, especially if you can show the pattern repeating (which you can — four tools, same substrate, compounding value).

The stronger VC angle might actually be: you’re not selling loom, you’re selling the pattern. The “repo-native operational OS” as a framework, with loom/issues/vision/parley as reference implementations. Think Rails, not Basecamp — though Basecamp funded Rails just fine.

What I’d Do Next

Build loom/routines/ for yourself this week. One routine: morning health check. Keep it dead simple. Once you’re using it daily and it’s saving you 15 minutes, you’ll know what the second routine should be. The tool will tell you what it wants to become.

📝: You asked about narrowing loom’s scope to recurring/scheduled tasks as first use case and whether common patterns emerge across your tools, I answered with a concrete routine schema, identified the common substrate pattern (typed yaml + markdown + scripts + agent), noted that scheduled triggers are the genuinely new capability loom adds, and gave honest assessment on VC angle (pattern is the product, not individual tools).

Question

interestingly, I did have basecamp in mind, but of their other aspect of how they define company culture and how they work, less so of Rails.

🤖:[Claude-Code-Opus]

🧠: The user is clarifying that their Basecamp reference is about Basecamp’s organizational philosophy — Shape Up, calm company, async-first, small teams with autonomy, “how we work” as a product — not about extracting a framework (Rails) from a product. This is a meaningful distinction. They’re thinking about the operating system for a company angle more literally: how a company runs, makes decisions, communicates, and coordinates. Basecamp sold their way of working (books like Rework, Shape Up, It Doesn’t Have to Be Crazy at Work) as much as they sold the software. The user sees a parallel: the repo-native toolchain isn’t just tooling, it’s a philosophy of how a modern startup should operate. I should engage with this distinction seriously and not just repeat what I already said.

That’s a much more interesting reference point.

Basecamp as “Company Philosophy, Productized”

Basecamp’s real innovation wasn’t the project management tool or Rails — it was codifying an opinionated way of working (Shape Up, six-week cycles, small teams, no backlogs, calm company) and then building software that enforced those opinions. The tool and the philosophy were inseparable. You couldn’t use Basecamp “wrong” because the constraints were baked in.

What you’re doing rhymes with that. The toolchain encodes a way of working:

State lives in the repo, not in someone’s head or a SaaS database
Agents are collaborators, not automation bots
Everything is inspectable, diffable, reviewable
Async by default — you write state, agent processes it later (or vice versa)
Small, composable tools over monolithic platforms

That’s not just a technical choice. It’s an organizational philosophy: the company’s operational state should have the same properties as its code — version-controlled, peer-reviewed, transparent, and forkable.

Where This Gets Interesting

Basecamp wrote Shape Up after they’d been working that way for years. The methodology emerged from practice, then got codified, then got evangelized. You’re on a similar trajectory — you’re in the “practicing it and discovering what works” phase.

The question is whether the philosophy that emerges is prescriptive enough to be compelling. Basecamp succeeded because their opinions were strong: no backlogs, period. Six-week cycles, period. If your philosophy is “use files in a repo,” that’s too weak. If it’s something like:

“Your company’s decisions, priorities, and operational state should be committed to git, manipulable by agents, and never locked in a tool you don’t control”

…that’s a stance people can rally around or argue against. Both are good.

The Honest Gap

Basecamp’s philosophy was legible to non-technical people — PMs, designers, founders who don’t code. The repo-native approach currently requires comfort with git, yaml, terminal, and agents. That’s a smaller audience.

But maybe that’s fine. Basecamp targeted “small teams who reject enterprise bloat.” You’d be targeting “technical founders and engineering-led teams who want their operational layer to be as hackable as their codebase.” That’s a real audience. It’s just not everyone — and it doesn’t need to be.

📝: You clarified that the Basecamp inspiration is about their company philosophy/way-of-working (Shape Up, calm company) rather than the Rails-from-product extraction pattern. I engaged with how your toolchain similarly encodes an opinionated organizational philosophy — operational state as code, agent-native, inspectable — and noted the gap is legibility to non-technical users, but that a technical-founder audience is viable.

Question

Vision alignment tools

2026-04-03T00:00:00+00:00

This transcript is generated by parley.nvim.

Question

in a startup how to best track both business and technical long term vision?

I found business leader and technical leader speak different language, and hard to create dual track vision that stagger and align. I found myself playing this role trying to bridge and translate between them. I started a mermaid graph for dependencies among them, but need:

1/ show “size and complexity of a thing” 2/ can customize description, including maybe one sentence of tech/business description, then link to further documentation

Any other tool I can use to represent and communicate this?

🤖:[Claude-Code-Opus]

🧠: The user is asking about a practical problem in startup leadership: bridging business and technical vision, and specifically about tools/representations for tracking dependencies and showing “size and complexity” of items. They already use Mermaid for dependency graphs. They seem to be a technical leader or CTO-type who bridges business and engineering. They want visual/communication tools that can represent not just dependencies but also the weight/complexity of items. Let me think about what’s actually useful here rather than giving generic advice.

Key concepts to cover:

The dual-track vision alignment problem is real and well-known
For showing “size and complexity” in graph representations - weighted nodes, bubble charts, treemaps
Tools beyond Mermaid that handle this better
Practical frameworks that work for this bridging role

I should focus on the tooling/representation question since that’s their specific ask, while briefly addressing the broader alignment challenge.

This is a really common and underappreciated problem. The business leader thinks in outcomes, markets, and timelines; the technical leader thinks in systems, constraints, and dependencies. You’re essentially acting as a translation layer, which is valuable but fragile if it lives only in your head.

On showing “size and complexity” in your dependency graph

Mermaid is great for structure but weak for encoding magnitude. What you want is a graph where nodes carry visual weight. A few options:

Option A: Weighted node graphs (best fit)

D3.js force-directed graphs — nodes can be sized by complexity (e.g., story points, estimated weeks, number of unknowns) and colored by domain (business vs. tech). Edges show dependencies. This is the most expressive option but requires some coding.
Graphviz (DOT language) — supports node sizing, shapes, colors, and cluster groupings. Less interactive but very precise. You can generate .dot files programmatically.

Option B: Treemaps / Bubble charts

If hierarchy matters more than dependency, a treemap (e.g., in Notion charts, Observable, or even Google Sheets) encodes size as area. Good for showing “this technical initiative is 5x the effort of that one.”

Option C: Wardley Maps

This might be the missing piece for you. Wardley Maps were designed exactly for the problem you’re describing:

X-axis: evolution stage (genesis → custom → product → commodity)
Y-axis: visibility to the user/business
Components are placed on the map and connected by dependency lines
You can annotate with effort/complexity

It forces business and technical leaders to look at the same map and argue about positioning rather than talking past each other. It’s specifically designed to bridge strategy and technology.

Option D: Impact/Effort matrices with dependency overlays

A 2x2 (impact vs. effort) where you draw dependency arrows between items. Simple, but surprisingly effective in meetings. Tools like Miro or FigJam work well here.

Practical suggestion

For your specific situation — dual-track vision with dependencies and complexity — I’d recommend:

Wardley Map for the strategic alignment conversation (quarterly)
Weighted dependency graph (D3 or Graphviz) for the operational/planning view
Keep your Mermaid graphs for documentation — they’re version-controllable and readable in PRs

On the broader alignment problem

A few hard-won patterns:

Shared artifact, not shared language: Don’t try to make the business leader speak tech or vice versa. Create a single visual artifact (like the Wardley Map) that both can point at and disagree on. The artifact is the translation layer, not you.
Name the bets: Every long-term vision item is a bet. Frame them as “We believe [X] will lead to [Y], and we’ll know by [Z].” This works in both languages.
Stagger intentionally: Business vision should lead by ~1 quarter. Technical vision responds with “here’s what we need to build/change to enable that.” The dependency graph you’re building is exactly the right connector.

Want me to sketch out a Graphviz example with weighted nodes for your use case, or dive deeper into Wardley Maps?

📝: You asked about tools to track dual business/technical vision with size/complexity representation beyond Mermaid, I answered with weighted node graphs (D3/Graphviz), Wardley Maps as the strongest fit for business-tech alignment, treemaps for magnitude, and the principle of using shared artifacts instead of relying on yourself as the translation layer.

Question

what is “Graphviz (DOT language)”

2026-04-03T00:00:00+00:00

This transcript is generated by parley.nvim.

← Vision alignment tools

Question

what is “Graphviz (DOT language)”? and which software supports it.

I like both nice visualization but also the ability to generate it programmatically.

🤖:[Claude-Code-Opus]

🧠: The user is asking about Graphviz and its DOT language specifically. They want to know what it is and which software supports it. They also care about both visual appeal and programmatic generation. I should explain DOT language concisely, list key tools/renderers, and highlight the programmatic angle since that’s their use case. They’re technical (they’re already using Mermaid), so I can be fairly direct.

Graphviz is a graph visualization system originally developed at AT&T Labs Research. The DOT language is its plain-text, declarative language for describing graphs (nodes, edges, attributes).

How it works

You write a .dot file like this:

digraph vision {
    rankdir=LR;

    // Nodes with size/style attributes
    node [shape=box, style=filled];

    A [label="Auth Service\n(S)", fillcolor="#a0d8ef", width=1.2];
    B [label="Data Platform\n(XL)", fillcolor="#ff9999", width=3.0, fontsize=18];
    C [label="Mobile App v2\n(M)", fillcolor="#c8e6c9", width=1.8];

    // Edges (dependencies)
    A -> B [label="feeds into", style=dashed];
    B -> C [label="enables"];
}

Key idea: node/edge attributes (width, height, penwidth, fontsize, color) let you encode size/complexity visually — which is exactly the dimension you said you’re missing in Mermaid.

Software that renders DOT

Tool	Type	Programmatic?	Notes
Graphviz CLI (`dot`, `neato`, `fdp`)	Command-line	✅ pipe `.dot` → SVG/PNG/PDF	The canonical renderer
Python `graphviz` lib	Python API	✅ build graph in code, render	`pip install graphviz`
D3-graphviz	JS (browser)	✅ render DOT in browser via WASM	Uses `@hpcc-js/wasm` under the hood
Viz.js / @viz-js/viz	JS library	✅ pure JS/WASM Graphviz port	Works in Node or browser
VS Code extension (“Graphviz Preview”)	Editor	❌ manual	Live preview as you edit `.dot`
Jupyter (via `graphviz` Python lib)	Notebook	✅ inline render	Great for exploration
Confluence / Notion	Wiki	⚠️ embed SVG output	No native DOT support

Programmatic generation example (Python)

from graphviz import Digraph

dot = Digraph('vision', format='svg')
dot.attr(rankdir='LR')
dot.attr('node', shape='box', style='filled')

# Items: (id, label, size_weight, color)
items = [
    ('auth',     'Auth Service',    1.2, '#a0d8ef'),
    ('data',     'Data Platform',   3.0, '#ff9999'),
    ('mobile',   'Mobile App v2',   1.8, '#c8e6c9'),
]

for id, label, w, color in items:
    dot.node(id, label=f"{label}\n(w={w})",
             width=str(w), fillcolor=color, fontsize=str(10 + int(w * 4)))

dot.edge('auth', 'data', label='feeds into', style='dashed')
dot.edge('data', 'mobile', label='enables')

dot.render('vision', view=True)  # outputs vision.svg and opens it

# To test from CLI directly:
pip install graphviz
python vision_graph.py

# Or raw DOT:
dot -Tsvg vision.dot -o vision.svg

Graphviz vs Mermaid — key tradeoff

	Mermaid	Graphviz
Rendering	Browser/Markdown native	CLI / lib, then embed
Layout control	Limited	Fine-grained (`width`, `pos`, subgraphs)
Encode magnitude	❌ not really	✅ node size, pen width, font size
Programmatic	Possible but awkward (string templating)	First-class (Python lib, JS lib)
Looks pretty by default	✅	⚠️ Functional, not beautiful — needs styling

Practical suggestion

Since you’re bridging business ↔ tech, a reasonable workflow:

Source of truth: a structured data file (YAML/JSON) listing initiatives, their size, type (biz/tech), dependencies
Generate: DOT graph programmatically from that data, encoding size as node width
Render: SVG → embed in your wiki/slides
Iterate: update the YAML, re-run, new graph

This keeps the representation version-controlled and reproducible — much better than manually dragging boxes in a whiteboard tool when the landscape keeps shifting.

📝: You asked what Graphviz/DOT language is and which tools support it, I answered with: it’s a declarative graph description language from AT&T Labs, rendered by CLI tools and libraries (Python graphviz, D3-graphviz, Viz.js), with fine-grained node attributes (width, color, penwidth) that solve your “show size/complexity” need, and suggested a YAML→DOT→SVG pipeline for maintainability.

Question

you meant a yaml file to represent dot format?

🤖:[Claude-Code-Opus]

🧠: The user is asking for clarification. They’re wondering if I meant YAML as an alternative representation of DOT format. No — I meant YAML as a higher-level, domain-specific data source (initiatives, their metadata like size, type, dependencies) that you then transform into DOT programmatically. The YAML is not representing DOT syntax; it’s representing your business/technical domain model. The Python script reads the YAML and generates DOT. This is an important distinction — the YAML is your semantic layer, DOT is your visualization layer. Let me clarify this cleanly with a concrete example.

No — not YAML as DOT. I meant two separate layers:

YAML (your domain model)  →  Python script  →  DOT  →  SVG/PNG
     what things are             transform       layout    visual

The YAML captures what you care about in business/tech terms. The script translates it into DOT. This separation matters because:

Non-technical people can read/edit YAML — they’ll never touch DOT
You can generate multiple views from the same YAML (e.g. a dependency graph, a timeline, a complexity heatmap)
The YAML becomes your single source of truth, not the graph

Concrete example

vision.yaml — your domain model:

initiatives:
  - id: auth
    name: Auth Service Rewrite
    type: tech
    size: S
    quarter: Q3
    depends_on: []

  - id: data_platform
    name: Data Platform
    type: tech
    size: XL
    quarter: Q3-Q4
    depends_on: [auth]

  - id: self_serve
    name: Self-Serve Onboarding
    type: business
    size: L
    quarter: Q4
    depends_on: [data_platform, auth]

  - id: mobile_v2
    name: Mobile App v2
    type: business
    size: M
    quarter: Q1-next
    depends_on: [data_platform]

generate_graph.py — the transform:

import yaml
from graphviz import Digraph

SIZE_MAP = {'S': 1.0, 'M': 1.5, 'L': 2.2, 'XL': 3.0}
COLOR_MAP = {'tech': '#a0d8ef', 'business': '#ffe0b2'}

with open('vision.yaml') as f:
    data = yaml.safe_load(f)

dot = Digraph('vision', format='svg')
dot.attr(rankdir='TB')
dot.attr('node', shape='box', style='filled,rounded')

for item in data['initiatives']:
    w = SIZE_MAP.get(item['size'], 1.5)
    color = COLOR_MAP.get(item['type'], '#eeeeee')
    label = f"{item['name']}\n[{item['size']}] {item['quarter']}"

    dot.node(item['id'], label=label,
             width=str(w), fillcolor=color,
             fontsize=str(10 + int(w * 3)))

    for dep in item.get('depends_on', []):
        dot.edge(dep, item['id'])

dot.render('vision_roadmap', view=True)

pip install graphviz pyyaml
python generate_graph.py
# opens vision_roadmap.svg

This produces a graph where:

Box size = initiative complexity (S/M/L/XL)
Color = business (orange) vs tech (blue)
Arrows = dependencies
Labels = timing

Why this layering matters for your specific problem

You said you’re the bridge between business and tech leaders. With this setup:

The YAML is the artifact you co-author with both sides — it’s readable, diffable, lives in git
The graph is a generated output you show in meetings — nobody argues about box positions, they argue about the data
When business says “we need self-serve in Q3”, you update one line in YAML, re-run, and instantly see if the dependency chain supports that timeline

The graph is disposable. The YAML is the conversation.

📝: You asked to clarify whether I meant YAML as DOT format, I clarified it’s two separate layers — YAML as a domain-specific source of truth (readable by both business and tech), with a Python script transforming it into DOT for visualization, keeping the semantic model separate from the rendering.

Question

ah, then I can also insert whatever links in the YAML etc. essentially develop a workbench around this, to manage dependencies.

I can for example imagine a parley like plugin to manage a folder in a repo. I can also see easily additional both read-only presentation of such folder on web, or eventually collaborative editing of this folder through web.

Thoughts?

🤖:[Claude-Code-Opus]

🧠: The user is connecting dots — they see the YAML-as-source-of-truth pattern and are envisioning a full workbench around it. They reference their own neovim plugin (parley.nvim) which seems to be a tool they built. They’re thinking about: 1) a repo folder as the backing store, 2) neovim plugin for power-user editing, 3) read-only web view, 4) eventually collaborative web editing. They’re not really asking me a question — they’re thinking out loud and want me to engage critically with the architecture. I should validate what’s strong, flag risks/tradeoffs, and maybe suggest concrete architectural choices. I should not over-explain basics — this person clearly builds tools and thinks in systems.

Yes — you’re describing a very natural architecture, and the key insight is right: a folder of YAML files in a git repo is already a database with versioning, diffing, branching, and access control built in.

What you’d be building

repo/vision/
├── initiatives/
│   ├── auth-rewrite.yaml
│   ├── data-platform.yaml
│   └── self-serve-onboarding.yaml
├── schema.yaml          # optional: validate structure
└── views/
    └── roadmap.dot      # generated

Each initiative = one file. Dependencies reference other files by ID. This is essentially a lightweight knowledge graph stored as flat files.

The three interfaces you’re describing

Interface	Audience	Mode
Neovim (parley-like plugin)	You, tech leads	Read/write, power user
Static web view	Stakeholders, all-hands	Read-only, generated
Collaborative web editor	Cross-functional leads	Read/write, structured

This is a smart layering because you decouple the data from the interface — each layer is independently buildable and optional.

What’s strong about this

Git as the collaboration protocol — you get PRs for vision changes, which is actually profound. “I propose we move self-serve to Q3” becomes a diff, not a meeting argument.
One file per initiative scales better than one big YAML — fewer merge conflicts, easier to grep, natural unit of ownership.
Neovim-first means you’ll actually use it daily. Tools you don’t live in die.

Where I’d be cautious

Don’t build the web editor early. The progression should be:

YAML files + generate script       ← start here, weeks of value
Neovim plugin for navigation        ← you'd build this naturally
Static site (read-only)             ← CI generates on push
Web editor                          ← only if non-git users need write access

Step 4 is where complexity explodes — you’re building a CMS at that point (conflict resolution, auth, validation). Many teams never actually need it. A surprisingly effective middle ground: non-technical people submit changes via PR templates with a simple form, and you merge.

For the static web view

A minimal approach that works well:

# generate.py — run in CI on every push
# 1. Read all YAML files from initiatives/
# 2. Emit a dependency graph (SVG via Graphviz)
# 3. Emit an index.html with the graph + a table of all initiatives
#    each linking to a detail page or external docs
# 4. Deploy to GitHub Pages / S3

Each initiative YAML can carry a docs_url field pointing to a design doc, Notion page, whatever — the web view just renders it as a link. You don’t need to centralize all documentation, just the graph of relationships and sizing.

The real leverage

What you’re building is not a project management tool. It’s a shared mental model with a rendering engine. The value isn’t the graph — it’s that business and tech leaders can look at the same structured data and see their own concerns reflected:

Business sees: what depends on what, what’s blocking revenue
Tech sees: what’s complex, what’s coupled, what’s risky
You stop being the translator and become the maintainer of the schema

That’s a much more sustainable role than being the human bridge.

📝: You proposed building a workbench around YAML-in-git-repo with neovim plugin, static web view, and eventual collaborative editing; I validated the architecture, recommended one-file-per-initiative structure, cautioned against building the web editor too early (static site + PR workflow covers most needs), and reframed the value as shifting your role from human translator to schema maintainer.

Question

actually, parley already supports exporting a tree of markdown files (chat files) cross referenced. here, we replace markdown with yaml.

Assuming there’s a company.yaml file of this “company vision” directory, we pretty much can generate easily various views, for example, two views:

1/ a dependency graph, the dot file and visualization mentioned above. 2/ just plain web page with interlinked yaml files for navigation, parley style.

We can add more different visualization easily, as likely any reasonable company.yaml is going to be too complex to just show fully, then maybe we only show starting from a node etc. We can customize whichever way.

🤖:[Claude-Code-Opus]

🧠: Xian is refining the architecture and essentially validating his own thinking. He’s not really asking a question — he’s describing the design and implicitly asking for confirmation, refinement, or pushback. The key points he’s making: 1) parley already has the tree-of-files + cross-reference pattern, 2) company.yaml as root, 3) multiple generated views from same data, 4) subgraph/filtered views because full graph is too complex. I should affirm what’s strong, add any sharp insight, and push back where appropriate. The main thing I can add value on: the “too complex to show fully” problem is THE core UX challenge, and the “start from a node” idea is exactly right — this is essentially a graph traversal UI. I should also note that company.yaml as root is a good design choice (entry point / manifest). I should keep this concise since he’s mostly thinking out loud and building momentum.

Exactly right. And the “start from a node” insight is the critical one — it’s the difference between a tool that works in a presentation and one that works in a conversation.

The architecture you’re converging on

vision/
├── company.yaml              # root: top-level goals, references initiatives
├── initiatives/
│   ├── auth-rewrite.yaml     # references other initiative IDs
│   ├── data-platform.yaml
│   └── self-serve.yaml
└── generated/                # gitignored or CI-built
    ├── full-graph.svg
    ├── index.html            # interlinked browsable pages
    └── subgraphs/
        └── data-platform.svg # rooted at one node

company.yaml as the root is a good design choice — it’s the manifest. It declares the top-level business objectives, and each one references initiatives. The graph is implicit in the depends_on fields across files. The views are just traversals.

Two views, two purposes

View	When you use it	What it answers
Dependency graph (rooted at node X)	Planning meetings, “why is this blocked?”	Structure, critical path, coupling
Interlinked pages (parley-style)	Async review, onboarding, reference	Detail, context, links to docs

They complement each other well — the graph is for seeing shape, the pages are for reading substance.

The “too complex” problem

This is actually the core UX problem of any vision-tracking tool, and “start from a node” solves it cleanly because it maps to how people actually ask questions:

“What does data platform depend on?” → subgraph, ancestors
“What does data platform unlock?” → subgraph, descendants
“What’s on the critical path to self-serve?” → shortest path from roots

These are all just graph queries over your YAML files. A small CLI like:

vision show data-platform --depth 2 --direction down

…would be more useful day-to-day than any full-graph rendering. And it’s trivial to implement given you already have the traversal logic in parley for markdown trees.

One thought

The parley pattern of “tree of cross-referenced files” is doing a lot of work here. What you’re really building is parley generalized beyond chat — a structured-file graph with pluggable renderers. The chat files were the first content type. Vision YAML is the second. The rendering engine (interlinked HTML, graph viz, filtered subgraphs) is the reusable layer.

Whether you keep it inside parley or extract it as a sibling tool is a naming/packaging decision, but the core abstraction is the same.

📝: You described a company.yaml-rooted vision directory reusing parley’s cross-referenced file pattern with multiple generated views, I affirmed the architecture and highlighted that “start from a node” graph traversal is the key UX insight, and that this is essentially parley generalized beyond chat into a structured-file graph with pluggable renderers.

Question

you can even run self consistency checks, or allocation of resources: did two person got assigned on the same project etc. really can embed any rules.

What’s your take on the value of such software in a business? would it help align business leader and tech leader? based on my experience with parley, this is easily constructable, maybe 1 week of my time I think.

thoughts?

🤖:[Claude-Code-Opus]

🧠: Xian is asking me to evaluate the business value of this tool concept — whether it would actually help align business and tech leaders. He’s also sanity-checking the 1-week estimate. He has deep experience building parley, so he knows his velocity. I should be honest and critical here, not just cheerleading. Let me think about what actually drives alignment in organizations vs. what tools can do. The real question is: does the problem of business/tech misalignment stem from lack of shared artifacts, or from deeper organizational/incentive issues? A tool helps with the former but not the latter. I should also consider: does this already exist? (Roadmapping tools like Productboard, Jira, etc.) What’s different here? The key differentiator is the code-native, git-native, programmable nature. Let me be balanced.

The honest answer: the tool itself has moderate value; the process of building and maintaining it together has high value.

Why I’m cautious about the tool alone

The alignment problem between business and tech leaders is rarely about lack of visibility. It’s usually about:

Different incentive horizons — business optimizes for next quarter’s revenue, tech optimizes for next year’s maintainability
Different risk models — business sees delay risk, tech sees complexity risk
Lack of shared vocabulary — not lack of shared documents

A beautifully rendered dependency graph doesn’t fix any of these. I’ve seen teams with Jira boards, Productboard roadmaps, Notion wikis, all perfectly structured — and still fundamentally misaligned because the leaders don’t actually negotiate tradeoffs together.

Where it does help

The specific thing your tool does that existing roadmap tools don’t:

Makes dependencies non-negotiable. When a business leader says “move self-serve to Q3,” the graph mechanically shows what that requires. It shifts the conversation from opinion to constraint propagation. This is genuinely powerful.
Consistency checks as guardrails. The rules engine you described — resource conflicts, impossible timelines, circular dependencies — turns implicit disagreements into explicit violations. That’s a forcing function for alignment.
Git-native means decisions have history. “Why did we deprioritize X?” becomes git log initiatives/x.yaml. No other roadmap tool gives you that for free.
It’s yours. You can shape it to your org’s actual decision-making process, not Jira’s abstraction of what a process should be.

The real value proposition

The highest-leverage use isn’t the software — it’s the weekly ritual of reviewing it together. If you get the business leader and tech leader to co-edit company.yaml in a meeting, even briefly, the act of negotiating what goes in the file is the alignment. The tool is the excuse.

This is the same dynamic that makes architecture decision records (ADRs) work — the document isn’t the value, the conversation that produces the document is.

On the 1-week estimate

Believable for you, given parley’s existing infrastructure. But scope it tightly:

Week 1: YAML schema + CLI to generate graph + static HTML. Stop here and use it for 2-3 weeks.
Only then decide if you need more.

The risk isn’t build time — it’s over-building before you’ve validated that other people will actually look at the output. You’ll use it because you built it. The question is whether the business leader opens the link.

Bottom line

It’s worth the week. But go in with clear eyes: you’re building a conversation scaffold, not a solution to misalignment. The tool works if it becomes a shared artifact that both sides feel ownership over. It fails if it becomes “Xian’s graph that he shows in meetings” — at that point you’re still the translator, just with a nicer visual aid.

📝: You asked whether this YAML-based vision tool would actually help align business and tech leaders and whether the 1-week build is worth it, I answered that the tool has moderate standalone value but high value as a conversation scaffold — the real alignment comes from co-editing the shared artifact together, not from the visualization itself, and cautioned against over-building before validating that others engage with it.

Question

Yep, the real value is to have a tool that we own, and can tweak to whatever abstraction and workflow we need, with minimal learning curve, so to speak. and yes, having the same vocabulary is one core reason to have such a thing to begin with.

to further flesh out the idea, everything can be in a single yaml file, the directories and files are merely creating scoped namespaces for “project names”

And to KTS (keep things simple), we don’t need an ID, if an “ID” is needed for project, it’s just snake case of the project name.

- name: Auth Service Rewrite
  type: tech
  size: S
  quarter: Q3
  depends_on: []

- name: Data Platform
  type: tech
  size: XL
  quarter: Q3-Q4
  depends_on: [auth]

- name: Self-Serve Onboarding
  type: business
  size: L
  quarter: Q4
  depends_on: [data_platform, auth]

- name: Mobile App v2
  type: business
  size: M
  quarter: Q1-next
  depends_on: [data_platform]

We can even support prefix-ID scheme, e.g. you can write data to refer to data platform, basically all IDs are going to be prefix matched. If there are multiple match, or zero match, we would throw an error, as user to correct it. The whole thing is to get out of the way. All the fields are strings, until we assign strong semantic or syntax in code. All can be incrementally evolved, e.g. compiler will tell you error, if some syntax/semantic is tightened.

we can also support type-ahead in nvim, when input in the depends_on list. need to resist over engineering for now. I think the key first step is to have structure up, and a reasonable export, to a spreadsheet, so that TPM can use to organize things.

🤖:[Claude-Code-Opus]

🧠: Xian is refining the design and I should engage critically with the specific design decisions. Key observations: 1) prefix-matching IDs from snake_cased names is clever but has a known trap, 2) the spreadsheet export insight is very practical — that’s how TPMs actually work, 3) he’s showing good discipline about resisting over-engineering. I should affirm what’s sharp, flag the one real risk in prefix matching, and keep it concise since he’s in flow.

Yes — this is clean. A few reactions:

Prefix matching is the right UX, with one guardrail

The scheme works well now with a small number of initiatives. The failure mode is: you add “Data Lake” later, and suddenly data is ambiguous where it wasn’t before. Your “multiple match → error” rule handles this correctly — but the error surfaces at read time, meaning someone’s previously-valid file breaks when a different file introduces a new initiative.

Not a reason to avoid it. Just worth noting that when it happens, the fix is trivial (lengthen the prefix). And the error message should say clearly: "data" matches both "data_platform" and "data_lake" — be more specific.

Spreadsheet export is the killer first output

This is the right call. Not the graph — the CSV/spreadsheet. Because:

TPMs live in spreadsheets. That’s their IDE.
A spreadsheet with name | type | size | quarter | depends_on is immediately useful for status tracking, resource planning, sorting, filtering — all the things TPMs do that you don’t want to build UI for.
The TPM will add columns (status, owner, notes) — that’s fine, the YAML is the structural source of truth, the spreadsheet is the working copy.

The graph is for you and leadership. The spreadsheet is for the people who actually run the projects.

What I’d build in the first sitting

company.yaml  →  Python script  →  spreadsheet.csv
                                →  roadmap.svg (graphviz)
                                →  errors.txt (validation)

Three outputs, one command. The validation pass (dangling refs, ambiguous prefixes, circular deps) is arguably more valuable than either of the other two, because it catches the “wait, that doesn’t make sense” moments before the meeting instead of during it.

On “all fields are strings”

This is the right starting posture. The moment you introduce enums or date parsing, you’re making schema decisions that constrain how people express things. Let quarter: "Q3-Q4" and quarter: "late Q3" both be valid until you have a reason to parse them. The compiler-tightening-over-time metaphor is exactly right.

You’re designing a tool that earns its strictness rather than imposing it upfront. That’s why people will actually use it.

📝: You asked about refining the YAML schema design (prefix IDs, string-typed fields, spreadsheet export), I answered affirming prefix-match with ambiguity guardrail, that CSV export for TPMs is the killer first feature over the graph, and that “all strings until proven otherwise” is the right schema evolution strategy.

Question

Conversation: agentic state management

2026-03-28T00:00:00+00:00

This transcript is generated by parley.nvim.

Question

Let’s think about state management in agentic coding

So in agentic coding, the state of issues to fix are often outside the source control repo. There’s tension when a ticket is too big, then there needs to be incremental building blocks, this typically manifest as some todo.md file. In my workflow, I keep those todo.md file outside source control as they are ephemeral.

The issue then is I need to either finish that big ticket, can’t really make a check point and come back later.

It seems generally speaking, my intuition is to keep all state in a single place, the issues and the steps we think we need to fix those issues.

This led me to think I should update my AI workflow to be totally repo based, e.g. all tickets should be in some folder, and maybe proposed changes in subfolders. e.g.

issues/

00001/issue.md
00001/todo.md
00001/tombed -> to indicate this issue are "done"

then I guess you can have

00001/dependencies which just contain issue IDs that we need to work on first before this. 

It also seems, instead of running different tasks in parallel, and deal with all the synchronization and conflict, we will just do a single threaded thing, one at a time with based on topological order.

I still need to figure out how to bring parallelism in a structured way, you can always do service separation in different repo, but that goes against my intuition.

thoughts?

🤖:[Claude-Code-Opus]

🧠: The user is thinking deeply about state management in agentic coding workflows. They’re describing a real problem: when using AI agents to work on code, the “meta-state” (what needs to be done, progress tracking, decomposition of tasks) often lives outside the repo (in Jira, Linear, todo.md files excluded from git, or just in conversation context). This creates problems with checkpointing, resumability, and coordination. They’re proposing a repo-based issue tracking system specifically designed for agentic workflows. Let me think about the key tensions and offer substantive thoughts. Key points to address: 1) The fundamental insight about collocating state with code, 2) The specific folder structure they propose, 3) The single-threaded vs parallel execution problem, 4) What’s missing or could be improved in their model. I should engage critically - not just validate but push back where appropriate and add insights they might not have considered. They seem quite experienced with agentic coding workflows, so I should go deep.

This is a really interesting design space, and I think you’re converging on something important. Let me engage with the specific tensions.

The core insight is right: colocate state with code

The fundamental problem you’ve identified is that context is the scarcest resource in agentic coding, and scattering it across systems (Linear, Jira, local ephemeral files, conversation history) means every agent session starts with a costly reconstruction phase. Making the repo the single source of truth for both code and work state is sound — it gets you versioning, diffing, branchability, and resumability for free.

On your proposed structure

The structure is reasonable, but I’d push on a few things:

The tombed marker feels like reinventing git. If an issue is done, that’s really a property of the commit history — the issue folder existed, work was done, and the resolution is the diff. A simpler model might be: issues that exist in issues/ are open; closing an issue means deleting the folder (the history lives in git). Or if you want to keep them around for reference, a simple status file or frontmatter in issue.md works. The tombstone file approach creates a state you have to check that’s redundant with what git already tracks.

todo.md is doing a lot of work. This is really the plan — the decomposition of the issue into steps. I’d be more explicit about its semantics. Is it a checklist the agent checks off? Is it append-only? Can the agent revise it mid-execution? The interesting design question is: should the agent be allowed to modify the plan, or only execute it? In practice, plans change as you learn things during implementation. You probably want the agent to be able to re-plan, but you want that to be an explicit, visible action (a diff you can review), not something that silently happens.

Dependencies as issue IDs is good but incomplete. You’ll likely need to distinguish between:

Hard dependencies (must be merged first, the code won’t work otherwise)
Soft dependencies (would be easier if done first, but can be worked around)
Conflicts (touches the same files, can’t be parallelized safely)

The parallelism problem is the real hard problem

Your instinct toward single-threaded execution is pragmatic, but I think it’s worth being more precise about why parallelism is hard here. It’s not just merge conflicts — it’s semantic conflicts. Two agents can modify different files but break each other’s assumptions.

The structured way to bring parallelism back might not be repo separation but rather scope separation within a repo:

By module/package boundary: if your codebase has well-defined module boundaries, two issues touching different modules can safely parallelize. The agent needs to know the dependency graph of modules, not just issues.
By layer: an API contract change can be split into “update the interface” (serial, blocking) and “update all callers” (parallelizable per caller).
Speculative execution: run N agents in parallel on independent issues, but against the same base commit. Then merge sequentially, and if a later merge has conflicts, re-run that agent against the new base. This is essentially optimistic concurrency control.

The speculative execution model is probably the most practical — you accept that some work will be thrown away, but the wall-clock speedup is worth it for issues that usually don’t conflict.

What might be missing

One thing I don’t see in your model: the agent’s reasoning trace. When you come back to a half-finished issue, the todo.md tells you what was planned, but not why certain decisions were made. You might want something like 00001/log.md that captures key decisions and discoveries during execution. This is essentially the agentic equivalent of commit messages, but at a higher level of abstraction.

Also worth considering: who creates the issue decomposition? If a human writes issue.md and the agent generates todo.md, that’s a clear contract. If the agent does both, you need a review step before execution begins.

📝: You asked about state management in agentic coding workflows and a repo-based issue tracking structure, I answered with validation of the colocate-state-with-code insight, critiques of tombstone files and todo.md semantics, pushed on dependency types, suggested parallelism via scope separation and speculative/optimistic concurrency, and noted missing elements like reasoning traces and plan authorship contracts.

Question

The following are response from ChatGPT, thoughts?

Yes. You are noticing a real systems problem, not just an AI-tooling quirk.

The core tension is: 1. the work state exists outside the repo, 2. but the code evolution is inside the repo, 3. so resumability, auditability, and decomposition drift apart.

That is why large tickets feel “must finish now.” The missing piece is not coding power. It is durable, local, versioned task state.

Your repo-based model is directionally right. I would push it a bit further.

The main idea

Treat issue execution as a state machine stored beside the code.

Not just: • issue statement • todo scratchpad • done marker

But: • problem definition • current understanding • decomposition • dependency edges • execution log • acceptance criteria • resulting commits / PRs / artifacts

That gives the agent somewhere to “stand” when it comes back later.

Why your current pattern breaks

Keeping todo.md outside source control is fine for purely personal scratch work, but for agentic coding it causes three failures:

No checkpointing

You lose the intermediate reasoning and plan state that made the next step obvious.

No causal trace

Later you cannot tell whether a sub-step was finished, invalidated, or superseded.

No shared substrate

If multiple agents or even the same agent across time work on it, they have no canonical source of task truth.

So the result is exactly what you described: large tickets become “finish in one sitting or pay large re-entry cost.”

Your proposed structure is mostly sound

Something like:

issues/ 00001/ issue.md todo.md dependencies tombstone

is already much better than external ephemeral files.

But I would change a few things.

Minimal fixes

Replace tombed with explicit machine-readable state

Do not use file presence as the main status signal. It is too lossy.

Prefer:

issues/00001/status.json

For example:

{ “state”: “done”, “updated_at”: “2026-03-28T10:00:00Z”, “blocked_by”: [], “depends_on”: [“00003”, “00007”], “supersedes”: [], “superseded_by”: null }

Or if you want human-friendly first:

state: done updated_at: 2026-03-28T10:00:00Z depends_on:

00003
00007 blocked_by: []

Reason: agents and scripts can consume it deterministically.

Split todo.md into plan vs log

One file tends to become a mess of stale intentions plus historical junk.

Use:

plan.md log.md

•	plan.md: current intended next steps only
•	log.md: chronological execution notes, discoveries, failed attempts, decisions

This matters a lot. Otherwise stale todos poison future runs.

Put acceptance criteria in their own file

Large-ticket drift often comes from the plan replacing the goal.

Use:

acceptance.md

That lets an agent ask: “what counts as done?” instead of hallucinating completion.

Dependencies should be structured, not freeform text

A file named dependencies is okay, but make it machine-readable:

depends_on.txt

or YAML/JSON.

Example:

00003 00007 00011

Simple is fine.

A better repo shape

I would probably use:

issues/ 00001/ issue.md # problem statement acceptance.md # definition of done plan.md # current decomposition / next steps log.md # execution history, discoveries, decisions status.yaml # state, priority, owner, timestamps depends_on.txt # issue IDs artifacts/ # generated diffs, notes, screenshots, benchmarks

Optional:

patches/
research/

if the task requires exploration before code changes.

State model

You want explicit lifecycle states, maybe: • new • ready • in_progress • blocked • review • done • abandoned

That is enough.

For example:

state: in_progress priority: medium created_at: 2026-03-28T09:10:00Z updated_at: 2026-03-28T10:22:00Z depends_on:

00003 blocked_by: []

This gives you resumability without inventing a huge PM system.

Single-threaded topological execution

This is the part where your instinct is strong.

For agentic work, single-threaded by default is correct.

Not because parallelism is bad in principle, but because most repo-local tasks share hidden state: • same codebase assumptions • same branch • overlapping files • overlapping test surfaces • evolving plan context

Parallelism here often creates fake speed and real merge debt.

Topological execution over explicit dependencies is much saner. It gives: • deterministic work order • fewer conflicts • clearer blocked states • easier restart after interruption

So yes, “one at a time in dependency order” is a good default operating model.

Where parallelism actually belongs

Not “multiple agents randomly touching the same repo.”

Parallelism works when units are genuinely separable along one of these dimensions:

File/ownership disjointness

Two tasks provably touch different bounded areas.

Interface-first decomposition

One task defines contract/spec/tests, another implements against that fixed interface.

Research vs execution split

One agent investigates and writes structured findings, another applies code changes.

Branch-isolated speculative work

Parallel branches only for alternatives, not concurrent mutation of the same plan.

So I would not bring parallelism in as “do many tickets at once.” I would bring it in as controlled fork-join under explicit boundaries.

The deeper issue: issue systems are not enough

You also correctly noticed that “issues” and “steps to solve issues” want to live together.

Classic ticket systems separate: • ticket = external tracker • code = repo • work log = chat / notes / memory • decomposition = hidden scratchpad

That architecture is bad for AI workflows.

AI agents benefit from co-locating operational state with the code. In other words, the repo becomes not just source code, but the local working memory substrate.

That is a meaningful shift.

But there is one objection

Knowledgeable people will object that this pollutes the repo with ephemeral planning noise.

That objection is valid.

There are really three possible answers:

Option A: Commit issues/ into the repo

Pros: • full auditability • perfect resumability • easy collaboration

Cons: • lots of operational noise • history clutter • people may dislike mixing planning with product code

Option B: Keep issues/ in repo but gitignored

Pros: • local durable state • no repo pollution • agents still get a canonical on-disk structure

Cons: • not shared across machines unless separately synced • weaker team-level coordination • state can silently diverge

Option C: Separate “worktree state repo”

A second repo colocated with the main repo, e.g.:

myproject/ myproject.ops/

Pros: • structured durable state • versioned independently • no pollution of main source tree • sharable if desired

Cons: • another moving part • slightly weaker coupling to code unless tooling is good

I suspect Option C is probably your best long-term architecture.

It preserves your intuition that state should be repo-based, but avoids contaminating the product repo with ephemeral execution debris.

This is the model I would recommend

Use a separate operational repo or sibling directory:

repo/ repo.work/ issues/ 00001/ issue.md acceptance.md plan.md log.md status.yaml depends_on.txt links.yaml

Where links.yaml might contain:

code_repo: ../repo branch: issue/00001-fix-auth-timeout commits:

abc1234 pr: null

That gives you durable work state, resumability, and clean separation.

On decomposition

You are also converging on an important point: a “big ticket” should usually become a DAG of smaller issues, not a single issue with a long todo list.

So instead of: • 00001 with a giant todo.md

prefer: • 00001 parent issue • 00002 extract parser • 00003 add failing tests • 00004 refactor interface • 00005 migrate callers

with explicit dependencies.

Then the parent issue becomes mostly a coordination node.

That is much more robust than giant freeform todo documents.

A practical heuristic

Use a new issue node when a step has any of these properties: • can be committed independently • has its own acceptance criteria • may block or unblock other work • could be delegated • could be abandoned without invalidating the entire ticket

If not, keep it as a checklist item inside plan.md.

That gives a sensible boundary between “sub-issue” and “todo item.”

How to handle resuming later

When an agent starts work, it should only need to read: • issue.md • acceptance.md • status.yaml • depends_on.txt • plan.md • tail of log.md

That is the minimal context pack.

If it needs more, it can inspect artifacts and code. This keeps restart cheap.

My blunt take

Your current instinct is right, but I would not say “all tickets should be in the source repo” as a universal rule.

The stronger rule is:

all execution state should be on disk, structured, durable, and near the codebase.

Whether that lives inside the main repo, in a gitignored subtree, or in a sibling ops repo is a secondary design choice.

For most serious agentic coding workflows, I would choose: • code repo: product source • work repo / ops repo: issue DAG, plans, logs, artifacts • default execution: single-threaded topological order • parallelism: only through explicit fork-join boundaries

That is coherent and scales much better than ad hoc todo.md scratchpads.

If you want, I can sketch a concrete file schema plus a tiny CLI design for creating issues, marking state transitions, and selecting the next runnable task.

🤖:[Claude-Code-Opus]

🧠: The user is showing me a response from another LLM and asking for my thoughts. This is a meta-discussion about comparing perspectives. I should be honest about where I agree, where I disagree, and where I think the other LLM’s response has strengths or weaknesses compared to my own earlier response. Let me think critically.

Key things to evaluate:

The state machine / lifecycle model (new, ready, in_progress, blocked, review, done, abandoned)
The plan.md vs log.md split
The Option A/B/C analysis (in-repo, gitignored, separate repo)
The recommendation of Option C (separate ops repo)
The DAG decomposition of big tickets into sub-issues
The structured files (status.yaml, acceptance.md, etc.)
The “minimal context pack” for resuming

My honest assessment: this response is thorough but I think it over-engineers the problem significantly. It’s the classic LLM pattern of taking a good idea and adding layers of structure until it becomes a project management system. Let me be specific about what I think is good and what’s over-designed.

The separate ops repo (Option C) recommendation is interesting but I actually think it works against the user’s original insight — colocating everything. The whole point was reducing context fragmentation.

The status.yaml, acceptance.md, links.yaml etc. — this is creeping toward reinventing Jira on disk. The user’s original simple structure was better.

The DAG decomposition point is genuinely good.

The plan.md vs log.md split is reasonable but I should note that in practice, for agentic coding, the log is often the conversation history itself.

I should be direct and critical, since the user is asking for my honest assessment.

This is a thoughtful response, but I have some real disagreements with it. Let me be direct.

Where it’s strong

The plan vs. log split is genuinely good. I made a similar point about separating the reasoning trace from the plan, but this response names it more cleanly. Stale todos poisoning future runs is a real failure mode — I’ve seen agents re-attempt steps that were already tried and abandoned because the todo.md didn’t distinguish “planned” from “tried and failed.”

The DAG decomposition heuristic is the best part. The rule of thumb — “create a sub-issue when a step can be committed independently, has its own acceptance criteria, could be delegated, or could be abandoned without invalidating the parent” — is practical and correct. That’s the kind of guidance that actually changes how you work.

The “minimal context pack” concept is useful. Naming the exact set of files an agent reads on resume is a good design discipline.

Where I think it goes wrong

It over-engineers the structure. status.yaml, acceptance.md, links.yaml, artifacts/, patches/, research/ — this is drifting toward building a project management system on the filesystem. The user’s original structure was 3-4 files. This response inflated it to 7+ files per issue with YAML schemas. In practice, that structure becomes a maintenance burden itself, and the agent spends tokens reading and updating metadata instead of writing code.

The pragmatic question is: who maintains all this metadata? If the agent does, you’re spending agent cycles on bookkeeping. If the human does, you’ve just recreated Jira with worse UX.

Option C (separate ops repo) contradicts the core insight. The user’s whole point was that fragmentation of state across locations is the problem. A sibling repo.work/ directory is just a more organized version of the same fragmentation. Now you need to keep two repos in sync, worry about which commit in the ops repo corresponds to which commit in the code repo, and the links.yaml with branch names and commit hashes is exactly the kind of cross-referencing that rots immediately.

I’d argue Option A (committed to the repo) is actually correct for agentic workflows, with one nuance: use a branch convention. The issues/ folder on main contains open issues. Completed issues get cleaned up as part of the closing PR. The full history lives in git. This is simpler and more robust than any of the three options as presented.

The lifecycle states are overkill. new → ready → in_progress → blocked → review → done → abandoned is a workflow engine. For single-threaded topological execution — which both the user and this response agree is the right default — you really only need three states: open, blocked, done. The scheduler picks the first open issue whose dependencies are all done. That’s it. Adding ready vs in_progress vs review implies a multi-actor workflow that doesn’t exist yet.

The deeper thing it misses

This response treats the issue structure as a data format problem and solves it with more files and schemas. But the harder problem is what the agent does when the plan is wrong.

An agent starts executing plan.md, discovers the approach won’t work at step 3 of 7. What happens? The response says the agent should be able to re-plan, but doesn’t address the mechanics. I think this is where the real design work is:

Should the agent stop and create new sub-issues in the DAG?
Should it modify plan.md in place and note the change in log.md?
Should it escalate to the human?

The protocol for plan revision matters more than the file layout. A simple structure with a clear revision protocol will outperform an elaborate structure with no protocol.

My blunt take on their blunt take

The response says “all execution state should be on disk, structured, durable, and near the codebase” — that’s right. But then it recommends putting it in a different repo, which is “near” only in the filesystem sense. The coupling between work state and code state is semantic, not spatial. Git branches already give you a mechanism to associate work-in-progress state with code changes without polluting main. Use that.

Keep the structure minimal. Start with what the user originally proposed. Add files only when you hit a concrete failure mode, not preemptively.

📝: You asked for my critical assessment of another LLM’s response on repo-based issue management for agentic coding. I answered that it correctly identifies plan/log split and DAG decomposition as valuable, but over-engineers the file structure toward a filesystem PM system, the separate ops repo recommendation contradicts the core insight of colocation, the lifecycle states are overkill for single-threaded execution, and the harder unsolved problem is the agent’s plan-revision protocol rather than the data format.

Question

Alright, based on what you understand of my intention, propose a simple but effective structure.

🤖:[Claude-Code-Opus]

🧠: The user wants a practical, minimal structure for repo-based agentic coding state management. Key constraints from the conversation: 1) Everything in the repo (not separate ops repo), 2) Simple - they started with issue.md, todo.md, tombstone, 3) Single-threaded topological execution as default, 4) Must support resumability (checkpointing and coming back later), 5) Must support decomposition of big tickets into sub-issues (DAG), 6) They use agentic coding workflows where an AI agent picks up and executes work. I should propose something close to their original intuition but incorporating the good ideas: plan vs log split, DAG decomposition, machine-readable dependencies. I should NOT over-engineer with status.yaml, acceptance.md, links.yaml etc. I should be opinionated and concrete. Let me think about what’s truly minimal: - issue.md: problem statement + what done looks like (no separate acceptance.md, that’s over-engineering) - plan.md: current steps, agent can revise but the diff is visible in git - log.md: discoveries, failed attempts, key decisions - deps: just issue IDs, one per line - done marker: I argued against tombstone files earlier. But actually, for simple scripting/agent consumption, a status indicator is useful. Simplest: the agent reads a single line from a status file. Or… just use presence in a convention. Hmm. Let me think about what the scheduler needs: it needs to know which issues are open and which are done, and dependency order. Simplest: a status file with one word in it: open, blocked, done. That’s 3 states as I recommended. Actually even simpler: issues/ folder contains open issues. A done issue gets its folder moved or deleted. But that loses history… no, git keeps it. But the agent needs to check deps - if a dep issue folder doesn’t exist, it’s done. That’s actually elegant but fragile. OK, I think a simple status file with one word is the right balance. Let me also think about the plan revision protocol since I criticized the other response for not addressing it. And the DAG decomposition - when should a step become a sub-issue. Let me draft the structure and conventions.

Here’s what I’d propose — biased toward minimalism, with clear conventions over configuration.

Structure

issues/
  0001/
    issue.md
    plan.md
    log.md
    deps
    status

That’s it. Five files max per issue.

File semantics

issue.md — The problem statement and what done looks like. One file. Keep them together because the definition of done is meaningless without the problem context. Format:

# Auth tokens not refreshed on 401

When the API returns 401, the client crashes instead of refreshing.

## Done when

- Client intercepts 401, refreshes token, retries original request
- Test covers the refresh-and-retry path
- No other callers affected

plan.md — The current execution plan. A checklist. The agent owns this file and can revise it. Revisions are visible in git diff, which is your audit trail. No separate versioning needed.

- [x] Identify all call sites that hit authenticated endpoints
- [x] Extract token refresh logic from login flow
- [ ] Add retry interceptor to HTTP client
- [ ] Write test for 401 → refresh → retry
- [ ] Verify no regression in existing auth tests

log.md — Append-only. The agent writes here when something unexpected happens: a failed approach, a discovery, a decision that deviates from the plan. This is what makes resumability work — the next session reads this to understand why the plan looks the way it does.

## 2025-06-28 14:30
Tried wrapping fetch() directly but the interceptor pattern 
doesn't work because we use a custom transport layer. 
Need to hook in at the transport level instead.
Revising plan step 3.

deps — One issue ID per line. Empty file or absent means no dependencies.

0003
0007

status — A single word on a single line. Three possible values:

open

open — ready to work on (or being worked on; in single-threaded mode, this distinction doesn’t matter)
blocked — can’t proceed, and the reason isn’t just deps (agent got stuck, needs human input, external blocker)
done — finished

That’s the entire lifecycle. The scheduler logic is trivial:

pick the first open issue whose deps are all done

DAG decomposition

When a step in plan.md gets big enough that it could be committed, tested, and reviewed independently — pull it out into its own issue. The parent issue’s plan then references the child:

- [x] Extract token refresh logic → issue 0008
- [ ] Add retry interceptor → issue 0009
- [ ] Integration test

The parent’s deps doesn’t list its children (that would be circular). Instead, the parent stays open and its plan tracks children by reference. The parent is done when its plan is fully checked off.

Children can depend on each other via their own deps files. That’s where your topological ordering comes from.

Plan revision protocol

This is the part that matters more than file layout. Three rules:

The agent can revise plan.md freely, but must write an entry in log.md explaining why before making the change. This makes plan drift visible and reasoned.
If the agent discovers the issue itself is wrong (wrong problem statement, wrong scope), it sets status to blocked and writes in the log what needs human attention. It does not silently redefine the issue.
If a plan step turns out to need more than ~2-3 commits, the agent decomposes it into a sub-issue rather than inflating the current plan. This keeps each issue’s plan short enough to hold in context.

Committing strategy

Commit issues/ to the repo, on whatever branch you’re working on. When an issue is done and merged, you can either:

Delete the folder in the merge commit (clean main, history in git)
Leave it (simple, searchable, slightly noisy)

I’d start with deleting on merge. If you miss having the history around, switch to leaving them.

A minimal scheduler

This is almost trivially implementable:

#!/bin/bash
# next-issue: prints the next issue to work on

for dir in issues/*/; do
  id=$(basename "$dir")
  status=$(cat "$dir/status" 2>/dev/null || echo "missing")
  
  if [ "$status" != "open" ]; then
    continue
  fi

  # Check all deps are done
  all_done=true
  if [ -f "$dir/deps" ]; then
    while IFS= read -r dep; do
      dep=$(echo "$dep" | tr -d '[:space:]')
      [ -z "$dep" ] && continue
      dep_status=$(cat "issues/$dep/status" 2>/dev/null || echo "missing")
      if [ "$dep_status" != "done" ]; then
        all_done=false
        break
      fi
    done < "$dir/deps"
  fi

  if [ "$all_done" = true ]; then
    echo "$id"
    exit 0
  fi
done

echo "No runnable issues found." >&2
exit 1

# Usage
chmod +x next-issue
./next-issue
# Output: 0001

What I deliberately left out

No YAML/JSON metadata. Plain text files are easier for both humans and agents to read and write. You don’t need structured data until you’re building dashboards.
No separate acceptance criteria file. It’s three lines in issue.md. Another file is another thing to keep in sync.
No artifacts folder. If the agent produces research or patches, they’re either commits or they go in the log.
No priority field. In single-threaded topological execution, priority is determined by dependency order and the sequence you create issues in. If you need explicit priority later, add a single number to the first line of status (e.g., open 1).

Start here. Add structure only when you hit a specific failure mode that forces it.

📝: You asked me to propose a concrete minimal structure for repo-based agentic coding state management, I answered with a 5-file-per-issue structure (issue.md, plan.md, log.md, deps, status), three lifecycle states (open/blocked/done), a plan revision protocol (log before revise, block if issue is wrong, decompose if step is too big), DAG decomposition convention, and a trivial bash scheduler — deliberately excluding YAML metadata, separate acceptance files, artifacts folders, and complex lifecycle states.

Question

If I move towards that (e.g. repo based issue management), then I’ll need some base level tool to help manage them.

I already have a tool called parley.nvim. I gues I’ll just extend that, a subsystem there would be an issue management system based on a folder of .md and .yaml files. Given how much you know about parley, read the specs folder for parley, propose an issue management subsystem, following the same “Keep It Simple” philosophy: issue management system is just a folder of markdown files.

🤖:[Claude-Code-Opus]

🧠: The user wants me to read the specs folder of their parley.nvim plugin to understand its architecture and philosophy, then propose an issue management subsystem that fits naturally within it. Let me fetch the repo and specs first.Let me read the specs index and a few key spec files to understand the architecture and philosophy.🧠: I can see the parley.nvim structure now. Key patterns I observe: 1) Everything is markdown/plain files (chats are markdown with emoji prefixes like ## Question

and 🤖:), 2) There’s already a notes system with year/month/week organization, 3) Specs are organized in a clean hierarchy, 4) The philosophy is “plain markdown files you can edit anytime” and “keyboard-first”, 5) There’s already a chat_dir and notes_dir concept, 6) There’s a picker system (telescope-based) for finding/navigating, 7) The plugin already has concepts of lifecycle, templates, and finder. The user wants me to propose an issue management subsystem that fits naturally within parley’s existing patterns. I should propose something that mirrors how chats and notes already work — markdown files, simple directory structure, telescope pickers, minimal metadata. The key insight is that parley already treats chats as “a folder of markdown files with a header” — issues should follow the exact same pattern. I should leverage existing parley infrastructure: pickers, file references (@@), the template system, and the notes-like directory structure. Let me propose something concrete.

Based on what I can see from the specs index and the README, here’s what I’d propose. The key design principle: issues should work exactly like chats and notes already work in parley — markdown files with frontmatter, navigable via pickers, editable as normal buffers.

Design: Issues as Markdown with YAML Frontmatter

Parley already has the pattern: chats are markdown files with a header block. Notes have templates and a directory structure. Issues should be the same thing — no new abstractions, just a new issues_dir.

Directory layout

{issues_dir}/
  0001-auth-token-refresh.md
  0002-extract-parser.md
  0003-add-retry-tests.md

That’s it. One file per issue. No subdirectories per issue. The plan, log, status, and deps all live inside the single markdown file as sections. This mirrors how a parley chat is one file with all state in it — the header plus the transcript.

Issue file format

Following parley’s convention of frontmatter + structured content:

---
status: open
deps: [0002]
created: 2026-03-28
updated: 2026-03-28
---

# Auth tokens not refreshed on 401

When the API returns 401, the client crashes instead of refreshing.

## Done when

- Client intercepts 401, refreshes token, retries original request
- Test covers the refresh-and-retry path

## Plan

- [x] Identify all call sites that hit authenticated endpoints
- [x] Extract token refresh logic from login flow → issue 0002
- [ ] Add retry interceptor to HTTP client
- [ ] Write test for 401 → refresh → retry

## Log

### 2026-03-28 14:30
Tried wrapping fetch() directly but interceptor pattern doesn't
work with custom transport layer. Revising plan step 3.

### 2026-03-28 15:10
Transport-level hook works. Proceeding with retry logic.

Why single file, not a folder-per-issue

Three reasons that align with parley’s philosophy:

Parley chats are single files. A chat’s entire state — header, system prompt reference, all turns — is one .md. Issues should follow the same pattern. You already have muscle memory for navigating a structured markdown file in nvim.
Telescope picker works on files, not directories. The existing picker infrastructure can list issues the same way it lists chats or notes — rg or fd over a directory, show filename + maybe grep the status from frontmatter. No new picker abstractions needed.
@@ file references already work. In a parley chat, you can reference @@issues/0001-auth-token-refresh.md to pull an issue into context when discussing it with an LLM. One file = one reference. If issues were folders, you’d need to reference multiple files or invent a new aggregation syntax.

Minimal subsystem spec

Following the pattern of existing parley specs, here’s what the implementation would need:

specs/issues/structure.md    -- format, frontmatter schema, sections
specs/issues/lifecycle.md    -- status transitions, scheduler logic
specs/issues/templates.md    -- template for new issues

Config additions

require("parley").setup({
  -- ...existing config...
  issues_dir = vim.fn.getcwd() .. "/issues",  -- repo-local by default
  issue_template = "default",
})

Note: unlike chat_dir and notes_dir which are global user directories, issues_dir defaults to cwd-relative because issues belong to the repo, not the user.

Commands and keybindings

Following parley’s prefix convention:

Command	Binding	Description
`:ParleyIssueNew`	`c`	Create issue from template, auto-increment ID
`:ParleyIssueFinder`	`f`	Telescope picker over issues, preview, open
`:ParleyIssueNext`	`x`	Open the next runnable issue (topo sort)
`:ParleyIssueStatus`	`s`	Cycle status in frontmatter (open → blocked → done)
`:ParleyIssueDecompose`	`d`	Create child issue, insert dep link in current plan

The IssueFinder should show status in the picker line, e.g.:

[open]    0001-auth-token-refresh
[open]    0003-add-retry-tests
[blocked] 0005-migrate-callers
[done]    0002-extract-parser

Sort order should be by default topological depending on dependency, otherwise old to new.

With filtering — default to showing open + blocked, toggle to show all.

Scheduler logic (the `IssueNext` command)

-- Pseudocode for next runnable issue
function M.next_issue(issues_dir)
  local issues = parse_all_issues(issues_dir)
  local done_ids = set of issues where status == "done"

  for _, issue in ipairs(sorted_by_id(issues)) do
    if issue.status == "open" then
      local all_deps_done = true
      for _, dep in ipairs(issue.deps) do
        if not done_ids[dep] then
          all_deps_done = false
          break
        end
      end
      if all_deps_done then
        return issue
      end
    end
  end
  return nil
end

Parsing frontmatter is trivial — read the file, extract the YAML block between --- markers, pull status and deps. Parley already parses chat headers, so this pattern exists in the codebase.

The `IssueDecompose` command

This is the one non-obvious command. When your cursor is on a plan line like:

- [ ] Extract token refresh logic from login flow

Running :ParleyIssueDecompose should:

Create a new issue file 0002-extract-token-refresh.md (prompts for slug)
Copy the line as the issue title
Add the new ID to the parent’s deps in frontmatter
Replace the plan line with: - [ ] Extract token refresh logic → 0002
Open the new issue for editing

This makes DAG decomposition a single keystroke from within the plan.

Template

---
status: open
deps: []
created: 
updated: 
---

# 



## Done when

-

## Plan

-

## Log

Follows the same template variable pattern as note templates.

What I deliberately kept aligned with parley’s existing patterns

Frontmatter for metadata — chats already use this
Emoji-free — unlike chat turns (💬/🤖), issues don’t need visual turn markers; markdown headings suffice
Picker-first navigation — same as chat finder and note finder
@@ compatible — issues are just .md files, so existing file reference syntax works in chats
One file = one thing — same as one chat file = one conversation

What this does NOT include (deliberately)

No priority field (ID order + deps is enough)
No assignee/owner (single-player workflow)
No tags/labels (use grep)
No separate status file, plan.md, log.md per issue (one file is simpler)
No JSON/YAML status files (frontmatter is enough)

The previous proposal I gave you with 5 files per issue was designed for a generic CLI tool. For parley specifically, single-file-per-issue is better because it matches the existing “one markdown file is one unit of work” pattern that pervades the plugin.

📝: You asked about proposing an issue management subsystem for parley.nvim that fits its philosophy, I answered with a single-file-per-issue design using YAML frontmatter + markdown sections (status/plan/log in one file), telescope picker integration, 5 commands (New/Finder/Next/Status/Decompose), repo-local issues_dir, leveraging existing parley patterns like chat files, @@ references, templates, and pickers.

Question

AI Workflow Evolved, One Year Jouney

2026-03-28T00:00:00+00:00

It’s been about a year since I’ve been vibe coding on Parley.nvim. It has also been three weeks since my last post on AI coding workflow. So many things have changed, both my understanding of such workflows and the world around it. For example, Nvidia open sourced OpenShell. Let’s take a look at my evolution, I think this might be useful for people new to the space.

Archaic time, a year ago, where I just used coding agent like claude directly, sending prompt directly inside its cli. It’s a fun but flaky workflow. You can read it here and here.
Using workflow management like Conductor, or Vibe Kanban, that’s when I was when I wrote this. OpenAI also open sourced Symphony which pushed boundary of such “workflow management” software.
Made my own little workflow, both because of my realization that there isn’t that much those “workflows” provide, and my desire to learn what’s under the hood. This resulted in a simple Makefile based flow, where I use make issue 42 to download github issue to a well-known-to-AGENTS.md location, and more diligently shaping the workflow in the AGENTS.md. I had two modes, either working on a branch in worktree, or working directly on main. After all, I’m a single guy tweaking Parley. It’s just a Makefile.
I kept tinkering this workflow, adding some archive function, to back up issue.md and todo.md after each feature was done to history/. This made it easier to continue tweaking key files, such as AGENTS.md, lessons.md and specs. Between things, accumulated 400+ commits in Parley.nvim.
OpenShell was open sourced, and you can take a look at their workflow, which was expressed as a set of agent skills. Roughly, you can describe what I did in my Makefile using plain English and get agent to do it. I’m not very sure what’s the point to make such very deterministic workflow agentic; but either way, my Makefile was totally written by agents, so in a sense, my Makefile based workflow is just a compiled version of agentic skills.
As I worked with my workflow in 3., one problem I notice is that the issues, the code, the plan to implement issues in the codebase, and incremental state of that implementation scattered in several different places, and have different life cycles. In Conversation: agentic state management I wondered why not just use repo to manage issue tracking. This resulted in me just putting issues in a structured way in issues/ in the repo itself, and the creation of simple Parley function to manage those issues. This made my “single developer” workflow supercharged. This will also break down when there are more people operated in a repo, need to figure out how to evolve in that type of world.
Then I integrated with OpenShell, basically replicating what I did in 6. within an OpenShell Docker container for isolation. I just “chatted” about what I wanted in Parley, committed it to repo, and asked Claude to make a plan based on that chat. Some minor back and forth later, now I have a hermetic-enough sandboxed agentic environment. Agentic coding made setting this up really a bliss, mostly I did was: “go create an OpenShell environment based on my Mac setup in ~/.config/nvim, ~/.zshrc and ~/settings/brewfile”. Well, almost, there’s a big gotcha during this process, I’ll talk about that in a separate post.
Agents will eventually forget about instructions, based on intuition described in 5., I decided to start lifting what’s previously described as principles in AGENTS.md, into actual code. This resulted in pre-merge-checks.sh, which essentially are “subagents” invoked deterministically. Further, I put in Claude hooks, that are automatically triggered when size of the diff cross some threshold. Really, just like how human would evolve code. Claude happily piled on a lot of sh script, I have the distinctive feeling at some point we’d need to rewrite this portion properly.

All of those tinkering is to get a real sense of the shifting ground of software development. It is very evident that agentic coding allow you to customize your environment, just like I mentioned in Peronalized Software. Your development environment is also ripe of opportunities for customization to your hearts content, just like what I did during the development of Parley.

What’s next? I firmly believe that the future is human setting up the environment, allowing full automation of coding. Parley is not able to achieve this, as I’m not yet familiar with UI test automation, nor with Neovim’s UI testing frameworks. I don’t believe human can be in the loop to review most of what agents write, it will be such a big bottleneck, and no fun job. Then the million dollar question, is how would we be comfortable agent is doing the right thing, not exposing your private key, not doing $O(n^2)$ algorithm when $O(n)$ algorithm is available?

I don’t know, but intend to tinker and find out.

Conversation: further on the ideal of log.md

2026-03-28T00:00:00+00:00

This transcript is generated by parley.nvim.

← Agentic state management

Question

The log.md is trick as it depends on how base model behave. doesn’t seems like something easy for me to plugin. right?

🤖:[Claude-Code-Opus]

🧠: The user is pointing out that log.md depends on the agent’s behavior - the agent needs to actively write to it during execution. This isn’t something the user can easily enforce or plug in from the outside. It’s a valid concern. The log.md requires the agent to have discipline about appending to it, which depends on the base model’s system prompt compliance, tool use patterns, etc. The user can’t just set up a file structure and expect the agent to maintain a log. This is different from issue.md (written by human), plan.md (can be written at start), deps (static), status (simple state change). Log.md requires continuous, disciplined writes during execution. Let me think about practical alternatives - maybe log.md should just be git history, or maybe the orchestration layer handles it, or maybe it’s just not worth the complexity.

You’re right, and this is an important practical point.

The problem with log.md

log.md requires the agent to voluntarily and consistently write structured notes mid-execution. That means:

You need the system prompt to instruct it
You need the model to actually comply throughout a long session
Different models behave differently — some will dutifully append, others will forget after a few steps
If the agent is using tool calls (file edits, shell), adding a “write to log” step between every meaningful action is fragile

It’s the least controllable file in the structure. Everything else is either human-authored or a simple state marker.

What actually works instead

The durable trace you want from log.md already exists in two places:

Git history — if the agent commits incrementally (which it should), the commit messages are the log
Conversation transcript — most agent frameworks (Claude Code, Cursor, Aider) already persist the session

So rather than asking the agent to maintain a separate log file, a more robust approach:

issues/
  00001/
    issue.md
    plan.md       ← agent updates this (checked items, revised steps)
    deps
    status        ← open | blocked | done

And the “log” is just:

plan.md diff over git history — you can see how the plan evolved
commits on the working branch — what actually happened

Plan.md as the only mutable file

This simplifies the agent’s job to one thing: keep plan.md honest.

Check off steps as done
Cross out steps that turned out wrong
Add new steps discovered during work
Commit plan.md changes alongside code changes

That’s a much more natural fit for how models actually behave — they’re already editing files and committing. You’re just asking them to edit one more file as part of their normal flow.

When you truly need a log

If you hit a case where you need to understand why something was tried and abandoned, you can reconstruct it from:

git log --follow -p -- issues/00001/plan.md

That gives you the full evolution of thinking, tied to real commits, without requiring the agent to maintain a separate file.

The honest answer

log.md is a good idea in theory but a compliance burden in practice. Drop it. Let git be your log. Put all the agent’s mutable state into plan.md and keep the contract simple: one file to read, one file to update, commit often.

📝: You asked about log.md being impractical given base model behavior, I answered that you’re right — it’s the least controllable file, drop it, let git history and plan.md diffs serve as the log instead, keep the agent’s contract to a single mutable file (plan.md) alongside commits.

Question

Personalized Software

2026-03-19T00:00:00+00:00

The recent surge in the quality of agentic coding opens up the endless possibility that more people will be coding and creating software to their hearts content. For example, I’m a command line and vim user. For years, I learn about new plugins, and figure out how do they improve my workflow. More recently though, I started creating an nvim plugin directly to suit my workflow, resulting in parley.nvim. At the beginning, my tinkering is limited, and a bit timid, as I didn’t know the capability of agentic coding, and I didn’t know much of lua, or nvim. But after 400 commits all human guided but AI driven, I got better at agentic workflow, and the capability of coding agents also improved by leap and bounds. Now I find myself almost freely guiding the plugin to whatever direction I want.

The latest example is that when I wanted mouse to work with type-a-head search interface and the all powerful telescope doesn’t support mouse well, I end up just getting the agent to write from scratch a floating pop-up that supports mouse, and the search behavior exactly I like!

Software development is rapidly merging with product management. The winner’s going to be whoever can both visualize the product, understand key internal states and subtle user interactions, and able to guide coding agents just enough to implement it. I suspect this is going to be the direction of software, a lot of personalize software, created just for a small number of users. That is, if human still use computer, manually, i.e. the knowledge workers still exist. What are the implications in such a world?

In that a world, it seems software systems need to be more programmable. We need not only great user interfaces geared towards end users, but also great programmability for coding agents to create customized workflows and user interfaces. The venerable vim/emacs are good examples of this. Extending this line of thoughts, it means for SaaS products, they need programming environments such that users can configure their own workflows. Few of them today do this today, wonder if this is going to be a trend in the future.

It seems we still need strong foundational software, like neovim, Emacs, or Google Sheets, to provide that programmable canvas. It’s unclear there are space for commercial software in such space. It just feels such software fitting the open source model better. It seems at minimal, SaaS products need to head into the direction of programmability, both in API form, but also in programmable user interfaces.

My AI Coding Workflow

2026-03-08T00:00:00+00:00

I used to write code; now I write poems… - so to speak

Previously, I introduced my thoughts that AI is the new abstraction layer, and marveled at how far AI coding has matured. So I went in a big way into tinkering in parley.nvim and would love to share the experience and the workflow I follow. If you remember my post 10 months ago, I got stuck in some semi-complex features. This round, it was easy to power through them, I haven’t found anything that I wanted to create that I couldn’t get the agent to do it for me!

The workflow orchestration space

I was initially impressed with tool like Conductor. In hindsight, it was a matter of the typical AI shock and awe. It turns out there are plenty of tools in this workflow orchestration space, such as Vibe-Kanban, and OpenAI just released Symphony.

When we take a closer look at those tools, you realize the core capability of such workflow came from the model and the system prompts. So I went ahead and built a simple workflow that suits my needs, mostly as a way to learn those intricacies. Part of that is to cut the middle-man’s opaque magic. Well, it turns out there aren’t too much of such magic anyway. This is similar to the days where I tweaked my dotfiles to fit my workflow, just super charged with AI capabilities.

I tweaked the workflow based on my belief of how medium scale development would be in the immediate future to take advantage of current AI capabilities. Maybe once I run this enough, I would be more comfortable with a flow further away from terminal and code.

The workflow around git and github

First is a set of tools to manage multiple git worktrees. It turns out with AI coding, it’s pretty simple, just ask Claude to do it. I choose to have targets like the following in Makefile:

make issue 42 to create a new worktree for github issue 42. It also dumps issues into tasks/issue.md, such that when I’m in the worktree folder, and start AI coding agent, I just need to say: “go work”, and it will pick up tasks/issue.md and start working on it. The fact they should use tasks/issue.md is in the system prompt.
make pull-request to submit current worktree as a pull request.
make merge to merge the current pull request, and clean up local worktree and branch, close associated github issue.

You get the point, just ask Claude to make a Makefile target for your workflow. Those are just conventional stuff, but with AI, everyone can tweak their workflow to their heart’s desire.

I also reserve a simpler workflow, directly work on main and push to main when done, fine for small personal project.

The workflow around AI agent

You may recall previously I expressed slight unease with OpenSpec during my learning, as I couldn’t picture mentally how the various levels of specs stay synchronized with each other and with the code. You only get a promise from AI that it would be so. Plus, it is often very hard to understand a problem space fully upfront, and using the imprecise human language to specify all the details is hard. I went through an instance of this, when trying to create oauth flow for parley.nvim, and realize it is much more complicated than I initially understood it to be. There would be no way for me, who only know oauth as a user, to understand all those nuances upfront.

Instead, I have constructed the following workflow, which I would call: Iterative Spec Flow. The key premise is to let go full specification upfront, with the rationale that coding itself uncovers edge cases and details. The spec in this flow, is the spec of boundaries to constrain an agent’s search space, not all the exact details to follow. And as coding continues, the spec evolves with new discoveries. In this flow, human has a couple of definitive places to review and verify.

Files involved

Orchestration glue: AGENTS.md. This is the main system prompt file. It explains and ties several other markdown files agent should maintain.
Project definition: tasks/issue.md. The work for each worktree. This is user provided, for example, make issue 42 will populate this from GitHub issue 42.
Agent state:
1. tasks/todo.md: this is the agent’s planning space and encodes agreement with the user within this session. I instructed agent to insert checkpoints for human manual review in their checklist. At end of session (pull request merged), tasks/todo.md is cleaned up, its states merged to specs/ files.
2. tasks/lessons.md: agent is instructed to record their learning as session goes on. Currently this file is committed so it survives across different feature development. This represents repo specific knowledge that agent has learned, a soft version of system prompt. I believe it is worth checking this file occasionally to keep it concise.
Shared knowledge base:
1. The specs/: this is where agent put spec files. Those are not formal specs, rather sketches of specs to limit agent’s search space. I instruct agent to always keep this updated as they change code. I even have make test-changed that drive testing based on which spec files changed. You can think those specs/* files encode sketches of what was the user request in tasks/issue.md, and second level details in tasks/todo.md.
  1. There’s specs/index.md as an indexing file, the hope is that agent can progressively discover specs, like how agent skills is processed.
  2. There’s specs/traceability.yaml as a mapping between specs and tests needing to run. The goal is to cut down time for agent to ground itself (running test).
  3. All the files in specs/ are retrofit from code and previous README.md file, with lightweight human review. Treating this as scaffold for agent to build on.

With those files, they are managed by instructions in AGENTS.md and tasks/lessons.md.

The workflow looks like

make issue 42 to create a new worktree for the issue.
claude or codex, then ask it to “go work”. System prompt ¹ is strong enough for agent to pick up tasks/issue.md and start working on it. This typically involves creating tasks/todo.md and updating specs/ files.
System prompt instructed agent to insert manual verification steps, this is often when I discover spec issues. Back and forth with agent ensues. Occasionally agent update tasks/lessons.md to reflect on their learning.
More often than not, after a pivot, I need to prod them to update tasks/todo.md with new plan. It seems they only remember to do it at the start of a session.
I still manually commit code when I think it’s in stable enough state. I suspect it would be easy to instruct agent to do so directly, but I still want more control and predictability.
When a pull request is ready, I have another agent to code review. I’m using the superpowers agent skills library for that. Unfortunately I haven’t dive deep enough to see what it does yet.

Learnings and Thoughts

Don’t over specify: treat the specs for AI as sketches to reduce AI search space, not exacts to confine AI’s creativity. Don’t over specify desired behavior. I believe specifying too much using the very imprecise human language is counter productive anyway.
Always verify: I found asking model to insert manual verification step a good practice to uncover assumptions I didn’t agree with.
Always keep stable state: git is your friend! Commit good state often.
Ground the agent: I believe that it’s on human to properly ground agent, through tests, ideally automated. Ask the agent to write a lot of tests! I went from having 0 tests to have 600+ tests (unit tests + integration test) and linting rules. I event created a make test-changed target that runs tests based on which spec files changed, so that agent can run it to verify their code change faster.
1. Eventually, lessons loosely held (in tasks/lessons.md) should migrate to some predictable framework, like linting rules.
2. Figure out a way to run end to end test directly against the UI would be interesting. For example, the Tidewave represents a very interesting approach where AI agent have direct access to the UI state, paving way of much shorter feedback loop.
3. OpenAI wrote about harness engineering where they ask coding agent to proof of work and use Chrome DevTool Protocol to access to DOM state directly.
Local log files are very useful to have agent reading local log files, so that it can correlate my usage with the log.
My job is review and evolve specs, the todo.md, and help manual verification steps. I haven’t found it useful to review code itself, as I’m not good at lua. But sometimes I can identify code not DRY enough, and would guide agent to refactor.
Still missing: need to have tests on performance of agent produced code.

As I use AI to construct more complex features, I find this workflow is pretty effective. It’s not perfect, but it’s dramatically better than what I had 10 months ago. It is already very usable for my personal project now. Looking forward to leverage AI fully at work, in a similar vein what OpenAI did.

The following is in AGENTS.md: “Work for you is in tasks/issue.md, you MUST make plan in tasks/todo.md” ↩︎