Turning the Dark Factory Lights On
Update 04/24/2026: This blog was written for version v0.13.4 of Gas City released at the end of March. Since then Gas City has moved to version 1.0. I would recommend checking out Steve's blog about the release: Welcome to Gas City
Agents finally started to work well broadly at the end of last year when Opus 4.5 met the harness and there was a critical mass of adoption. Similar to how they were 18 months ago, a similar pattern is emerging with software factories or "dark factories". They still need effort and expertise to adapt to problems to get them to work well, the models don't quite master everything they need, and the user experiences are all over the place. One of these software factories that got my attention, because of Steve Yegge's writing, is Gas City. It is not the only one out there: Paperclip, Symphony, and StrongDM's Attractor (which arguably kicked it all off) to name a few. Gas City will serve as the motivating example.
Software factories should solve these problems, that individual agents like Claude Code, GitHub Copilot CLI, Codex, etc, don't currently solve:
- Allow you to easily manage many projects
- Manage multiple lines of work within those projects, often in parallel.
- Those lines of work each execute for longer, with higher quality, and lower the burden of validation by spending more inference compute.
Gas City (well Gas Town...we'll get there) allows you to manage many projects through the mayor. The mayor lets you manage and queue many different tasks in parallel, and that work can be more ambitious as they use techniques to leverage compute to allow it to execute reliably, for longer.
Gas City vs Gas Town
The easiest way to think about the distinction is that Gas City is an SDK and CLI that lets you use configurations, one of which is Gas Town. Another way to understand this is that Gas Town came first, and Gas City is taking the patterns from it to create a platform that can not only create Gas Town variants, but a whole host of other things. The focus will be here on the Gas Town configuration, while pointing out which parts are part of the SDK versus the configuration. In this diagram, anything in yellow, like the mayor, Failure Recovery, or polecats are "Gas Town" concepts that are defined by city.toml. Anything in blue, like the city concept itself, rigs, and beads is a Gas City construct.

The City
The city is a directory that holds the city.toml that defines it, and the agent configurations, prompts, and workflows that go with it. It is meant to be the place where you manage your various projects, which are each called a rig. Adding a rig to the city, "rigs" it for use with Gas City. The city level is for working across many things at once, while a rig is one project (think in terms of the granularity of a GitHub repo). This configuration is what turns what you are using into "Gas Town".
Meet the Mayor
Gas City has the concept of providers, which are the coding agents we all know: instances of Claude Code, GitHub Copilot CLI, Codex, etc. These underpin their concept of an agent, which most notably have a prompt_template. The prompt template is applied to the agent by being the first message when the agent starts. For the Claude Code and Copilot providers at least, this is simply through passing it as a command-line argument like claude "<rendered prompt>". Additionally, for Claude Code only it is also injected via a SessionStart hook. This duplication seems likely to get worked out; there is no compelling justification for why it exists. Ideally, if Gas City had its own agent implementation this would be better served by setting the system prompt and splitting out the actions that the agent should take on startup.
The mayor is a cleverly designed agent that gives it the instructions it needs to manage everything below it. Its prompt template sums its job up nicely: "You are the Mayor - the global coordinator of Gas Town. You sit above all rigs, coordinating work across the entire workspace."
[[agent]]
name = "mayor"
scope = "city"
wake_mode = "fresh" # Start from system prompt each time (no conversation resume)
work_dir = ".gc/agents/mayor"
prompt_template = "prompts/mayor.md.tmpl"
nudge = "Check mail and hook status, then act accordingly." # Default text sent when another agent nudges the mayor
overlay_dir = "overlays/default"
idle_timeout = "1h" # Controller kills the session after 1h of inactivity
max_active_sessions = 1
[[named_session]] # Declares that a session should exist
template = "mayor"
scope = "city"
mode = "always" # Session is always running
The mayor is always running (mode = "always") and the controller will restart it on crash - which sends it that initial prompt template again. You talk to the mayor, which is just one of the provider instances running in tmux, and the mayor dispatches work to the agents that only write the code. It decides what needs to happen, files work items, and routes them to the right rig and agent. It has instructions to fix things directly when that is faster than dispatching, but dispatching is the encouraged behavior for a variety of benefits covered next. The controller is the process started by gc start that manages agent sessions, handling agent startup and restarting them when it detects they have died.
So... how does anything happen?
You ask the mayor to do something, what actually happens? The mayor is instructed to route most work to a polecat, the rig-scoped coding agent, through beads.
### Prefer dispatching to polecats
When you file a bead, default to immediately dispatching it to a polecat:
bd create "Fix the auth timeout bug" -t task --json # file it
bd update <bead-id> --set-metadata gc.routed_to=<rig>/polecat # dispatch to polecat pool (pool reconciler picks up routed metadata)
Cutting through Yegge's metaphors, beads is a way to file a task to get done (similar to GitHub Issues). So the mayor creates a bead to get worked on by the polecat. Each bead has a title, status (open, in-progress, closed), priority, assignee, and a freeform key-value metadata bag. When the mayor runs bd update, all that happens is the metadata on the bead is updated. Gas City's controller runs a reconciler loop on a tick. On each tick, the controller queries every bead store (the city's and each rig's) for ready and in-progress beads, then walks each one looking at its gc.routed_to metadata. There is a cap on how many polecats can be running at once, but as long as there is room, the reconciler spawns a new polecat for each routed bead that has no one to work it.
Spawning a polecat means running its pre_start script (which sets up a git worktree under .gc/worktrees/<rig>/polecats/<name>/), starting a tmux session with the provider command, and sending the polecat prompt. When the polecat starts, its prompt instructs it to run a work_query that looks for beads routed to its own pool. It then claims one by running bd update --claim, an atomic operation that sets assignee=<polecat> and status=in_progress in a single write, so if two polecats raced to claim the same bead, only one would win and the other would get an error that it could figure out how to choose a different bead.
One interesting note, although not critical to how it works, is that beads are stored in Dolt. It is a versioned SQL database that supports multiple concurrent writers - which is a nice property when you have a bunch of agents trying to write. So the flow so far is: the mayor writes tasks to a bead, the controller observes state and spawns polecats, and then polecats claim work atomically. Each polecat will be working on something different, since each bead is different. This is part of how Gas Town lets you have many things working at once.
Once a polecat finishes writing code, it hands off to the refinery, a rig-scoped agent, that there can only be one of, that owns the merge queue for that rig. The polecat's prompt template ends with what it calls the "done sequence".
git push origin HEAD
bd update <work-bead> \
--set-metadata branch=$(git branch --show-current) \
--set-metadata target=<default-branch>
bd update <work-bead> --status=open --assignee=<rig>/refinery --set-metadata gc.routed_to=<rig>/refinery
This is the same gc.routed_to pattern the mayor used to dispatch work in the first place. The polecat pushes its branch, stamps branch and target onto the bead, reassigns the bead to the refinery pool, and then tells the controller it is safe to kill this session. Surprisingly, none of this is enforced by code. If the polecat skips a step, the refinery will hit a missing field and reject the bead back to the pool. If the polecat just exits without reassigning, the bead sits orphaned with status=in_progress (until an agent discussed later figures it out).
From there the reconciler takes over the same way it did for the polecat. The refinery is configured as a singleton (max_active_sessions = 1) with mode = "on_demand", so if there is not one running, the reconciler spawns one in its own worktree under .gc/worktrees/<rig>/refinery. It claims the bead, reads branch, target, and merge_strategy, then runs a formula called mol-refinery-patrol. A formula is a .toml file that describes a collection of steps with dependencies, variables, and optional control flow. The refinery's formula specifies to do a fetch, rebase the branch on the target, run the rig's configured quality checks, and either fast-forward merge and push or create a GitHub PR depending on the strategy.
I found it surprising that formulas are not code-enforced. The entirety of the formula is provided to the refinery agent, and the syntax in the toml around dependencies and control flow is essentially a prompt engineering technique. However, the core rule at the top of its prompt is "You are a merge processor, NOT a developer." It is forbidden from reading polecat code to understand intent, forbidden from fixing test failures, forbidden from landing integration branches via raw git merge. If the rebase conflicts, it aborts, puts the bead back in the pool with rejection_reason set, and leaves the branch intact for a new polecat to pick up and resume. If tests fail and it's a branch regression, same flow but it deletes the branch. If tests fail from a pre-existing bug on the target, it files a new bead and merges anyway. And this is how stuff is supposed to get done in Gas Town! However, agents can go wrong, especially when they are just being trusted to follow fairly complex instructions like the formulas.
When it all goes wrong
Gas Town has a set of agents that are designed specifically so that everything discussed before stays alive. I won't get into the details for all of these, but the witness is constantly polling to detect if polecats are stuck (not literally crashed, but "semantically" not doing anything meaningful) by seeing if beads haven't been updated recently and checking if beads are orphaned. Notably it is told it cannot kill things directly, it must file a "warrant" type bead for a dog to go execute. Sounds like witnesses at some point were too happy to kill polecats.
At the city level, the deacon is responsible for checking for stuck witnesses, refineries, or dogs. Similar to the witness, it is instructed to file warrant beads for dogs to handle. But wait, what if the deacon is not working? Well, Gas Town thought of that and that is what the boot is for. Its sole goal is to make sure the deacon is working. That covers the major components of Gas City and Gas Town. There is much more, but it is not as critical to the understanding of how it works (things like mail and wisps). If you are looking for more info, I found the Gas City Tutorials and the Beads documentation helpful.
Conclusion
Gas City has a lot of cool ideas that make it useful to learn from and as an example of the news ways that software is being created. The mayor being the main entry point and ideally having an understanding of everything is a nice user experience. You don't have to worry about starting multiple coding sessions and it can see the progress of all of them, although it is yearning for a UX that does not require constantly asking for status updates. Secondly, Beads lends itself to a "pull not push" architecture where every agent uses Beads, backed by Dolt, as its fresh source of truth each time. This is possible because Dolt being a SQL database is reliable at maintaining state and the agents, given the prompts they have, are unlikely to poison beads with bad context. Although Dolt has features around version control of the database, Gas City did not appear to use those features explicitly, and across three projects I built with it, the "git" parts of Dolt were not needed. Finally, having it structured as a bunch of background processes that are always running and checking in on things is a pattern that will be in most software factories. I view it as an extra "safety" feature that keeps things going autonomously, for longer.
There are areas I see others doing differently or better. One is that their providers rely on existing CLI coding agents like Claude Code and GitHub. This does have the benefits of easier adoption as these agents often offer subscriptions that many people alread have. Writing agents from scratch using the direct model API is also much more effort. However, it is limiting in what you can do, which was clear with how they instantiate their agents. The best software factories are going to have their own "CLI agent". Relatedly, they rely heavily on the agent doing the right thing purely from sending it prompts/messages. Formulas are the exact thing you'd want to make more deterministic. I suspect this is done for similar reasons: it's harder code-wise and the off-the-shelf agents might not expose everything needed to do this well. Additionally, when I tried to build things, testing and validation are not currently something that is built in out-of-the-box. Most of the time I firmly tell the mayor to make sure that what the refineries had merged in actually worked.
Looking forward, I think whoever gets a software factory product right will figure out:
- The user experience of interacting with it. I like the mayor. The others that using something like GitHub Issues is also nice.
- A way to see iterative progress. You should not have to wait hours or more to see if it worked, and to be able to steer it again. This gets into the total wall time it takes as well. If I can use something like Github Copilot CLI to be way more in the loop, but get it done faster or with higher certainty, most of the time for things I care about I will use that instead.
- Orchestrating agents to get work done in parallel without the issues of individual failures crashing things or leading to git-astrophes. Gas City is onto something here, although they do lack automatic validation (at least out of the box). Spoiler for a potential future blog, but this is top of mind with the work I'm doing at Microsoft: Digital Twins and Reality Check.
- Figuring out how to make the costs make sense - one simple interaction could send off an inordinate amount of LLM calls.