Garry Tan's gstack: Why a 23-Step Workflow Beat 'Just Prompt the AI'
Garry Tan's gstack breaks AI-assisted coding into 23 specialist roles, and understanding why that structure matters could change how your whole team works with AI.

tl;dr
Garry Tan shipped 600,000 lines of production code in 60 days by treating AI-assisted coding as a series of specialist roles, not a single conversation. His open-source gstack defines 23 distinct skills, from planning and architecture through QA and reflection, each with its own context and purpose. The lesson is structural: AI performs better when you give it a defined job, not an open brief.
Garry Tan ran Y Combinator while shipping 600,000 lines of production code in 60 days. That number is worth sitting with, because the interesting part isn't the output. It's how he organised the work to make that output possible.
Most people using AI for coding operate the same way: open a chat window, describe a problem, read the response, adjust, repeat. It works. Occasionally it works well. But it has a ceiling, and that ceiling comes from treating a large language model like a single, infinitely capable colleague rather than a system that performs best when given a specific role and clear constraints.
gstack is Tan's answer to that ceiling. The repository, open-sourced on GitHub, organises AI-assisted software development into 23 specialist skills, each mapping to a discrete phase or function in the development lifecycle. Think of it as an org chart for a software team where every role is played by Claude.
What the 23 skills actually do
The framework moves through a recognisable development arc: think, plan, build, review, test, ship, reflect. But gstack makes each stage explicit rather than assumed. There's a skill for requirements analysis, one for system design, one for code generation, separate skills for code review and security review, a QA-specific skill, and a retrospective skill at the end. The SKILL.md file lays out the underlying philosophy: each skill gives the model a defined persona, a specific set of responsibilities, and an explicit output format.
That last part matters more than it might seem. When you ask Claude to "review this code," you get a general response shaped by whatever the model thinks "review" means. When you invoke the gstack QA skill, the model is operating inside a tighter definition: what to check, what format to use, what counts as done. The output becomes predictable. Predictable output is the thing most AI workflows lack.
Structured roles don't constrain the AI. They constrain the chaos that comes from leaving the AI's job description undefined.
This is where how generic business prompts fall short without structure becomes directly relevant. A single broad prompt is an open brief. Open briefs produce variable results because the model has to guess at context, scope, and priority all at once. gstack eliminates that guesswork by front-loading the context into the skill definition itself. You're designing a better prompting environment, not just writing better prompts.
Why Tan could ship at that pace

The 600,000 lines in 60 days claim comes from Tan's own account of building while leading YC, and while there's no third-party audit of that number, the mechanism behind it is plausible and worth examining on its own terms.
Lines of production code shipped in 60 days
Garry Tan via gstack documentation 2025
The throughline is parallelism through role separation. A single developer using gstack isn't doing one thing at a time; they're moving a task through a pipeline where each stage has a defined handler. You don't pause to figure out how to approach a security review because the security review skill already knows what it's doing. That cognitive offload accumulates fast across a 60-day sprint.
There's also a compounding effect on quality. When review, testing, and reflection are built into the workflow as named steps rather than optional extras, they actually happen. Most solo or small-team development skips QA not because developers don't value it, but because the friction of switching modes mid-flow is high enough that it gets deferred until something breaks. gstack lowers that friction by making the switch feel like a handoff rather than an interruption.
The structural argument against winging it
The broader case gstack makes isn't really about Claude or even about coding. It's about what happens when you impose process on an inherently variable tool.
AI models are probabilistic. Given the same input twice, they'll produce similar but not identical outputs. That variability is a feature in creative work and a liability in production software. The standard response is to iterate: prompt, check, re-prompt. gstack's response is different: reduce variability at the input level by standardising the context the model receives. If the input is consistent, the output distribution narrows.
The goal of a structured workflow isn't to make AI more powerful. It's to make its output more predictable, which is what production work actually requires.
The tradeoff is real. A 23-skill framework has a learning curve. You need to understand which skill applies when, and early in adoption you'll sometimes pick the wrong one or sequence them awkwardly. Some practitioners combining gstack with other tools report that the initial overhead is the hardest part. The framework is also currently Claude-specific; if your team uses GPT-4 or Gemini, you'll need to adapt the persona definitions and test whether the output consistency holds.
Neither of those is a reason to avoid it. They're reasons to phase it in deliberately rather than dropping 23 new skills on a team in a single sprint.
How to start without overhauling everything

Pick three skills and ignore the rest for now. The most immediately useful are the planning skill, the code review skill, and the retrospective skill. These three bracket a development cycle: they cover what you're building before you build it, whether the output meets the standard after you build it, and what to do differently next time. You can run a full project through just those three and already produce more consistent output than an unstructured prompt workflow.
Once those three feel natural, add the QA skill. Then the security review. Build the habit of reaching for the right skill rather than a general prompt before you expand the vocabulary further.
The gstack repository is public and free. Fork it, strip it down to the skills your team actually needs, and treat the rest as a backlog. The framework is opinionated by design, but that doesn't mean you have to adopt every opinion at once.
verdict
gstack is the most practically useful thing to come out of the current AI-coding wave precisely because it's boring. It applies decades of software engineering discipline, role clarity, process separation, defined outputs, to a tool that most people are still using like a search engine. Tan's 600K lines in 60 days will get the attention, but the actual idea is simpler: give the AI a job title, and it does a better job.
Start this week: fork the gstack repository, pick the planning skill and the code review skill, and run your next feature through both before you touch any other part of the framework. That's a two-step process, not twenty-three. See whether your output gets more consistent. Then decide how much more of the system you need.

Alec Chambers
Founder, ToolsForHumans
I've been building things online since I was 12 — 18 years of shipping products, picking tools, and finding out what actually works after the launch noise dies down. ToolsForHumans started as the research I kept needing: what practitioners are still recommending months after launch, and whether the search data backs it up. Since 2022 it's helped 600,000+ people find software that actually fits how they work.