workflow7 min read29 march 2026

How Markdown Files Became the Control Layer for AI Tools

A practical guide to AI markdown configuration: how CLAUDE.md, cursor rules, and .github/copilot-instructions.md files work, what the research actually shows about their value, and how to write them so they help rather than hurt.

tl;dr

Markdown files like CLAUDE.md and .github/copilot-instructions.md have become the default way developers configure AI coding tools, but the evidence that they actually improve performance is thinner than the hype suggests. A 4% average task success improvement from human-written context files, at up to 19% extra cost, means writing these files carelessly can make things worse. The practical answer is to write less, target specific behaviours, and test whether your file is helping.

Developers have quietly agreed on something: if you want to shape how an AI coding tool behaves, you write it a Markdown file. No vendor-specific config format, no GUI settings panel. Just a .md file sitting in your repo, written in plain English, telling the model what you want.

This happened fast. A 2026 study of 2,923 GitHub repositories found context files, things like AGENTS.md and CLAUDE.md, in 90% of them. That's a convention forming in real time.

90%

Repos using Markdown context files

Mohsenimofidi et al. arXiv 2026

The appeal is obvious. Markdown is readable by both humans and models, requires no tooling, travels with the repository, and can be edited by anyone on the team. When Anthropic shipped Claude with CLAUDE.md support, when Cursor formalised its rules files, when GitHub added .github/copilot-instructions.md, they were each independently arriving at the same answer: a plain text file is the lightest possible control mechanism that actually works.

Markdown configuration files won because they are the only format that developers, AI models, and non-technical stakeholders can all read and edit without friction.

What These Files Are Actually Doing

There are roughly three types of content you'll find across these files, and they serve different purposes.

Project context covers what the codebase is, what stack it uses, what naming conventions matter, and what parts are off-limits. This is the architectural overview: "this is a Next.js 14 app using the App Router; do not use the Pages Router pattern anywhere."

Behavioural rules are the if/then constraints that shape how the model responds. "If you add a dependency, explain why in the commit message." "Do not modify files in /legacy without a comment flagging the risk." These narrow the model's default latitude down to something appropriate for your specific project.

Workflow instructions tell the model how to operate: run tests before committing, check for a related issue before creating a new one, always generate a summary comment at the top of new functions. These are the patterns you'd otherwise repeat in every prompt.

The distinction matters because the evidence suggests these three types have very different returns. The ETH Zurich study cited in InfoQ, which tested Claude 3.5 Sonnet, GPT-5.1 mini, and other agents on real Python tasks, found that architectural overviews in context files did not reduce the time models spent locating relevant files. The model read the overview, then went looking anyway. Behavioural rules and workflow instructions showed more signal. Stop writing mini architecture docs and start writing specific constraints.

The Cost Problem Nobody Talks About

Real-time cost accumulation from repeated API calls during context refinement

The same ETH Zurich research found that LLM-generated context files actually reduced task success rates. Human-written files did better, but only by about 4% on average, while pushing inference costs up by as much as 19%. The model is spending more steps processing your instructions, then performing marginally better. That's a real trade-off, and most teams aren't measuring it.

The reason is length. A long context file stuffed with background, caveats, and aspirational guidelines is noise to the model. Every token of irrelevant context competes with the actual task. The model isn't ignoring your file; it's reading all of it, every time, on every call. A 2,000-word CLAUDE.md covering the history of your project architecture and your team's preferred code review tone is costing you inference budget on tasks where none of that information is relevant.

A shorter, more targeted context file usually outperforms a comprehensive one because it reduces the noise-to-signal ratio the model has to work through.

The developer community has clearly sensed something is working here, even if the measurements are messy. Three personal AI configuration repositories, primarily Markdown files with prompts and rules, collectively gained over 64,000 GitHub stars in under two months according to OSS Insight. Garry Tan's gstack repo alone hit 50,000 stars in 16 days. Developers are sharing their Markdown configs the way they used to share dotfiles. That's cultural signal, not performance proof. It tells you these files feel useful, which is enough reason to take the practice seriously and measure it properly.

How to Write a Context File That Actually Helps

Given what the research shows, the practical approach is to treat your context file as a precision instrument, not a documentation dump.

Start with the behaviours you correct most often. Pull your last 20 AI interactions in the tool you're configuring. Find the three things you corrected repeatedly. Write a rule for each. That's your first draft. It will be short, maybe 10-15 lines, and it will be more useful than a file written from scratch in one sitting.

Separate concerns by file when the tool supports it. Cursor's rules system lets you scope rules to specific file types or directories. .github/copilot-instructions.md applies globally, but you can reference it alongside more specific guidance. If your frontend and backend have genuinely different conventions, don't force both into one file. Split them.

Write rules as constraints, not descriptions. "The app uses React" is context. "Use functional components with hooks; do not use class components" is a constraint. Constraints are what change model behaviour. Descriptions are what you'd put in a README.

Test your file. Run the same prompt against the tool with and without the context file active. If the output is meaningfully better with it, keep the file. If the difference is negligible, cut the file down until you find the parts that are actually doing work. The ETH Zurich researchers found most teams have never done this test. That's why so many context files are accumulating content they don't need.

Version your context file alongside your code. When your stack changes, your rules should change. A stale CLAUDE.md that references a library you migrated away from six months ago is actively misleading the model. Treat it like a dependency: review it when you upgrade things.

Where This Is Going

The Model Workspace Protocol paper on arXiv proposes a more structured approach: numbered folders, a CONTEXT.md that scopes the model's access to only the relevant parts of a workspace, and separate instruction files per task. It's an early signal that Markdown-based configuration is about to get more sophisticated, with tooling building around the convention rather than just trusting developers to write good files unaided.

That matters because the current state, every developer writing their own CLAUDE.md from scratch with no feedback loop, is genuinely inefficient. The practice works better when teams share their files, iterate on them based on actual output quality, and build internal libraries of rules that worked. This is the same discipline that separates teams getting consistent value from AI tools from those treating every session as a fresh start.

verdict

Markdown configuration files are the right abstraction for directing AI coding tools. They're portable, readable, and tool-agnostic in a way nothing else currently is. But the practice has outrun the evidence, and most context files in the wild are longer and vaguer than they should be. Write rules, not documentation; measure the difference; and cut anything that doesn't change the output.

Start today: open your CLAUDE.md, cursor rules, or copilot-instructions.md and delete every line that describes what the project is rather than how the model should behave. If you don't have those files yet, open your last week of AI coding sessions, find your most repeated correction, and write one rule. One specific, testable constraint. Then watch whether you stop making that correction.

Alec Chambers

Founder, ToolsForHumans

I've been building things online since I was 12 — 18 years of shipping products, picking tools, and finding out what actually works after the launch noise dies down. ToolsForHumans started as the research I kept needing: what practitioners are still recommending months after launch, and whether the search data backs it up. Since 2022 it's helped 600,000+ people find software that actually fits how they work.

about LinkedIn

keep reading

best Cursor alternatives

7 Cursor alternatives ranked by pricing, features, and workflow fit. Includes Claude Code, Windsurf, and Copilot. For developers tired of VS Code fork lock-in.

best GitHub Copilot alternatives

7 GitHub Copilot alternatives compared, including Cursor, Tabnine, and Cody. Ranked by pricing, features, and ease of switching for developers in 2026.

best ChatGPT alternatives

7 ChatGPT alternatives including Claude and Perplexity, ranked by pricing, features, and use case fit. For writers, developers, and privacy-focused users.

best ai tools for coders: top picks for developers

8 AI coding tools compared: GitHub Copilot, Cursor, Tabnine, and more, ranked by output quality, pricing, and workflow fit for working developers.

84% of engineers report ai productivity gains — here's what they have in common

84% of engineers report AI productivity gains, but the data shows they're gaining time in verification and review, not code writing — here's what separates the engineers seeing real results from those just feeling busy.

custom ai agents vs. chatgpt: when to build vs. when to subscribe

A practical decision framework for choosing between a ChatGPT subscription and a custom AI agent, with real numbers on cost, ROI, and when each approach actually makes sense.