McKinsey's 1.5M Hour Gain: How to Find Your Organization's Biggest AI Opportunity
McKinsey's internal AI pilot saved a large bank 1.5 million hours annually — here's the framework to find your organisation's equivalent bottleneck and size the real ROI before you commit.

tl;dr
A McKinsey pilot at a large global bank cut 1.5 million hours from a single annual process by applying AI to search and synthesis work. Most organisations have an equivalent bottleneck sitting untouched, but finding it requires a structured audit, not a hunch. This post gives you the framework to locate it, size it honestly, and avoid the cost traps that erode the headline number.
The most expensive work in most organisations is the work nobody tracks. Not the big projects or the meetings with agendas. The hours spent hunting for the right document, re-reading last quarter's report to answer a question that should take thirty seconds, stitching together data from four different systems before the actual analysis can begin. That's where the 1.5 million hours went at McKinsey's anonymised bank case study, published in the firm's 2023 generative AI productivity report. A 4.4 million-hour annual process shrank to 2.9 million hours once AI was applied to search and synthesis tasks. The process itself didn't change. The friction inside it did.
Most coverage glosses over what that finding actually shows. It wasn't a reinvention of the business. It was one subprocess, in one division, where employees spent a disproportionate share of their time retrieving and summarising information rather than acting on it. Time losses concentrate in specific, identifiable places, and those places are almost always search and synthesis work.
Hours saved in a single bank subprocess
McKinsey Global Institute 2023
Why Most Organisations Miss Their Version of This
The obvious question after reading that stat is: where does our equivalent bottleneck sit? Most teams can't answer it, and the reason is structural. Time tracking in knowledge work captures meetings and projects. It almost never captures the retrieval layer: the ten minutes before every meeting spent finding the deck, the thirty minutes reconstructing context that already exists somewhere. That friction is invisible in any standard productivity report.
McKinsey's own 2025 research on AI adoption in the workplace found that organisations systematically underestimate how much AI their employees are already using, which means leadership is making deployment decisions based on incomplete data. If you don't know what people are already doing informally to cope with information friction, you'll misdiagnose where the formal opportunity is.
The bottleneck you can't see in your time-tracking data is usually the one worth fixing first.
The fix is a workflow audit, and it doesn't need to be elaborate. Three questions, asked honestly across a sample of roles, will surface the pattern:
- What task do you do repeatedly that requires finding or summarising information before you can act?
- How long does that retrieval step take, and how often does it happen?
- What would you do with that time if it took thirty seconds instead of thirty minutes?
You're looking for high-frequency, high-friction tasks where the output of the retrieval step is structurally similar every time. That's where AI compounds. One-off research tasks are poor candidates. Weekly synthesis of the same data sources, answering the same category of customer question, pulling together the same type of pre-meeting briefing: those are the targets. Once you've identified the candidates, you can pair them with AI tools designed for workflow research and discovery to run a scoped pilot before committing broader resources.
Sizing the ROI Without Fooling Yourself

The 1.5 million hour number is real, but the McKinsey report itself notes it came from a narrow pilot under controlled conditions. The same report acknowledges that "realising the full potential will require investments in data, talent, and change management." That's the actual work, not a footnote.
Independent research gives a more grounded baseline. An NBER field experiment with 5,179 call-centre agents found AI boosted productivity 14% for less experienced workers but only 3% for experts. The org-level gain depends entirely on who holds the bottleneck. A separate McKinsey analysis found that pilots showing 20-50% gains in controlled conditions typically settle at 10-20% firm-wide after year one, once integration friction sets in. Neither of those numbers is discouraging. They're just the honest range to plan against.
A workable ROI model for a search optimisation use case has four inputs:
- Baseline hours: how many hours per week the target task currently consumes across the team
- Realistic reduction rate: use 15-25% as your planning assumption, not the headline pilot number
- Loaded hourly cost: salary plus benefits, not just salary
- Deployment cost: include setup, integration, prompt engineering, and at least six months of maintenance
The ratio that actually matters is net hours recovered versus total cost of recovery. A 1.5 million hour gain at a bank running a custom LLM deployment with a reported cost base of over $10 million looks very different from a 500-person professional services firm using an off-the-shelf AI tool on a $30,000 annual contract. Both can be good investments. They require different break-even calculations.
Where Search Optimisation Pays Off Most
Across the research, the use cases with the best real-world ROI on search and synthesis share three characteristics: high task frequency, structured input data, and a downstream action that requires the retrieved information in a standard format. Customer support triaging, internal knowledge base queries, competitive intelligence summaries, contract review pre-screening, and financial reporting prep all fit this pattern. Open-ended research tasks, creative work, and anything requiring significant judgement calls do not.
The Bank for International Settlements analysis of enterprise AI pilots found that structured, high-volume tasks succeed in roughly 60% of implementations, while unstructured or variable tasks fail at similar rates. That asymmetry is your filter. If the task you're targeting requires a human to make a materially different judgement each time, AI will help at the margins. If it requires the same retrieval logic applied to new inputs, AI will handle most of it.
High frequency plus structured inputs plus standard output format is the profile of an AI-ready workflow.
One honest limitation worth naming: the gains from search optimisation tend to concentrate in a small number of roles. The NBER research makes this explicit: less experienced workers handling high-volume, structured queries capture most of the productivity uplift. Senior staff with expert knowledge and varied tasks see far smaller gains. So before scoping a pilot, identify which roles hold the bottleneck. The answer shapes both the ROI projection and the change management conversation you'll need to have.
verdict
The 1.5 million hour case is real, but treating it as a benchmark rather than a ceiling is where most AI strategies go wrong. The organisations that capture comparable gains will be the ones that run honest workflow audits first, pick the narrowest high-frequency target, and plan for 15-20% real-world gains instead of the pilot number. The opportunity is there. The gap between finding it and inflating it is just discipline.
What to Do This Week
Pick one team. Ask them the three retrieval questions above. Time-box the audit to two hours. You're looking for a task that happens at least weekly, involves finding or summarising existing information, and produces an output that looks broadly similar each time. If you find one, map the current time cost across the whole team annually. Then model it at 15% reduction with a six-month payback requirement. If it clears that bar, you have a pilot worth running. If it doesn't, you've eliminated a distraction and can look at the next candidate. That's the whole process. Start there.

Alec Chambers
Founder, ToolsForHumans
I've been building things online since I was 12 — 18 years of shipping products, picking tools, and finding out what actually works after the launch noise dies down. ToolsForHumans started as the research I kept needing: what practitioners are still recommending months after launch, and whether the search data backs it up. Since 2022 it's helped 600,000+ people find software that actually fits how they work.