Even the most advanced AI models still fail 50% of the requirements for professional investment banking work.
April 15, 2026
Original Paper
BankerToolBench: Evaluating AI Agents in End-to-End Investment Banking Workflows
arXiv · 2604.11304
The Takeaway
BankerToolBench proves that despite the hype, frontier models (like GPT-5.4) produce 0% client-ready output for complex finance workflows. They fail nearly half of the professional rubric criteria, stumbling on high-stakes tasks that require perfect precision. This is a cold bucket of water for the 'AI agents will replace white-collar jobs' narrative. It highlights a massive 'utility gap' where models are impressive in demos but useless in actual professional production. For finance firms, it means AI is still just an 'assistant's assistant,' not a replacement for analysts.
From the abstract
Existing AI benchmarks lack the fidelity to assess economically meaningful progress on professional workflows. To evaluate frontier AI agents in a high-value, labor-intensive profession, we introduce BankerToolBench (BTB): an open-source benchmark of end-to-end analytical workflows routinely performed by junior investment bankers. To develop an ecologically valid benchmark grounded in representative work environments, we collaborated with 502 investment bankers from leading firms. BTB requires a