ITBench-AA: Frontier Models Score Below 50% on the First Benchmark for Agentic Enterprise IT Tasks — by Artificial Analysis and IBM

Hugging Face Blog May 27, 2026

AI Summary— plain English for professionals

# AI Still Struggles With Real Office Work Even the most advanced AI systems are failing basic IT support tasks that any competent tech worker should handle easily, according to a new test from Hugging Face and IBM. The benchmark tested whether AI could solve common workplace problems like resetting passwords or managing user accounts—the kind of straightforward IT work that happens in offices every day—and the best-performing models scored below 50%, meaning they failed more often than they succeeded. This reveals a significant gap between what companies are being promised about AI's capabilities and what these systems can actually accomplish in real business environments.

Read full article on Hugging Face Blog

More from Latest News

View all →

In more good news for Amazon, Snowflake signs $6B deal with AWS for AI CPU chips

Your SEO strategy is optimized for a search engine that no longer exists.

ElevenLabs’s new music generation model can switch genres mid-track

Get new guides every week

Real AI income strategies, tool reviews, and plain-English news — free in your inbox.

or enter email