research 7 min read
Frontier AI Models Fail Basic Enterprise IT Tasks: ITBench-AA Benchmark Shows 47% Peak Score in 2026
The first benchmark for agentic enterprise IT tasks reveals an uncomfortable truth: the best AI models score below 50% on real-world site reliability engineering tasks. ITBench-AA, developed by Artificial Analysis and IBM, shows frontier models struggle with Kubernetes incident diagnosis despite excelling at other benchmarks.
Dr. Sana Okafor