Most developers default to one model, many prompts. A finance simulation that runs each agent on a different lab's small model proves the opposite approach produces richer, more unpredictable behavior—and the engineering challenges aren't where you'd expect.
Nvidia released Nemotron 3 Ultra, a 550-billion-parameter open-weight model optimized for long-running agents. While it's the fastest among U.S. open-weight models and promises 30% cost savings, it still lags behind Chinese competitors and GPT-5.5 on core benchmarks.
Global IT services firm Endava has embedded AI agents throughout its software development process, from requirements gathering to deployment. The move signals a shift from AI as a coding assistant to AI as an autonomous workflow participant.
Everyone's focused on GPUs for AI, but the shift to autonomous agents is quietly turning compute on its head. The bottleneck isn't model inference anymore—it's orchestration, sandboxing, and tool execution. All CPU workloads.
At Build 2026, Microsoft doubled down on a contrarian thesis: enterprise AI needs organizational memory more than bigger models. The company launched HorizonDB, GPU-accelerated warehousing, and made Fabric IQ generally available to give agents the context layer they're missing.
OpenAI is repositioning Codex beyond developers, adding Sites for shareable interactive dashboards, extended Annotations for documents, and curated plugins for sales, finance, and legal teams. With 1 million knowledge workers already using the platform weekly, this marks a direct challenge to Anthropic's Claude Cowork.
The enterprise software consensus on AI agents stops at one point: context matters. Hyland's CEO Jitesh Ghai makes the contrarian bet that you get that context by preserving existing systems, not tearing them down—a direct challenge to the vendor playbook pushing cloud migration and process redesign.
The enterprise AI adoption crisis isn't a model quality problem—it's an architecture problem. IBM's production data from mainframe modernization to compliance automation shows that intelligent agent logic reduces token consumption by 15-30× while improving performance.
Replit is embedding Visa's payment infrastructure directly into its development platform, giving AI agents a cryptographic identity layer and native transaction capabilities. The partnership signals a shift from bolting payments onto finished products to building commerce into agents from day one.
Google unveiled Gemini Omni, a multimodal model that generates and edits video through natural language, alongside Gemini 3.5 Flash, designed for complex agentic workflows. Both models are rolling out to consumers and developers with significant implications for content creation and enterprise automation.
Snyk entered the AI pentesting market with Evo Continuous Offensive Security, targeting the 350-day gap left by traditional security testing. The platform uses LLM reasoning for context-dependent flaws while reserving deterministic scanning for known vulnerability classes.
The first benchmark for agentic enterprise IT tasks reveals an uncomfortable truth: the best AI models score below 50% on real-world site reliability engineering tasks. ITBench-AA, developed by Artificial Analysis and IBM, shows frontier models struggle with Kubernetes incident diagnosis despite excelling at other benchmarks.
When three major AI labs ship the same product within six weeks, that product stops being a differentiator. The managed agent runtime has become table stakes, and the real battle is now being fought over a file format most developers don't even think about yet.
Most AI agents forget everything between interactions. Learn how to build persistent memory into your agents using conversation buffers, vector stores, and retrieval patterns—so your agent remembers users across sessions.
Google I/O 2026's Dialogues stage brought together CEO Sundar Pichai, DeepMind's Demis Hassabis, and quantum computing experts to discuss proactive AI agents, quantum-AI convergence, and AI's expanding role in science and creativity. The sessions signal Google's push beyond chatbots into autonomous agents and quantum-accelerated AI research.
Learn to build a ReAct (Reasoning + Acting) agent that thinks through problems step-by-step using Claude's tool calling capabilities. This tutorial walks you through creating an agent that can use web search, perform calculations, and read files to answer complex questions.
A hands-on guide to building reliable AI agents using modern frameworks. Covers architecture patterns, tool use, memory systems, and deployment strategies that work in production.
OpenAI just released a Chrome extension that connects Codex directly into your browser, allowing agents to work across authenticated sessions and multiple tabs without commandeering your desktop. This moves AI agents closer to where modern work actually happens.
OpenAI just released GPT-Realtime-2, bringing GPT-5-level reasoning to voice interactions with a 4x larger context window. The update includes two specialized models for translation and transcription, signaling a push toward voice-first AI applications.
OpenAI has released Symphony, an open-source specification designed to standardize how AI agents coordinate and communicate. The move signals a strategic shift toward interoperability in multi-agent systems, potentially reshaping how developers build complex AI workflows.