NVIDIA's research shows that synthetic training data structured around task families—not raw scale—drives targeted capability gains. Their approach improved scientific reasoning by 11 points while keeping math and code performance stable.
DharmaOCR's methodology proves Direct Preference Optimization isn't just for chat alignment. Applied after supervised fine-tuning, DPO reduced text degeneration by an average of 59.4% across five vision-language model families—with zero exceptions.
Single-turn safety benchmarks don't predict real-world vulnerability. Cisco's testing of 15 frontier models reveals that iterative attacks succeed up to 88% of the time—even against models that look secure in standard evaluations.
The first benchmark for agentic enterprise IT tasks reveals an uncomfortable truth: the best AI models score below 50% on real-world site reliability engineering tasks. ITBench-AA, developed by Artificial Analysis and IBM, shows frontier models struggle with Kubernetes incident diagnosis despite excelling at other benchmarks.
NVIDIA's new diffusion language models generate multiple tokens in parallel, hitting 865 tokens/second on B200 hardware — roughly 6× faster than traditional autoregressive models. Unlike GPT-style generation that produces one token at a time, these models draft and refine text blocks simultaneously while maintaining accuracy.
A structural failure mode in autoregressive language models causes fewer than 3% of requests to consume nearly half of total inference time. New research from DharmaOCR shows the problem is built into training objectives—and proposes a fix grounded in the training distribution itself.
GPT-5.2 identified a pattern human physicists missed, conjectured a formula for gluon scattering amplitudes, then proved it—marking a shift from AI as research tool to AI as research partner. The result challenges a decades-old assumption about particle interactions.