#language models

2 articles tagged with "language models"

NVIDIA's Diffusion Language Models Hit 865 Tokens/Second — 6× Faster Than GPT-Style Generation

NVIDIA's new diffusion language models generate multiple tokens in parallel, hitting 865 tokens/second on B200 hardware — roughly 6× faster than traditional autoregressive models. Unlike GPT-style generation that produces one token at a time, these models draft and refine text blocks simultaneously while maintaining accuracy.

Dr. Sana Okafor May 23, 2026

research 9 min read

Text Degeneration in LLMs: The Hidden Production Cost Inflating Inference by 42%

A structural failure mode in autoregressive language models causes fewer than 3% of requests to consume nearly half of total inference time. New research from DharmaOCR shows the problem is built into training objectives—and proposes a fix grounded in the training distribution itself.

Dr. Sana Okafor May 22, 2026