research 7 min read
NVIDIA's Diffusion Language Models Hit 865 Tokens/Second — 6× Faster Than GPT-Style Generation
NVIDIA's new diffusion language models generate multiple tokens in parallel, hitting 865 tokens/second on B200 hardware — roughly 6× faster than traditional autoregressive models. Unlike GPT-style generation that produces one token at a time, these models draft and refine text blocks simultaneously while maintaining accuracy.
Dr. Sana Okafor