🤖 Large Language Models

LLM Inference's Power Lie: 99.8% Wasted on Data Hauling, Not Crunching Numbers

We all figured bandwidth or VRAM would cap LLMs. Nope. Power's the brick wall, and it's mostly pissed away shuffling weights—not doing math.

Pie chart breaking down LLM inference power: 99.8% data movement vs 0.2% compute on NVIDIA H100

⚡ Key Takeaways

  • 99.8% of LLM inference power moves data, not computes. 𝕏
  • Dennard scaling's death means more TDP for gains—1000W+ GPUs standard. 𝕏
  • Datacenter power caps scaling; nuclear neighbors incoming. 𝕏
Published by

theAIcatchup

Community-driven. Code-first.

Worth sharing?

Get the best Open Source stories of the week in your inbox — no noise, no spam.

Originally reported by Dev.to

Stay in the loop

The week's most important stories from theAIcatchup, delivered once a week.