🧬 Related Insights?

- **Read more:** [Cursor 3's Agent Window Changes Everything—But Not for Everyone](https://opensourcebeat.com/article/cursor-3s-agent-window-changes-everythingbut-not-for-everyone/) - **Read more:** [rs-trafilatura Unlocks Firecrawl's Hidden Precision](https://opensourcebeat.com/article/rs-trafilatura-unlocks-firecrawls-hidden-precision/) Frequently Asked Questions **What is MXFP8 training for MoEs?** MXFP8 dynamically quantizes MoE inputs to 8-bit floating point for faster grouped GEMMs, delivering up to 1.3x training speed vs BF16 without hurting convergence. **How to enable MXFP8 in TorchTitan?** Check TorchTitan docs: apply to routed experts, exclude sensitive layers like output/router.gate. Use TorchAO 0.17+ on GB200. **Does MXFP8 work on non-NVIDIA hardware?** Prototype's CUDA-tuned for Hopper/GB200; portability TBD. Roofline assumes NVIDIA FP8 units.

🤖 AI & Machine Learning

MXFP8 MoE Training: 1.3x Speedup, But Skepticism Lingers

MXFP8 just turbocharged MoE training by 30% on massive GB200 setups. Equivalent quality to BF16? Sure. But let's poke holes in the hype.

theAIcatchup Apr 07, 2026 4 min read

Training loss curves comparing MXFP8 and BF16 for Llama4 Scout on GB200 cluster

⚡ Key Takeaways

MXFP8 yields 30.2% faster Llama4 Scout training on 256 GB200 GPUs, matching BF16 loss curves. 𝕏
Equivalent convergence proven at small batches; real-world scale untested. 𝕏
TorchAO primitives enable easy repro via TorchTitan – but layer exclusions reveal precision pitfalls. 𝕏

Published by

theAIcatchup

Community-driven. Code-first.

#Llama4 Scout #MXFP8 #MoE training #TorchAO #TorchTitan

Worth sharing?

Get the best Open Source stories of the week in your inbox — no noise, no spam.

Originally reported by PyTorch Blog

⚡ Key Takeaways

The 60-Second TL;DR

theAIcatchup

Share this article

Worth sharing?

Related Stories

The 429 Error That's Killing Your AI Fallback Chains Dead

DFlash Cracks Open Speculative Decoding's Parallel Future

Reclaiming the Wheel: How I Code with AI Without Handing Over the Keys

AI's Hidden Bills: Track Multi-Provider Costs Before They Bury You

Stay in the loop