How do I run Gemma 4 on Apple Silicon?

pip install rapid-mlx; rapid-mlx serve gemma-4-26b. API at localhost:8000/v1.

Gemma 4 vs Ollama: Which is faster on Mac?

Rapid-MLX wins decode (85 vs 75 tok/s on 26B). Tools broader.

Yes, native parser. Works with PydanticAI, LangChain, Aider out-of-box.

🤖 AI Dev Tools

Gemma 4 on Apple Silicon just got stupidly fast. One command, 85 tok/s, tools included – cloud services, take notes.

theAIcatchup Apr 07, 2026 4 min read

Gemma 4 hits 85 tok/s on Apple Silicon with one pip install via Rapid-MLX. 𝕏
Beats Ollama on decode speed, full tool calling for 18 model families. 𝕏
OpenAI-compatible API works with LangChain, Aider, PydanticAI – offline agents unlocked. 𝕏

Published by

Community-driven. Code-first.

#Gemma 4 #MLX inference #Rapid-MLX #apple silicon #local AI inference #local LLMs #local-llm

Get the best Open Source stories of the week in your inbox — no noise, no spam.

Originally reported by Dev.to