What are the llama.cpp fixes for Gemma4 tool calling?

They patch reasoning budget overflows and add Google chat templates, making tool calls reliable on local hardware—no more mid-task meltdowns.

How does the cuBLAS MatMul bug affect RTX GPUs?

It dispatches bad kernels for FP32 batched MatMuls, slashing perf by 60% on dimensions common in LLM inference; fix incoming via drivers.

What is AmicoScript and how do I use it?

A local UI blending Whisper transcription/diarization with Ollama summaries; clone from r/Ollama, run on your GPU for private audio processing.

🤝 Community & Governance

Local AI's Quiet Revolution: Gemma4 Fixes in llama.cpp, RTX cuBLAS Killer Bug, Whisper-Ollama UI

Your local LLM setup isn't dreaming anymore—llama.cpp just patched Gemma4's tool-calling woes. But watch out: NVIDIA's cuBLAS is choking RTX GPUs on basic math.

theAIcatchup Apr 10, 2026 4 min read

Split-screen of llama.cpp code merge, RTX GPU benchmark graph, and AmicoScript UI dashboard

⚡ Key Takeaways

llama.cpp's Gemma4 fixes unlock reliable tool calling and reasoning for local deployments. 𝕏
cuBLAS MatMul bug costs RTX users 60% perf on key AI ops—driver fix imminent. 𝕏
AmicoScript delivers privacy-first Whisper + Ollama for audio-to-insights workflows. 𝕏

Published by

theAIcatchup

Community-driven. Code-first.

#Gemma4 #Llama.cpp #Ollama Whisper #cuBLAS bug

Worth sharing?

Get the best Open Source stories of the week in your inbox — no noise, no spam.

Originally reported by Dev.to

⚡ Key Takeaways

The 60-Second TL;DR

theAIcatchup

Share this article

Worth sharing?

Related Stories

Small LLMs Fix Code Better Than They Write It

M1 Mac Becomes Offline AI Coding Monster with 26B Llama – Here's the Build

Intel's OpenVINO 2026.1 Cracks Open Llama.cpp — And Edge AI's Future

Intel NPU's LLM Reality Check: 96-Second Loads and CPU Wins on Core Ultra

Stay in the loop