What is the best small model for LLM function calling?

Qwen3.5 4B Q4_K_M at 97.5% accuracy on this test — fits 8GB VRAM easy.

Why do small LLMs beat large ones in tool use?

Tool calling prioritizes format compliance over knowledge; tuning shines in compact models.

Use GBNF grammars for JSON enforcement — see code above for Qwen3.5 setup.

Forget parameter counts. A puny 3.4GB model just schooled the heavyweights in function calling. Here's the data shaking up LLM deployment.

theAIcatchup Apr 07, 2026 3 min read

Published by

Community-driven. Code-first.

#LLM benchmarks #Qwen 4B #Qwen3.5 4B #function calling #local AI agents #quantized models #tool use LLM

Get the best Open Source stories of the week in your inbox — no noise, no spam.

Originally reported by Dev.to