Ship or Skip — Daily AI Tool Reviews

Google's Gemma 4 is the first open-weight model family to credibly compete with frontier closed models on reasoning and multimodal benchmarks while remaining runnable on a single GPU.

The family ships in four sizes: Effective 2B (E2B), Effective 4B (E4B), 26B Mixture of Experts (MoE), and 31B Dense. Every model in the family processes images and video natively. The E2B and E4B edge models also handle audio input directly — speech recognition without a separate transcription step — and support a 128K context window. The 26B and 31B models extend to 256K context.

The 31B Dense model currently sits at #3 on the global open model arena leaderboard. The 26B MoE occupies #6. These rankings put Gemma 4 ahead of every previous Gemma generation and ahead of most competing open-weight families at comparable sizes.

Built from the same research stack as Gemini 3, Gemma 4 natively supports function-calling with structured JSON output — the primitive required for reliable agentic workflows. Google trained the entire family on 140+ languages.

License is Apache 2.0, which permits commercial use without per-unit fees or deployment restrictions. Available on Google Cloud, Hugging Face, Ollama, and Kaggle from day one.

The significance: open-weight models reaching #3 on the global leaderboard — behind only the largest closed proprietary models — changes the calculus for any team currently paying per-token API costs for reasoning tasks. The efficiency story is real: the 26B MoE activates only a fraction of parameters per token, making inference cost genuinely competitive with smaller dense models.

Google Launches Gemma 4: Open-Weight Multimodal Models That Run on a Single GPU and Rank Third Globally

Panel Takes