Ship or Skip — Daily AI Tool Reviews

LM Studio 0.4.0, released April 5, 2026, is a fundamental architectural shift for the most-used local LLM runner. The update introduces a headless CLI that lets developers start the inference engine as a background daemon — no display required — unlocking local models for CI/CD pipelines, Docker containers, cloud VMs, and scheduled tasks. For a tool that previously required a running desktop GUI to serve models, this is a watershed moment.

The CLI ships as `lms` (or `llmster` for those who prefer it), and handles everything the GUI did: downloading and managing models, starting inference servers, switching between models, and configuring hardware settings. The daemon mode persists through reboots and can be configured to auto-start specific models, making local LLM infrastructure operationally similar to Ollama but with LM Studio's broader model compatibility and richer hardware support.

Three features ship alongside the CLI: continuous batching (multiple simultaneous requests handled by one model instance, improving throughput for multi-user or multi-agent workloads), a stateful `/v1/chat` REST API that preserves conversation context between API calls, and an interactive terminal chat mode via `lms chat` that supports streaming and system prompt injection.

The stateful API deserves particular attention. Most local inference servers — and indeed most cloud LLM APIs — are stateless: you send the full conversation history on every request, and the server has no memory between calls. LM Studio 0.4.0's stateful mode lets the server maintain context, dramatically reducing token costs for agent loop implementations that would otherwise re-send thousands of tokens of history per turn.

The Hacker News thread focused heavily on running Gemma 4 under the new CLI — the model's efficiency profile makes it a natural match for 0.4.0's parallel request handling. The post reached 216 points, suggesting the update resonated with a developer audience that has been waiting for local LLMs to become first-class server infrastructure rather than desktop toys.

LM Studio Goes Headless: Local LLMs Can Now Run as a Server Daemon Without a GUI

Panel Takes