Back
Hacker NewsLaunchHacker News2026-04-06

LM Studio Goes Headless: Local LLMs Can Now Run as a Server Daemon Without a GUI

LM Studio 0.4.0 ships a headless CLI that separates the inference engine from the GUI, enabling local language models to run as background server daemons in CI, Docker, and remote environments. Combined with a new stateful REST API and continuous batching, it's the most significant update to the local LLM stack in 2026.

Original source

LM Studio 0.4.0, released April 5, 2026, is a fundamental architectural shift for the most-used local LLM runner. The update introduces a headless CLI that lets developers start the inference engine as a background daemon — no display required — unlocking local models for CI/CD pipelines, Docker containers, cloud VMs, and scheduled tasks. For a tool that previously required a running desktop GUI to serve models, this is a watershed moment.

The CLI ships as `lms` (or `llmster` for those who prefer it), and handles everything the GUI did: downloading and managing models, starting inference servers, switching between models, and configuring hardware settings. The daemon mode persists through reboots and can be configured to auto-start specific models, making local LLM infrastructure operationally similar to Ollama but with LM Studio's broader model compatibility and richer hardware support.

Three features ship alongside the CLI: continuous batching (multiple simultaneous requests handled by one model instance, improving throughput for multi-user or multi-agent workloads), a stateful `/v1/chat` REST API that preserves conversation context between API calls, and an interactive terminal chat mode via `lms chat` that supports streaming and system prompt injection.

The stateful API deserves particular attention. Most local inference servers — and indeed most cloud LLM APIs — are stateless: you send the full conversation history on every request, and the server has no memory between calls. LM Studio 0.4.0's stateful mode lets the server maintain context, dramatically reducing token costs for agent loop implementations that would otherwise re-send thousands of tokens of history per turn.

The Hacker News thread focused heavily on running Gemma 4 under the new CLI — the model's efficiency profile makes it a natural match for 0.4.0's parallel request handling. The post reached 216 points, suggesting the update resonated with a developer audience that has been waiting for local LLMs to become first-class server infrastructure rather than desktop toys.

Panel Takes

The Builder

The Builder

Developer Perspective

The stateful /v1/chat API is the feature I didn't know I needed. Cutting context re-transmission from agent loops alone will meaningfully reduce local inference time. LM Studio 0.4.0 is the release that moves local models from demo to infrastructure.

The Skeptic

The Skeptic

Reality Check

Ollama has been headless and daemon-friendly since 2024, and the broader ecosystem has built around it. LM Studio is playing catch-up on infrastructure while Ollama leads. The stateful API is nice but it creates lock-in — you can't easily switch inference backends if state is held server-side.

The Futurist

The Futurist

Big Picture

Headless local inference is the prerequisite for personal AI agents that are truly yours. When your models run as daemons on your own hardware with stateful context, you have an AI with persistent memory that no company can shut down, reprice, or subpoena.