PersonaPlex
NVIDIA's 7B voice model that talks and listens simultaneously — 70ms latency
PersonaPlex is NVIDIA's open research model for full-duplex voice conversation — meaning it processes incoming speech and generates its spoken response at the same time, enabling real interruptions, barge-ins, and natural conversational overlap. Current voice AI pipelines are walkie-talkie style: the AI waits for you to stop, processes, then responds. PersonaPlex eliminates that turn-taking constraint. The 7B-parameter model achieves ~70ms end-to-end response latency and handles persona and voice control through two mechanisms: a text prompt that describes the persona's personality and speaking style, and an optional audio sample for voice cloning. The duplex architecture means it can detect mid-sentence whether you're interrupting (and stop gracefully) versus just clearing your throat (and continue). It ships with inference code, persona configuration examples, and a demo server. PersonaPlex was released in January 2026 as open research and is gaining significant traction this week (295 new stars today) as developers building voice agents discover it. The open model weights make it deployable on NVIDIA hardware without API dependencies, and the 7B scale means it runs comfortably on a single A100 or H100. The primary constraint is that full-duplex requires low-latency streaming infrastructure — it's not a drop-in for existing HTTP-based voice pipelines.
Panel Reviews
The Builder
Developer Perspective
“70ms with real interruption handling is a leap over anything I've built with pipeline-based approaches. The persona control via text prompt is flexible enough to cover most use cases. The main engineering challenge is the streaming infrastructure — this isn't plug-and-play, you need WebSocket or WebRTC plumbing — but for serious voice agent work, that's worth the investment.”
The Skeptic
Reality Check
“Full-duplex in a research model doesn't mean production-ready full-duplex. The non-commercial research license blocks most commercial deployments, and NVIDIA-specific optimization creates hardware lock-in. OpenAI and ElevenLabs already have managed full-duplex APIs; wait for a commercial-licensed version before building on this.”
The Futurist
Big Picture
“Full-duplex voice AI removes the last major uncanny valley in AI conversation — the awkward pause while the model waits. Once this pattern is widespread, conversations with AI agents will feel phonically indistinguishable from human calls. PersonaPlex is the open-source reference architecture for that future; competitors will ship commercial versions within months.”
The Creator
Content & Design
“The voice persona control is compelling for content creators building AI hosts or characters — you describe the personality and voice in text, provide an audio sample, and you get a consistent character. For podcasters and interactive content, this is a meaningful creative tool once it reaches more accessible hardware.”
Community Sentiment
“Research license vs commercial deployment gap”
“70ms latency benchmark credibility”
“Comparing to OpenAI and ElevenLabs realtime APIs”