Research that advances artificial intelligence for everyone
AI still leaves too many questions unanswered. At Parloa Labs, we open the black box to understand what today's systems are capable of and where they can go next.
:format(webp))
Our research areas
We study the frontiers of voice and agentic AI and publish our work openly, share practical learnings, and give back to the broader community’s understanding of what makes conversations work.
Voice infrastructure
Building telephony systems that handle real-time voice interactions, e.g., speech recognition, synthesis, latency optimization, and audio quality at scale.
Agent architecture
Designing AI agents that reason, plan, and execute complex tasks: model orchestration, multi-agent systems, and prompt engineering approaches.
Agent capabilties
Developing reusable skills for common scenarios, such as routing, authentication, knowledge retrieval, and integrations with backend systems.
Observability & optimization
Understanding agent behavior through analytics, simulations, evaluations, and continuous improvement based on production data.
Latest findings and discoveries
Building customer-facing data products: A builder’s perspective
At Parloa, our AI agents drive high-stakes customer interactions, which demands a data platform designed for resilience. This article gives an overview on the architecture and governance principles we’ve implemented to meet this challenge.
GPT-5.2 doesn’t just follow instructions, it follows through
There’s a specific kind of model failure we like to track closely. It doesn’t show up in latency graphs or user feedback. It sounds like the system is doing the right thing. It looks like it’s doing the right thing. And yet: the action never happens.
The never-ending conversation: Measuring long-conversation performance in LLMs
At Parloa, our AI agents handle these long conversations every day, i.e., tool-heavy dialogues where context grows fast and ambiguity grows faster. And we wanted to know: does performance actually degrade when conversations get that long or do today’s models hold steady?
What happens when calls never end?
Every customer conversation has a rhythm: a start, a middle, and (generally) an end. But what happens when it doesn’t?
We got early access to GPT-5.1 Thinking. Naturally, we tested it
We’ve been quietly testing OpenAI’s GPT-5.1. We’ve been running it through a series of internal real-world benchmarks, especially for tool calling and instruction following, which are foundational to how customer service agents operate inside Parloa’s AI agent management platform.
A Bayesian framework for A/B testing AI agents
We’re introducing a hierarchical Bayesian model for A/B testing AI agents. It combines deterministic binary metrics and LLM-judge scores into a single framework that accounts for variation across different groups.
We believe the best innovation happens at the intersection of theory and practice
Every day, Parloa’s AI agents handle millions of interactions across industries and languages. This gives us a unique vantage point to identify real challenges, test solutions at scale, and contribute meaningful findings back to the research community.