Research that advances artificial intelligence for everyone

AI still leaves too many questions unanswered. At Parloa Labs, we open the black box to understand what today's systems are capable of and where they can go next.

Research that advances artificial intelligence for everyone

Our research areas

We study the frontiers of voice and agentic AI
and publish our work openly, share practical learnings, and give back to the broader community’s understanding of what makes conversations work.

Voice infrastructure

Building telephony systems that handle real-time voice interactions, e.g., speech recognition, synthesis, latency optimization, and audio quality at scale.

Agent architecture

Designing AI agents that reason, plan, and execute complex tasks: model orchestration, multi-agent systems, and prompt engineering approaches.

Agent capabilties

Developing reusable skills for common scenarios, such as routing, authentication, knowledge retrieval, and integrations with backend systems.

Observability & optimization

Understanding agent behavior through analytics, simulations, evaluations, and continuous improvement based on production data.

Latest findings and discoveries

Insights
Building customer-facing data products: A builder’s perspective

At Parloa, our AI agents drive high-stakes customer interactions, which demands a data platform designed for resilience. This article gives an overview on the architecture and governance principles we’ve implemented to meet this challenge.

Elisabeth Reitmayr

Insights
GPT-5.2 doesn’t just follow instructions, it follows through

There’s a specific kind of model failure we like to track closely. It doesn’t show up in latency graphs or user feedback. It sounds like the system is doing the right thing. It looks like it’s doing the right thing. And yet: the action never happens.

Matthäus Deutsch and Stefan Ostwald

Research
The never-ending conversation: Measuring long-conversation performance in LLMs

At Parloa, our AI agents handle these long conversations every day, i.e., tool-heavy dialogues where context grows fast and ambiguity grows faster. And we wanted to know: does performance actually degrade when conversations get that long or do today’s models hold steady? 

Mariano Kamp and Stefan Ostwald

Insights
What happens when calls never end?

Every customer conversation has a rhythm: a start, a middle, and (generally) an end. But what happens when it doesn’t?

Mariano Kamp and Stefan Ostwald

Research
We got early access to GPT-5.1 Thinking. Naturally, we tested it

We’ve been quietly testing OpenAI’s GPT-5.1. We’ve been running it through a series of internal real-world benchmarks, especially for tool calling and instruction following, which are foundational to how customer service agents operate inside Parloa’s AI agent management platform.

Matthäus Deutsch, Stefan Ostwald and Rouven Glauert

Research
A Bayesian framework for A/B testing AI agents

We’re introducing a hierarchical Bayesian model for A/B testing AI agents. It combines deterministic binary metrics and LLM-judge scores into a single framework that accounts for variation across different groups. 

Matthäus Deutsch, Stefan Ostwald and Rouven Glauert

Our approach

We believe the best innovation happens at the intersection of theory and practice

Every day, Parloa’s AI agents handle millions of interactions across industries and languages. This gives us a unique vantage point to identify real challenges, test solutions at scale, and contribute meaningful findings back to the research community.