Barge-in is one of the fastest ways a voice experience fails. Users talk over the AI. The system hesitates. Audio overlaps. Transcripts degrade. Conversations derail.
Testing barge-in properly means validating how your ASR, TTS, and voice infrastructure behave under real interruption pressure, not just whether an interruption happened.
Cekura is built to test barge-in behavior across different ASR engines, voice stacks, and infrastructure setups using automated, repeatable voice simulations.
Below is what a complete barge-in testing tool needs to cover and how Cekura supports it end to end.
What “Good” Barge-In Looks Like in Practice
Speech detection during active TTS
Barge-in only matters if user speech is detected while the AI is still talking. Cekura detects overlap at the audio level and explicitly labels these moments as interruption events Detection works for very short interruptions under 300 ms and for softer, low-volume speech within ASR limits.
Fast and measurable TTS stoppage
It is not enough to know the user interrupted. You need to know how long the AI kept talking afterward. Cekura measures interruption overrun in milliseconds, capturing exactly how long TTS continued after user speech began.
Accurate transcription under overlap
Overlapping speech is transcribed and retained. Teams can isolate interrupted runs and compare transcription quality against clean runs using custom or Python-based metrics to quantify degradation.
Correct recovery after interruption
After barge-in, the conversation must resume cleanly. Cekura evaluates recovery through instruction following, response consistency, and workflow success. Any deviation is timestamped at the moment state breaks.
Threshold-ready metrics
Latency, interruption counts, overrun duration, and timing data are available as numeric metrics. False and missed barge-in rates can be derived across scenarios. ASR WER deltas can be computed directly from transcripts when needed.
Audio Handling Across ASR Engines
Full-duplex voice testing
Cekura supports duplex conversations with overlapping send and receive paths through telephony, WebRTC, and WebSocket integrations including VAPI, Retell, LiveKit, Pipecat, and ElevenLabs.
No forced echo cancellation assumptions
Cekura does not require a specific AEC model. Stereo recordings are recommended so overlap and interruption events can be distinguished cleanly regardless of provider.
Outcome-focused VAD evaluation
Voice activity detection behavior is evaluated based on results, not configuration. Whether the ASR missed speech or triggered false interruptions shows up directly in interruption and silence metrics.
Latency without raw partial tokens
Partial ASR behavior is inferred through timing. Speech onset, detection, TTS stop, and final transcript availability are all timestamped for precise latency analysis.
Latency and Timing You Can Actually Compare
Every barge-in test captures event-level timestamps that allow you to derive:
-
TTS playback start
-
User speech onset
-
ASR detection timing
-
TTS stop or overrun
-
Final transcript readiness
From this, teams get distributions like mean, P50, and P90 for interruption latency and recovery time across ASR engines and versions.
Recognition Quality Under Interruption
Barge-in rarely happens cleanly. Users interrupt mid-sentence, at prosodic peaks, or with varying speech styles.
Cekura supports:
-
Mid-sentence and late interruptions
-
Soft speech and hesitant interruptions
-
Variations in pace, volume, and assertiveness
-
Clean versus interrupted run comparisons using identical scenarios
This makes it possible to measure how much recognition quality drops when interruptions occur and which ASR engines degrade least.
Noise and Robustness Testing
Real calls include cafés, offices, background chatter, and silence artifacts.
Cekura personalities simulate:
-
Ambient noise environments
-
Noise-only interactions
-
Near-field versus far-field speech effects
-
Silence-related infrastructure failures
False interruptions and missed barge-ins can be inferred directly from noise-heavy and expected-interruption scenarios using interruption counts, overrun timing, and silence metrics.
Automated Test Harness for Barge-In
Cekura replaces manual call testing with a deterministic harness:
-
Controlled TTS playback
-
Automated speech injection through scripted scenarios
-
Simultaneous duplex routing
-
Provider-agnostic transcript ingestion
-
Millisecond-level event logging
Scenarios are reusable and can be triggered via CI, cron jobs, or A/B test runs to catch regressions early.
Barge-In Scenarios You Can Run Automatically
Teams can automate:
-
Immediate interruptions
-
Mid-response barge-ins
-
Late interruptions
-
Rapid repeated interruptions
-
Noise-only calls
-
Low-volume, low-assertiveness interruptions
Each scenario can be replayed across ASR engines, prompts, models, and infrastructure changes with identical inputs.
Scoring and Comparison Across ASR Engines
Cekura supports:
-
Numeric, boolean, rating, and Python-defined metrics
-
Custom pass and fail logic per call
-
Aggregated scoring across latency, interruption behavior, accuracy, and stability
-
Locked baselines for regression tracking
-
Side-by-side comparisons across versions and providers
This makes ASR benchmarking concrete instead of anecdotal.
Why Teams Use Cekura for Barge-In Testing
If your voice agent speaks over users, misses interruptions, or loses state after overlap, you will not see it through basic logging.
Cekura lets teams test barge-in the way users actually experience it. Across ASR engines. Across noise conditions. Across real conversation dynamics. Before those failures hit production.
