-
Notifications
You must be signed in to change notification settings - Fork 4.1k
Description
Summary:
I would like to contribute a new .NET demo named VoiceChat, showing a real-time voice pipeline built with Semantic Kernel.
This demo is an end-to-end console app that:
- Captures microphone audio
- Uses voice activity detection (VAD) to gate processing
- Transcribes speech to text (STT)
- Sends text to an LLM via Semantic Kernel
- Streams the LLM’s response as audio (TTS) back to the user
- Handles user voice interruptions (barge-ins) so the user can cut in while the AI is speaking
- Orchestrates flow with TPL Dataflow for efficient, non-blocking processing
Why:
While the existing dotnet/samples/Demos/OpenAIRealtime demo focuses on a minimal Semantic Kernel integration with the OpenAI Realtime API (preview), VoiceChat demo adds voice activity detection (VAD), barge-in handling, and a TPL Dataflow-based architecture. These enhancements make it may be used as starting point for a production voice chat agents. Also this demo is easy to run in a console environment
The VoiceChat demo expands on this by:
- Adding voice activity detection (VAD) to skip processing when no speech is detected, reducing unnecessary API calls and latency
- Using TPL Dataflow for structured, asynchronous message passing and backpressure handling, making the flow easier to extend or integrate into larger systems
- Demonstrating streaming responses from the LLM through to audio output, so users hear the reply as it’s generated
- Providing a console-based implementation that is easy to run locally without additional UI frameworks
This makes the sample more representative of real-world voice agent architectures while keeping it runnable in a minimal environment.
Link to working code
VoiceChat demo on GitHub