This repository contains a set of demos demonstrating OpenAI’s Realtime API features, including speech recognition, noise reduction, and interactive voice responses. The demos are structured within the "realtime" directory and can be run in your browser after providing a valid OpenAI API key. They are also available online here.
-
Realtime Voice Agent (basic):
- Allows you to have a voice conversation with an AI agent.
- Prompts you to enter an OpenAI API key, select a voice model, and provide optional instructions.
-
Realtime Transcription (transcribe):
- Streams live audio input from your microphone (or a file) and transcribes it in realtime.
-
Realtime Loopback (noise_reduction):
- Demonstrates how local audio and the effects of noise reduction can be monitored.
- Clone or download this repository.
- Open the file "realtime/index.html" in your browser, which serves as a hub for all demos.
- Pick a demo from the links provided.
- Enter a valid OpenAI API key (you can obtain one from the OpenAI dashboard). It will be stored in browser local storage.
- Adjust any settings or instructions, then start the microphone to see the realtime features in action.
- Each demo is contained in its own folder, with an HTML file, a corresponding JavaScript file, and CSS styles shared from "main.css".
- The primary logic for the basic voice agent is in "realtime/basic/main.js".
- The "Session" logic (located in "session.js") handles the realtime WebRTC connection, sending audio to OpenAI, and receiving text and audio responses.
Feel free to open a pull request or file an issue if you find a bug or have suggestions for improvements.
These demos are intended for educational purposes to showcase the Realtime API capabilities.