App: AI Voices

AI Voices provides text-to-speech, transcription, and lip-sync data generation powered by AWS Polly neural voice engine. Convert any text into natural-sounding spoken audio across multiple voices and languages, with output delivered as MP3 or other audio formats ready to serve directly to web clients or store for reuse.
Beyond basic audio, AI Voices generates lip-sync tween data alongside speech - a frame-by-frame animation dataset that drives avatar mouth movements in sync with the generated audio. This enables realistic talking avatars and animated characters built directly from your content without any external animation software.
AI Voices is designed as a service layer for the rest of the platform. The AI Chatbot can route its responses directly through Voices to produce spoken replies, combining conversational AI with audio output in a single flow. Chain Commands can call Voices as a node in any automation sequence. Custom apps can use it as a building block for any product that requires spoken output or transcription.
- Text-to-speech synthesis using AWS Polly neural voice engine - natural, high-quality spoken audio from any text input.
- Multiple voices and language codes - choose from a library of neural voices across dozens of languages and regional accents.
- Lip-sync tween data generation - frame-level animation data delivered alongside audio to drive realistic avatar mouth movement.
- Speech-to-text transcription for converting recorded or uploaded audio into text.
- Multiple output formats including MP3 for direct web delivery.
- Per-character voice configuration - set default voice, engine, output format, and language at the character level for consistent output across a project.
- Native AppBridge integration with the AI Chatbot - chatbot responses can be spoken automatically when audio output is enabled on a bot.
- Callable from Chain Commands for inclusion in any automated workflow that requires audio output.