AITranslations

AI‑powered live translation for multilingual receiving and live video.

AITranslations is a private project for near real-time, AI-based translation. It is not a commercial offering. The goal is that people can follow the same event in different languages — with high quality, optional privacy choices, and controllable ongoing costs.

Access & rooms

Everything happens inside a “room”. A room is the shared context for an event: audio/video can be sent into the room, and receivers get the translation in their language.

Rooms can be public or password-protected. Sender access (streaming audio/video and managing settings) is limited to authorized users. Usage is invitation-only or by request.

Cost transparency (Bring Your Own API Key)

AITranslations follows a “Bring Your Own API Key” approach: the platform connects to your own accounts with AI providers. This keeps provider choice and billing under your control.

Ongoing costs mainly depend on the chosen mode, the model, and the number of target languages — and can be managed deliberately.

Modularity & privacy

The platform is modular. Depending on your needs, you can use high-end models for maximum clarity, or choose setups that aim for stronger privacy (for example alternative providers or self-hostable models).

Depending on configuration, audio and/or text is transmitted to external AI providers for processing. There is no advertising tracking; operational measurements may be used to keep the service reliable.

Three paths for live translation

AITranslations can process spoken language in three ways. The paths differ in whether translated audio is produced directly or whether text is created first.

1) Direct speech-to-speech translation (Speech-to-Speech, S2S)

A realtime model converts spoken language directly into translated speech. This approach is designed for natural playback and low delay.

2) Speech translation followed by speech output (AST→TTS)

Spoken language is first converted into translated text in the target language. A text-to-speech model (TTS) then turns it into speech output.

3) Transcript-based processing (STT/ASR → Translation → TTS)

Spoken language is transcribed first. The transcript is then translated and rendered as audio by a text-to-speech model (TTS).

Quality in live use

AITranslations works best when the input signal is clear: a good microphone, sufficient proximity to the speaker, and a quiet environment noticeably improve clarity and translation quality.

Roles: receivers and sender

Receivers can join a room via link or QR code (and enter a password if required). The sender role requires authorization to stream audio/video and manage room settings.

Work in progress

AITranslations is still under active development. Some features may be incomplete, change without notice, or not work in every setup yet.