Transcribe AI - Transcription of audio and video recordings
The service generates transcripts from audio or video files and delivers files with spoken text and subtitles.
The transcription takes place on KIT's own hardware and does not leave KIT. A translation can be requested via the EU service eTranslation.
Transcribe AI is available to all KIT employees, guests and partners.

Contact:servicedesk@scc.kit.edu

Links:
- Start Transcribe AI

General

With the Transcribe-AI service, it is possible to convert the spoken word from an audio or video file into text. The file can be uploaded via drag & drop or specified with a link to the file.

The result of the transcription is made available as a text file for download on the website. There is also subtitling, i.e. the output is in another special file format (WebVTT) that supports the subtitling of videos. It is helpful for a good transcription result to specify the source language.

A translation can also be requested. After transcription, the text is forwarded to the EU GOV service eTranslation via an API interface and translated. The target languages must be specified at the start of the transcription. For very critical content that absolutely should not “leave” the KIT, automatic translation should not be used.

Transcription takes place on the KIT's own hardware. The processing of the uploaded files takes place after incoming orders and can take some time, depending on the current workload.

This service does not provide an archiving or backup function for audio and video files or their results.

The results, transcripts & subtitle files, are available for download for 7 days and are then deleted.

Included services

Transcribe AI works with OpenAI Whisper on in-house hardware and uses the Whisper language model for text conversion. The translation is carried out using the EU service eTranslation, which is operated outside KIT by the EU.

Service availability is limited to core working hours; maintenance takes place at off-peak times when the service is in use.

The service creates a sentence-based subtitle file with timestamps in VTT format. Both the translated and the transcribed text can be downloaded as a text file and in subtitle format.

As the FFMPEG tool is used in the background to convert audio and video files, numerous formats are supported. These include the common audio formats: MP3, OGG, WAV, AAC, M4A, OPUS or the video formats: MP4, MPEG, MOV, AVI and many more.
The audio track is extracted from the video formats and used for transcription and translation. If a format is not listed here, it is still worth trying the transcription.

Please note that no authentication/login can be performed when specifying a link (URL) to a file. This means that only services that allow files to be downloaded without authentication can be used. For example, Amazon S3 offers the option of creating a time-limited access link to a file, which can then be used to download and transcribe precisely this file.

Use of subtitles

Many video players support the VTT format and therefore allow subtitling of the video. For example, the VTT file can be selected in the VLC Media Player via the menu item “Subtitles” -> “Add subtitles”.

Embedding in HTML is also possible:

Requirements and restrictions

The creation of transcriptions is only possible for KIT employees and persons with a GuP account.

Access to Transcribe AI is only possible from inside the KIT network, a VPN connection is required for remote access.

In rare cases, “hallucinations” may occur. Whisper will make up text in places where nothing was spoken or “recognize” incorrect text. Please check the transcription result for correctness, respectively.

We have no influence on the result of the translation. For data protection reasons, the texts are forwarded to the EU GOV service eTranslation.

Transcribe AI - Transcription of audio and video recordings

General

Included services

Use of subtitles

Requirements and restrictions