Service for transcribing audio and video

Audio and video transcribing service that allows you to automatically create and translate subtitles and perform text summarization.

This voice can be

Customer

Industry

Entertainment

Region

Russia

Client since

2020

Our client translates movies, serials and educational videos. Detailed information about the client cannot be disclosed under the provisions of the NDA.

Challenge

Since the amount of information generated by people is calculated exponentially, the number of audio and video files is calculated proportionally. Most of the audio transcribing, text translation and audio summarization operations were done manually or with primitive tools by the customer, which required hiring more and more employees for these operations. However, profits did not grow much, as the cost of increasing the number of employees was growing and competitors were reducing prices by introducing AI systems into their business processes.

The customer decided to automate the process of transcribing, translating and summarizing audio and turned to us. It was necessary to develop a cloud-based platform that would allow transcribing text with timings and the possibility of manual editing if necessary. It was necessary for the server to be able to summarize the content of audio/video files and automatically generate an announcement without revealing the main plot of the script. It was also necessary to link the service with the specified translator API for automatic translation of transcribed audio files.

Solution

Our team decided to make the frontend based on React and the backend based on FastAPI. To perform counting operations for neural networks that converted voice to text, we decided to use a distributed computing network built on Ray. For voice-to-text conversion it was decided to take from free access several pre-trained neural networks, licensing of which allowed to use them for commercial purposes, compare them with each other and choose the best one in terms of quality-performance ratio. The possibility of pre-training of neural networks was allowed if necessary.

Our team designed an architecture for distributed computing based on Ray cluster. The designed architecture allowed using pre-trained neural networks on any computer that met the specified minimum requirements to run the necessary computations. This approach allowed the customer to flexibly use the existing computing power and, if necessary, to expand the network by simply installing the necessary software on the final computing node, which as a result would be automatically added to the computing cluster.

Technologies

Languages

Python, JavaScript

Frontend

React, Material UI

Backend

FastAPI, Ray

ML

TensorFlow, Keras, Transformers

DB

PostgreSQL, Redis

Process

Scrum was used to manage the development of the project using Agile methodology, which allowed to get a working prototype in the shortest possible time and gradually increase the functionality to the required for the customer.

Team

2

Backend developers

1

Frontend developer

1

DevOps

1

Project Manager

1

ML developer

Results

Using the developed platform allowed the customer to reduce the staff by 20% and increase the amount of processed data by 200% within a year! These results look quite impressive. According to the customer, he was able to significantly increase his profits by automating routine processes.