Audio and video transcribing service that allows you to automatically create and translate subtitles and perform text summarization.
This voice can be
Customer
Industry
Entertainment
Region
Russia
Client since
2020
Our client translates movies, serials and educational videos. Detailed information about the client cannot be disclosed under the provisions of the NDA.
Challenge
Since the amount of information generated by people is calculated exponentially, the number of audio and video files is calculated proportionally. Most of the audio transcribing, text translation and audio summarization operations were done manually or with primitive tools by the customer, which required hiring more and more employees for these operations. However, profits did not grow much, as the cost of increasing the number of employees was growing and competitors were reducing prices by introducing AI systems into their business processes.
The customer decided to automate the process of transcribing, translating and summarizing audio and turned to us. It was necessary to develop a cloud-based platform that would allow transcribing text with timings and the possibility of manual editing if necessary. It was necessary for the server to be able to summarize the content of audio/video files and automatically generate an announcement without revealing the main plot of the script. It was also necessary to link the service with the specified translator API for automatic translation of transcribed audio files.
Solution
Our team decided to make the frontend based on React and the backend based on FastAPI. To perform counting operations for neural networks that converted voice to text, we decided to use a distributed computing network built on Ray. For voice-to-text conversion it was decided to take from free access several pre-trained neural networks, licensing of which allowed to use them for commercial purposes, compare them with each other and choose the best one in terms of quality-performance ratio. The possibility of pre-training of neural networks was allowed if necessary.
Our team designed an architecture for distributed computing based on Ray cluster. The designed architecture allowed using pre-trained neural networks on any computer that met the specified minimum requirements to run the necessary computations. This approach allowed the customer to flexibly use the existing computing power and, if necessary, to expand the network by simply installing the necessary software on the final computing node, which as a result would be automatically added to the computing cluster.
Our team designed an architecture for distributed computing based on Ray cluster. The designed architecture allowed using pre-trained neural networks on any computer that met the specified minimum requirements to run the necessary computations. This approach allowed the customer to flexibly use the existing computing power and, if necessary, to expand the network by simply installing the necessary software on the final computing node, which as a result would be automatically added to the computing cluster.
Technologies
Languages
Python, JavaScript
Frontend
React, Material UI
Backend
FastAPI, Ray
ML
TensorFlow, Keras, Transformers
DB
PostgreSQL, Redis
Process
Scrum was used to manage the development of the project using Agile methodology, which allowed to get a working prototype in the shortest possible time and gradually increase the functionality to the required for the customer.
Team
2
Backend developers
1
Frontend developer
1
DevOps
1
Project Manager
1
ML developer
Results
Using the developed platform allowed the customer to reduce the staff by 20% and increase the amount of processed data by 200% within a year! These results look quite impressive. According to the customer, he was able to significantly increase his profits by automating routine processes.