AI-powered YouTube Stories

Ismriti Gupta
Oct 7, 2020
1 min read

For all the short video content makers, Google launched Looking-to-Listen, a new audiovisual speech enhancement feature in YouTube Stories which is currently available with iOS devices. Leveraging AI and machine learning will allow creators to record better selfie videos by automatically boosting their voices and reducing background noise.

While tremendous efforts are invested in improving the quality of videos taken with smartphone cameras, the quality of audio in videos is often overlooked. 'Enhance speech' feature is based on machine learning (ML) technology which uses both audio and visual signals to distinguish the voices of people in a video from one another and other background noise. The feature fuses 'enhanced speech' with just 10% background noise to improve video quality in iOS devices. YouTube Stories users can access the feature from the volume controls editing tool. After the video is recorded, the audio and the visual features are processed using the speech separation model to produce the enhanced speech. They can then compare the original video with the enhanced version.

With various architectural optimizations and improvements, they successfully reduced Looking-to-Listen’s running time from 10 times real-time on a desktop to 0.5 times performance using only an iPhone processor. The result is that enhanced speech is available within seconds after YouTube Stories recordings finish.

The search giant had trained the ML model on a large collection of online videos to capture connections between speech and visual signals such as mouth movements and facial expressions. And tested the technology’s performance in different recording conditions. and on people with different appearances and voices.

AI-powered YouTube Stories

Recent Posts

Комментарии

Subscribe Form