For millions who can’t hear, lip reading offers a window into conversations that would be lost without it. But the practice is hard—and the results are often inaccurate.
copyright by www.sciencemag.org
Now, researchers are reporting a new () program that outperformed professional lip readers and the best to date, with just half the error rate of the previous best algorithm. If perfected and integrated into smart devices, the approach could put lip reading in the palm of everyone’s hands.
“It’s a fantastic piece of work,” says Helen Bear, a computer scientist at Queen Mary University of London who was not involved with the project.
Writing computer code that can read lips is maddeningly difficult. So in the new study scientists turned to a form of called , in which computers learn from data. They fed their system thousands of hours of videos along with transcripts, and had the computer solve the task for itself.
The researchers started with 140,000 hours of YouTube videos of people talking in diverse situations. Then, they designed a program that created clips a few seconds long with the mouth movement for each phoneme, or word sound, annotated. The program filtered out non-English
The process and the resulting data set—seven times larger than anything of its kind—are “important and valuable” for anyone else who wants to train similar systems to read lips, says Hassan Akbari, a computer scientist at Columbia University who was not involved in the research.
The process relies in part on neural networks,
After training, the researchers tested their system on 37 minutes of video it had not seen before. The
1 Comment