An unofficial blog that watches Google's attempts to move your operating system online since 2005. Not affiliated with Google.

Send your tips to

April 26, 2010

YouTube Auto-Captioning for Classic Novels

YouTube's auto-captioning feature is impressive, even if the results are sometimes hilarious. "Auto-captioning combines some of the speech-to-text algorithms found in Google's Voice Search to automatically generate video captions when requested by a viewer. The video owner can also download the auto-generated captions, improve them, and upload the new version."

Converting speech to text is a difficult technological problem, especially if you can't train the speech recognition software. Here's a video that illustrates how YouTube's audio transcription works for novels (also check the original video):

The results are terrible, but you should take into account that auto-captioning works best for speeches. There are many hilarious mistakes: "George Orwell" is recognized as "but it wasn't", "Lolita" is converted to "don't think so", "the hobbit" is recognized as "the hall", while "cold day" is converted to "cocaine".

And if that's not enough, try to enable auto-captioning for the video embedded above. "This goes on a infinite loop... the transcribe audio function applied to this version transforms entire non-sense phrases into single words," comments RequiemPipes.

{ Thanks, Richard. }


  1. Speech recognition is bad because for the last 40+ years, they've been trying a brute force approach. It's gotten a little better as computers have gotten faster chips and more memory. While a untrained system is usable for simple voice commands/vocabulary or a trained system for more extensive use, handling normal speech is still terrible. The long awaited machine translation introduces further complications and is in even worse shape.

  2. I wonder if Google just made a poor choice of test material:


Note: Only a member of this blog may post a comment.