Converting speech to text is a difficult technological problem, especially if you can't train the speech recognition software. Here's a video that illustrates how YouTube's audio transcription works for novels (also check the original video):
The results are terrible, but you should take into account that auto-captioning works best for speeches. There are many hilarious mistakes: "George Orwell" is recognized as "but it wasn't", "Lolita" is converted to "don't think so", "the hobbit" is recognized as "the hall", while "cold day" is converted to "cocaine".
And if that's not enough, try to enable auto-captioning for the video embedded above. "This goes on a infinite loop... the transcribe audio function applied to this version transforms entire non-sense phrases into single words," comments RequiemPipes.
{ Thanks, Richard. }
Speech recognition is bad because for the last 40+ years, they've been trying a brute force approach. It's gotten a little better as computers have gotten faster chips and more memory. While a untrained system is usable for simple voice commands/vocabulary or a trained system for more extensive use, handling normal speech is still terrible. The long awaited machine translation introduces further complications and is in even worse shape.
ReplyDeleteI wonder if Google just made a poor choice of test material:
ReplyDeletehttp://www.youtube.com/watch?v=M2iD-oNqD_I