ProtoATT - prototype of system for automatic transcription of Czech broadcast (2004)
Automatic spoken documents transcription is a very computation-intensive task. State-of-the-art transcription systems employ Viterbi decoder and Hidden Markov Models, where speed of transcription is strictly determined by:
• vocabulary size
• processor speed
• memory bandwidth
Czech language belongs among inflective languages and for good spoken language coverage transcription systems need large vocabularies, which directly decrease transcription speed. There are several possibilities how to accelerate transcription of continuous multimedia stream without decreasing vocabulary size. The first one is parallel Viterbi decoding, that is very hard to implement on current hardware. The second approach is temporal segmentation of data and their simultaneous processing by a cluster of computers. This work deals with the latter approach.
Our system works with vocabulary of 200 000 Czech words and transcription of 10 minutes of broadcast takes 40 minutes on 1.5 GHz PC. Recognition rate of our system is about 75% (80 - 85% in noisy - clear conditions).
Více informací: