ProtoATT - prototype of system for automatic transcription of Czech broadcast (2004)

Automatic spoken documents transcription is a very computation-intensive task. State-of-the-art transcription systems employ Viterbi decoder and Hidden Markov Models, where speed of transcription is strictly determined by:

• vocabulary size
• processor speed
• memory bandwidth

Czech language belongs among inflective languages and for good spoken language coverage transcription systems need large vocabularies, which directly decrease transcription speed. There are several possibilities how to accelerate transcription of continuous multimedia stream without decreasing vocabulary size. The first one is parallel Viterbi decoding, that is very hard to implement on current hardware. The second approach is temporal segmentation of data and their simultaneous processing by a cluster of computers. This work deals with the latter approach.

Our system works with vocabulary of 200 000 Czech words and transcription of 10 minutes of broadcast takes 40 minutes on 1.5 GHz PC. Recognition rate of our system is about 75% (80 - 85% in noisy - clear conditions).

Více informací:

NOUZA, J., NEJEDLOVÁ, D., ŽĎÁNSKÝ, J., KOLORENČ, J.: Very Large Vocabulary Speech Recognition System for Automatic Transcription of Czech Broadcast. In: Proc. of ICSLP 2004, October 2004, Jeju Island, Korea, pp. 409-412, ISSN 1225-441x
ŽĎÁNSKÝ, J., DAVID, P., NOUZA, J.: An Improved Preprocessor for the Automatic Transcription of Broadcast News Audio Stream. In: Proc. of ICSLP 2004, October 2004, Jeju Island, Korea, pp. 1065-1068, ISSN 1225-441x

<<< Back