HOME   |    PROJECTS   |    PUBLICATIONS   |    GRANTS   |    PEOPLE   |    CONTACT   |    Czech ČESKY

Voice dictation to a computer (2004)

In 2003, we presented to the professional public a prototype of the first voice dictation system for Czech. Its limitation was that it was necessary to dictate text word for word, always with a short space between words. On the other hand, the system worked with a dictionary containing 400,000 of the most common words and word forms, which is almost 99% of the entire vocabulary of the Czech language. The system also enabled voice-controlled text formatting and editing of misrecognized words. In 2004, this system was further expanded, especially in terms of vocabulary scope (600,000 words), speed and success of correct recognition, which now stands at around 90-93%.

In 2004, a functional prototype of the system was also introduced, enabling dictation of the text in fluent speech, ie in whole sentences. Because continuous speech recognition is a much more difficult and computationally demanding task, current computer technology so far allows the system to work with a dictionary of up to 100,000 words (on a computer with a 2.5 GHz processor). Recognition success is around 80-90% of correctly recognized words (depending on the type of text). Recognition is supported by a complex acoustic and language model, for the creation of which about 40 hours of speech recordings from about 300 people were used, as well as a large number of electronic texts on which the optimal composition of the dictionary and relationships between individual words were sought.

Both systems can be used immediately by any person. It is also possible for an individual to make an adaptation (based on several tens of words or sentences), after which the success of the system will be further improved.

Example of continuous dictation

diktat.jpg

Article from MF Dnes 29.10.2004 - Klaus: Let's face the past. The system worked with a 73000 word dictionary on a 2.6 GHz Pentium PC (1 s response). The system also recorded individual sentences in acoustic form:

[first sentence] [second sentence] [third sentence] [fourth sentence] [fifth sentence] [sixth sentence] [seventh sentence] [eighth sentence]

<<< Back