NAKI - Historical CRo Archive

Speech-To-Text Technology to Transcribe and Disclose 100,000+ Hours of Bilingual Documents from Historical Czech and Czechoslovak Radio Archive

The main goal of this project was to develop a complex platform that can transcribe, index and make searchable the historical archive of Czech and Czechoslovak Radio. The archive covers 90 years of public broadcasting and contains hundreds of thousands audio documents. The developed modular platform employs our LVCSR system that has to cope with 2 related languages: Czech and Slovak. Furthermore, it must deal with audio files of varying quality (e.g. recordings originally stored on matrices or tapes, data passed through analog and digital telephone lines, speech recorded during parliament or court sessions, etc.) The system includes speaker and language identification modules, a narrow-band signal detector, a music/song detector, and several other components to enhance transcription accuracy and provide support for multi-optional search. We evaluate the performance on broadcast news test sets grouped according to decades. We show that after acoustic and language model adaptation WER values are in range 8-14% and do not differ much since 1960s to present. We report also results achieved on other types of documents (e.g. talk shows, political debates, public speeches, etc), where the WER is higher but still acceptable for most search tasks.

more information:

2014

Nouza, J., Červa, P., Žďánský, J., Blavka, K., Boháč, M., Silovský, J., Chaloupka, J., Kuchařová, M., Šeps, L., Málek, J., Rott, M.: Speech-To-Text Technology to Transcribe and Disclose 100,000+ Hours of Bilingual Documents from Historical Czech and Czechoslovak Radio Archive, In: Proceedings of the 15th Annual Conference of the International Speech Communication Association (INTERSPEECH 2014), Singapore, pp. 964-968, ISSN 2308-457X, 2014
Chaloupka, J., Nouza, J., Málek, J., Silovský, J.: Phone Speech Detection and Recognition in the Task of Historical Radio Broadcast Transcription, In: Proc. of Telecommunications and Signal Processing (TSP) conference, Berlin, Germany, pp. 433 – 436, ISBN: 978-80-214-4983-1, ISSN 1805-5435, 2014
Silovský, J., Nouza, J., Kuchařová, M.: Search for speaker identity in historical oral archives, In: An International Journal Multimedia Tools and Applications, pp. 1-20, ISSN 1380-7501, July 2014, DOI 10.1007/s11042-014-2067-2
Boháč, M., Blavka, K.: Using Suprasegmental Information in Recognized Speech Punctuation Completion, In 17th International Conference, TSD 2014, Springer-Verlag Berlin Heidelberg, pp. 555-562, ISSN 0302-9743, ISBN 978-331910815-5, DOI: 10.1007/978-3-319-10816-2_50, 2014
Škodová, S., Kuchařová, M.: Mluvené slovo v pořadech Českého rozhlasu. SALi. 2014, roč. 5, č.1, s. 141–143. ISSN 1804-3240. Recenzovaný časopis Studie z aplikované lingvistiky (SALi)
Kuchařová M.; Škodová, S.; Šeps, L.; Boháč, M. :Study on Phrases Used for Semi-Automatic Text-Based Speakers’ Names Extraction in the Czech Radio Broadcasts News. Text, Speech and Dialogue Lecture Notes in Computer Science. In Lecture Notes in Computer Science, Springer Verlag Berlin, Volume 8655, 2014, pp 416-423. ISBN 978-3-319-10815-5. Online ISBN 978-3-319-10816-2. Series ISSN 0302-9743. SCOPUS ISI.
Škodová, S. Použití interpunkce v automatických přepisech mluveného slova. Didaktické studie. Monotematické číslo Syntax v teorii a praxi jazykového vyučování. 2013, roč. 5, č. 2, s. 99–111. ISSN 1804-1221. Recenzovaný časopis
Pacovská, J. Диалог на языке тела и психосоматическа фразеология. In Journal of Psycholinguistics, Institute of linguistics of Russian academy of sciences / Moscow institute of linguistics : Moscow, 2 (18), 2013, pp 46-53. ISSN 2077-5911. (В перечене российских рецензируемых научных журналов ВАК No 641 Подписной индекс Роспечати 37152

2013

Nouza, J., Cerva, P., Silovsky, J.: Adding Controlled Amount of Noise to Improve Recognition of Compressed and Spectrally Distorted Speech, In International Conference on Acoustics, Speech, and Signal Processing Mobile App - ICASSP 2013, Vencouver, Canada, pp. 8046-8050, ISBN 978-1-4799-0356-6,2013
Nouza, J., Cerva, P., Silovsky, J.: Dealing with Bilingualism in Automatic Transcription of Historical Archive of Czech Radio, In ICIAP 2013 - International Workshop on Multimedia for Cultural Heritage MM4CH, Springer-Verlag Berlin Heidelber, Italy, pp. 238-246, ISBN 978-3-642-41189-2, 2013
Chaloupka, J., Nouza, J., Kucharova, M.: Using Different Types of Multimedia Resources to Train System for Automatic Transcription of Czech Historical Oral Archives, In ICIAP 2013 - International Workshop on Multimedia for Cultural Heritage MM4CH, Springer-Verlag Berlin Heidelber, Italy, pp. 228-237, ISBN 978-3-642-41189-2, 2013
Chaloupka, J., Nouza, J., Cerva, P., Malek, J.: Downdating lexicon and language model for automatic transcription of Czech historical spoken documents, In 16th International Conference, TSD 2013, Springer-Verlag Berlin Heidelberg, pp. 201-208, ISSN 0302-9743, 2013
Kucharova, M., Skodova, S., Seps, L., Labus, V., Nouza, J., Bohac, M.: On the Quantitative and Qualitative Speech Changes of the Czech Radio Broadcast News within Years 1969-2005, In 16th International Conference, TSD 2013, Springer-Verlag Berlin Heidelberg, pp. 360-368, ISSN 0302-9743, 2013
Lábus, V.: Nisa, or Nysa? Acta Onomastica LIII, pp. 207-218, ISSN 1211-4413, 2013

2012

Nouza, J., Blavka, K., Bohac, M., Cerva, P., Zdansky, J., Silovsky, J. and Prazak, J.: Voice Technology to Enable Sophisticated Access to Historical Audio Archive of the Czech Radio. In: Proc. of Multimedia for Cultural Heritage, vol. 247, Springer, Berlin Heidelberg, ISBN 978-3-642-27977-5, ISSN 1865-0929, pp. 27-38, 2012
Nouza, J., Blavka, K., Cerva, P., Zdansky, J., Silovsky, J., Bohac, M. and Prazak, J.: Making Czech Historical Radio Archive Accessible and Searchable for Wide Public. In: Journal of Multimedia, vol. 7, no. 2, Academy Publisher, pp. 159 – 169, ISSN 1796-2048, 2012
Nouza, J., Blavka, K., Žďánský, J., Červa, P, Silovský, J, Boháč, M., Chaloupka, J., Kuchařová, M., Šeps, L.: Large-Scale Processing, Indexing and Search System for Czech Audio-Visual Cultural Heritage Archives. In: Proc. of IEEE conf. on Multimedia Signal Processing (MMSP), Banff, Canada, pp. 337-342, ISBN 978-146734572-9, 2012
Boháč, M., Blavka, K., Kuchařová, M., Škodová, S. : Post-processing of the Recognized Speech for Web Presentation of Large Audio Archive, In: Proc. of Telecommunications and Signal Processing (TSP) conference, Prague, pp. 441 – 445, ISBN: 978-1-4673-1117-5, 2012
Boháč, M., Nouza, J., Blavka K.: Investigation on Most Frequent Errors in Large-Scale Speech Recognition Applications. In: Proc. of Text, Speech and Dialogue (TSD). Springer Verlag Berlin Heidelberg, Series LNCS 7499, pp. 520-527, ISBN 978-3-642-32789-6, ISSN 0302-9743, 2012
Silovský, J., Červa, P., Žďánský, J., Nouza J.: Study on Integration of Speaker Diarization with Speaker Adaptive Speech Recognition for Broadcast Transcription. In: Proc. of Interspeech 2012, Portland, USA, 2012
Škodová, S., Kuchařová, M., Šeps, L.: Discretion of Speech Units for the Text Post-processing Phase of Automatic Transcription (in the Czech Language), In: Proc. of Text, Speech and Dialogue (TSD). Springer Verlag Berlin Heidelberg, Series LNCS 7499, pp. 446-455, ISBN 978-3-642-32789-6, ISSN 0302-9743, 2012
Lábus, V.: Atyp v cihle aneb O jednom progresivním způsobu neologizace. In: Naše řeč, no. 4., pp. 187-197, ISSN 0027-8203, 2012

2011

Bohac, M., Blavka, K.: Automatic segmentation and annotation of audio archive documents, In proc. of. 10th IEEE International workshop on Electronics, Control, Measurement and Signals (ECMS 2011), June 1-3 2011, Liberec, Czech Republic, pp. 61 - 66, ISBN 978-1-61284-395-7, 2011
Nouza, J., Blavka, K., Bohac, M., Cerva, P., Zdansky, J., Silovsky, J., Prazak, J.: Voice technology to enable sophisticated access to historical Czech Radio audio archive, In proc. of International Workshop on Multimedia for Cultural Heritage (MM4CH 2011), Springer-Verlag, volume CCIS 247, May 3 2011, Modena, Italy, pp.27–38, 2011
Cerva, P., Palecek, K., Silovsky, J., Nouza, J.: Using Unsupervised Feature-Based Speaker Adaptation for Improved Transcription of Spoken Archives, In proc. of the 12th Annual Conference of the International Speech Communication Association (Interspeech 2011), Florence, Italy 2011, pp. 2565 - 2568, ISSN 1990-9772, 2011
Nouza, J., Blavka, K., Bohac, Kucharova, M, Zdansky, J., Seps, L., Prazak J.: System for Transcribing and Accessing Historical Archive of Czech Radio. In Proc. of 5th Language & Technology conference (LTC 2011), Poznan, Poland, pp. 585, November 2011

<<< Back