For example,Бобцов

COMPARATIVE STUDY OF NEURAL NETWORK ARCHITECTURES FOR INTEGRATED SPEECH RECOGNITION SYSTEM

Annotation

The problem of improving the architecture of an integral neural-network model of Russian speech recognition is discussed. The considered model is created by combining the codec model with the attention mechanism, and the model based on the connectional temporal classification. Application of such neural network architectures as Highway Network, residual connections, dense connections, in the end-to-end model is studied. In addition, the use of the gumbel-softmax function instead of the softmax activation function during decoding is investigated. The models are trained using transfer learning method with English as non-target language, and then trained on a small corpus of continuous Russian speech with duration of 60 hours. The developed models are reported to demonstrate a higher accuracy of speech recognition in comparison with the basic end-to-end model. The results of experiments on recognition of continuous Russian speech are presented: the best result is 10.8% in terms of the number of incorrectly recognized characters and 29.1% in terms of the number of incorrectly recognized words.

Keywords

Articles in current issue