Homograph recognition algorithm based on Euclidean metric
Annotation
The problem of resolving the uncertainties associated with homonymy for the Chechen language has become especially relevant after the creation of speech synthesis systems. The main disadvantage of speech synthesizers in the Chechen language are errors in reading homograph words that differ in the length / brevity of vowels — the longitude of such sounds is not displayed in any way when writing. The reproduction of diphthongs, which are indicated on the letter in the same way as monophthongs close to them in sound, causes problems. To improve the quality of synthesized speech in the Chechen language, an automatic homograph recognition program is needed. To solve this problem, the article considers the task of eliminating the ambiguity of the meaning of the words WSD (Word Sense Disambiguation). Algorithmic (supervised) methods based on a pre-marked database have been selected for the Chechen language. These methods are the most common solutions for eliminating the ambiguity of the meaning of words. The implementation of such methods is possible in the presence of large marked-up corpora that are inaccessible to most languages of the world including Chechen. The Chechen language belongs to low-resource languages for which the optimal approach from the point of view of saving labor and time resources is a semi-controlled hybrid method of homograph recognition based on the use of algorithmic and statistical methods. The algorithm created by the authors for recognizing homographs by six adjacent words in a sentence is presented. The method is implemented as a program. Preliminary preparation of the initial data for the operation of the algorithm includes marking of proposals by the values of homographs performed “manually”. The results of the program were evaluated using generally recognized accuracy metrics and amounted to F1 — 39 %, Accuracy — 45 %. A comparative analysis of the data obtained with the results of other methods and models showed that the accuracy of the algorithm presented in this article is closest to the results of the accuracy of algorithms based on the Lesk method. Using Lesk method for English, the results of F1 accuracy were obtained — 41.1 % (simple Lesk) and 51.1 % (extended Lesk). Methods using neural network algorithms provide higher WSD accuracy rates for most languages; however, their implementation requires large data bodies, which is not always available for low-resource languages, including Chechen.
Keywords
Постоянный URL
Articles in current issue
- Structural and spectral properties of YAG:Nd, YAG:Ce and YAG:Yb nanocrystalline powders synthesized via modified Pechini method
- Computational prediction in the problem of stereo image identification
- Comparison of application results of two speckle methods for study multi-cycle fatigue of structural steel
- Laser-induced thermal effect on the electrical characteristics of photosensitive PbSe films
- An improved performance of RetinaNet model for hand-gun detection in custom dataset and real time surveillance video
- Solving the problem of preliminary partitioning of heterogeneous data into classes in conditions of limited volume
- Correction of single error bursts beyond the code correction capability using information sets
- A novel strategic trajectory-based protocol for enhancing efficiency in wireless sensor networks
- Automation of complex text CAPTCHA recognition using conditional generative adversarial networks
- Deep attention based Proto-oncogene prediction and Oncogene transition possibility detection using moments and position based amino acid features
- A method of storing vector data in compressed form using clustering
- Monocular depth estimation for 2D mapping of simulated environments
- Segmentation of muscle tissue in computed tomography images at the level of the L3 vertebra
- Providing operating modes for Coriolis vibration gyroscopes with low-Q resonators
- Collection and processing of environmental information in oil and gas production areas and solving other applied problems using active search methods (Review article)
- Using machine learning technologies to solve the problem of classifying infrasound background monitoring signals
- Study of the influence of the optical fiber output end shape on hydroacoustic processes in a liquid stimulated by microsecond pulses of Yb,Er:Glass laser radiation