For example,Бобцов

ALGORITHM OF AUTOMATIC SELECTION OF COLLOCATIONS FROM THE TEXT

Annotation

To improve the accuracy of the associative search system, an algorithm for automatic selection of collocations from the corpus of natural language texts is proposed. The developed algorithm is intended for additive estimation of bigrams (pairs of elements) of the text on the basis of statistical approach and selec-tion of the most relevant bigrams with the use of Zipf distribution. Methods of extracting collocations are analyzed on the example of a random corpus of texts obtained from the Internet on the base of such asso-ciative measures as the frequency of occurrence of bigrams in the text - t-test, MI and χ2, using a gram-matical filter, with removal of stop words and subsequent evaluation of these measures. The application of the additive estimation method in the construction of Zipf distribution makes it possible to determine the ar-ea of correct collocations, which leads to a decrease in the number of errors in the obtained collocation lists.

Keywords

Articles in current issue