For example,Бобцов

QUICK SEARCH METHOD FOR NODES OF A SEMANTIC NETWORK BY EXACT WORD FORMS MATCHING

Annotation

Development and usage of ontologies is an important part of modern text analysis application. When huge amount of text is analyzed, the lookup time in ontology becomes critical bottleneck. An approach to creation of search prefix tree of wordforms’ parts called x-gram is proposed. Division of the wordform into 3- and 4-gramms as well as phonetic and morphological syllable for Russian language is used. Every x-gram is represented by numerical index that allows its storage in the plain array. Resulting arrays are very sparse, so approach uses compactification to “insert” one array into another. When looking for a word, it is split into x-grams, the index for every x-gram is computed, consequent lookup is performed in constructed arrays, where each array corresponds to a single level of prefix tree. The developed program demonstrates the advantage of 36—50 % over Google dense hash-map in seek time and 12 % over Google sparse hash-map in memory consumption for set of Russian wordforms extracted from wellknown grammatical dictionary and Russian National Corpus. This approach is well suited for dictionary search in rarely changing wordform sets, such as ontology based on Russian Wiktionary.

Keywords

Articles in current issue