![Scientific and technical journal «Priborostroenie»](/images/mag-pr.png)
QUICK SEARCH METHOD FOR NODES OF A SEMANTIC NETWORK BY EXACT WORD FORMS MATCHING
![Scientific and technical journal «Priborostroenie»](/images/mag-pr.png)
Annotation
Development and usage of ontologies is an important part of modern text analysis application. When huge amount of text is analyzed, the lookup time in ontology becomes critical bottleneck. An approach to creation of search prefix tree of wordforms’ parts called x-gram is proposed. Division of the wordform into 3- and 4-gramms as well as phonetic and morphological syllable for Russian language is used. Every x-gram is represented by numerical index that allows its storage in the plain array. Resulting arrays are very sparse, so approach uses compactification to “insert” one array into another. When looking for a word, it is split into x-grams, the index for every x-gram is computed, consequent lookup is performed in constructed arrays, where each array corresponds to a single level of prefix tree. The developed program demonstrates the advantage of 36—50 % over Google dense hash-map in seek time and 12 % over Google sparse hash-map in memory consumption for set of Russian wordforms extracted from wellknown grammatical dictionary and Russian National Corpus. This approach is well suited for dictionary search in rarely changing wordform sets, such as ontology based on Russian Wiktionary.
Keywords
Постоянный URL
Articles in current issue
- INFOLOGICAL MODELING METHOD IN KNOWLEDGE ENGINEERING FOR SOLUTION OF COMPUTER-AIDED DESIGN PROBLEMS
- QUICK SEARCH METHOD FOR NODES OF A SEMANTIC NETWORK BY EXACT WORD FORMS MATCHING
- IMPLEMENTATION OF SANDBOX METHOD FOR POTENTIALLY MALICIOUS APPLICATIONS
- RESERVED SERVICE OF REQUESTS, CRITICAL TO WAITING DELAYS, IN TWO-LEVEL SYSTEMS
- ESTIMATION OF ROUTER STRUCTURAL PARAMETERS UNDER PRIORITY MANAGEMENT OF HETEROGENEOUS TRAFFIC WITH ARBITRARY DISTRIBUTION OF PACKET LENGTHS
- PROBABILITY DISTRIBUTION FOR THE TIME INTERVAL BETWEEN PACKETS IN CORPORATE COMPUTER NETWORK
- THE PROBLEM OF FALSE SPLITTING OF CPU CACHE MEMORY STRINGS IN MULTIPROCESSOR SYSTEMS
- AN APPROACH TO DESIGN OF FPGA-BASED SYSTEMS FOR STREAM DATA PROCESSING WITH CAPABILITY OF COMBINED DEBUGGING
- USING CYCLIC CORRECTIVE CODES IN RECURRENT CODE SCALES
- DESIGN OF COMPUTER MICROARCHITECTURE BASING ON PROBLEM-ORIENTED LANGUAGES
- ESTIMATING GEOMETRICAL PARAMETERS OF FLYING VEHICLE BY TRACKING KEY FEATURES OF THE VIDEO STREAM
- DESIGN OF COMPUTING PLATFORM FOR CYBER-PHYSICAL SYSTEMS