EVALUATION OF SEMANTIC SIMILARITY FOR SENTENCES IN NATURAL LANGUAGE BY MATHEMATICAL STATISTICS METHODS
Annotation
Subject of Research. The paper is focused on Wiktionary articles structural organization in the aspect of its usage as the base for semantic network. Wiktionary community references, article templates and articles markup features are analyzed. The problem of numerical estimation for semantic similarity of structural elements in Wiktionary articles is considered. Analysis of existing software for semantic similarity estimation of such elements is carried out; algorithms of their functioning are studied; their advantages and disadvantages are shown. Methods. Mathematical statistics methods were used to analyze Wiktionary articles markup features. The method of semantic similarity computing based on statistics data for compared structural elements was proposed.Main Results. We have concluded that there is no possibility for direct use of Wiktionary articles as the source for semantic network. We have proposed to find hidden similarity between article elements, and for that purpose we have developed the algorithm for calculation of confidence coefficients proving that each pair of sentences is semantically near. The research of quantitative and qualitative characteristics for the developed algorithm has shown its major performance advantage over the other existing solutions in the presence of insignificantly higher error rate. Practical Relevance. The resulting algorithm may be useful in developing tools for automatic Wiktionary articles parsing. The developed method could be used in computing of semantic similarity for short text fragments in natural language in case of algorithm performance requirements are higher than its accuracy specifications.
Keywords
Постоянный URL
Articles in current issue
- TRENDS IN THE DEVELOPMENT OF DETONATION ENGINES FOR HIGH-SPEED AEROSPACE AIRCRAFTS AND THE PROBLEM OF TRIPLE CONFIGURATIONS OF SHOCK WAVES. Part II - Research of counterpropagating shock waves and triple shock wave configurations
- CONCENTRIC LENS SYSTEMS
- EVALUATION OF CHROMATICITY COORDINATES SHIFT FOR IMAGE DISPLAYED ON LIQUID CRYSTAL PANELS WITH VARIOUS PROPERTIES ON COLOR REPRODUCTION
- DESIGNING FEATURES OF POWER OPTICAL UNITS FOR TECHNOLOGICAL EQUIPMENT
- APPLICATION OF THE LONGITUDINAL CHROMATIC ABERRATION EFFECT FOR DISTANCES MEASUREMENT ON THE BASIS OF A SINGLE PHOTO
- ANALYSIS OF CAMOUFLAGE COVER SPECTRAL CHARACTERISTICS BY IMAGING SPECTROMETER
- RESEARCH OF THERMO-OPTICAL INHOMOGENEITIES IN Yb-Er GLASS AT DIODE PUMPING
- OPTICAL PROPERTIES OF CARBAMIDE AQUEOUS SOLUTIONS
- FIRE-RESISTANCE PROPERTIES RESEARCH OF “WATER GLASS - GRAPHITE MICROPARTICLES” COMPOSITE MATERIAL
- SEMI-AUTOMATIC SPEAKER VERIFICATION SYSTEM
- COMPUTATIONALLY EFFICIENT PRIVATE INFORMATION RETRIEVAL PROTOCOL
- FORENSIC LINGUISTICS: AUTOMATIC WEB AUTHOR IDENTIFICATION
- DEVELOPMENT AND TESTING OF ERRORS CORRECTION ALGORITHM IN ELECTRONIC DESIGN AUTOMATION
- EFFICIENCY OF REDUNDANT QUERY EXECUTION IN MULTI-CHANNEL SERVICE SYSTEMS
- INVESTIGATION OF NEURAL NETWORK ALGORITHM FOR DETECTION OF NETWORK HOST ANOMALIES IN THE AUTOMATED SEARCH FOR XSS VULNERABILITIES AND SQL INJECTIONS
- INVESTIGATION OF INFLUENCE OF ENCODING FUNCTION COMPLEXITY ON DISTRIBUTION OF ERROR MASKING PROBABILITY
- IMPROVEMENT OF REFERENCE BASE IN THE FIELD OF METROLOGICAL ASSURANCE OF THREAD JOINTS
- MICROCLIMATE CARTOGRAPHY USING DATA FROM THE EARTH REMOTE SENSING AND SIMULATION OF THERMAL FIELDS
- NUMERICAL SIMULATION OF SHOCK WAVE DIFFRACTION OVER RIGHT ANGLE ON UNSTRUCTURED MESHES
- MODELING OF RAIL BAR DYNAMIC GAP AT ITS BREAK FOR DIFFERENT STIFFNESS VALUES OF RAIL BASE
- ALGEBRAIC PROPERTIES OF MATRIX COMPONENTS OF CONTROL PLANT MODELS IN PLACEMENT STATE MATRIX MODES OF SYSTEM ALGORITHMS FOR DESIGNED SYSTEM
- ANTIREFLECTION MULTILAYER COATINGS WITH THIN METAL LAYERS
- TWO-STEP ALGORITHM OF TRAINING INITIALIZATION FOR ACOUSTIC MODELS BASED ON DEEP NEURAL NETWORKS
- ON THE POSSIBILITY OF BURNING ACCELERATION IN THE COMBUSTION CHAMBERS OF ADVANCED JET ENGINES BY DEEPLY SUBCRITICAL MICROWAVE DISCHARGE