AUTOMATIC SUMMARIZATION OF WEB FORUMS AS SOURCES OF PROFESSIONALLY SIGNIFICANT INFORMATION
Annotation
Subject of Research.The competitive advantage of a modern specialist is the widest possible coverage of informationsources useful from the point of view of obtaining and acquisition of relevant professionally significant information. Among these sources professional web forums occupy a significant place. The paperconsiders the problem of automaticforum text summarization, i.e. identification ofthose fragments that contain professionally relevant information. Method.The research is based on statistical analysis of texts of forums by means of machine learning. Six web forums were selected for research considering aspects of technologies of various subject domains as their subject-matter. The marking of forums was carried out by an expert way. Using various methods of machine learning the models were designed reflecting functional communication between the estimated characteristics of PSI extraction quality and signs of posts. The cumulative NDCG metrics and its dispersion were used for an assessment of quality of models.Main Results. We have shown that an important role in an assessment of PSI extraction efficiency is played by requestcontext. The contexts of requestshave been selected,characteristic of PSI extraction, reflecting various interpretations of information needs of users, designated by terms relevance and informational content. The scales for their estimates have been designed corresponding to worldwide approaches. We have experimentally confirmed that results of the summarization of forums carried out by experts manually significantly depend on requestcontext. We have shown that in the general assessment of PSI extraction efficiency relevance is rather well described by a linear combination of features, and the informational content assessment already requires their nonlinear combination. At the same time at a relevance assessment the leading role is played by the features connected with keywords, and at an informational content assessment characteristics of the post text in general come to the fore, and also the features connected with structure of a thread as the text and the social graph. We have shown that efficiency of extraction of informative posts poorly depends on a way of keywords assignment while such dependence is essential to extraction of relevant posts. The way of keywords extraction, the most effective for real appendices has been revealed. We have shown that at extraction of relevant posts linear methods are better in efficiency in comparison with nonlinear, and the LDA model is intermediate; at the same time at extraction of informative posts linear and nonlinear methods are identical by efficiency, and the LDA model considerably concedes to both of them. We have proposed substantial model explaining the received results. Practical Relevance. The obtained results can provide background for creation of new and adequate application of the existing algorithms of web forums summarization that will allow reducing significantly user’s time and resource expenditure by receiving and studying the last minute professionally significant information.
Keywords
Постоянный URL
Articles in current issue
- ANALYSIS OF MULTIMODAL FUSION TECHNIQUES FOR AUDIO-VISUAL SPEECH RECOGNITION
- ANALOG-TO-DIGITAL CONVERSION OF SIGNALS WITH ANGULAR MANIPULATION FOR SOFTWARE DEFINED RADIO SYSTEMS
- QUANTUM-MECHANICAL MODELING OF SPATIAL AND BAND STRUCTURE OF Y3AL5O12 SCINTILLATION CRYSTAL
- STUDY OF INK LAYER BY METHOD OF ATTENUATED TOTAL REFLECTANCE SPECTROSCOPY
- RESEARCH OF THE ENTRANCE ANGLE EFFECT ON THE REFLECTANCE SPECTRA OF THE STAINLESS STEEL SURFACE OXIDIZED BY PULSED LASER RADIATION
- FEATURES OF MULTIPLEXED HOLOGRAMS RECORDING IN PHOTO-THERMO-REFRACTIVE GLASS
- SCALE FACTOR DETERMINATION METHOD OF ELECTRO-OPTICAL MODULATOR IN FIBER-OPTIC GYROSCOPE
- STUDY OF THE EFFECT OF ENDFACES POLISHING ANGLE FOR ANISOTROPIC WAVEGUIDES ON STATE CONVERSION OF LIGHT POLARIZATION
- SOLUTION OF SIGNAL UNCERTAINTY PROBLEM AT ANALYTICAL DESIGN OF CONSECUTIVE COMPENSATOR IN PIEZO ACTUATOR CONTROL
- ADAPTIVE SELECTION OF AUXILIARY OBJECTIVES IN MULTIOBJECTIVE EVOLUTIONARY ALGORITHMS
- AVAILABILITY RESEARCH OF REMOTE DEVICES FOR WIRELESS NETWORKS
- HIERARCHICAL ADAPTIVE ROOD PATTERN SEARCH FOR MOTION ESTIMATION AT VIDEO SEQUENCE ANALYSIS
- AUTHENTICATION ALGORITHM FOR PARTICIPANTS OF INFORMATION INTEROPERABILITY IN PROCESS OF OPERATING SYSTEM REMOTE LOADING ON THIN CLIENT
- GRAPH-BASED POST INCIDENT INTERNAL AUDIT METHOD OF COMPUTER EQUIPMENT
- ENVIRONMENTALLY FRIENDLY METHOD OF GASEOUS FUEL COMBUSTION WITH THE USE OF QUASI-OPTICAL MICROWAVE
- FINITE MARKOV CHAINS IN THE MODEL REPRESENTATION OF THE HUMAN OPERATOR ACTIVITY IN QUASI-FUNCTIONAL ENVIRONMENT
- EVALUATION OF ERRORS IN PARAMETERS DETERMINATION FOR THE EARTH HIGHLY ANOMALOUS GRAVITY FIELD
- MATHEMATICAL MODEL OF RR-TYPE MICROMECHANICAL GYRO CAPACITIVE COMB-TYPE SENSORS WITH ACCOUNT FOR VIBRATIONS
- NUMERICAL SIMULATION OF SHOCK WAVE REFRACTION ON INCLINED CONTACT DISCONTINUITY
- METHOD OF EQUIPMENT GRAPHIC REPRESENTATION IN THE PROCESS OF PREPRODUCTION ENGINEERING
- IDENTIFICATION PROPERTIES ENHANCEMENT ALGORITHM FOR PROBLEMS OF PARAMETERS ESTIMATION OF LINEAR REGRESSION MODEL
- EVALUATION OF DISTRIBUTION HISTOGRAMS FOR INCREMENT OF CHROMATICITY COORDINATES IN DISPLAY TECHNOLOGIES
- CONDUCTOMETRY BIOTESTING AS APPLIED TO VALUATION OF THE PRO- AND ANTIBACTERIAL PROPERTIES OF CATOLITES AND ANOLITES
- ON UNIFORMITY OF RASTER ILLUMINATION UNDER LASER SCANNING