![Scientific and technical journal of information technologies, mechanics and optics](/images/mag-ntv.png)
DOCUMENT REPRESENTATION FOR CLUSTERING OF SCIENTIFIC ABSTRACTS
![Scientific and technical journal of information technologies, mechanics and optics](/images/mag-ntv.png)
Annotation
The key issue of the present paper is clustering of narrow-domain short texts, such as scientific abstracts. The work is based on the observations made when improving the performance of key phrase extraction algorithm. An extended stop-words list was used that was built automatically for the purposes of key phrase extraction and gave the possibility for a considerable quality enhancement of the phrases extracted from scientific publications. A description of the stop- words list creation procedure is given. The main objective is to investigate the possibilities to increase the performance and/or speed of clustering by the above-mentioned list of stop-words as well as information about lexeme parts of speech. In the latter case a vocabulary is applied for the document representation, which contains not all the words that occurred in the collection, but only nouns and adjectives or their sequences encountered in the documents. Two base clustering algorithms are applied: k-means and hierarchical clustering (average agglomerative method). The results show that the use of an extended stop-words list and adjective-noun document representation makes it possible to improve the performance and speed of k-means clustering. In a similar case for average agglomerative method a decline in performance quality may be observed. It is shown that the use of adjective-noun sequences for document representation lowers the clustering quality for both algorithms and can be justified only when a considerable reduction of feature space dimensionality is necessary.
Keywords
Постоянный URL
Articles in current issue
- OPTICAL CONTROL IN THE QUALITY MANAGEMENT SYSTEM OF PISTON RINGS PRODUCTION
OPTICAL SPECTRAL TECHNOLOGIES APPLICATION FOR CONTROL AND DIAGNOSTICS OF BLEACHING WORKS IN THE PULP AND PAPER INDUSTRY
- REMOTE LASER IDENTIFICATION OF THE BUILDING MATERIAL TYPE
- DYNAMIC PARAMETERS OPTIMIZATION FOR TRIGGER TYPE OPTICAL PROBE
- Genomic variants’ analysis of Escherichia coli K-12 cells resistant to phage T7 infection
- Тест
- ANALYSIS OF INTERFEROMETER DATA PROCESSING METHODS IN SPECTRAL OPTICAL COHERENCE TOMOGRAPHY ТОМОГРАФИИ
- RECONSTRUCTION OF DIGITAL FOURIER HOLOGRAM IN CASE OF NYQUIST FREQUENCY EXCESS
- SMALL DISPLACEMENTS DETERMINATION OF OBJECTS SURFACE BY DIGITAL HOLOGRAPHY METHODS
- TERAHERTZ TRANSMISSION AND REFLECTION SPECTRA OF CATARACT MODIFIED HUMAN CRYSTALLINE LENS
- FORMATION OF THE ALGORITHMIC RECURSIVE CORRECTION FOR SYSTEMATIC CODES MULTIPLE ERRORS BASED ON QUASI-SYNDROMES IN THE RATE OF HARDWARE TIME
- PHYSICAL PROCESSES RESEARCH IN THE PULSE XENON LAMP AT THE PUMP ELECTRICAL CIRCUIT OPERATION ON THE BASIS OF MODULATOR WITH THE PARTIAL DISCHARGE CONDENSER
- SYNTHESIS OF OPTIMAL ARTIFICIAL NEURAL NETWORKS BY MODIFIED GENETIC ALGORITHM
- CONTROL SYSTEM IDENTIFICATION AND ADJUSTMENT FOR THE ELECTRO DRIVE OF TELESCOPE AZIMUTHAL AXIS
- PROBLEM-ORIENTED AGENT-BASED PLATFORM OF MULTI MODEL COMPLEXES DESIGN FOR REGIONAL SAFETY MANAGEMENT SUPPORT
- SYSTEMS OF DUPLICATED COMPUTER COMPLEXES WITH REQUESTS REALLOCATION
- DE BRUIJN SEQUENCES APPLICATION FOR PSEUDO REGULAR CODE SCALES GENERATION
- DEVELOPMENT PROBLEMS OF THE ENTERPRISE IT- INFRASTRUCTURE
- KINEMATIC CONTROL OF TWO-PARAMETRICAL SCANNING AERIAL
- QUANTITATIVE DESCRIPTION OF NONLINEAR DYNAMICS IN THE POROUS ACRYLIC THIN FILM
- METHOD OF THE FREE-CONVECTION HEAT EXCHANGE CALCULATION ON SOLID SURFACES IN A WIDE TEMPERATURE RANGE
- MATHEMATICAL ESTIMATION MODELS OF INFORMATION SECURITY SYSTEM INFRASTRUCTURE AT THE ENTERPRISE
- MALWARE DETECTION METHOD BASED ON THE PROGRAM DISTRIBUTION PROCESS ANALYSIS
- PROBABILISTIC ANALYSIS MODEL FOR INFORMATION INFLUENCE
- INFORMATION PROTECTION AGAINST INTENTIONAL ELECTROMAGNETIC INFLUENCES
- INTEGRATED ENGINEERING DESIGN TECHNOLOGIES FOR POLYMERIC COMPOSITE MATERIALS
- PRODUCT SURFACE QUALITY PROVIDING ON ELECTRO-EROSION EQUIPMENT
- MANAGEMENT SCIENTIFIC PRECONDITIONS FOR INTELLECTUAL PROPERTY IN THE PROBLEMS DECISION OF INNOVATIVE DEVELOPMENT OF RUSSIA
- METHODOLOGICAL BASIS OF SUSTAINABILITY AND SUSTAINABLE DEVELOPMENT OF BUSINESS STRUCTURES AS SOCIO-ECONOMIC SYSTEMS
- RISK MANAGEMENT SYSTEM IMPROVEMENT FOR GOODS AND VEHICLES MOVEMENT THROUGH CUSTOMS BORDER OF THE RUSSIAN FEDERATION
- INNOVATION STAGES DISTRIBUTION OF VENTURE CAPITAL INVESTMENTS IN RUSSIA AND THE UNITED STATES
- PROJECT MANAGEMENT MODEL FOR UNIVERSITY MAIN COMPETENCE-ORIENTED EDUCATIONAL PROGRAMS
- TEMPERATURE DEPENDENCE OF REFRACTION INDEX FOR ETHYLENE GLYCOL AND PROPYLENE GLYCOL AQUEOUS SOLUTIONS
- SOFTWARE ARCHITECTURE FOR THE AUTOMATED WORKPLACE OF THE ONBOARD AVIATION EQUIPMENT DEVELOPER
- LIFE CYCLE “DESIGN-MANUFACTURE-OPERATION” REALIZATION FOR ONBOARD EQUIPMENT AT THE AVIATION INDUSTRY ENTERPRISES
- ON THE INFORMATION SECURITY WORKS AGAINST INTENTIONAL ELECTROMAGNETIC INFLUENCES
- CONSOLIDATION OF DIVERSE CHANNELS OF COMPUTER NETWORK
- ESTIMATION OF FUNCTIONAL SAFETY FOR THE DUPLICATED COMPUTING SYSTEMS
- FINANCIAL PLANNING SYSTEMS REVIEW IN THE MARKET OF KAZAKHSTAN
- MUSICAL COMPUTER TECHNOLOGIES IN THE MODERN PROFESSIONAL MUSIC EDUCATION