![Scientific and Technical Journal of Information Technologies, Mechanics and Optics](/images/mag-ntv.png)
STATISTICAL METHOD OF TERM EXTRACTION FROM CHINESE TEXTS WITHOUT PRELIMINARY SEGMENTATION OF PHRASES
![Scientific and Technical Journal of Information Technologies, Mechanics and Optics](/images/mag-ntv.png)
Annotation
Subject of Research. The paper considers the problem of automatic term extraction from natural language texts (text mining). One of the first-priority problems in this topic is creation of domain thesaurus. Some well approved methods of terms extraction exist for alphabetic languages, for instance, the latent semantic analysis. Applying of these methods for hieroglyphic texts is challenged because of missing blanks between words. The sentences segmentation task in hieroglyphic languages is usually solved by dictionaries or by statistical methods, particularly, by means of a mutual information approach. Methods of sentences segmentation, as methods of terms extraction, separately, do not reach 100 percent accuracy and fullness, and their consistent applying just increases a number of errors. The aim of this work is improving the fullness and accuracy of domain terms extraction from hieroglyphic texts. Method.The proposed method lies in detection of repeating two, three or four symbol sequences in each sentence and correlation of occurrence frequencies for these sequences in domain and contrast documents collection. According to research carried out it was stated that a trivial ranging of all possible symbol sequences enables to extract satisfactory only frequently using terms. Filtering of symbol sequences by their ratio of frequencies in the domain and contrast collection gave the possibility to extract reliably frequently used terms and find satisfactory rare domain terms. Some results of terms extraction for the “Network technologies” domain from a Chinese text are presented in this paper. A set of articles from the newspaper “Rénmín Rìbào” was used as a contrast collection and some satisfactory results were obtained.
Keywords
Постоянный URL
Articles in current issue
- SUPERCOMPUTER SIMULATION OF CRITICAL PHENOMENA IN COMPLEX SOCIAL SYSTEMS
- COMPRESSION OF FEW-CYCLE OPTICAL PULSES AND UNIPOLAR PULSE GENERATION DUE TO COHERENT INTERACTION WITH NONLINEAR RESONANT MEDIUM
- LIDAR COMBINED SCANNING UNIT
- INKJET PRINTING OF HIGH REFRACTIVE STRUCTURES BASED ON TiO2 SOL
- EFFECT OF OPTICAL FIBER HYDROGEN LOADING ON THE INSCRIPTION EFFICIENCY OF CHIRPED BRAGG GRATINGS BY MEANS OF KrF EXCIMER LASER RADIATION
- ALGORITHM OF MULTIHARMONIC DISTURBANCE COMPENSATION IN LINEAR SYSTEMS WITH ARBITRARY DELAY: INTERNAL MODEL APPROACH
- LUMINESCENT PROPERTIES OF SILVER CLUSTERS FORMED BY ION EXCHANGE METHOD IN PHOTO-THERMO-REFRACTIVE GLASS
- DISTRIBUTION OF DISLOCATIONS IN AlN CRYSTALS GROWN ON EVAPORATING SiC SUBSTRATES
- SYNTHESIS OF THICK GALLIUM NITRIDE LAYERS BY METHOD OF MULTI-STAGE GROWTH ON SUBSTRATES WITH COLUMN STRUCTURE
- FEATURES OF MEASURING IN LIQUID MEDIA BY ATOMIC FORCE MICROSCOPY
- GAUSSIAN MIXTURE MODELS FOR ADAPTATION OF DEEP NEURAL NETWORK ACOUSTIC MODELS IN AUTOMATIC SPEECH RECOGNITION SYSTEMS
- FUZZY MAPPING IN DATA SONIFICATION SYSTEM OF WIRELESS SENSOR NETWORK
- INTEGRATED INFORMATION SYSTEM ARCHITECTURE PROVIDING BEHAVIORAL FEATURE
- AUTOMATING SELECTION OF OPTIMAL PACKET SCHEDULING DURING VOIP-TRAFFIC TRANSMISSION
- DYNAMIC AUTHORIZATION BASED ON THE HISTORY OF EVENTS
- FUNCTIONAL SURFACE MICROGEOMETRY PROVIDING THE DESIRED PERFORMANCE OF AN AIRCRAFT VIBRATION SENSOR
- EXPENSES FORECASTING MODEL IN UNIVERSITY PROJECTS PLANNING
- VIRTUAL CHANNEL SIMULATION MODEL
- ESTIMATION TECHNIQUE OF MECHANICAL PRODUCTS QUALITY LEVEL IN DESIGN PROCESS
- ANTIFUNGAL ACTIVITY OF ZnO, SiO2, Au AND Ag ACRYLIC NANOCOMPOSITES
- REDUNDANCY OF TRANSMISSIONS OVER THE AGGREGATED CHANNELS DIVIDED INTO GROUPS
- THE EFFECT OF TOPOLOGY ON TEMPORAL NETWORK DYNAMICS
- PREDICTION OF FLU EPIDEMIC PEAKS IN ST. PETERSBURG THROUGH POPULATION-BASED MATHEMATICAL MODELS