FORENSIC LINGUISTICS: AUTOMATIC WEB AUTHOR IDENTIFICATION
Annotation
Internet is anonymous, this allows posting under a false name, on behalf of others or simply anonymous. Thus, individuals, criminal or terrorist organizations can use Internet for criminal purposes; they hide their identity to avoid the prosecuting. Existing approaches and algorithms for author identification of web-posts on Russian language are not effective. The development of proven methods, technics and tools for author identification is extremely important and challenging task. In this work the algorithm and software for authorship identification of web-posts was developed. During the study the effectiveness of several classification and feature selection algorithms were tested. The algorithm includes some important steps: 1) Feature extraction; 2) Features discretization; 3) Feature selection with the most effective Relief-f algorithm (to find the best feature set with the most discriminating power for each set of candidate authors and maximize accuracy of author identification); 4) Author identification on model based on Random Forest algorithm. Random Forest and Relief-f algorithms are used to identify the author of a short text on Russian language for the first time. The important step of author attribution is data preprocessing - discretization of continuous features; earlier it was not applied to improve the efficiency of author identification. The software outputs top q authors with maximum probabilities of authorship. This approach is helpful for manual analysis in forensic linguistics, when developed tool is used to narrow the set of candidate authors. For experiments on 10 candidate authors, real author appeared in to top 3 in 90.02% cases, on first place real author appeared in 70.5% of cases.
Keywords
Постоянный URL
Articles in current issue
- TRENDS IN THE DEVELOPMENT OF DETONATION ENGINES FOR HIGH-SPEED AEROSPACE AIRCRAFTS AND THE PROBLEM OF TRIPLE CONFIGURATIONS OF SHOCK WAVES. Part II - Research of counterpropagating shock waves and triple shock wave configurations
- CONCENTRIC LENS SYSTEMS
- EVALUATION OF CHROMATICITY COORDINATES SHIFT FOR IMAGE DISPLAYED ON LIQUID CRYSTAL PANELS WITH VARIOUS PROPERTIES ON COLOR REPRODUCTION
- DESIGNING FEATURES OF POWER OPTICAL UNITS FOR TECHNOLOGICAL EQUIPMENT
- APPLICATION OF THE LONGITUDINAL CHROMATIC ABERRATION EFFECT FOR DISTANCES MEASUREMENT ON THE BASIS OF A SINGLE PHOTO
- ANALYSIS OF CAMOUFLAGE COVER SPECTRAL CHARACTERISTICS BY IMAGING SPECTROMETER
- RESEARCH OF THERMO-OPTICAL INHOMOGENEITIES IN Yb-Er GLASS AT DIODE PUMPING
- OPTICAL PROPERTIES OF CARBAMIDE AQUEOUS SOLUTIONS
- FIRE-RESISTANCE PROPERTIES RESEARCH OF “WATER GLASS - GRAPHITE MICROPARTICLES” COMPOSITE MATERIAL
- SEMI-AUTOMATIC SPEAKER VERIFICATION SYSTEM
- COMPUTATIONALLY EFFICIENT PRIVATE INFORMATION RETRIEVAL PROTOCOL
- DEVELOPMENT AND TESTING OF ERRORS CORRECTION ALGORITHM IN ELECTRONIC DESIGN AUTOMATION
- EFFICIENCY OF REDUNDANT QUERY EXECUTION IN MULTI-CHANNEL SERVICE SYSTEMS
- INVESTIGATION OF NEURAL NETWORK ALGORITHM FOR DETECTION OF NETWORK HOST ANOMALIES IN THE AUTOMATED SEARCH FOR XSS VULNERABILITIES AND SQL INJECTIONS
- EVALUATION OF SEMANTIC SIMILARITY FOR SENTENCES IN NATURAL LANGUAGE BY MATHEMATICAL STATISTICS METHODS
- INVESTIGATION OF INFLUENCE OF ENCODING FUNCTION COMPLEXITY ON DISTRIBUTION OF ERROR MASKING PROBABILITY
- IMPROVEMENT OF REFERENCE BASE IN THE FIELD OF METROLOGICAL ASSURANCE OF THREAD JOINTS
- MICROCLIMATE CARTOGRAPHY USING DATA FROM THE EARTH REMOTE SENSING AND SIMULATION OF THERMAL FIELDS
- NUMERICAL SIMULATION OF SHOCK WAVE DIFFRACTION OVER RIGHT ANGLE ON UNSTRUCTURED MESHES
- MODELING OF RAIL BAR DYNAMIC GAP AT ITS BREAK FOR DIFFERENT STIFFNESS VALUES OF RAIL BASE
- ALGEBRAIC PROPERTIES OF MATRIX COMPONENTS OF CONTROL PLANT MODELS IN PLACEMENT STATE MATRIX MODES OF SYSTEM ALGORITHMS FOR DESIGNED SYSTEM
- ANTIREFLECTION MULTILAYER COATINGS WITH THIN METAL LAYERS
- TWO-STEP ALGORITHM OF TRAINING INITIALIZATION FOR ACOUSTIC MODELS BASED ON DEEP NEURAL NETWORKS
- ON THE POSSIBILITY OF BURNING ACCELERATION IN THE COMBUSTION CHAMBERS OF ADVANCED JET ENGINES BY DEEPLY SUBCRITICAL MICROWAVE DISCHARGE