For example,Бобцов

FORENSIC LINGUISTICS: AUTOMATIC WEB AUTHOR IDENTIFICATION

Annotation

Internet is anonymous, this allows posting under a false name, on behalf of others or simply anonymous. Thus, individuals, criminal or terrorist organizations can use Internet for criminal purposes; they hide their identity to avoid the prosecuting. Existing approaches and algorithms for author identification of web-posts on Russian language are not effective. The development of proven methods, technics and tools for author identification is extremely important and challenging task. In this work the algorithm and software for authorship identification of web-posts was developed. During the study the effectiveness of several classification and feature selection algorithms were tested. The algorithm includes some important steps: 1) Feature extraction; 2) Features discretization; 3) Feature selection with the most effective Relief-f algorithm (to find the best feature set with the most discriminating power for each set of candidate authors and maximize accuracy of author identification); 4) Author identification on model based on Random Forest algorithm. Random Forest and Relief-f algorithms are used to identify the author of a short text on Russian language for the first time. The important step of author attribution is data preprocessing - discretization of continuous features; earlier it was not applied to improve the efficiency of author identification. The software outputs top q authors with maximum probabilities of authorship. This approach is helpful for manual analysis in forensic linguistics, when developed tool is used to narrow the set of candidate authors. For experiments on 10 candidate authors, real author appeared in to top 3 in 90.02% cases, on first place real author appeared in 70.5% of cases.

Keywords

Articles in current issue