For example,Бобцов



 The paper is devoted to the cross-domain web author attribution (identification), where user's messages are obtained from several sources (web-sites). We focused on the problem of one web-site user identification by his messages from another web-site. We found that there is a stylistic difference between the texts of messages created by one user on different web-sites. The possibility of a single feature space forming for texts received from various sources was determined providing sufficient accuracy of linguistic identification. Two subtasks were studied: 1) mixed sources – training and test datasets include messages from mixed sources (web-sites); 2) separated sources –  the text messages sources of the training and test datasets do not intersect; training dataset includes texts from one source, test dataset includes texts from another.The experiment results showed that identification accuracy in mixed sources task is 0.82. The accuracy in separated sources task is 0.74. It is concluded that there is a stylistic difference between texts created by one user, but on the various web-sites. But at the same time, it is possible to form a single feature space for text messages received from various web-sites, ensuring sufficient identification accuracy.


Articles in current issue