CROSS-DOMAIN WEB AUTHOR IDENTIFICATION Open database of scientific publications ITMO UNIVERSITY

Journal

SCIENTIFIC AND TECHNICAL JOURNAL OF INFORMATION TECHNOLOGIES, MECHANICS AND OPTICS

Vorobeva Alisa A. , Pozvolenko Vitaliy A. , Korobitsyna Anastasiya S. , Sharafiev Azamat A.

UDK004.89

Issue:3 (115)

Annotation

The paper is devoted to the cross-domain web author attribution (identification), where user's messages are obtained from several sources (web-sites). We focused on the problem of one web-site user identification by his messages from another web-site. We found that there is a stylistic difference between the texts of messages created by one user on different web-sites. The possibility of a single feature space forming for texts received from various sources was determined providing sufficient accuracy of linguistic identification. Two subtasks were studied: 1) mixed sources – training and test datasets include messages from mixed sources (web-sites); 2) separated sources – the text messages sources of the training and test datasets do not intersect; training dataset includes texts from one source, test dataset includes texts from another.The experiment results showed that identification accuracy in mixed sources task is 0.82. The accuracy in separated sources task is 0.74. It is concluded that there is a stylistic difference between texts created by one user, but on the various web-sites. But at the same time, it is possible to form a single feature space for text messages received from various web-sites, ensuring sufficient identification accuracy.

CROSS-DOMAIN WEB AUTHOR IDENTIFICATION

SCIENTIFIC AND TECHNICAL JOURNAL OF INFORMATION TECHNOLOGIES, MECHANICS AND OPTICS

Annotation

Keywords

Постоянный URL

Articles in current issue

CROSS-DOMAIN WEB AUTHOR IDENTIFICATION

SCIENTIFIC AND TECHNICAL JOURNAL OF INFORMATION TECHNOLOGIES, MECHANICS AND OPTICS

Annotation

Keywords

Постоянный URL

Поделиться

Articles in current issue