RuPersonaChat: a dialog corpus for personalizing conversational agents
Annotation
Personalization is one of the keyways to improve the performance of conversational agents. It improves the quality of user interaction with a conversational agent and increases user satisfaction by increasing the consistency and specificity of responses. The dialogue with the agent becomes more consistent, the inconsistency of responses is reduced, and the responses become more specific and interesting. Training and testing personalized conversational agents requires specific datasets containing facts about a persona and texts of persona’s dialogues where replicas use those facts. There are several datasets in English and Chinese containing an average of five facts about a persona where the dialogues are composed by crowdsourcing users who repeatedly imitate different personas. This paper proposes a methodology for collecting an original dataset containing an extended set of facts about a persona and natural dialogues between personas. The new RuPersonaChat dataset is based on three different recording scenarios: an interview, a short conversation, and a long conversation. This is the first dataset for dialogue agent personalization collected which includes both natural dialogues and extended persona’s descriptions. Additionally, in the dataset, the persona’s replicas are annotated with the facts about the persona from which they are generated. The methodology for collecting an original corpus of test data proposed in this paper allows for testing language models for various tasks within the framework of personalized dialogue agent development. The collected dataset includes 139 dialogues and 2608 replicas. This dataset was used to test answer and question generation models and the best results were obtained using the Gpt3-large model (perplexity is equal to 15.7). The dataset can be used to test the personalized dialogue agents’ ability to talk about themselves to the interlocutor, to communicate with the interlocutor utilizing phatic speech and taking into account the extended context when communicating with the user.
Keywords
Постоянный URL
Articles in current issue
- Selection of parameters of optoelectronic systems for monitoring the wear for steam turbine rotor blading based on the value of the total error
- Modeling and analysis of fractal transformation of distorted images of the Earth’s surface obtained by optoelectronic surveillance systems
- Fast labeling pipeline approach for a huge aerial sensed dataset
- Adaptive suboptimal control problem and its variational solution
- Output control for a class of nonlinear systems based on dynamic linearization
- An optimized deep learning method for software defect prediction using Whale Optimization Algorithm
- Guarantee structural anomaly detection in streaming data using the RRCF model: selection of detector parameters and its stabilization under additive noise conditions
- ViSL One-shot: generating Vietnamese sign language data set
- Evaluation of probabilistic-temporal characteristics of a computer system with container virtualization
- A new method for countering evasion adversarial attacks on information systems based on artificial intelligence
- On the properties of M-estimators optimizing weighted L2-norm of the influence function
- Stability of a highly elastic rectangular plate with clamped-free edges under uniaxial compression
- Models and a deformations simulation approach using ANSYS CAD for railway wagons weighing system
- Application of lattice Boltzmann method to solution of viscous incompressible fluid dynamics problems
- From the construction of wavelets based on derivatives of the Gaussian function to the synthesis of filters with a finite impulse response
- Partition of unity method and smooth approximation
- Censoring training samples using regularization of connectivity relations of class objects
- Approach to software products development in a startup
- Modeling perceiving of recommendations provided by clinical decision support system based on predictive modeling within dental preventive screening