ViSL One-shot: generating Vietnamese sign language data set
Annotation
The development of methods for automatic recognition of objects in a video stream, in particular, recognition of sign language, requires large amounts of video data for training. An established method of data enrichment for machine learning is distortion and noise. The difference between linguistic gestures and other gestures is that small changes in posture can radically change the meaning of a gesture. This imposes specific requirements for data variability. The novelty of the method lies in the fact that instead of distorting frames using affine image transformations, vectorization of the sign language speaker’s pose is used, followed by noise in the form of random deviations of skeletal elements. To implement controlled gesture variability using the MediaPipe library, we convert to a vector format where each vector corresponds to a skeletal element. After this, the image of the figure is restored from the vector representation. The advantage of this method is the possibility of controlled distortion of gestures, corresponding to real deviations in the postures of the sign language speaker. The developed method for enriching video data was tested on a set of 60 words of Indian Sign Language (common to all languages and dialects common in India), represented by 782 video fragments. For each word, the most representative gesture was selected and 100 variations were generated. The remaining, less representative gestures were used as test data. The resulting word-level classification and recognition model using the GRU-LSTM neural network has an accuracy above 95 %. The method tested in this way was transferred to a corpus of 4364 videos in Vietnamese Sign Language for all three regions of Northern, Central and Southern Vietnam. Generated 436,400 data samples, of which 100 data samples represent the meaning of words that can be used to develop and improve Vietnamese sign language recognition methods by generating many variations of gestures with varying degrees of deviation from the standards. The disadvantage of the proposed method is that the accuracy depends on the error of the MediaPipe library. The created video dataset can also be used for automatic sign language translation.
Keywords
Постоянный URL
Articles in current issue
- Selection of parameters of optoelectronic systems for monitoring the wear for steam turbine rotor blading based on the value of the total error
- Modeling and analysis of fractal transformation of distorted images of the Earth’s surface obtained by optoelectronic surveillance systems
- Fast labeling pipeline approach for a huge aerial sensed dataset
- Adaptive suboptimal control problem and its variational solution
- Output control for a class of nonlinear systems based on dynamic linearization
- RuPersonaChat: a dialog corpus for personalizing conversational agents
- An optimized deep learning method for software defect prediction using Whale Optimization Algorithm
- Guarantee structural anomaly detection in streaming data using the RRCF model: selection of detector parameters and its stabilization under additive noise conditions
- Evaluation of probabilistic-temporal characteristics of a computer system with container virtualization
- A new method for countering evasion adversarial attacks on information systems based on artificial intelligence
- On the properties of M-estimators optimizing weighted L2-norm of the influence function
- Stability of a highly elastic rectangular plate with clamped-free edges under uniaxial compression
- Models and a deformations simulation approach using ANSYS CAD for railway wagons weighing system
- Application of lattice Boltzmann method to solution of viscous incompressible fluid dynamics problems
- From the construction of wavelets based on derivatives of the Gaussian function to the synthesis of filters with a finite impulse response
- Partition of unity method and smooth approximation
- Censoring training samples using regularization of connectivity relations of class objects
- Approach to software products development in a startup
- Modeling perceiving of recommendations provided by clinical decision support system based on predictive modeling within dental preventive screening