![Scientific and Technical Journal of Information Technologies, Mechanics and Optics](/images/mag-ntv.png)
AUDIO-VISUAL SPEECH PROCESSING AND ANALYSIS BASED ON SUBSPACE PROJECTIONS
![Scientific and Technical Journal of Information Technologies, Mechanics and Optics](/images/mag-ntv.png)
Annotation
Subject of Research.The paper deals with the problems of the mutual reconstruction (transformation) of acoustic and visual components (modalities) of speech. Audio recording of voice represents the acoustic component whereas the parallel video recording of the speaker’s face comprises the visual component. Because of the different physical nature of these modalities, their mutual analysis is accompanied by numerous difficulties. Reconstruction methods can be used to overcome these difficulties. Method. The proposed approach is based on Principal Component Analysis (PCA), Multiple Linear Regression (MLR), Partial Least Squares regression (PLS regression) and K-means clustering algorithm. Moreover, attention is paid to data preprocessing. Mel-frequency cepstral coefficients (MFCCs) are used as acoustic features, and twenty key points, which represent the mouth contour, comprise visual features. Main Results. The experiments on the reconstruction of the mouth contour from the MFCCs are presented. The experiments were carried out on VidTIMIT dataset of audio-visual phrase recordings in English. Four variants of the proposed approach were tested and evaluated. They are based on PCA and PLS regression with clustering and without it. Quantitative (objective) and qualitative (subjective) assessment confirmed the efficiency of the proposed approach. The implementation based on PLS regression with preliminary clustering led to the best results. Practical Relevance. The proposed approach can be used to develop various bimodal biometric systems, voice-driven virtual “avatars”, mobile access control systems and other useful human-computer interaction solutions. Moreover, it is shown that, given the proper implementation, PCA and PLS reduce significantly the computational complexity of the reconstruction operation. In addition, the clustering step can be omitted to increase additionally the processing speed at the cost of slightly lower reconstruction quality.
Keywords
Постоянный URL
Articles in current issue
- ON MODERN APPROACH TO AIRPLANE-TYPE UNMANNED AERIAL VEHICLES DESIGN WITH SHORT TAKEOFF AND LANDING PART III. NUMERICAL MODELING OF AIRCRAFT VORTEX AERODYNAMICS BY DISCRETE VORTEX METHOD
- LOCALIZED LASER VAPORIZATION OF FILMS WITH COMPLEX TOPOLOGIES FOR SURFACE ACOUSTIC WAVE MICROGYROSCOPE SENSOR
- OPTICAL FIELD AMPLITUDE DISTRIBUTION ON THE PATTERN PLATE OF OPTOELECTRONIC SYSTEM FOR MEASURING OF DITHER SYSTEM PARAMETERS IN RING LASER GYRO
- CHOOSING PARAMETERS OF SPATIAL POSITION CONTROL OPTICAL-ELECTRONIC SYSTEMS WITH ACTIVE REFERENCE MARKS
- DEFORMATION CONTROL METHOD OF COMPOSITE STRUCTURAL ELEMENTS BY FIBER-OPTIC ACOUSTIC EMISSION SENSOR
- SEMICONDUCTOR FREQUENCY STANDARD BASED ON P(16) SPECTRAL LINE OF ACETYLENE ISOTOPE WITH TEMPERATURE STABILIZATION BY PHASE MODULATION
- ELECTRIC GENERATOR CONTROL UNDER HIGH-FREQUENCY MEASUREMENT NOISES
- KNOWLEDGE TRANSFER FOR RUSSIAN CONVERSATIONAL TELEPHONE AUTOMATIC SPEECH RECOGNITION
- EFFICIENCY IMPROVEMENT OF CODING METHOD BY INTRAFRAME PREDICTION IN H.265 / HEVC STANDARD
- DATABASE SEMANTIC MODEL APPLICATION IN NATURAL LANGUAGE USER INTERFACE DEVELOPMENT PROCESS
- INDUSTRY 4.0 DIGITAL PRODUCTION ORGANIZATION BASED ON CYBER AND PHYSICAL SYSTEMS AND ONTOLOGIES
- INFORMATIVE FEATURE SELECTION IN SOFTWARE IDENTIFICATION TASK
- DESIGN, DEVELOPMENT AND MAINTENANCE METHODOLOGY OF DOMAIN SEMANTIC PORTALS OF SCIENTIFIC AND TECHNICAL INFORMATION
- APPLICATION OF MAСHINE LEARNING METHODS FOR DETECTING OF JPEG IMAGE INTEGRITY VIOLATIONS
- OBJECT-PROCESS DATA MODEL FOR SERVICE-ORIENTED ARCHITECTURE OF INTEGRATED INFORMATION SYSTEMS
- ANALYSIS OF USERS’ PROTECTION FROM SOCIO-ENGINEERING ATTACKS: SOCIAL GRAPH CREATION BASED ON INFORMATION FROM SOCIAL NETWORK WEBSITES
- OPTIMAL MATHEMATICAL MODEL FOR DESCRIPTION OF PHYSICAL PHENOMENA AND TECHNOLOGICAL PROCESSES
- SYNTHESIS METHOD OF DIGITAL-TO-ANALOG CONVERTER SCHEMATIC MODELS FOR INTEGRATED CIRCUITS
- INTEGRATED CIRCUITS TIMING ANALYSIS WITH ACCOUNT OF PAD MODELS AND BOND WIRES
- ACCURACY INCREASE FOR AUTOMATIC VISUAL RUSSIAN SPEECH RECOGNITION: VISEME CLASSES OPTIMIZATION
- FEATURE COMBINATION FOR THE TASK OF NEURAL NETWORK ACOUSTIC MODEL LEARNING