AUDIO-VISUAL SPEECH PROCESSING AND ANALYSIS BASED ON SUBSPACE PROJECTIONS Open database of scientific publications ITMO UNIVERSITY

AUDIO-VISUAL SPEECH PROCESSING AND ANALYSIS BASED ON SUBSPACE PROJECTIONS

Journal

Scientific and Technical Journal of Information Technologies, Mechanics and Optics

Oleinik Andrei L

UDK004.93

Issue:2 (114)

Download PDF0 Kbyte

Annotation

Subject of Research.The paper deals with the problems of the mutual reconstruction (transformation) of acoustic and visual components (modalities) of speech. Audio recording of voice represents the acoustic component whereas the parallel video recording of the speaker’s face comprises the visual component. Because of the different physical nature of these modalities, their mutual analysis is accompanied by numerous difficulties. Reconstruction methods can be used to overcome these difficulties. Method. The proposed approach is based on Principal Component Analysis (PCA), Multiple Linear Regression (MLR), Partial Least Squares regression (PLS regression) and K-means clustering algorithm. Moreover, attention is paid to data preprocessing. Mel-frequency cepstral coefficients (MFCCs) are used as acoustic features, and twenty key points, which represent the mouth contour, comprise visual features. Main Results. The experiments on the reconstruction of the mouth contour from the MFCCs are presented. The experiments were carried out on VidTIMIT dataset of audio-visual phrase recordings in English. Four variants of the proposed approach were tested and evaluated. They are based on PCA and PLS regression with clustering and without it. Quantitative (objective) and qualitative (subjective) assessment confirmed the efficiency of the proposed approach. The implementation based on PLS regression with preliminary clustering led to the best results. Practical Relevance. The proposed approach can be used to develop various bimodal biometric systems, voice-driven virtual “avatars”, mobile access control systems and other useful human-computer interaction solutions. Moreover, it is shown that, given the proper implementation, PCA and PLS reduce significantly the computational complexity of the reconstruction operation. In addition, the clustering step can be omitted to increase additionally the processing speed at the cost of slightly lower reconstruction quality.

AUDIO-VISUAL SPEECH PROCESSING AND ANALYSIS BASED ON SUBSPACE PROJECTIONS

Scientific and Technical Journal of Information Technologies, Mechanics and Optics

Annotation

Keywords

Постоянный URL

Articles in current issue

AUDIO-VISUAL SPEECH PROCESSING AND ANALYSIS BASED ON SUBSPACE PROJECTIONS

Scientific and Technical Journal of Information Technologies, Mechanics and Optics

Annotation

Keywords

Постоянный URL

Поделиться

Articles in current issue