Visual Speech Recognition with Lightweight Psychologically Motivated Gabor Features

Research Output

Extraction of relevant lip features is of continuing interest in the visual speech domain. Using end-to-end feature extraction can produce good results, but at the cost of the results being difficult for humans to comprehend and relate to. We present a new, lightweight feature extraction approach, motivated by human-centric glimpse-based psychological research into facial barcodes, and demonstrate that these simple, easy to extract 3D geometric features (produced using Gabor-based image patches), can successfully be used for speech recognition with LSTM-based machine learning. This approach can successfully extract low dimensionality lip parameters with a minimum of processing. One key difference between using these Gabor-based features and using other features such as traditional DCT, or the current fashion for CNN features is that these are human-centric features that can be visualised and analysed by humans. This means that it is easier to explain and visualise the results. They can also be used for reliable speech recognition, as demonstrated using the Grid corpus. Results for overlapping speakers using our lightweight system gave a recognition rate of over 82%, which compares well to less explainable features in the literature

Type:

Article
Date:

03 December 2020
Publication Status:

Published
Publisher

MDPI AG
DOI:

10.3390/e22121367
Cross Ref:

10.3390/e22121367
Funders:

Engineering and Physical Sciences Research Council; New Funder

http://researchrepository.napier.ac.uk/output/2708846 Zhang, X., Xu, Y., Abel, A. K., Smith, L. S., Watt, R., Hussain, A., & Gao, C. (2020). Visual Speech Recognition with Lightweight Psychologically Motivated Gabor Features. Entropy, 22(12), https://doi.org/10.3390/e22121367

Citation

Zhang, X., Xu, Y., Abel, A. K., Smith, L. S., Watt, R., Hussain, A., & Gao, C. (2020). Visual Speech Recognition with Lightweight Psychologically Motivated Gabor Features. Entropy, 22(12), https://doi.org/10.3390/e22121367

Authors

Prof Amir Hussain

Professor
School of Computing Engineering and the Built Environment

0131 455 2239

A.Hussain@napier.ac.uk

Keywords

speech recognition; image processing; gabor features; lip reading; explainable

Monthly Views:

Available Documents

pdf

Visual Speech Recognition With Lightweight Psychologically Motivated Gabor Features

2MB

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Downloadable citations
HTML BIB RTF

Type:

Date:

Publication Status:

Publisher

DOI:

Cross Ref:

Funders:

Citation

Authors

Prof Amir Hussain

Keywords

Monthly Views:

Visual Speech Recognition With Lightweight Psychologically Motivated Gabor Features

Downloadable citations