MIRACL-VC1: an RGB-D dataset for lipreading in English
- Ahmed Rekik CRNS
- Achraf Ben-Hamadou CRNS
- Walid Mahdi Sfax University
Abstract
MIRACL-VC1 is a lip-reading dataset including both depth and color images. It can be used for diverse research fields like visual speach recognition, face detection, and biometrics. Fifteen speakers (five men and ten women) positioned in the frustum of a MS Kinect sensor and utter ten times a set of ten words and ten phrases (see the table below). Each instance of the dataset consists of a synchronized sequence of color and depth images (both of 640x480 pixels). The MIRACL-VC1 dataset contains a total number of 3000 instances.
Dataset labels
ID | Words | ID | Phrases |
---|---|---|---|
1 | Hegin | 1 | Stop navigation. |
2 | Choose | 2 | Excuse me. |
3 | Connection | 3 | I am sorry. |
4 | Navigation | 4 | Thank you. |
5 | Next | 5 | Good bye. |
6 | Previous | 6 | I love this game. |
7 | Start | 7 | Nice to meet you. |
8 | Stop | 8 | You are welcome. |
9 | Hello | 9 | How are you? |
10 | Well | 10 | Have a good time |
Download
The MIRACL-VC1 dataset is made available for research purposes only. In order to have access to the dataset please follow these 15 links, each one corresponds to the data of one person. For more information do not hesitate to sent an email to Ahmed Rekik (rekikamed@gmail.com),
It is recommended to use the following link to get access to the full data on Google Drive:
Download all in one zip file (about 5 Gb)
Otherwise, you can use the following links:
F01
F02
F04
F05
F06
F07
F08
F09
F010
F11
M01
M02
M04
M07
M08
Citation
Details and results for visual speech recognition can be found in this following paper:@article{rekik2016adaptive,
title = {An adaptive approach for lip-reading using image and depth data},
author = {Rekik, Ahmed and Ben-Hamadou, Achraf and Mahdi, Walid},
journal = {Multimedia Tools and Applications},
volume = {75},
number = {14},
pages = {8609--8636},
year = {2016},
publisher= {Springer}
}
@inproceedings{RekikICIAR14,
author = {Ahmed Rekik and Achraf {Ben-Hamadou} and Walid Mahdi},
title = {A New Visual Speech Recognition Approach for {RGB-D} Cameras},
booktitle = {Image Analysis and Recognition - 11th International Conference,
{ICIAR} 2014, Vilamoura, Portugal, October 22-24, 2014}
pages = {21--28},
year = {2014}
}
Folder architecture of the dataset is explained as follows: