MIRACL-VC1: an RGB-D dataset for lipreading in English

Abstract

overview

MIRACL-VC1 is a lip-reading dataset including both depth and color images. It can be used for diverse research fields like visual speach recognition, face detection, and biometrics. Fifteen speakers (five men and ten women) positioned in the frustum of a MS Kinect sensor and utter ten times a set of ten words and ten phrases (see the table below). Each instance of the dataset consists of a synchronized sequence of color and depth images (both of 640x480 pixels). The MIRACL-VC1 dataset contains a total number of 3000 instances.

Dataset labels

ID Words ID Phrases
1 Hegin 1 Stop navigation.
2 Choose 2 Excuse me.
3 Connection 3 I am sorry.
4 Navigation 4 Thank you.
5 Next 5 Good bye.
6 Previous 6 I love this game.
7 Start 7 Nice to meet you.
8 Stop 8 You are welcome.
9 Hello 9 How are you?
10 Well 10 Have a good time

Download

The MIRACL-VC1 dataset is made available for research purposes only. In order to have access to the dataset please follow these 15 links, each one corresponds to the data of one person. For more information do not hesitate to sent an email to Ahmed Rekik (rekikamed@gmail.com), It is recommended to use the following link to get access to the full data on Google Drive:

Download all in one zip file (about 5 Gb)

Otherwise, you can use the following links:

F01 F02 F04 F05 F06 F07 F08 F09
F010 F11 M01 M02 M04 M07 M08

Sensor calibration to align the depth maps with color images can be downloaded from the following link:

Download sensor calibration file

Citation

Details and results for visual speech recognition can be found in this following paper:
@article{rekik2016adaptive,
    title   = {An adaptive approach for lip-reading using image and depth data},
    author  = {Rekik, Ahmed and Ben-Hamadou, Achraf and Mahdi, Walid},
    journal = {Multimedia Tools and Applications},
    volume  = {75},
    number  = {14},
    pages   = {8609--8636},
    year    = {2016},
    publisher= {Springer}
}

@inproceedings{RekikICIAR14,
    author    = {Ahmed Rekik and Achraf {Ben-Hamadou} and Walid Mahdi},
    title     = {A New Visual Speech Recognition Approach for {RGB-D} Cameras},
    booktitle = {Image Analysis and Recognition - 11th International Conference,
                {ICIAR}  2014, Vilamoura, Portugal, October 22-24, 2014}
    pages     = {21--28},
    year      = {2014}
}

Folder architecture of the dataset is explained as follows:

scales