We have created a public database combining ground truth data for head pose, gaze and simplified user face models of 12 individuals.
The hardware employed is a magnetic sensor (The Flock of Birds by Ascension Technologies) for 3D pose and a camera. The sensor is used to register the head pose with respect to the transmitter. The employed camera is a standard Logitech webcam with a resolution of 1280×720 pixels working at 30 fps.
The relative position between the transmitter and the camera has been carefully calibrated together with the positions of the fixations grid.
For each user, 8 sessions are recorded under controlled movements in static and free head movements’ scenarios. Point grids containing 17 and 65 fixations are recorded. For each fixation point the best 10 frames are selected providing an image and the head pose for each one of the samples. In addition a simplified head model for each user is provided.
The videos are provided in MPEG-4 format, recoded with a loss of approximately 1% with respect to the original recording. They have a resolution of 1280×720 pixels, and have been acquired at 30 frames per second. Every video is 10 seconds long, containing 300 frames. Each video is associated to three ground-truth text files. One contains automatically annotated 2D facial points, the 2D-ground-truth, following a model of 54 facial landmarks. The other two contain the head pose, the 3D-ground-truth. One corresponds to the originally recorded head pose, and the other one corresponds to the same head pose sequence transformed so that the rotation in the initial frame is exactly zero. This transformation is done by multiplying the inverse rotation matrix of the initial pose to the pose of each frame. Getting an exact zero initial rotation is not feasible during the recordings, and applying this small transformation to every video is equivalent to moving the headband slightly at the beginning of each video so that the sensor gives an exact zero rotation for the initial frame. The average deviation from zero of the original ground-truth is of 0.83º, 0.86º, and 1.05º in roll, yaw, and pitch respectively. Translations are given in millimeters and rotations in degrees in the 3D ground-truths, and landmarks position is given in pixels in the 2D ground-truth.
For more detailed information about the database, please refer to:
Ion Martinikorena, Rafael Cabeza, Arantxa Villanueva, Sonia Porta, Introducing I2Head Database, PETMEI ’18, June 14–17, 2018, Warsaw, Poland © 2018 Copyright is held by the owner/author(s). ACM ISBN 978-1-4503-5789-0/18/06, https://doi.org/10.1145/3208031.3208033 (In press)
Download the database
The database is publicly available after registering. You can download it here
This database is publicly available for research purposes. If you use it, please cite this paper:
- Ion Martinikorena, Rafael Cabeza, Arantxa Villanueva, Sonia Porta, Introducing I2Head Database, PETMEI ’18, June 14–17, 2018, Warsaw, Poland © 2018 Copyright is held by the owner/author(s). ACM ISBN 978-1-4503-5789-0/18/06, https://doi.org/10.1145/3208031.3208033 (In press)