We have been collecting data sets and conducting baseline and advanced personal identification studies using biometric measurements. We are committed to releasing all data collated to eligible research groups, with appropriate controls to forbid on-line distribution outside the research community.
If you are interested in obtaining any of the biometric datasets described below, please follow these instructions:
- Download all applicable license agreements. Several of our datasets require more than one license agreement.
- Have the license agreement reviewed and signed by an individual authorized to make legal commitments in the name of your organization.
- For university licensees – we cannot accept licenses signed by students or postdoctoral scholars under any circumstances. We cannot accept licenses signed by faculty members unless they have been explicitly delegated the authority to make contracts on behalf of the institution. Your institution's legal or contracting office must review and execute the license.
- Return the properly signed license agreement via your INSTITUTIONAL e-mail address (we cannot accept license agreements sent through third party e-mail providers) to firstname.lastname@example.org. You may also fax requests to +1 574 631 9260, attention J. Dhar.
- Include in the e-mail/cover page the full name, title, address and phone number of the institution and institutional point of contact.
The Masked Physiological Monitoring (MPM) dataset contains 159 video recordings from 54 human subjects wearing protective face coverings. Each recording consists of a 1920x1080 resolution losslessly compressed RGB video recorded at 90 frames per second with simultaneous PPG collected from two fingertip oximeters. Each recording lasts a minimum of 3 minutes where subjects converse, move their head, and sit still, resulting in over 8 hours of data.
The Multi-Site Physiological Monitoring (MSPM) dataset consists of 103 sessions, each lasting just over 14 minutes on average, in which human subjects engage in a variety of activities designed to elicit interesting physiological phenomena such as a breath hold to increase blood pressure, or to provide a challenging context for performing remote photoplethysmography (rPPG) such as an adversarial attack. Sessions were recorded in RGB from three different angles and near-infrared zoomed in on the eyes, along with cardiac pulse at ten sites across the body, blood oxygenation, and blood pressure using a cuff-based monitor.
The UND WACV 2023 CYBORG Dataset contains (a) images of live (authentic) faces, (b) images of faces synthetically generated by deep learning-based generative adversarial networks, and (c) regions annotated by humans solving the synthetic face detection task, indicating features supporting their decisions.
This dataset contains modified samples from the Flickr-Faces-HQ (FFHQ), made available under Creative Commons BY-NC-SA 4.0 license by NVIDIA Corporation (https://github.com/NVlabs/ffhq-dataset/blob/master/LICENSE.txt). According to that license, one is allowed to redistribute and adapt FFHQ samples for non-commercial purposes, as long as one (a) gives appropriate credit by citing the FFHQ creator’s paper, (b) indicate any changes that one made, and (c) distribute any derivative works under the same license. In response to these requirements, we: (a) cited the paper indicated at https://github.com/NVlabs/ffhq-dataset in the paper publishing the UND WACV 2023 CYBORG Dataset, (b) inform that the modifications made to the original FFHQ samples include cropping the image around the detected face and rescaling such cropped samples to the 224x224 pixel resolution, and (c) the derivative work is distributed as the AAAI 2023 paper. UND WACV 2023 CYBORG Dataset contains modified samples from the (https://creativecommons.org/licenses/by-nc/4.0/legalcode).
According to the request of the licensor (https://github.com/tkarras/progressive_growing_of_gans), one is allowed to use any of the material in their own work, as long as appropriate credit is given to the creators by mentioning the title and author list of their paper: Tero Karras, Timo Aila, Samuli Laine, Jaakko Lehtinen, “Progressive Growing of GANs for Improved Quality, Stability, and Variation,” ICLR 2018.
LivDet-Iris-2023 dataset contains images of live (authentic) irises and images of irises synthetically-generated by deep learning-based generative adversarial networks. The primary goal of creating and sharing this dataset is to allow researchers to participate in LivDet-Iris 2023 competition by delivering to the organizers the presentation attack detection scores associated with these images. After the LivDet-Iris 2023 competition is concluded, this dataset may be a useful benchmark allowing to compare future solutions with those submitted to the competition.
All data is de-identified. Assembly of this data set was supported by the US National Institute of Standards and Technology.
The BVC-UNN-face data set was collected by the Biometrics Vision and Computing (BVC) group at the University of Nigeria. It includes a database of face images of Nigerians.
The BVC group through the University of Nigeria retains ownership and copyright of the BVC-UNN-face dataset. This data is distributed via the University of Notre Dame upon receipt of a properly executed copy of the license agreement.
For details on publishable images, please click here.
This dataset may be useful for studying accuracy differences across female / male demographics.
Data Type: RGB Face Video, Pulse waveforms and Heart rate, Approximate Download Size: 7 TB
This dataset overlaps with DDPM specifically for remote pulse detection. It consists of losslessly compressed RGB videos and ground truth pulse waveforms and heart rate (HR) for 86 subjects. The data was collected in an interview scenario with subjects freely moving, talking, and exhibiting facial expressions. Each video lasts around 10 minutes, recorded at 90 frames-per-second giving several million visible-light frames at 1920x1080 resolution. Pulse data was interpolated to the video sampling rate, such that each frame has a waveform and HR pair. Subject metadata describing age, gender, race, and ethnicity is included. Predefined train, validation, and test splits are also included to compare with results presented in the original paper . Download of this dataset requires an account on Globus.org.
Data Type: Video with corresponding force plate data , Approximate Download Size: 20 GB
This dataset consists of videos of 89 female athletes performing 582 evaluative jumps for the purpose of predicting ACL injury risk. The dataset includes three videos from different angles for each jump as well as force plate data. For more information please see the detailed description HERE.
Data Type: Visible Face Images, Approximate Download Size: 4.2 GB
The TIM test includes a total of 675 images organized into 225 image triads. The triads comprise two image of the same identity and one image of a different identity. The task is to select the odd-one-out (i.e., the image of the different identity). The images were sampled from the Good, the Bad, and the Ugly Challenge and show frontal view of faces with variation in illumination, expression, and subject appearance (hair, accessories). The age of the subjects can be estimated assuming that the photos were taken in between 2004 and 2005.
Data Type: RGB Face Video, NIR Face Video, LWIR Face Video, Pulse waveforms, Heart rate, Approximate Download Size: 12 TB
The Deception Detection and Physiological Monitoring (DDPM) dataset captures an interview scenario in which the interviewee attempts to deceive the interviewer on selected responses. The interviewee is recorded in RGB, near-infrared, and long-wave infrared, along with cardiac pulse, blood oxygenation, and audio. After collection, data were annotated for interviewer/interviewee, curated, ground-truthed, and organized into train/test parts for a set of canonical deception detection experiments. The dataset contains almost 13 hours of recordings of 70 subjects, and over 8 million visible-light, near-infrared, and thermal video frames, along with appropriate meta, audio, and pulse oximeter data. Download of this dataset requires an account on Globus.org.
Data Type: IR Iris Still, Approximate Download Size: 2.7 GB
The NDIris3D dataset contains a total of 6,850 images: 3,488 images acquired by LG4000, and 3,362 images acquired by AD100 from the same 89 subjects with and without textured contact lenses, and for varying illumination setups in both LG4000 and AD100 sensors. The dataset may be used in research on iris presentation attack detection (especially related to application of photometric stereo in PAD), or to assess the impact of contact lenses on matching performance. Note: the NDIris3D dataset is part of the larger LivDet-Iris 2020 test set. If you plan to test you algorithms with Notre Dame’s part of the LivDet-Iris 2020 benchmark, use NDIris3D appropriately to make fair comparisons with the LivDet competition winner (e.g., do not use NDIris3D in training).
Data Type: IR Iris Still, Approximate Download Size: 2.4 GB
This database offers iris images (with and without contact lenses) of the same eyes captured shortly one after another with illumination coming from two different locations. 5,796 iris images in total were acquired by the LG IrisAccess 4000 sensor from 119 subjects. This set is divided into four subsets used in the experiments: (a) 1,800 images of irises wearing regular (with dot-like pattern) textured contact lenses, as shown in Fig. 6a in the wAcv 2019 paper; (b) 864 images of irises wearing irregular (without dot-like pattern) textured contact lenses, as shown in Fig. 6b in the WACV 2019 paper; (c) 1,728 images of irises wearing clear contact lenses (without any visible pattern), and (d) 1,404 images of authentic irises without any contact.
Data Type: Video
Approximate Download Size: 3.3 GB
The VBOLO dataset was collected in several sessions, at various checkpoints within public transportation facilities such as tunnels, bridges, and hallways. These capture environments include different camera mount heights and depression angles, illuminations, backgrounds, resolutions, pedestrian poses, and distractors. This dataset provides a good scenario for the facial ReID problem. This dataset uses a small set of known individuals - ``actors'', who move in and out of the surveillance cameras' fields of view, together with the unknown persons denoted as ``distractors''. The ``actors'' change clothing randomly between each ``appearance'' in a camera's field of view.
Compared to a typical body-based ReID dataset, which has only a few images for each subject, the VBOLO dataset has a large number of annotations for each subject from consecutive video frames, which mimic a real scenario for surveillance tracking and detection. This is significantly challenging for matching, because: 1) Faces change size significantly e.g. , from 12x12 to 150x150) and exhibit significant pose variations as well. 2) The cameras supplying the probe and gallery images may have different resolutions and points of view.
Advances in image restoration and enhancement techniques have led to discussion about how such algorithms can be applied as a pre-processing step to improve automatic visual recognition. In principle, techniques like deblurring and super-resolution should yield improvements by de-emphasizing noise and increasing signal in an input image. But the historically divergent goals of computational photography and visual recognition communities have created a significant need for more work in this direction. To facilitate new research, we introduce a new benchmark dataset called UG^2, which contains three difficult real-world scenarios: uncontrolled videos taken by UAVs and manned gliders, as well as controlled videos taken on the ground.
Data Type: Synthetic Face Images, 3D Head Models, Approximate Download Size: 211 GB
The dataset contains two types of data:
1. A set of 3D head models (.abs files) and their corresponding 2D RGB registration image (.ppm files), obtained using a Konica-Minolta ‘Vivid 910’ 3D scanner, of real identities (subjects), either Male or Female in gender, and Caucasian or Asian in ethnicity.
2. A set of RGB face images, masked faces without context and background 800x600 in size, of fully synthetic subjects (identities) that do not exist in reality. The synthetic identities are generated by consistent sampling of facial parts from face images of different real identities, sampled from, either Male or Female in gender, and Caucasian or Asian in ethnicity.
Since all the identities in this dataset are synthetic, i.e. they do not exist, they can be used freely without any privacy concerns. These synthetic face images were generated using Python and OpenGL, with minimal training, and can be used as – (1) supplemental training data to train CNNs, (2) additional distractor face images in the gallery for face verification experiments.