We have been collecting data sets and conducting baseline and advanced personal identification studies using biometric measurements. We are committed to releasing all data collated to eligible research groups, with appropriate controls to forbid on-line distribution outside the research community. Data is distributed using rsync.
If you are interested in obtaining any of the biometric datasets described below, please follow these instructions:
- Download all applicable license agreements. Several of our datasets require more than one license agreement.
- Have the license agreement reviewed and signed BY AN INDIVIDUAL AUTHORIZED TO MAKE LEGAL COMMITMENTS ON BEHALF OF YOUR INSTITUTION. WE CANNOT ACCEPT LICENSES SIGNED BY STUDENTS OR FACULTY MEMBERS. YOUR INSTITUTION'S LEGAL OFFICE MUST REVIEW AND EXECUTE THE LICENSE.
- Return the properly signed license agreement via your INSTITUTIONAL e-mail address (we cannot accept license agreements sent through third party e-mail providers) to email@example.com or fax, attention D. Wright, to 1-574-631-9260.
- Include in the e-mail/cover page the full name, title, address and phone number of the institution and institutional point of contact.
The Deception Detection and Physiological Monitoring (DDPM) dataset captures an interview scenario in which the interviewee attempts to deceive the interviewer on selected responses. The interviewee is recorded in RGB, near-infrared, and long-wave infrared, along with cardiac pulse, blood oxygenation, and audio. After collection, data were annotated for interviewer/interviewee, curated, ground-truthed, and organized into train/test parts for a set of canonical deception detection experiments. The dataset contains almost 13 hours of recordings of 70 subjects, and over 8 million visible-light, near-infrared, and thermal video frames, along with appropriate meta, audio, and pulse oximeter data.
Data Type: IR Iris Still, Approximate Download Size: 2.7 GB
The NDIris3D dataset contains a total of 6,850 images: 3,488 images acquired by LG4000, and 3,362 images acquired by AD100 from the same 89 subjects with and without textured contact lenses, and for varying illumination setups in both LG4000 and AD100 sensors. The dataset may be used in research on iris presentation attack detection (especially related to application of photometric stereo in PAD), or to assess the impact of contact lenses on matching performance. Note: the NDIris3D dataset is part of the larger LivDet-Iris 2020 test set. If you plan to test you algorithms with Notre Dame’s part of the LivDet-Iris 2020 benchmark, use NDIris3D appropriately to make fair comparisons with the LivDet competition winner (e.g., do not use NDIris3D in training).
Data Type: IR Iris Still, Approximate Download Size: 2.4 GB
This database offers iris images (with and without contact lenses) of the same eyes captured shortly one after another with illumination coming from two different locations. 5,796 iris images in total were acquired by the LG IrisAccess 4000 sensor from 119 subjects. This set is divided into four subsets used in the experiments: (a) 1,800 images of irises wearing regular (with dot-like pattern) textured contact lenses, as shown in Fig. 6a in the wAcv 2019 paper; (b) 864 images of irises wearing irregular (without dot-like pattern) textured contact lenses, as shown in Fig. 6b in the WACV 2019 paper; (c) 1,728 images of irises wearing clear contact lenses (without any visible pattern), and (d) 1,404 images of authentic irises without any contact.
Data Type: Video
Approximate Download Size: 3.3 GB
The VBOLO dataset was collected in several sessions, at various checkpoints within public transportation facilities such as tunnels, bridges, and hallways. These capture environments include different camera mount heights and depression angles, illuminations, backgrounds, resolutions, pedestrian poses, and distractors. This dataset provides a good scenario for the facial ReID problem. This dataset uses a small set of known individuals - ``actors'', who move in and out of the surveillance cameras' fields of view, together with the unknown persons denoted as ``distractors''. The ``actors'' change clothing randomly between each ``appearance'' in a camera's field of view.
Compared to a typical body-based ReID dataset, which has only a few images for each subject, the VBOLO dataset has a large number of annotations for each subject from consecutive video frames, which mimic a real scenario for surveillance tracking and detection. This is significantly challenging for matching, because: 1) Faces change size significantly e.g. , from 12x12 to 150x150) and exhibit significant pose variations as well. 2) The cameras supplying the probe and gallery images may have different resolutions and points of view.
Advances in image restoration and enhancement techniques have led to discussion about how such algorithms can be applied as a pre-processing step to improve automatic visual recognition. In principle, techniques like deblurring and super-resolution should yield improvements by de-emphasizing noise and increasing signal in an input image. But the historically divergent goals of computational photography and visual recognition communities have created a significant need for more work in this direction. To facilitate new research, we introduce a new benchmark dataset called UG^2, which contains three difficult real-world scenarios: uncontrolled videos taken by UAVs and manned gliders, as well as controlled videos taken on the ground.
Data Type: Synthetic Face Images, 3D Head Models
Approximate Download Size: 211 GB
The dataset contains two types of data:
1. A set of 3D head models (.abs files) and their corresponding 2D RGB registration image (.ppm files), obtained using a Konica-Minolta ‘Vivid 910’ 3D scanner, of real identities (subjects), either Male or Female in gender, and Caucasian or Asian in ethnicity.
2. A set of RGB face images, masked faces without context and background 800x600 in size, of fully synthetic subjects (identities) that do not exist in reality. The synthetic identities are generated by consistent sampling of facial parts from face images of different real identities, sampled from, either Male or Female in gender, and Caucasian or Asian in ethnicity.
Since all the identities in this dataset are synthetic, i.e. they do not exist, they can be used freely without any privacy concerns. These synthetic face images were generated using Python and OpenGL, with minimal training, and can be used as – (1) supplemental training data to train CNNs, (2) additional distractor face images in the gallery for face verification experiments.