5 Million Face Images for Facial Recognition Model Training

Written by limarc | Published 2020/11/10
Tech Story Tags: face-recognition | facial-recognition | facial-recognition-tech | facial-detection | datasets | dataset | machine-learning | hackernoon-top-story

TLDRThis article on face recognition datasets is one of my best-performing articles I wrote originally on Lionbridge AI. I'm happy to share it with the Hacker Noon community!via the TL;DR App

This article on face recognition datasets is one of my best-performing articles I wrote originally on Lionbridge AI. I'm happy to share it with the Hacker Noon community!
From mobile phone security and surveillance cameras to augmented reality and photography, the facial recognition branch of computer vision has a variety of useful applications. Depending on your specific project, you may require face images in different lighting conditions, faces that express different emotions, or annotated face images. From video frames annotated with facial keypoints to real and fake face image pairs, the datasets on this list vary in size and scope.

Where can I find Free Image Datasets for Facial Recognition Models?

We’ve compiled a list of the best free image datasets for face recognition which total over 5,000,000 face images and video frames. Ranging from GIFs and still images taken from Youtube videos to thermal imaging and 3D images, each dataset is different and suited to different projects and algorithms.
For non-commercial research purposes only, this dataset from MMLAB contains over 200,000 celebrity images.
A simple, yet useful dataset, Face Detection in Images contains just over 500 images with approximately 1,100 faces already tagged with bounding boxes.
dataturks.com/projects/devika.mishra/face_detection3
Face Images with Marked Landmark Points
This dataset includes over 7,000 facial images with keypoints annotated on every image. The number of keypoints on each image varies, with the max number of keypoints being 15 on a single image. The keypoints data is included in a separate CSV file.
With images taken from Flickr, this dataset has 210,000 images. The total image count is made up of 70,000 original images from Flickr, 70,000 images cropped at 1024 x 1024 pixels, and 70,000 cropped at 128 x 128 pixels.
From Google AI comes the Google Facial Expression Comparison dataset which includes 156,000 facial images. The images come in triplets, with two images out of each triplet annotated as the “most similar” in the triplet in terms of facial expression. In true Google fashion, these images were meticulously annotated and each triplet was worked on by at least six separate human annotators.
Created by researchers at the University of Massachusetts, this dataset was originally made to study unconstrained face recognition. It totals over 13,000 images of over 5,700 people. The dataset also includes helpful metadata in CSV format.
This dataset was made to train facial recognition models to distinguish real face images from generated face images. The dataset includes over 1,000 real face images and over 900 fake face images which vary from easy, mid, and hard recognition difficulty.
With images taken from seasons 25 to 28 of the popular American cartoon series, this dataset includes over 9,800 cropped faces of Simpsons characters.
With over 100,000 images, the Tufts Face Database includes a huge collection of facial images divided into nine categories. The categories include computerized sketches, thermal, thermal cropped, three dimensional, Lytro, 2D RGB around, 2D RGB emotion, night vision, and video.
By far the largest dataset on this list, the UMDFaces dataset has over 367,000 face annotations across over 8,200 different subjects in still images. Apart from those images, the dataset also includes over 3.7 million video frames all annotated with facial keypoints of over 3,100 subjects. It should be noted that this dataset is strictly for non-commercial research purposes only.
via umdfaces.io11. UTKFace
The UTKFace dataset includes faces from a wide age range. The people in these images range from less than a year old to over 100 years old. The dataset includes over 20,000 face images with age, gender, and ethnicity annotations.
This dataset contains over 10,000 images that include multiple people or just a single person. The images are divided into numerous settings such as meetings, traffic, parades, and more.
The Yale Face Database is a dataset containing 165 GIF images of 15 different subjects in a variety of lighting conditions. The subjects in the images display different emotions and expressions.
This dataset is composed of public Youtube videos of celebrities which total 155,560 still frames. The videos have been cropped around the faces of the celebrities and have been annotated with facial keypoints for each frame of every video.
Still looking for more datasets? Check out:
  1. https://hackernoon.com/tagged/datasets
  2. https://hackernoon.com/tagged/dataset

Written by limarc | Director of Content @ISNation & HackerNoon's Editorial Ambassador by day, VR Gamer and Anime Binger by night.
Published by HackerNoon on 2020/11/10