In November the National Institute of Standards and Technology, a government laboratory that sets measurement standards for all kinds of technological products, released an inter-agency report on the state of the performance of facial recognition algorithms. It’s most likely already outdated.
“Technology is evolving very quickly,” says scientist Patrick Grother, who is in charge of evaluating face and iris recognition algorithms.
According to a recent NIST report co-written by Grother, there has been an “industrial revolution” recently in facial recognition algorithms to the point where some of them are extremely accurate. As a result, their use has been proliferating. Social media networks now recognize your face when it’s uploaded. Windows computers and Apple phones allow users to unlock devices using the biometric recognition of their faces. Airports around the world scan crowds for wanted individuals and suspected terrorists. Facial recognition technology is proliferating among national security services and law enforcement agencies.
The NIST is currently completing a report that analyzes how well these many different algorithms perform at recognizing people of different ages, races, and genders.
Grother will speak at a meeting of the Princeton ACM/IEEE on Thursday, March 14, at Princeton University’s computer science building, Room 105. Admission is free. For more information, call 908-285-1066 or visit www.princetonacm.acm.org.
Scientists have long suspected that some algorithms might be better at recognizing people of a lighter skin tone. A 2018 report from MIT’s Media Lab confirmed these suspicions when it tested facial recognition systems from Microsoft, IBM, and Megvii of China and found that the systems were less accurate at identifying gender in darker-skinned subjects.
The NIST’s study is more comprehensive. Grother says the lab evaluated up to 200 different algorithms from 60 or 70 different developers — some were commercial systems, and others were made by universities.
“There has been a lot of coverage about this issue in the press and elsewhere going back at least a year now,” Grother says. He says he wanted to write a report that provided a scope of the work to be done, created metrics and methods for others to measure bias in face recognition, and gave empirical results for a broad range of algorithms.
The report was supposed to have been done by now, but the government shutdown delayed the effort.
Although the report is not yet complete, Grother could say a few things about what the study found. “It’s a complicated story,” he says. One finding was that some systems don’t recognize certain demographics as well as others. For example, a traveler going through an automated passport identification gate might not be authenticated or recognized as himself.
Other software tended to produce false positives, that is, identifying a person as someone else, and did this at higher rates for certain demographic groups.
Another category of error was that some systems didn’t even correctly identify that there was a person there at all.
Grother says he doesn’t want to say at this point, before the report was complete, what demographics were affected. (Other research has indicated some algorithms have problems with darker skin tones.)
Gender plays a role, too. Grother says that in general, women are harder to recognize than men, with some systems having a higher rate of false positive and false negatives among women. But that’s not always true across all algorithms.
The NIST has been testing facial recognition algorithms since 2000. Grother attributes the recent rise in quality to the adoption of a new kind of algorithm called a convolutional neural network, which uses machine learning to train facial recognition systems. “Error rates are now approaching zero,” Groher says.
Some systems, however, are better than others. “Some recognition algorithms are very capable, and some are markedly less so,” Grother says. “It’s like golf. There’s a difference between Tiger Woods and other leading pro players, then there’s a gulf of difference between those guys and me playing golf. The same thing exists in facial recognition … technologies are not commoditized. You can’t pick one off the shelf and expect it to work. It’s not like pumping gasoline into your car. ‘Buyer beware’ is still alive and well in facial recognition.”
There is no single “best” algorithm, but the NIST has published data about which systems perform well in different measures and in different conditions.
Grother, who grew up in southwest England and studied at the University of London, has evaluated the performance of several types of image recognition systems for NIST, including iris, fingerprint, and tattoo identification.
As facial recognition systems proliferate and become more accurate, they are also becoming more capable of recognizing faces under different circumstances and from low-quality images. For example, a good algorithm could see a picture of a person that was taken in profile and match it to a head-on driver’s license photo. A camera attached to a video game console could identify different players seated at different angles around a living room.
These advances have allowed for a new use of the technology: solving child exploitation crimes. Law enforcement agencies have access to a database of known child pornography images. Several organizations have developed software to help investigators solve child exploitation cases, and now face recognition tools are being added to these programs. The hope is that since algorithms have become so advanced, they will be able to identify the children in the database and ultimately lead to the arrest of their abusers.
The proliferation of image recognition technology makes it all the more important for the companies that create algorithms to minimize errors related to demographics. Some developers are already tackling the problem, Grother says. “Demographic effects exist, and they need to be measured,” Grother says. “The issue needs to be addressed going forward.”