Putting People First: Implementing Biometrics Ethically
Biometrics are celebrated for being among the most accessible, ever-present, and authoritative methods of proving one’s identity. However, recent privacy abuses and security breaches of biometrics data have caused widespread concerns, bringing some to reject the value of this approach entirely.
Biometrics is one of those technologies that brings significant user-friendliness to the devices we carry with us and use at our desks and in our homes. From an Apple iPhone Finger ID to unlock the device and authorize commerce transactions to Windows Hello, biometrics provide easy access to our locked devices. Consumers adopt this technology and others for its utility and ease of use willingly without concern for the potential abuse of the underlying technology. Location enablement, as another high value utility to consumers also exposes significant risks. With just four pieces of location data about frequently visited places, a consumer’s identity can be determined.
While caution is well deserved, we believe there is a way to ethically implement biometric authentication. This paper identifies the specific risks associated with biometrics and offers some possible implementations to mitigate against those risks.
General intro to how biometrics work
Regardless of whether the biometrics system is a fingerprint scanner, facial scanner, or iris scanner, or a palm scanner, the basics of biometric authentication are the same, and utilize the same common components: Enrolment, Templates, and Matching.
Enrolment
When registering onto a biometrics authentication system for the first time, the user must input some biometric data such as a picture of their face, or a scan of a fingerprint: something that is unique to the individual and for which a recognition model can be built. Next, the system scans the biometric data to find the unique features that make up the image in a process called Feature Identification. Finally, a Template, which is a compilation of measurement ratios of unique features from the biometric data, is created. For example, the ratio of the distance between the eyes to the length of the nose. In a secure process, the final steps would involve destroying the original biometric data, so only the template remains. It should be noted that not all systems destroy the original biometric data, and these systems are arguably less secure.
Templates
The templates from enrolment are stored in the system to be used later in matching. Templates are meant to provide obfuscation of the biometric data, and flexibility. Obfuscation occurs because the biometric data cannot be recreated from the template; the template only contains ratios, not the biometric data itself.
Because living beings are dynamic, biometric systems need to have the ability to be flexible in their recognition of individuals based on things like facial hair, eyewear, lighting conditions, and distance from a camera. Templates allow for this flexibility by allowing for a subset of the total measured features to be used to determine a positive match. For example if a facial recognition model recorded 1000 unique measurements about a person’s face, it’s possible that only 100 measurements are needed to confirm a match. In basic terms, this means that half of a person’s face could be occluded and the biometrics system is still able to authenticate them.
Matching
In a matching workflow such as a login operation, the user has their biometric data scanned, and an active template calculated. This active template is compared to the stored templates, and if a match is found, an authorization token is granted to log the user in. Note in this example the image of users face is never saved, processed or compared. A template is immediately created and used for template signature matching.
What are the risks?
Security Risk
A security risk for biometrics occurs when the user’s biometric data is leaked allowing an attacker to use the stolen data to log in as the user either on the site with the breach or other places which use the same biometric method. An example of a high risk implementation would be a facial recognition system that sends and potentially saves a facial scan of the user to a server to authenticate them (rather than destroying this scan locally after a template is created). In such a system, facial scans of all the users are exposed to the internet and if compromised, an attacker might use these scans to log into other servers based facial recognition services.
Privacy Risk
A privacy risk for biometrics occurs when a user’s template has been leaked allowing an attacker to use this template to scan raw data and identify individuals. An example of this could be if the face templates of a biometric system are leaked, an attacker could use those templates to scan videos and images to identify and surveil these individuals.
In a properly implemented system, the biometric data has been deleted, and only the templates are stored in a database or local storage to be used for user authentication. Privacy breaches occur from unintended breaches of the database or storage device.
Functional Scope Creep Risk
Scope creep occurs when more attributes about an individual are revealed than the individual intended. An example of this can be age, and biological gender, and sentiment attribution derived from a facial verification. Within facial recognition and voice recognition biometrics, this type of risk is common due to the use of shared libraries in the feature identification stage. Many of the high performant code libraries designed for feature extraction were developed for gauging customer engagement and sentiment so avoiding tools like this can be difficult or impossible even, but implementing these tools in an ethical way with specific attention to nullify the output of this data is entirely possible. An example of this being used improperly might be an employer using facial recognition as a tool for physical access to the office building, but extracting the age data from the biometrics to make age-based decisions about the workforce.
Racially biased models
One of the emerging topics of discussion around biometrics, especially facial recognition, is a reported bias against people of color. The most popularly-cited culprit of this are the images the AI is trained with, which come primarily from white male faces. Facebook is a good example of this. In 2014 they published a paper boasting a facial recognition system with 97% accuracy. However, researchers who later reviewed the paper found that the system’s dataset consisted of 77% male and 80% white faces.
Due to these primarily white male training datasets, AI has a harder time discerning pigmented skin. In the case of facial identification used in law enforcement, this may yield false positives and, according to studies done by Georgetown Law University’s The Perpetual Line-up, it can wrongly identify innocent people of color as criminals.
Mitigating Risks
Security Risks are addressed through secure handling of biometric data
A secure system must have mitigations in place to address the risk of biometric information breaches. Best practices for this include handling all biometric data locally on the user’s hardware and never exposing this data to the internet. This means that both the registration and template comparisons must be done on the device and never on a server.
When local processing of biometric data is not possible, there are algorithmic approaches to securing the raw biometric data locally and only sending derived data. Examples of this include fuzzy extractors, key binding, and cancelable biometrics.
Privacy Risks are addressed with secure template storage
Making a system private means that the templates are stored in a way that is inaccessible to others, or the data stored in the template has been hashed in a way that makes it usable by the intended system but useless to others. There are many options for this including:
- Using secure local storage for the templates. The iPhone is a great example of this done right. In modern iPhones which utilize face id or touch id, the biometric data is processed by a stand-alone chip and the templates are stored in this chip and made inaccessible to the rest of the phone or the external world. Because the chip itself can’t send biometric data or templates to another system, it can only answer yes or no as to whether or not a user is authenticated.
- Using securely hashed templates. This can be any hash algorithm which makes the measurements impossible to recompute from the hash, while still allowing for template matching to occur.
- If storing templates on a central server, the system should be designed to perform the matching also on the server, and never send the templates to an unauthorized client for matching. In this way, the template ID pair list on the server is never exposed externally. In a system like this, there is still some risk as an active template is exposed to the internet in transit from the client to the server. To further eliminate this risk, only transmit template data to the server via an SSL connection.
Racially performant models
One of the leading groups in biometrics, the Algorithmic Justice League, offers a free service to audit biometric recognition. Their benchmarks are developed with a balanced dataset of light and dark-skinned individuals from a variety of ethnic groups and it is one of the best ways to ensure the inclusivity of racially diverse users in facial recognition AI.
Example Implementations
There are many examples of performing biometric authentication in a way that is both secure and private. When working on the biometric authentication for MyPass, the City of Austin utilized the following server/client relationship to keep the users’ biometric data safe by design.
Step 1: Object detection is done in the browser.
The first step of any form of biometric authentication is to ingest a photo or a scan and search that larger capture for the object used in the biometric authentication otherwise know as the Region of Interest (ROI). The City of Austin performed palm print authentication for their application, so their first step was to detect a hand in a larger picture. In order to perform this in the browser, OpenCV was compiled in WebAssembly, then a Caffe model trained on hand images was used to perform the classification and identify the ROI.
Step 2: Image preprocessing is done in the browser.
Once the anticipated object is found, the image must be processed to correct for differences in rotation, colorspace, and illuminance. This processing was also done using OpenCV running in the browser.
Step 3: Calculating the abstract template is done in the browser.
Once the ROI has been identified and the image has been normalized, the unique features are identified and measured to create an abstract template from the biometric data. The images in the prior steps are discarded.
Step 4: Finding a match is done on the server.
The abstract template is sent to a server for comparison with templates on file. Because templates are represented as measurements stored in a matrix, the comparison of two templates can be found by calculating the dot product of the two matrices. If the threshold for similarity is met then the user is validated and an authorization token is sent back to the browser to authenticate the requested service.
By processing the user’s biometric data locally, and only sending an abstract template of the biometrics to the server, the user’s security was maintained by not transmitting any biometric data that might be used to log onto other services. Likewise, the users’ privacy is protected by storing all the users’ templates on the server, and not making them accessible through an API.
For anyone wanting a more detailed template to work from, this example project by the OpenCV team is excellent.
Creating Understanding with Users
Although there are ways to make biometric authentication secure and private by design, it is challenging to communicate that assurance of safety to users in a meaningful way. On the MyPass project, the team at the City of Austin did something unique. A setup wizard was used to walk the users through a palm print authentication step-by-step. In the first step of this process, the user was invited to verify that their biometric data was not being transmitted over the internet, by turning off their device’s internet before completing the following steps. The wizard, which runs offline in their browser, then walks the user through the steps of image capture and processing. Once their palm print has been safely converted to an abstract template we delete the image data, communicate this to the user, and let them know that it is now safe to reconnect to the internet for template matching and authentication. This is intended to prove that the entire time that the system was in possession of their biometric data (i.e. their palm print), the internet was disconnected, and thus that data could not be transmitted outside of their device.
Conclusion
As with all technology, the way biometric authentication is implemented can impact its users for good or for harm. The number of recent abuses in the biometrics space have led some people to assume that all biometrics are inherently flawed and can’t be implemented without causing harm. However, as we have noted, careful implementation of biometric authentication can address ethical concerns over privacy and security as long as the individual’s security is preserved through the proper handling of biometric data, and their privacy is preserved by taking steps to prevent biometric templates from being used against the individual in non-consensual mass surveillance.