LUNA SDK is a pure face recognition engine that enables efficient and accurate processing of faces in images and live video stream and can run on a wide range of devices
LUNA SDK supports multiple platforms and comes in two editions: the front-end edition and the complete edition.
The Frontend edition is intended for lightweight solutions that do not need to implement face descriptor extraction and matching functions.
The Complete edition contains all the features of the frontend edition but adds the face descriptor extraction and matching functions.
Supported software and hardware platforms differ depending on the editions described above.
LUNA SDK is subdivided into several facilities. Each facility is a set of classes dedicated to a common problem domain.
- Face detection facility is dedicated to face detection. It contains various face detector implementations and factories
- Attribute estimation facility is dedicated to various attribute estimations of both images (such as: blurriness, exposure, transformation, etc.) and depicted faces (such as: age, gender, emotions, etc.)
- Descriptor processing facility is dedicated to face descriptor extraction and matching.
A face descriptor is a set of features, describing the face, invariant to face transformation, size, or other parameters. Face descriptor matching allows judging with a certain probability whether two face images received belong to the same person.
Face Detection Facility
The Face detection facility is responsible for quick and coarse detection tasks, like finding a face in an image.
Detection structure represents an images-space bounding rectangle of the detected object as well as the detection score
The detection score is the measure of classification confidence and not the source image quality. While the score is related to quality (low-quality data generally results in a lower score), it is not a valid metric to estimate the visual quality of an image. Special estimators exist to fulfill this task.
Face alignment is the process of detecting special key points (called “landmarks”) on a face. The LUNA SDK performs landmark detection simultaneously with the face detection since some of the landmarks are byproducts of that detection.
Face landmarks or key points are not used for the face recognition process in VisionLabs' face recognition engine. They are required at the preprocessing stage of an incoming face image for getting the best possible shot of the face
At the very minimum, just 5 landmarks are required: two for eyes, one for a nose tip and two for mouth corners. Using these coordinates, one may warp the source face image for further use with all other LUNA SDK embedded algorithms.
A more advanced 68-points face alignment method is also implemented and is a part of LUNA SDK.
Typical use case for 5 landmarks is face image warping for further utilization by other VisionLabs algorithms include quality and attribute estimators along with descriptor extractors.
Typical use cases for 68 landmarks are segmentation and head pose estimation.
Attribute Estimation Facility
Estimation facility is the only multi-purpose facility in the LUNA SDK. It is designed as a collection of tools that help estimate various images or depicted object properties. These properties may be used to increase the precision of algorithms implemented by other LUNA SDK facilities or resolve custom business tasks.
The face attributes estimator determines age and gender.
|Age (years)||Average error (years)||Age (years)||Average error (years)|
Average age estimation error per age group for cooperative conditions
Gender estimation precision is 99.8%.
The face image quality estimator predicts visual quality of an image. The estimator is optimized for facial images processing and detects the following defects:
- face image is blurred
- face image is underexposed (i.e., too dark)
- face image is overexposed (i.e., too light)
- face image color variation is low (i.e., image is monochrome or close to monochrome)
Eye estimator aims to determine:
- eye state: open, closed or occluded
- precise eye iris location and eyelid shape as an array of landmarks
Poor quality images or ones that depict occluded eyes (think eyewear, hair, gestures) fall into the “Occluded” category.
Head pose estimator is designed to determine camera-space head pose. Since 3D head translation is hard to determine reliably without camera- specific calibration, only 3D rotation component is estimated.
There are two head pose estimation methods available:
- estimate with 68 face-aligned landmarks
- estimate with an input image in RGB format
|Range||-45°...+45°||<-45° or >+45°|
|Average prediction error (per axis)||Yaw||±2.7°||±4.6°|
|Average prediction error (per axis)||Pitch||±3.0°||±4.8°|
|Average prediction error (per axis)||Roll||±3.0°||±4.6°|
Head pose prediction precision
LUNA SDK implies the following coordinate system convention:
Gaze estimator is designed to determine gaze direction relatively to head pose estimation.
|Range||-25°...+25°||-25°...-45° or >+25°...+45°|
|Average prediction error (per axis)||Yaw||±2.7°||±4.6°|
|Average prediction error (per axis)||Pitch||±3.0°||±4.8°|
Smile estimator predicts smile and mouth occlusion.
Emotions estimator determines whether a facial expression corresponds to a broad interpretation of the display of certain emotions:
Ethnicity estimator aims to determine a person’s ethnic group and/or race.
There are 4 types of ethnic groups and races the estimator is currently able to distinguish:
Approximate garbage score estimator (AGS) determines the suitability of an image for later face descriptor extraction and matching.
Glasses estimator predicts eyewear on an image. The estimator outputs probability of prescription and sunglasses.
Occlusion estimator determines whether the face is occluded by an object.
Child estimator determines whether the person is a child or not. We define “child” as a person who is younger than 18.
Warping is the process of face image normalization.
The purpose of the process is to:
- compensate image plane rotation (roll angle)
- center the image using eye positions
- properly crop the image
Descriptor Processing Facility
Face descriptor stores a compact vector of packed properties as well as some helper parameters that were used to extract these properties from the source face image. Together these parameters determine descriptor compatibility.
Not all descriptors are compatible with each other. It is impossible to batch and match incompatible descriptors.
We refer to all face descriptor comparison operations as face matching. The result of matching of two face descriptors is a distance between components of the corresponding parameter vectors. Thus, from a magnitude of this distance, we can judge with certain probability if two facial images depict the same real person. We call that probability a similarity score.
Face descriptors in LUNA SDK have backend and mobile implementations. Backend versions offer higher accuracy while mobile are faster and have smaller model files.
Descriptor extractor is the entity responsible for descriptor extraction.
Descriptor extraction is one of the most computation-heavy operations of LUNA SDK. For this reason, threading must be considered.
Descriptor extraction implementation supports execution on GPUs.
Descriptor matching is an operation when a pair (or more) previously extracted descriptors are compared to find their similarity score. With this information, it is possible to implement face search and other analytic applications.
Descriptor indexing helps accelerate face descriptor matching process by using a special index for face descriptor batch.
Descriptor index has no support of embedded and 32-bit desktop platforms.
The Liveness Engine is a wrapper library with added functionality, which utilizes LUNA SDK building blocks to produce different solutions for liveness detection problems. You can combine multiple liveness checks, but keep in mind that excessive checks result in a poor user experience.
The Liveness Engine is responsible for determining whether a detected face is a still image or in an image sequence i.e living. By “image sequence” here we mean consequent frames of a video stream from a camera or a video file.
Liveness types are implemented according to the inheritance architecture:
Basic liveness check
Basic liveness check types require a single video sequence for operation.
Each liveness type is inherited from the basic liveness class, which utilizes a generic execution cycle and performs common tasks such as:
- basic initialization
- face detection
- additional data extraction / calculation
- face tracking analysis
-using detection rectangles
-using landmark points
- state change monitoring
Each liveness test traces and analyzes a primary estimated attribute chosen for that liveness check. The result is positive if a user succeeds in the correct alteration of the attribute or negative if they fail to.
Angle liveness check
Angle liveness checks additionally perform head pose estimation. Below are some of the examples of application of angle liveness.
a. nod scenario means smooth head tilt in a positive direction until the set threshold is exceeded.
b. head raise scenario requires smooth head tilt in a negative direction until the set threshold is exceeded.
a. left turn scenario requires smooth head rotation in a positive direction until the set threshold is exceeded.
d. right turn scenario requires smooth head rotation in a negative direction until the set threshold is exceeded.
Mouth liveness check
Mouth liveness checks perform mouth landmarks analysis. In this scenario distance between mouth landmarks increases until the set threshold is exceeded (i.e., a user opens a mouth).
Eyes liveness check
Eye liveness checks perform eye state estimation and analysis. In this scenario a user should blink, i.e., both eyes are opened, closed and opened again simultaneously.
Eyebrows liveness check
Eyebrow liveness checks perform eyebrow landmarks analysis. This scenario requires increase of the distance between eyebrows and eyes landmarks (i.e., eyebrow rising) until the set threshold is exceeded.
Zoom liveness check
Zoom liveness checks analyze optical flow and perspective distortion. The idea is to look at the same face from multiple distances (by putting the camera closer to the face, or moving it further) and predict if the face is of real person or is a spoof planar picture.
This liveness check is designed for mobile phones, and the results may be erroneous on other platforms.
Smile liveness check
This scenario requires user to smile until the probability calculated by neural network will be above the set threshold.
Infrared liveness check
This scenario requires user to normally appear in front of a NIR camera until the probability calculated by neural network will reach the set threshold.
Unified liveness check
Unified liveness checks combine previous types of algorithms with an exception of zoom and blink types, and perform additional calculation and analysis in order to detect fraud attempts.
Calculated and tracked entities:
- mouth landmarks distance
- eyebrow landmarks distance
- eye states (blinks)
- smile probability
Complex liveness check
These liveness tests require additional unordinary data for analysis. Such data cannot be obtained by common RGB camera, so it requires the use of complementary devices for its operation.
Depth liveness check
Depth liveness checks require a 16-bit depth matrix sensor, which transfers information concerning the distance of the surfaces of scene objects from a viewpoint in millimeters.
For correct operation the face should be positioned at a distance of 0.5 to 4.5 meters from the sensor.
This liveness test does not require any actions because it performs depth map face region of interest analysis using neural networks.
Track Engine is a tool for face detection and tracking on multiple sources. It allows the user to pick the face images most suitable for facial recognition from a sequence of video frames.
Track Engine itself does not perform any facial recognition. Its purpose is to prepare required face data for LUNA SDK Descriptor Facility methods and LUNA PLATFORM.
The main interface for the Track Engine is Stream - an entity to which you submit video frames.
You can create multiple streams at once if required (in cases when you would like to track faces on multiple cameras). In each stream the engine detects faces and builds their tracks. Each face track has its own unique identifier. It is therefore possible to group face images belonging to the same person with their track IDs. Tracks may break from time to time either due to people leaving the surveillance area or due to challenging detection conditions (poor image quality, occlusions, extreme head poses, etc.).
The frames are submitted on a one by one basis and each frame has its own unique ID.
Track Engine emits various events to inform you about what is happening. The events occur on a per-stream basis.
By implementing one or several observer interfaces it is possible to define custom processing logic in your application.
The Track Engine allows users to define custom recognition suitability criteria for face detections. That way one may alter the best shot selection logic and, therefore, specify which images will make it to the recognition phase.
Track Engine is multi-threaded. The number of threads is configurable and depends on the currently bound LUNA SDK settings.
LUNA SDK comes in multiple editions on all major platforms. Customers are free to pick the version that best suits their needs. In addition, a fine-grained licensing of individual features is possible upon request.