##plugins.themes.bootstrap3.article.main##

Background: Artificial intelligence has made significant contributions to facial recognition and biometric identification and is now being employed in a range of applications. Detecting facial spoofing, where someone attempts to pass as an authorized user to gain access to the system, is still difficult. Spoofing-attack-resistant face recognition systems demand efficient and effective solutions. A more stringent recognition system will result in higher false positives and false negatives, which makes such a system questionable for practical use. Eventually, the prominent deep-learning techniques were overtaken by CNN-based architecture.

Objective: To analyse classifiers to identify the impact on spoof detection. The intent is not only to get the highest accuracy but also to find strategies to significantly reduce false positives and false negatives.

Methods: Face image spoofing detection is implemented in this paper by extracting face embedding using the Local Binary Pattern (LBP) and the VGG16 CNN architecture. To classify real and spoof images, SVM, KNN, Decision Tree, and ensembles of classifier models are utilized.

Results: The proposed three models obtained test accuracy of 98%, 94.48%, and 99% when applied to the custom dataset, while in the NUAA photography imposter dataset, they achieved 97%, 99%, and 100% and kept the FN and FP significantly low.

Conclusion: Accessing human faces through smart gadgets from various resources is possible, leading to the possibility of spoof attacks. Although spoof detection methods persist, effective methods with high accuracy and low FN and FP are still required. The proposed ensemble techniques significantly outperform the existing classifiers with high accuracy, keeping FN and FP low.

Downloads

Download data is not yet available.

Introduction

Facial biometrics are increasingly being used in a variety of commercial and industry contexts, and they are becoming extremely popular for verifying user identities. Facial recognition is an activity of identifying a person by his face by estimating and evaluating patterns on the exclusive facial markers using biometric software. It has a variety of uses, including ATM payment validation, criminal identification, forensic investigation, automatic attendance, and other areas. An increase in the interest and scope of human biometric-based automatic secure identification has been observed during the past ten years [1]. One of the primary drivers of this concentration is the prevalence of flaws and frauds related to security and various kinds of transactions in non-biometric systems, which are susceptible to being cracked due to built-in weaknesses [2], including, but not limited to, stolen cards and shared passwords.

Many challenges arise when using facial recognition systems for biometric authentication. These include the range of views or angle in which the picture is taken, ever-changing age and facial contours, facial hair, and lighting circumstances. The most concerning issue is the spoofing attack on face authentication or recognition systems [3]. Spoofing attacks like fraud photographs, videos, or masks, can result in incalculable privacy leaks and property loss. Some impacts of such attacks are damage to reputation, persona, and even individual’s age detection. It can result in some serious damages like monetary loss, security breaches and identity theft. With the advent of social media and the Internet, it has become easy to get pictures, videos and even voices of individuals. With more people becoming socially active, attackers find it very easy to impersonate anyone, and it becomes a daunting task for Facial Recognition Systems to identify such attacks. Spoofing attacks occur when an attacker attempts to impersonate another user and presents falsified images or videos. Thus, sophisticated face authentication methods are necessary to discern between genuine and fake faces, guaranteeing that such systems are secure. Researchers have put a lot of effort into tackling the problem by looking into several clues. Much research is ongoing on improving hardware-based recognition systems, such as the use of thermal cameras [4] and 3D cameras [5]. However, the high cost of these solutions is still a challenge despite the improvement in identifying fake and real faces.

The properties describing liveness such as structure information, texture and liveness signs like the eyes blinking or mouth movement, as well as image quality, are analysed using software-based approaches where spoofing can be a threat as mentioned above. With the introduction of convolutional neural networks (CNN), advances in computer vision led to the development of novel algorithms to detect face presentation attacks. Much research is successfully being carried out in using CNN to detect face spoofing attacks.

Spoofing Significance

Authentication systems equipped with face identification and recognition have been a prominent target for spoofing attacks. In these situations, a forger may use someone else’s identity to bypass identification. Thus, spoofing detection has now become a required feature in every face recognition system. Methods to mimic human faces are now very easy with widely used free and paid tools such as Photoshop and 3D printing. Fig. 1 demonstrates the issue of spoofing. Assaults on the face can be broadly categorized into three groups: attacks using images, attacks using videos, and attacks using masks. Photo attacks refer to when an unauthorized user delivers a photo to the verification system’s camera by printing it on paper or displaying it from an electronic device. A video attack occurs when an unauthorized user replays a video taken by an authorized user and uses dynamic information to trick the face recognition software. Attacks involving masks are when an unauthorized user dons the 3D mask of the original wearer and imitates the stereotypical appearance of the face. Attacks using images and video have the potential to be artificially modified; for example, the background of the image might be the real thing, but the foreground image (the area of the face) might be manufactured by software simulation or manually substituted. The APP programme ZAO, which swaps out the face in movies for a different face supplied by users, shows how face anti-spoofing technology has gradually advanced and how there are still many potential future types of attack that are unknown at this time.

Fig. 1. The NUAA PI DB sample image demonstration shows one of the face spoofing attacks: The first 5 images are live faces of the user and are legitimate. The second 5 images are spoofed (print-out images thus illegitimate).

The spoofing detection approach proposed is an ensemble of classification algorithms such as SVM, KNN and Decision Tree with simple handcrafted features and CNN to extract information from images. We believe that the combination of different classifiers achieves better and substantially significant performance considering that they acquire distinctive characteristics, which will enhance the robustness and generalization potential of the model. It can also reduce the false negatives (FN) and false positives (FP) thus making systems widely acceptable.

Rationale

Recently, a plethora of face recognition-based biometric security systems have been deployed across various real-time systems due to advancements in computer vision and face recognition technologies. Indeed, such applications have enhanced the security levels but have attracted intentional users to compromise the security systems for their benefit. Because, unlike other biometric attributes like fingerprints and iris, it is quite easy to acquire an individual’s face image or live images using smartphones equipped with high-quality digital cameras or by accessing their online social network profiles.

Experimental studies in AIML have significantly improved the performance of several computer vision-generated systems for the classification of images and the recognition of objects. Deepface [6], DeepIds [7], FaceNet [8] and ArcFace [9] are such breakthroughs in the areas of facial recognition due to these progressions. Although the adaptability of face recognition systems across biometric security applications has achieved higher accuracies, up to 96% to 99%. Despite such performances, these systems are experiencing higher false positive and true negative rates and are greatly vulnerable to increasing spoofing attacks. The improvised precise system will be beneficial across many existing security establishments like industries, airports etc.

The noteworthy contributions of this work are:

  1. Comparison of five different classification learning algorithms fitted on generalized dataset and custom-made dataset using LBP descriptors and CNN.
  2. Examining the impact of classification algorithm ensembles and the different types of ensembles in three distinct face spoofing models.
  3. Tweaking LBP and CNN to find the best architecture to find features in various conditions.
  4. Identifying the best combination between feature extraction algorithm and classifier to get high accuracy with low False Negatives (FN) and False Positives (FP) when finding spoofed images.

The remaining paper has been organized as follows. Section 2 provides an overview of the most recent research on face anti-spoofing detection. In Section 3, we provide a detailed explanation of the materials and methods used. Then, in Section 4, the proposed ensemble models along with evaluation parameters have been illustrated. In section 5 the experimental study of models is presented. A discussion of the findings of the experiment follows this in Section 6. Section 7 provides conclusions and suggestions for further research.

Related Work

There is indeed a significant amount of literature on detecting face spoofing by extracting handcrafted features. However, after learning about Deep Learning, researchers began to consider replacing it with new CNN techniques. Reference [10] proposes the equilibrium difference local binary pattern (ED-LBP) as a new texture descriptor for the recognition of face textures. The Haar Cascade and Local Binary Pattern techniques were employed in [11] for face detection and identification, and the comparison revealed that while the accuracy of the Haar Cascade technique is high, the execution time is longer.

For identification purposes, biometrics can use physical or behavioural characteristics. Various alternatives have been investigated for years, including the fingerprint [12], [13], hand geometry [14], palmprint [15], and voice [16]. Face stands out among them for its acceptability and recognition cost, proving to be one of the best options for a variety of applications, from high-security to low-security usage (such as social media and smartphone access control) (e.g., border control and video surveillance in critical places). The experimental study was carried out for the classification of large-scale images using deep convolution neural networks [17], [18].

In time-sensitive applications, a lightweight robust classifier that can recognise live faces in a fraction of the time is required. For textured-based facial liveness identification, [19] proposes a lightweight permuted Xceptio-Inception/Reduction CNN classifier. All of the frontal face images from the impostor datasets had been normalized, and the extracted multi-colored space LBP feature maps were given to the classifiers as inputs. In the case of CASIA-FASD, NUAA and FRAUD2 datasets the proposed CNN method has obtained accuracies of 94.35%, 99.98% and 100%, respectively.

Arora et al. [20] employ convolutional autoencoders to lower the dimensionality of pictures which is beneficial in the detection of multiple spoofing attacks. The image features are extracted and classified using pre-trained encoder weights and a softmax classifier, that too by providing a solution to problems like overfitting. In [21], Avinash et al. detected face liveness by proposing an ensemble technique based on CNN. Which employs a set of light CNN models. The suggested spoofing detection approach according to Vareto and Schwartz combines an ensemble of classification algorithms with basic handcrafted spatial and frequency domain characteristics. The LBP and HOG descriptors collect spatial information from video frames, whereas GLCM uses Fourier transforms to extract features. The classification ensemble method comprises partial least square learning algorithms and multi-layer layer perceptrons along with support vector machines. Support Vector Machines and Partial Least Squares learning algorithms comprise the classification ensemble [22].

In [23], Parkin and Grinchuk suggested a new anti-spoofing network design that uses multi-modal image data to aggregate intra-channel information over many network levels. They translated significant facial traits and employed an ensemble of models trained individually for different forms of spoofing attacks to boost generalization capacity. According to Akhtar and Foresti Naive-Bayes, SVM, Quadratic Discriminant Analysis (QDA), and Ensemble classifiers have been utilized to discriminate in real and spoof face images by using techniques based on voting methods, further To identify discriminative image patches they have proposed as much as seven novel methods as well [24].

Wen et al. in [25] suggested an Image Distortion Analysis-based face spoof detection method (IDA) that comprises blurriness, specular reflection, color diversity and chromatic moment characteristics. An ensemble classifier of multiple SVM classifiers is used and is extended to multi-frame face spoof recognition in videos using a voting-based technique.

We can see that most state-of-the-art approaches contain sophisticated processes involving handcrafted features and CNN techniques. In this regard, we propose 3 different approaches for face spoof detection, which extract face embeddings utilizing LBP, and VGG16 CNN architectures and apply SVM, KNN, decision tree, and ensembles of classifiers. It also estimates ensembled classifier performance with other individual classifiers on real and public databases.

Materials and Methods

This section illustrates the existing machine learning techniques used for the detection of spoofing attacks namely LBP, CNN, Decision Tree, SVM and KNN. We have also described the benchmark data used for experiential study and further analytical study.

Local Binary Pattern (LBP)

It was introduced by Ojala et al. [26] and popularly used for image processing applications. By thresholding, the region surrounding each pixel and using the result as a binary number, the individual pixels in it are a grey-scale invariant texture descriptor that identifies the individual pixels in a picture or image. Due to its computational ease, LBP can evaluate images in difficult real-time situations. The LBP technique for classifying textures gathers all instances of the LBP codes in an image into a histogram. Simple histogram similarities are then calculated to complete the classification. The binary pattern of an image is computed using the equation demonstrated below (1): where gc is center pixel intensity and gp adjacent pixel intensity, refer to (2) to understand the threshold step function S.

L B P ( g p x , g p y ) p = 0 p 1 S ( g p g c ) x 2 p
S ( x ) = { 1   f o r   x     00   f o r   x < 0

CNN with VGG16

CNN VGG-16 is a convolution neural network that is amongst one of the best computer vision model architectures. In VGG16, ‘16’ denotes 16 layers with weights. In this the Max Pooling layers are 5, Dense layers are 3, and convolutional layers are 13. The input tensor of VGG16 is 224, 244 and the RGB channels are 3. It uses padding and maxpool layer of a 2 × 2 filter with stride 2 and has convolution layers of 3 × 3 filter with stride 1. There are three Fully Connected (FC) layers following a stack of convolutional layers, the first two of each have 4096 channels along with a softmax layer for output.

Support Vector Machine

This supervised machine learning approach can be applied to address regression problems for linear and nonlinear classification using the kernel function [27]. The method involves plotting every data point in the n-dimensional space, with values of each feature representing a particular coordinate. Finding a plane with the maximum margin, that is, the greatest separation between data points from both classes, is our goal. Using SVM the data are mapped into a higher dimensional feature space by employing the kernel functions, wherein, a hyperplane is drawn which obtains separate classes. Generally, the following mathematical equation is used to compute the kernel function to find the hyperplane (3).

f ( x ) = i a i y i   K ( x i , x )

In the (3), K represents the kernel function; {xi, yi} refers to the training dataset having classes yi which belongs to {−1, 1}.

KNN

K-Nearest Neighbor (KNN) is one of the simple supervised machine learning techniques [28]. The KNN method places the new instances in the category that resembles the current categories the most, presuming that the new case and the previous cases are comparable. After storing all the previous data, a new data point is classified using the K-NN algorithm based on similarity score. Although the KNN technique can be applied to both classification and regression problems, it is typically more frequently applied to classification problems. KNN classification can be applied for image classification defining the Euclidean distance similarity (4) and Manhattan distance similarity functions (5).

E u c l i d i a n d ( p , q ) = i 1 N ( q i p i ) 2
M a n h a t t a n d ( p , q ) = i 1 N | q i p i |

Decision Tree

The decision tree technique is bifurcated as a supervised learning algorithm that can be used to address regression and classification issues, but it is typically chosen for dealing with classification issues. In the decision tree, internal nodes stand for the dataset’s features, branches for the rules of classification, and each leaf node for the result. By altering the tree depth, which is the threshold that prevents additional node splitting, this tree-structured classifier can be made to fit a specific need.

Data Description

To analyse the proposed ensemble model performance, two different image datasets have been considered. First, we examined the needs and gathered information using mobile devices and tablets with a focus on maintaining picture quality at a reasonably low level on the user’s face for both legitimate and illegitimate based on attacks originating from various sources and types, such as camera type, illumination, etc., (Refer Fig. 2 for sample custom picture database).

Fig. 2. Images from the custom dataset: (a) The leftmost picture is a real picture taken with standard resolution; (b) the second cell picture pair is taken from a mobile phone it is visible that the quality is degraded due to the mobile screen resolution; (c) printed pictures used for spoofing.

We also have used the standard NUAA photography imposter database [29], for applying the proposed ensemble technique. Fig. 3 demonstrates the standard NUAA datasets. In the NUAA database, there are 15 user photos and 12,614 total images, including genuine face and fake face photographs. We used 2002 images (1041 real images and 961 spoof images) from the database. The NUAA photo imposter database is the only available state-of-the-art data set majorly used for experimental analysis. According to Anjos et al. in [30] the dataset may be requested from the corresponding authors of [29] (http://parnec.nuaa.edu.cn/xtan/NUAAImposterDB_ download.html). A generic, unspecific webcam that recorded photo attacks and actual access to 15 different identities was used to build the database. According to Fig. 3, the database is split into three sessions with various lighting conditions. Because not all participants took part in the three acquisition campaigns, there is an imbalance in the amount of data collected across sessions. Participants were instructed to maintain a neutral expression and prevent eye blinks or head movements throughout all sessions to make the image as like a photograph as feasible. The webcam used recorded roughly 25 seconds at a frame rate of 20 before a selection of frames was made by hand for the picture database. The original video clip is not included in the database distribution. Instead of each of the manually selected frames from the database, bitmap images are collected. Further Attacks were created by first taking high-definition images of each subject with an unidentified type of Canon camera such that the face would occupy roughly 2–3 of the total photo area. The pictures were then printed on with 6:8–10:2 cm (small) and 8:9–12:7 cm (larger) dimensions on 70 g white A4 paper using an HP colour printer.

Fig. 3. Samples pictures from the NUAA photo imposter dataset. The first cell has the real image followed by a fake image captured from a real image.

Proposed Methodology

To determine the optimal algorithm to identify spoofed images and to identify a classifier that reduces false positives and false negatives 3 algorithms, and 5 classifiers were tested. Algorithms used were, LBP (Local Binary Pattern), CNN-VGG16 (with HSV and YUV color spaces) and CNN-VGG16 (with YCR and LUV color spaces). Classifiers applied were SVM, KNN, Decision Tree, Ensemble with Voting and Ensemble with Weighted Average. The ensemble proposed methodology is illustrated in Fig. 4.

Fig. 4. The proposed ensembled methodology.

LBP Based Model

This model uses DLIB for face detection, and a non-local means denoising algorithm to remove noise from the discovered face. When defining texture features for the categorization of grayscale images, the LBP operator is used. By thresholding the p-neighbor values of each adjacent pixel and converting the result to a binary number, it changes an image. After that, a texture analysis computation is made on the output image. Grayscale photos are frequently turned into additional multiple-image variations for use in LBP computation along with color images. This may be because the genuine texture may differ from the color images [18], [31].

On account of physical, psychological, and psycho-visual properties, we have used the color mode YCrCb and created the local binary pattern for all the three channels and later concatenates to a single feature. This method of applying LBP on each channel yields better results than applying on the image itself. The extracted features of LBP model are then passed on to various classifiers which classifies the images to real and spoof ones. Fig. 5 demonstrates the algorithm flowchart to create models for detection of spoof images using LBP histogram. The LBP based model is applied on the custom dataset for appropriate pattern recognition. The different grayscale resultant image variants are demonstrated in Fig. 6.

Fig. 5. Flowchart of the algorithm to create models for spoof detection using LBP histogram.

Fig. 6. LBP applied to various layers of YCbCr color space.

CNN-VGG16 Using HSV and YUV Images

The CNN-VGG16 architecture extracts feature from an image and encodes an image. The train images are converted to colour mode HSV and YUV to extract features which are then concatenated. We apply all 5 classifiers to identify spoof vs. real images. In this method, we identified 3 sets of data, a Training set containing both positive and negative features. We then created a validation set to validate the results. In this instance Denoising did not add any value, hence we did not include it for efficiency. However, we applied the Adaptive Movement Optimization (ADAM) algorithm [32].

CNN-VGG16 Using HSV and YUV Images

The CNN-VGG16 architecture was used to fetch the features from the image. VGG16 breaks the image into 16 convolutional layers. It takes a 224 × 224 image and applies filters at each layer. Fig. 7 displays how VGG-16 applies various filters on 6 layers. Features are extracted from images that are in YCR [33] and LUV (where L stands for luminance, whereas U and V represent chromaticity values of colour images) [34]. The embeddings extracted for each colour space are then concatenated to form a comprehensive set of all features extracted. All classifiers were applied to create models. These models were used on test sets to capture results.

Fig. 7. VGG16, 6 filters showing 3 channels for 2 images.

As shown in Fig. 8, after applying the YCR, Fig. 8b and LUV-based model Fig. 8d, different variants of images have been generated for texture classification.

Fig. 8. (a) and (c) are original images; (b) extended YCR variant (c) extended LUV image.

Classifiers Used (SVM, KNN,DT, Ensemble-Voting and Ensemble-Weighted Average)

Ensemble Using Voting

We used SVM with the RBF kernel to build our ensemble with a voting procedure. Two parameters, C and gamma, must be taken into account while training an SVM using an RBF kernel. The parameter C, shared by all SVM kernels, trades off the incorrect classification of training samples against decision surface simplicity.

The low value of C, helps to smooth the decision surface, whereas a high C intends to correctly classify every training sample. How big of an impact a single training example has is determined by gamma. The closer the other examples must be to be influenced, the larger the gamma must be [35]. It has been found that C = 1e3 and Gamma of 0.5 worked very well for our tests. Fig. 9 demonstrates the ensembling process.

Fig. 9. Flow chart demonstrating for creating ensembles.

r b f : exp ( γ C 2 ) ; C = | | x x | |

K Nearest Neighbor (KNN)

According to Guo et al. in [36] Supervised learning technique was implemented using KNN. We used neighbors of 3. Here is a plot on sample data for KNN where Neighbor was 3.

Distribution Tree (DT)

DT is a supervised learning technique; details are in section 3.6. We have used DT classifiers with a max depth of 10. We used all three classifiers to create an ensemble with Maximum Voting. The results are in section 6.

Ensemble Using Weighted Average

We created weighted classifiers giving higher weightage to classifiers that gave better accuracy scores [37]. The accuracy score is calculated using the (1).

Evaluation Parameters

The performance of methods proposed for processing the images as to detect a spoofing attacks state of the art evaluation parameters has been used which include Accuracy (6), Specificity (7), Sensitivity (8), Precision (9) and AUC-ROC Curve complementing with error rates or misclassification rates.

Accuracy

One of the most widely used metrics for classification performance is accuracy, which is measured as the proportion of correctly categorised samples to all samples [38]:

Accuracy  = ( T P + T N ) ( T P + T N + F P + F N )

Sensitivity

A classifier’s true positive rate (TPR), hit rate, or recall measures the proportion of correctly identified positive samples to all positive samples and is calculated using (11) [38]:

Sensitivity  =  recall  = T P ( T P + F N )

Specificity

The ratio of correctly identified negative samples to all negative samples, as in (12), is used to describe specificity, true negative rate (TNR), or inverse recall [38]:

Specificity  = T N ( T N + F P )

Precision

Predictive values (positive and negative) reflect the performance of the prediction. Positive prediction value (PPV) or precision represents the proportion of positive samples that were correctly classified to the total number of positive predicted samples as indicated in (9) [38]:

Precision  = ( T P ) ( T P + F P )

F Measure

The harmonic mean of recall and precision, as shown in (9), is represented by the F-measure, often known as the F1-score [38]. F-measure values range from 0 to 1, and high F-measure values indicate strong categorization performance. The F-measure is a different variation of this measurement. This version represents, the weighted harmonic means between recall and precision (9). This measurement responds well to variations in data distributions.

The total number of correctly predicted events is a measure of accuracy. Sensitivity calculates the number of accurate positive predictions. The correlation between TPR and FPR is represented by the ROC (Receiver Operating Characteristics) Curve. The AUC (Area Under Curve) or ROC (Receiver operating curve) measures the probability that an input will be accurately classified and ranges from 0 to 1.

Experimental Analysis

Due to limited processing resources on local machines and the decision to use Python as our programming language, we used Google Collab to train our experiments. We split the whole dataset into 80:20, meaning that 80 per cent is utilized for training and 20 per cent is used for testing. Hyperparameter optimization is used to get the threshold values for algorithms SVM, KNN, and decision trees, and these values are only finalized after a series of experiments. Table I shows the number of real and spoof images used in the experiment. The same ratio is followed in both custom-made and publicly available datasets.

Image type No. of images
REAL 1,041
SPOOF 961
TOTAL 2,002
Table I. Number of Real and Spoof images

We have evaluated each model on both real and publicly accessible datasets for performance analysis. The confusion matrix and classification report are plotted. The accuracy of the method is evaluated by analysing the Half Total Error Rate (HTER) and Equal Error Rate (EER) was also noted. The intersection where false acceptance ratio (FAR) and false rejection ratio (FRR) located is termed as EER, generally the lower the EER higher will be the accuracy. FAR and FRR are the two biometric measures used as a parameter for evaluation. FAR is the ratio of the number of false acceptances to the total number of imposters attempted. Thus, this measure gives the probability of the model for an incorrect acceptance of any unauthorized user access. it is also known as false match ratio (FMR). Unlike FAR, FRR measures the probability of the model for incorrect rejection of a user. It is the ratio of the number of false recognition to the total user attempts. For instance, if FAR is 15%, which means for every 100 attempts of access only 15 will be successful and thus increasing FAR will decrease the accuracy of the model. In the same fashion FRR of 15% means 15 authorized users are rejected for every 100 attempts, therefore reduction in FRR helps to minimize the rejection of authorized users.

Result and Discussion

We have prepared and trained all the models separately with custom-made and NUAA datasets. In the LBP model when trained with a custom-made dataset, an accuracy of 97% is achieved on the test set when SVM is used as the classifier. Accuracy of 94% and 93% is achieved respectively when KNN and Decision Tree are used as classifiers. Table II shows the achieved results for the LBP model in terms of accuracy, precision, sensitivity and F1 score. A low EER (equal error rate) value shows that the model is producing more accurate results due to fewer false rejections and false acceptances. By analysing accuracy and other metrics like EER and HTER, we can see that the model attains better results when the SVM classifier is used for spoof image classification.

Classifier Class Acc (%) Sensitivity (%) Precision (%) F1 score (%) AUC EER HTER
SVM Real 97 96 98 97 0.96 0.03 0.03
Spoof 98 95 97
KNN Real 94 92 97 94 0.93 0.08 0.06
Spoof 96 90 93
DT Real 93 92 95 93 0.92 0.08 0.07
Spoof 94 90 92
Table II. LBP Model Results with Different Classifiers on Custom-Made Dataset

In the same fashion on the contrary to validate the LBP model results with state-of-the-art NUAA photo imposter database KNN shows slightly better results in terms of accuracy (96%) when compared with SVM and Decision Tree. The accuracy is well justified by the obtained EER and HTER which shows that false rejections and false acceptances are quite low. Table III shows the achieved result analysis for the Local Binary Pattern model with the NUAA dataset.

Classifier Class Accuracy (%) Sensitivity (%) Precision (%) F1 score (%) AUC EER HTER
SVM Real 95 94 94 94 0.95 0.05 0.04
Spoof 96 96 96
KNN Real 96 95 96 95 0.95 0.05 0.04
Spoof 97 96 97
DT Real 92 92 89 90 0.91 0.08 0.08
Spoof 92 94 94
Table III. LBP Model Results with Different Classifiers on NUAA Photo Imposter Database

In the CNN VGG-16 model when trained with a custom-made dataset, an accuracy of 96% is achieved on the test set when the SVM classifier is used. Accuracy of 90% is achieved when KNN and Decision Tree are used as classifiers. Table IV shows the achieved results for the CNN model in terms of accuracy, precision, F1 score and sensitivity. By analyzing accuracy and other metrics like EER and HTER we can see that the model attains better results when SVM classifier is used.

Classifier Class Acc. (%) Sensitivity (%) Precision (%) F1 score (%) AUC EER HTER
SVM Real 95.98 96 97 96 0.96 0.04 0.03
Spoof 96 95 96
KNN Real 90.4 92 90 91 0.90 0.07 0.09
Spoof 88 91 90
DT Real 90.4 93 89 91 0.90 0.06 0.09
Spoof 87 92 90
Table IV. CNN Model Results with Different Classifiers on Custom Made Dataset

When validated CNN model with NUAA dataset SVM was showing better results in terms of accuracy as it achieved 99% on test set and EER-HTER was also better with SVM classifier. Table V shows the achieved results for LBP model with NUAA dataset.

Classifier Class Acc (%) Sensitivity (%) Precision (%) F1 score (%) AUC EER HTER
SVM Real 98.5 99 99 99 0.98 0.01 0.01
Spoof 98 98 98
KNN Real 90.4 92 90 91 0.90 0.07 0.09
Spoof 88 91 90
DT Real 96 98 95 96 0.96 0.02 0.03
Spoof 95 97 96
Table V. CNN Model Results with Different Classifiers on NUAA Dataset

In CNN paper modified models when trained with custom made dataset, an accuracy of 95% is achieved on the test set when SVM classifier is used. Table VI shows the achieved results for the CNN model in terms of accuracy, sensitivity, precision and F1 score. By analyzing accuracy and other metrics like EER and HTER we can see that the model obtained better results when the SVM classifier is used.

Classifier Class Accuracy (%) Sensitivity (%) Precision (%) F1 score (%) AUC EER HTER
SVM Real 94.6 97 94 95 0.94 0.03 0.05
Spoof 93 96 94
KNN Real 84.5 89 83 86 0.84 0.11 0.15
Spoof 80 87 83
DT Real 87 89 87 88 0.87 0.11 0.12
Spoof 85 87 86
Table VI. CNN-VGG16 HSV LUV Results with Different Classifiers on Custom Made Dataset

When validating the model with NUAA dataset SVM was showing better results in terms of accuracy as it achieved 99.5% on test set and EER-HTER was also better with SVM classifier. Table VII shows the achieved results for LBP model with NUAA dataset.

Classifier Class Accuracy (%) Sensitivity (%) Precision (%) F1 score (%) AUC EER HTER
SVM Real 99.5 99 1 1 0.99 0.009 0.004
Spoof 1 99 99
KNN Real 97.5 98 97 98 0.97 0.01 0.02
Spoof 97 98 97
DT Real 96 97 95 96 0.95 0.02 0.04
Spoof 95 97 96
Table VII. CNN-VGG16 HSV LUV Results with Different Classifiers on NUAA Dataset

After making the models go through various classifiers, its effect on ensemble classifiers is tested on custom-made dataset. For this maximum voting ensembling and weighted average ensembling are used. The details are available in Table VIII.

Model Ensemble Class Acc. % Sensitivity % Precision % F1 score % AUC EER HTER
LBP based model Maximum Voting Real 97 95 99 97 0.96 0.04 0.03
Spoof 98 94 96
Weighted Average Real 98 98 98 98 0.97 0.01 0.02
Spoof 98 98 98
CNN-VGG16 HSV &YUV Maximum Voting Real 94 96 93 95 0.94 0.03 0.05
Spoof 93 96 94
Weighted Average Real 94 97 93 95 0.94 0.02 0.05
Spoof 92 97 94
CNN-VGG16 YCr and LUV Maximum Voting Real 99 99 99 99 0.98 0.009 0.011
Spoof 99 99 99
Weighted Average Real 99 99 99 99 0.99 0.007 0.009
Spoof 99 99 99
Table VIII. Ensembling Results on all Three Models with Custom Made Dataset

It is very evident from Tables VIII and IXthat FP, FN, EER (Equal Error Rate) and HTER values are reasonable when the weighted average ensemble opted for all three models rather than going with a single standalone classifier model. FP and FN values are considerably less in all three models when ensembling is used. It is visible that in all three models, the crossover error rate (CER) which is the intersection point of FRR (False Rejection Rate) and FAR (False Acceptance Rate) is significantly low resulting in a high accurate model for correct prediction of spoof images. It is deemed that models having low equal error rates perform substantially better compared to models having higher equal error rates.

Model Classifiers and ensemble method used TP FP TN FN
LBP based model SVM 168 8 198 4
KNN 159 17 195 7
Decision tree 159 17 191 11
Maximum voting 166 10 199 3
Weighted average 172 4 198 4
CNN-VGG16 HSV and YUV SVM 183 9 200 7
KNN 168 16 193 22
Decision tree 166 14 195 24
Maximum voting 176 8 201 14
Weighted average 174 6 203 16
CNN-VGG16 YCr and LUV SVM 173 7 202 14
KNN 149 23 186 38
Decision tree 159 23 186 28
Maximum voting 730 8 824 10
Weighted average 731 6 826 9
Table IX. Positive and Negative Analysis of All Three Models with Custom Made Dataset

To visualize the performance of the model, the Receiver Operating Characteristics (ROC) and AUC Curve of the weighted average ensemble are plotted in Figs. 1012, as it is the most significant assessment metrics for testing the efficiency of any classification model. ROC is the probability curve whereas AUC is termed a separability measure or degree. The AUC curve illustrates the capability of the ensemble model to differentiate between the legitimate face and the illegitimate face. The high value of AUC shows that the model is predicting 0 classes as 0 and 1 classes as 1 quite effectively and efficiently. The curve plot is a representation of the trade-off between TPR and FPR. The X-axis represents FPR and the y-axis represents FPR.

Fig. 10. Receiver operating characteristics curve of LBP based model weighted average.

Fig. 11. Receiver operating characteristics curve of CNN-VGG16 HSV and YUV weighted average.

Fig. 12. Receiver operating characteristics curve of CNN-VGG16 YCr and LUV weighted average.

As shown in Fig. 10 the LBP-based model effectively performs the classification between legitimate and illegitimate faces while detecting and prediction spoof attacks. The AUC score is 0.979 which is nearly close to 1 which is an ideal benchmark for separability between positive and negative classes.

The CNN-VGG16 HSV and YUV weighted average model AUC score is 0.944. The ROC curve in above Fig. 11 illustrates fair classification between true positive and true negative. It also represents that both the curves are not overlapping and the model is considerably separating both the classes.

As illustrated by Fig. 12 that this CNN-VGG16 YCr and LUV weighted average model is performing better amongst all and generating an AUC of 0.990 which highest. It justifies that there is a clear distinction between True positive identifications and False positive identification. It assures that the TP and FN are not obscuring and overlapping and thus the ensembling model has significant measure of separability. It is certain that the model performs better when AUC is higher. Here AUC is high when ensembling is done on different models. Thus, it has been deduced that the False positive cases and True negative cases are significantly reduced, and models are performing better.

Conclusion and Future Work

The intent of this work is to address the issues of spoofing attacks by analysing and developing an ensemble model with SVM, CNN and Decision tree classifiers. Initially, we analysed the different classification learning algorithms fitted on generalized datasets and custom-made datasets using LBP descriptors and CNN for the identification of legitimate or illegitimate images. It is observed that the accuracy obtained here is substantially fair but apparently the false positive rate and the false negative rate are also significantly high leading to an unrealistic biometric system. We have enhanced the feature extraction algorithm by tweaking Local Binary Pattern (used YCrCb color spaces), CNN-VGG16 (with HSV and YUV color spaces) and CNN-VGG16 (with YCr and LUV color spaces) to produce better accuracy while keeping the CER (Crossover Error Rate) or EER (Equal Error Rate) significantly low. The proposed three ensemble models are applied to a custom-made database and a standard NUAA database. State-of-the-art evaluation parameters are used to evaluate the performance of the methods.

The weighted average ensemble technique outperformed the voting ensemble and other solo classifiers in three separate suggested models when tested using customized and publicly available datasets. The three suggested models achieved test accuracy of 98%, 94.48%, and 99% on a custom dataset, respectively, while they achieved 97%, 99%, and 100% on the NUAA photography impostor dataset. It is noteworthy that the false rejection rate and false acceptance rate have been significantly reduced. Attaining a significantly lower EER or CER would make the biometric systems sustainable, reliable, and realistic. In further enhancement work, we will be focusing on the analysis of custom-made datasets with other machine-learning algorithms.

References

  1. Jain AK, Flynn P, Ross AA. Handbook of Biometrics. Springer New York, NY October 2007;1–9:43–4, 2008. Published. doi:10.1007/978-0-387-71041-9.
    DOI  |   Google Scholar
  2. Meadowcroft P. Card fraud–will PCI-DSS have the desired impact? Card Technol Today. Mar. 2008;20(3):10–1. doi:10.1016/s0965-2590(08)70076-8.
    DOI  |   Google Scholar
  3. Nguyen HP, Retraint F, Morain-Nicolier F, Delahaies A. Face spoofing attack detection based on the behavior of noises. IEEE Global Conference on Signal and Information Processing (Global-SIP), Dec. 2016. Published. doi: 10.1109/globalsip.2016.7905815.
    DOI  |   Google Scholar
  4. York TW, MacAlister D. Electronic security system integration. Hosp Healthcare Secur. 2015;459–504. doi:10.1016/b978-0-12-420048-7.00019-2.
    DOI  |   Google Scholar
  5. Zhou S, Xiao S. 3D face recognition: a survey. Human-CentricComput Inform Sci. Nov. 2018;8(1). doi: 10.1186/s13673-018-0157-2.
    DOI  |   Google Scholar
  6. Taigman Y, Yang M, Ranzato M, Wolf L. DeepFace: closing the gap to human-level performance in face verification. 2014 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2014. doi: 10.1109/cvpr.2014.220.
    DOI  |   Google Scholar
  7. Sun Y, Wang X, Tang X. Deeply learned face representations are sparse, selective, and robust. Dec. 2014. arXiv (Cornell University). doi: 10.48550/arxiv.1412.1265.
    DOI  |   Google Scholar
  8. Schroff F, Kalenichenko D, Philbin J. FaceNet: a unified embedding for face recognition and clustering. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2015. doi:10.1109/cvpr.2015.7298682.
    DOI  |   Google Scholar
  9. Deng J, Guo J, Yang J, Xue N, Cotsia I, Zafeiriou SP. ArcFace: additive angular margin loss for deep face recognition. IEEE Trans Pattern Anal Mach Intell. 2021;1. doi: 10.1109/tpami.2021.3087709.
    DOI  |   Google Scholar
  10. Shu X, Tang H, Huang S. Face spoofing detection based on chromatic ED-LBP texture feature. Nov. 2020;27(2):161–76. doi:10.1007/s00530-020-00719-9.
    DOI  |   Google Scholar
  11. Shetty AB, Bhoomika D, Rebeiro J, Ramyashree. Facial recognition using haar cascade and LBP classifiers. Glob Transit Proc. Aug. 2021;2(2). doi: 10.1016/j.gltp.2021.08.044.
    DOI  |   Google Scholar
  12. Hasan H, Abdul-Kareem S. Fingerprint image enhancement and recognition algorithms: a survey. Neural Comput Appl. Aug. 2012;23(6):1605–10. doi: 10.1007/s00521-012-1113-0.
    DOI  |   Google Scholar
  13. Marasco E, Ross A. A survey on antispoofing schemes for fingerprint recognition systems. ACM Comput Surv. Nov. 2014;47(2):1–36. doi: 10.1145/2617756.
    DOI  |   Google Scholar
  14. Eidan A. Hand biometrics: overview and user perception survey. Sep. 2013. doi: 10.1109/icoia.2013.6650265.
    DOI  |   Google Scholar
  15. Tamrakar D, Khanna P. Kernel discriminant analysis of blockwise gaussian derivative phase pattern histogram for palmprint recognition. J Vis Commun Image Rep. Oct. 2016;40:432–48. doi:10.1016/j.jvcir.2016.07.008.
    DOI  |   Google Scholar
  16. Yadav KS, Mukhedkar MM. Review on speech recognition. IJSE. 2013;1,2:61–70.
     Google Scholar
  17. Simonyan K, Zisserman A. Very deep convolution networks for large scale image recognition. 2015. arxiv:1409.1556v6[cs.CV]. Available from: https://arxiv.org/pdf/1409.1556.pdf.
     Google Scholar
  18. Drimbarean A, Whelan PF. Experiments in colour texture analysis. Pattern Recogn Lett. Aug. 2001;22(10):1161–7. doi:10.1016/s0167-8655(01)00058-7.
    DOI  |   Google Scholar
  19. Satapathy A, Livingston LMJ. A lite convolutional neural network built on permuted Xceptio-inception and Xceptio-reduction modules for texture based facial liveness recognition. Multimed Tools Appl. Nov. 2020;80(7):10441–72. doi: 10.1007/s11042-020-10181-4.
    DOI  |   Google Scholar
  20. Arora S, Bhatia MPS, Mittal V. A robust framework for spoofing detection in faces using deep learning. Vis Comput. Apr. 2021;38:2461–72. doi: 10.1007/s00371-021-02123-4.
    DOI  |   Google Scholar
  21. Shashank Shekhar AVP,Haloi M, Salim A. An ensemble model for face liveness detection. Jan. 2022. arXiv (Cornell University). doi:10.48550/arxiv.2201.08901.
     Google Scholar
  22. Vareto RH, Schwartz WR. Face spoofing detection via ensemble of classifiers toward low-power devices. Pattern Anal Applic. May 2021;24(2):511–21. doi: 10.1007/s10044-020-00937-x.
    DOI  |   Google Scholar
  23. Parkin A, Grinchuk O. Recognizing multi-modal face spoofing with face recognition networks. IEEE Xplore. Jun. 01, 2019:1617–23. doi: 10.1109/CVPRW.2019.00204.
    DOI  |   Google Scholar
  24. Akhtar Z, Foresti GL. Face spoof attack recognition using discriminative image patches. J Electr Comput Eng. 2016;2016:1–14. doi: 10.1155/2016/4721849.
    DOI  |   Google Scholar
  25. WenDI, Han HU, Jain AK. Face spoof detectionwith image distortion analysis. IEEE Trans Inf Foren Secur. Apr. 2015;10(4):746–61. doi: 10.1109/tifs.2015.2400395.
    DOI  |   Google Scholar
  26. Ojala T, Pietikainen M, Maenpaa T. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell. July 2002;24(7):971–87. doi: 10.1109/TPAMI.2002.1017623.
    DOI  |   Google Scholar
  27. Nicolaou N, Georgiou J. Detection of epileptic electroencephalogram based on permutation entropy and support vector machines. Expert Syst Appl. Jan. 2012;39(1):202–9. doi: 10.1016/j.eswa.2011.07.008.
    DOI  |   Google Scholar
  28. Lopez-Bernal D, Balderas D, Ponce P, Molina A. Education 4.0: teaching the basics of KNN, LDA and simple perception algorithms for binary classification problems. Future Internet. Jul. 2021;13(8):193. doi: 10.3390/fi13080193.
    DOI  |   Google Scholar
  29. Tan X, Li Y, Liu J, Jiang L. Face liveness detection from a single image with sparse low rank bilinear discriminative model. Comput Vis–ECCV. 2010;2010:504–17. doi:10.1007/978-3-642-15567-3_37.
    DOI  |   Google Scholar
  30. Anjos A, Chingovska I, Marcel S. Anti-Spoofing: face Databases. Springer eBooks; Jan. 2014, pp. 1–13. doi: 10.1007/978-3-642-27733-7_9067-2.
    DOI  |   Google Scholar
  31. Umasankar Kandaswamy SS, Adjeroh D. Comparison of texture analysis schemes under nonideal conditions. IEEE Trans Image Process. Aug. 2011;20(8):2260–75. doi: 10.1109/tip.2010.2101612.
    DOI  |   Google Scholar
  32. Kingma D, Ba J. Adam: a method for stochastic optimization. Comput Sci. 2014. doi: 10.48550/arXiv.1412.6980.
     Google Scholar
  33. Porebski A, Truong Hoang V, Vandenbroucke N, Hamad D. Combination of LBP bin and histogram selections for color texture classification. J Imaging. Jun. 2020;6(6):53. doi: 10.3390/jimaging6060053.
    DOI  |   Google Scholar
  34. Rahimzadeganasl A, Sertel E. Automatic building detection based on CIE LUVcolor space using very high resolution pleiades images. May 2017. doi: 10.1109/siu.2017.7960711.
    DOI  |   Google Scholar
  35. Ruta D, Gabrys B. Classifier selection for majority voting. Inform Fusion. Mar. 2005;6(1):63–81. doi: 10.1016/j.inffus.2004.04.008.
    DOI  |   Google Scholar
  36. Guo G, Wang H, Bell D, Bi Y, Greer K. KNN model-based approach in classification. On The Move to Meaningful Internet Systems 2003: CoopIS, DOA, and ODBASE, vol. 2888, pp. 986–96, 2003. doi: 10.1007/978-3-540-39964-3_62.
    DOI  |   Google Scholar
  37. Shahhosseini M, Hu G, Pham H. Optimizing ensemble weights and hyperparameters of machine learning models for regression problems. Machine Learn Appl. Jan. 2022;100251. doi:0.1016/j.mlwa.2022.100251.
    DOI  |   Google Scholar
  38. Sokolova M, Japkowicz N, Szpakowicz S. Beyond accuracy, F-score and ROC: a family of discriminant measures for performance evaluation. Lect Notes Comput Sci. 2006;1015–21. doi:10.1007/11941439_114.
    DOI  |   Google Scholar