In recent years, significant advancements in face recognition have been witnessed thanks to the rapid development of artificial intelligence. Despite remarkable performance, predictions made by such techniques tend to be challenging to explain. Considering their wide applications in security-sensitive areas, it is essential to fully understand the decision-making process of AI-based face recognition techniques and make them more acceptable to society. Many studies have been dedicated to offering visual interpretation for face recognition systems’ decisions, such as generating similarity and dissimilarity saliency maps. One of the most promising approaches is based on the perturbation mechanism, which has demonstrated exceptional performance in highlighting similar regions between two matching face images. However, this type of method has shown to be less effective in identifying the dissimilar regions, which are particularly critical in the decision-making process for two nonmatching face images. Therefore, this study focuses on the specific problem of the perturbation-based mechanism when applied to the explainable face recognition task. In particular, we first thoroughly analyze the limitation of a perturbation-based method in generating dissimilarity saliency maps. Then, a new regularization technique is proposed to alleviate this problem, followed by experiments to validate its effectiveness.
KEYWORDS: Facial recognition systems, Visualization, Detection and tracking algorithms, Visual process modeling, Data modeling, Image processing, Image classification, Education and training, Decision making, Systems modeling
Despite the significant progress in recent years, deep face recognition is often treated as a “black box” and has been criticized for lacking explainability. It becomes increasingly important to understand the characteristics and decisions of deep face recognition systems to make them more acceptable to the public. Explainable face recognition (XFR) refers to the problem of interpreting why a recognition model matches a probe face with one identity over others. Recent studies have explored use of visual saliency maps as an explanation mechanism, but they often lack a deeper analysis in the context of face recognition. This paper starts by proposing a rigorous definition of explainable face recognition (XFR) which focuses on the decision-making process of the deep recognition model. Based on that definition, a similarity-based RISE algorithm (S-RISE) is then introduced to produce high-quality visual saliency maps for a deep face recognition model. Furthermore, an evaluation approach is proposed to systematically validate the reliability and accuracy of general visual saliency-based XFR methods.
In recent years, the remarkable progress in facial manipulation techniques has raised social concerns due to their potential malicious usage and has received considerable attention from both industry and academia. While current deep learning-based face forgery detection methods have achieved promising results, their performance often degrades drastically when they are tested in non-trivial situations under realistic perturbations. This paper proposes to leverage the information in the frequency domain, particularly the phase spectrum, to better differentiate between deepfakes and authentic images. Specifically, a new augmentation method called degradation-based amplitude-phase switch (DAPS) is proposed, which disregards the sensitive amplitude spectrum of a forged facial image and enforces the detection network to focus on phase components during the training process. Extensive evaluation results from a realistic assessment framework show that the proposed augmentation method significantly improves the robustness of two deepfake detectors analyzed and consistently outperform other augmentation approaches under various perturbations.
Detecting manipulations in facial images and video has become an increasingly popular topic in media forensics community. At the same time, deep convolutional neural networks have achieved exceptional results on deepfake detection tasks. Despite the remarkable progress, the performance of such detectors is often evaluated on benchmarks under constrained and non-realistic situations. In fact, current assessment and ranking approaches employed in related benchmarks or competitions are unreliable. The impact of conventional distortions and processing operations found in image and video processing workflows, such as compression, noise, and enhancement, is not sufficiently evaluated. This paper proposes a more rigorous framework to assess the performance of learning-based deepfake detectors in more realistic situations. This framework can serve as a broad benchmarking approach for both general model performance assessment and the ranking of proponents in a competition. In addition, a stochastic degradation-based data augmentation strategy driven by realistic processing operations is designed, which significantly improves the generalization ability of two deepfake detectors.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.