Detecting Sequential Deepfake Manipulation via Spectral Transformer With Pyramid Attention in Consumer IoT

Abstract

Recently, the Consumer Internet of Things (CIoT) has brought great convenience to people. In CIoT, face image information is indispensable for payment and checking the identity of the user in the transaction. However, the misuse of deepfake face information in CIoT transactions is a growing problem. It has seriously violated the property and privacy of individuals. Moreover, with the proliferation of easily accessible facial editing applications, individuals can effortlessly manipulate facial components through sequential multi-step manipulations. To solve this issue, we propose a Spectral Transformer with a Pyramid Attention (STPA) model to detect sequence permutations in manipulated facial images. Specifically, we introduce a pyramid attention module that integrates both spatial and channel attention mechanisms to prioritize the face region over the background region. Additionally, a spectral Transformer is employed concurrently to extract global and local features to facilitate the fine-grained extraction of the face forgery region. Comprehensive experiments prove that the proposed method can enhance the detection accuracy of the sequential deepfake manipulation task through the fine-grained extraction of features in the face forgery region.

Publication
IEEE
Mingliang Gao
Mingliang Gao
Associate Professor