Hybrid Convolutional-Recurrent Neural Networks (CNN-RNN) Model with Temporal Attention and Particle Swarm Optimization for Deepfake Video Detection
Jeremias C. Esperanza | Jean Fidelio E. Marquez | Ron Anthony A. Sy
Discipline: Artificial Intelligence
Abstract:
The rapid advancement of deepfake technology presents a growing threat to information integrity and online security. To address this, this research proposed an efficient deepfake video detection framework that integrates Convolutional Neural Networks (CNNs) for spatial feature extraction, Recurrent Neural Networks (RNNs) with a temporal attention mechanism for modeling sequential dependencies, and Particle Swarm Optimization (PSO) for hyperparameter tuning. The pipeline included frame extraction, face alignment, and feature processing using a pre-trained CNN, followed by an RNN that emphasizes critical temporal artifacts through attention. PSO further enhanced model performance by optimizing key hyperparameters such as learning rate and hidden dimensions. To evaluate the effectiveness of the proposed model, a comparative analysis against existing deepfake detection methods, including XceptionNet, LSTM with frame-level features, and CNN-GRU without attention, was conducted. The proposed CNN-RNN model with Temporal Attention and PSO outperformed the baselines, demonstrating the model's improved generalization and reliability, particularly in reducing false negatives, making it a robust solution for real-world media forensics and platform integrity.
References:
- Afchar, D., Nozick, V., Yamagishi, J., & Echizen, I. (2018). MesoNet: A compact facial video forgery detection network. 2018 IEEE International Workshop on Information Forensics and Security (WIFS), 1–7. https://ieeexplore.ieee.org/document/8630761
- Ahmed, A., Jalal, S., & Sayed, A. (2021). Enhancing deep learning models using particle swarm optimization. Journal of Machine Learning Research, 22(1), 567–589.
- Al-Adwan, A., Alazzam, H., Al-Anbaki, N., & Alduweib, E. (2023). Detection of deepfake media using a hybrid CNN–RNN model and particle swarm optimization (PSO) algorithm. Computers, 13(4), 99. https://www.mdpi.com/2073-431X/13/4/99
- Amerini, I., Galteri, L., Caldelli, R., & Del Bimbo, A. (2020). Deepfake video detection through optical flow–based CNN. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) (pp. 1205–1214).
- Antad, M., & Arthamwar, P. (2023). A hybrid approach for deepfake detection using CNN-RNN. International Journal of Computer Applications, 182(47), 1–5.
- Chadha, A., Kumar, V., Kashyap, S., & Gupta, M. (2021). Deepfake: An overview. In R. Silhavy (Ed.), Lecture Notes in Networks and Systems (Vol. 188, pp. 557–566). Springer. https://doi.org/10.1007/978-981-16-0733-2_39
- Chen, J., Lin, T., & Chen, L. (2020). Hybrid CNN-RNN model with attention mechanism for deepfake detection. IEEE Transactions on Information Forensics and Security, 15, 234–245.
- Cunha, L., Zhang, L., Sowan, B., Lim, C. P., & Kong, Y. (2024). Video deepfake detection using particle swarm optimization improved deep neural networks. Neural Computing and Applications, 36, 8417–8453. https://doi.org/10.1007/s00521-024-09536-x
- Dang, H. T., Liu, F., Stehouwer, J., Liu, X., & Jain, A. K. (2020). On the detection of digital face manipulation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 5781–5790). https://doi.org/10.1109/CVPR42600.2020.01020
- Darwish, T., Mohamed, A., & Mersh, M. (2023). Deepfake videos: A comprehensive review. In K. R. Rao & N. Panda (Eds.), Proceedings of the 3rd International Conference on Computing and Communication Systems (pp. 709–726). Springer. https://doi.org/10.1007/978-981-19-7615-5_55
- Dimmock, T. (2019). Deepfakes: A growing threat to trust and security. Journal of Cyber Policy, 4(2), 189–207.
- Gao, H., Su, Y., & Kong, W. (2021). Temporal attention mechanisms in video analysis: Applications in deepfake detection. IEEE Transactions on Multimedia, 23(6), 320–333.
- Güera, D., & Delp, E. J. (2018). Deepfake video detection using recurrent neural networks. 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), 1–6. https://ieeexplore.ieee.org/document/8639163
- Johnson, L. (2023, June 15). Understanding deepfakes: The rise of synthetic media. TechInsights.
- Kennedy, J., & Eberhart, R. (1995). Particle swarm optimization. In Proceedings of ICNN’95 ‒ International Conference on Neural Networks (Vol. 4, pp. 1942–1948). IEEE. https://doi.org/10.1109/ICNN.1995.488968
- Khalid, M., & Akhtar, N. (2023). Deepfake detection: Enhancing performance with spatiotemporal features. In Proceedings of the International Conference on Artificial Intelligence and Data Analytics (AIDA 2023) (pp. 112–118).
- Matern, F., Riess, C., & Stamminger, M. (2019). Exploiting visual artifacts to expose deepfakes and face manipulations. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) (pp. 1–9). https://doi.org/10.1109/ICCVW.2019.00182
- Microsoft, Amazon, Facebook, et al. (2019). The Deepfake Detection Challenge. https://deepfakedetectionchallenge.ai
- Qi, L., Yang, Y., Song, Y. Z., & Xiang, T. (2020). Deepfake detection using spatiotemporal features and neural architectures. arXiv preprint. https://arxiv.org/abs/2007.02526
- Rahman, A., Islam, M., Moon, M., Tasnim, T., Siddique, N., & Ahmed, S. (2022). A qualitative survey on deep learning based deep fake video creation and detection method. Australian Journal of Engineering and Innovative Technology, 4(1), 13–26. https://doi.org/10.34104/ajeit.022.013026
- Rössler, A., Cozzolino, D., Verdoliva, L., Riess, C., Thies, J., & Nießner, M. (2019). FaceForensics++: Learning to detect manipulated facial images. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (pp. 1–11).
- Sabir, E., Cheng, J., Jaiswal, A., AbdAlmageed, W., Masi, I., & Natarajan, P. (2019). Recurrent convolutional strategies for face manipulation detection in videos. arXiv preprint. https://arxiv.org/abs/1905.00582
- Shami, T. M., El-Saleh, A. A., Alswaitti, M., Al-Tashi, Q., Summakieh, M. A., & Mirjalili, S. (2022). Particle swarm optimization: A comprehensive survey. IEEE Access, 10, 10031–10061. https://doi.org/10.1109/ACCESS.2022.3142859
- Yan, C., Tu, Y., Wang, X., Zhang, Y., Hao, X., Zhang, Y., & Dai, Q. (2020). STAT: Spatial-temporal attention mechanism for video captioning. IEEE Transactions on Multimedia, 22(1), 229–241. https://doi.org/10.1109/TMM.2019.2924576
- Yu, P., Xia, Z., Fei, J., & Lu, Y. (2021). A survey on deepfake video detection. IET Biometrics, 10(6), 607–624. https://doi.org/10.1049/bme2.12031
ISSN 3082-3684 (Online)
ISSN 3082-3676 (Print)