Hybrid Convolutional-Recurrent Neural Networks (CNN-RNN) Model with Temporal Attention and Particle Swarm Optimization for Deepfake Video Detection

Jeremias  C. Esperanza,; Jean Fidelio  E. Marquez,; Ron Anthony  A. Sy,

Hybrid Convolutional-Recurrent Neural Networks (CNN-RNN) Model with Temporal Attention and Particle Swarm Optimization for Deepfake Video Detection

Jeremias C. Esperanza | Jean Fidelio E. Marquez | Ron Anthony A. Sy

Abstract:

The rapid advancement of deepfake technology presents a growing threat to information integrity and online security. To address this, this research proposed an efficient deepfake video detection framework that integrates Convolutional Neural Networks (CNNs) for spatial feature extraction, Recurrent Neural Networks (RNNs) with a temporal attention mechanism for modeling sequential dependencies, and Particle Swarm Optimization (PSO) for hyperparameter tuning. The pipeline included frame extraction, face alignment, and feature processing using a pre-trained CNN, followed by an RNN that emphasizes critical temporal artifacts through attention. PSO further enhanced model performance by optimizing key hyperparameters such as learning rate and hidden dimensions. To evaluate the effectiveness of the proposed model, a comparative analysis against existing deepfake detection methods, including XceptionNet, LSTM with frame-level features, and CNN-GRU without attention, was conducted. The proposed CNN-RNN model with Temporal Attention and PSO outperformed the baselines, demonstrating the model's improved generalization and reliability, particularly in reducing false negatives, making it a robust solution for real-world media forensics and platform integrity.

References:

Afchar, D., Nozick, V., Yamagishi, J., & Echizen, I. (2018). MesoNet: A compact facial video forgery detection network. 2018 IEEE International Workshop on Information Forensics and Security (WIFS), 1–7. https://ieeexplore.ieee.org/document/8630761
Ahmed, A., Jalal, S., & Sayed, A. (2021). Enhancing deep learning models using particle swarm optimization. Journal of Machine Learning Research, 22(1), 567–589.
Al-Adwan, A., Alazzam, H., Al-Anbaki, N., & Alduweib, E. (2023). Detection of deepfake media using a hybrid CNN–RNN model and particle swarm optimization (PSO) algorithm. Computers, 13(4), 99. https://www.mdpi.com/2073-431X/13/4/99
Amerini, I., Galteri, L., Caldelli, R., & Del Bimbo, A. (2020). Deepfake video detection through optical flow–based CNN. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) (pp. 1205–1214).
Antad, M., & Arthamwar, P. (2023). A hybrid approach for deepfake detection using CNN-RNN. International Journal of Computer Applications, 182(47), 1–5.
Chadha, A., Kumar, V., Kashyap, S., & Gupta, M. (2021). Deepfake: An overview. In R. Silhavy (Ed.), Lecture Notes in Networks and Systems (Vol. 188, pp. 557–566). Springer. https://doi.org/10.1007/978-981-16-0733-2_39
Chen, J., Lin, T., & Chen, L. (2020). Hybrid CNN-RNN model with attention mechanism for deepfake detection. IEEE Transactions on Information Forensics and Security, 15, 234–245.
Cunha, L., Zhang, L., Sowan, B., Lim, C. P., & Kong, Y. (2024). Video deepfake detection using particle swarm optimization improved deep neural networks. Neural Computing and Applications, 36, 8417–8453. https://doi.org/10.1007/s00521-024-09536-x
Dang, H. T., Liu, F., Stehouwer, J., Liu, X., & Jain, A. K. (2020). On the detection of digital face manipulation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 5781–5790). https://doi.org/10.1109/CVPR42600.2020.01020
Darwish, T., Mohamed, A., & Mersh, M. (2023). Deepfake videos: A comprehensive review. In K. R. Rao & N. Panda (Eds.), Proceedings of the 3rd International Conference on Computing and Communication Systems (pp. 709–726). Springer. https://doi.org/10.1007/978-981-19-7615-5_55
Dimmock, T. (2019). Deepfakes: A growing threat to trust and security. Journal of Cyber Policy, 4(2), 189–207.
Gao, H., Su, Y., & Kong, W. (2021). Temporal attention mechanisms in video analysis: Applications in deepfake detection. IEEE Transactions on Multimedia, 23(6), 320–333.
Güera, D., & Delp, E. J. (2018). Deepfake video detection using recurrent neural networks. 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), 1–6. https://ieeexplore.ieee.org/document/8639163
Johnson, L. (2023, June 15). Understanding deepfakes: The rise of synthetic media. TechInsights.
Kennedy, J., & Eberhart, R. (1995). Particle swarm optimization. In Proceedings of ICNN’95 ‒ International Conference on Neural Networks (Vol. 4, pp. 1942–1948). IEEE. https://doi.org/10.1109/ICNN.1995.488968
Khalid, M., & Akhtar, N. (2023). Deepfake detection: Enhancing performance with spatiotemporal features. In Proceedings of the International Conference on Artificial Intelligence and Data Analytics (AIDA 2023) (pp. 112–118).
Matern, F., Riess, C., & Stamminger, M. (2019). Exploiting visual artifacts to expose deepfakes and face manipulations. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) (pp. 1–9). https://doi.org/10.1109/ICCVW.2019.00182
Microsoft, Amazon, Facebook, et al. (2019). The Deepfake Detection Challenge. https://deepfakedetectionchallenge.ai
Qi, L., Yang, Y., Song, Y. Z., & Xiang, T. (2020). Deepfake detection using spatiotemporal features and neural architectures. arXiv preprint. https://arxiv.org/abs/2007.02526
Rahman, A., Islam, M., Moon, M., Tasnim, T., Siddique, N., & Ahmed, S. (2022). A qualitative survey on deep learning based deep fake video creation and detection method. Australian Journal of Engineering and Innovative Technology, 4(1), 13–26. https://doi.org/10.34104/ajeit.022.013026
Rössler, A., Cozzolino, D., Verdoliva, L., Riess, C., Thies, J., & Nießner, M. (2019). FaceForensics++: Learning to detect manipulated facial images. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (pp. 1–11).
Sabir, E., Cheng, J., Jaiswal, A., AbdAlmageed, W., Masi, I., & Natarajan, P. (2019). Recurrent convolutional strategies for face manipulation detection in videos. arXiv preprint. https://arxiv.org/abs/1905.00582
Shami, T. M., El-Saleh, A. A., Alswaitti, M., Al-Tashi, Q., Summakieh, M. A., & Mirjalili, S. (2022). Particle swarm optimization: A comprehensive survey. IEEE Access, 10, 10031–10061. https://doi.org/10.1109/ACCESS.2022.3142859
Yan, C., Tu, Y., Wang, X., Zhang, Y., Hao, X., Zhang, Y., & Dai, Q. (2020). STAT: Spatial-temporal attention mechanism for video captioning. IEEE Transactions on Multimedia, 22(1), 229–241. https://doi.org/10.1109/TMM.2019.2924576
Yu, P., Xia, Z., Fei, J., & Lu, Y. (2021). A survey on deepfake video detection. IET Biometrics, 10(6), 607–624. https://doi.org/10.1049/bme2.12031

Tools

Cite this paper

Philippine E-Journals

Home⇛Isabela State University Linker: Journal of Engineering, Computing and Technology⇛vol. 2 no. 1 (2025)