Deepfake Speech Detection: Identifying AI-Generated and Real Human Voices Using Hybrid Convolutional Neural Network and Long Short-Term Memory Model
Marc P. Laureta | John Maynardk M. Atienza | John Lemuel B. Tapel
Discipline: Artificial Intelligence
Abstract:
This study explored deepfake audio detection using English and Tagalog datasets to enhance multilingual speech classification. The rise of synthetic media, particularly deepfake audio, raises concerns about misinformation, security, and authenticity. To address this, the researchers developed a web-based detection system using a hybrid Convolutional Neural Network and Long Short-Term Memory Model (CNN-LSTM) model, which captured spatial and temporal features for accurate classification. The approach leveraged Mel spectrograms, convolutional layers for spatial patterns, and LSTM networks for temporal dependencies. Trained on an augmented dataset of over 176,000 samples and fine-tuned using TensorFlow, the model achieved 98.65% accuracy, with a precision of 98.60% and a recall of 98.76%. The system employed class weighting to address imbalance and used mixed-precision training for efficiency. Its architecture included Conv2D layers with Batch Normalization and MaxPooling, followed by TimeDistributed Dense layers and an LSTM for sequential modeling. Regularization and callbacks optimized performance, which was evaluated using accuracy, precision, recall, F1-score, and a confusion matrix. Results confirmed its efficacy in distinguishing real and AI-generated voices, mitigating risks from synthetic speech. Future work may refine dataset diversity and optimize system responsiveness for broader real-world implementation.
References:
- Al-Badawy, E., Lyu, S., & Farid, H. (2019). Detecting AI-synthesized speech using deep learning. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2130–2136.
- Al-Dulaimi, O. A. H. H., & Kurnaz, S. (2024). A hybrid CNN-LSTM approach for precision deepfake image detection based on transfer learning. Electronics, 13(9). https://doi.org/10.3390/electronics13091662
- Al-Khazraji, S., Saleh, H. H., & Khalid, A. I. (2023). Impact of deepfake technology on social media: Detection, misinformation, and societal implications. Engineering Proceedings, 23, 429. https://doi.org/10.55549/epstem.1371792
- Amin, M. A., Hu, Y., & Hu, J. (2024). Analyzing temporal coherence for deepfake video detection. Electronic Research Archive, 32(4), 2621–2641. https://doi.org/10.3934/era.2024119
- Cinar, B. (2023). Deepfakes in cyber warfare: Threats, detection techniques and countermeasures. Asian Journal of Research in Computer Science, 16(4), 178–193. https://doi.org/10.9734/ajrcos/2023/v16i4381
- Dwivedi, Y. K., Sharma, A., Rana, N. P., Giannakis, M., Goel, P., & Dutot, V. (2023). Evolution of artificial intelligence research in technological forecasting and social change: Research topics, trends, and future directions. Technological Forecasting and Social Change, 193. https://doi.org/10.1016/j.techfore.2023.122579
- Guo, B., Tai, H., Luo, G., & Zhu, Y. (2024). AVSecure: An audio-visual watermarking framework for proactive deepfake detection. In 2024 IEEE 14th International Conference on Electronics Information and Emergency Communication (ICEIEC) (pp. 1–4). https://ieeexplore.ieee.org/document/10561738
- Hamza, A., Javed, A. R. R., & Iqbal, F. (2022). Deepfake audio detection via MFCC features using machine learning. 2023 International Conference on Digital Forensics and Information Security (ICDFIS), 1–6. https://doi.org/10.1109/ACCESS.2022.3231480
- Hany, M., Hamed, H., & Shalaby, M. (2023). The effect of deep learning methods on deepfake audio detection for digital investigation. Procedia Computer Science, 229, 1676–1684. https://doi.org/10.1016/j.procs.2023.01.291
- Heidari, A., Navimipour, N. J., Dag, H., & Unal, M. (2023). Deepfake detection using deep learning methods: A systematic and comprehensive review. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 14(2). https://doi.org/10.1002/widm.1520
- Mathew, J. J., Ahsan, R., & Furukawa, S., et al. (2024). Towards the development of a real-time deepfake audio detection system in communication platforms. arXiv. https://doi.org/10.48550/arXiv.2403.11778
- Pallavi N, P, P. T., Sushma Bylaiah, & Goutam R. (2024). Adversarial Robustness in DeepFake Detection: Enhancing Model Resilience with Defensive Strategies. 2024 International Conference on Intelligent Cybernetics Technology & Applications (ICICyTA), 221–226. https://doi.org/10.1109/ICICYTA64807.2024.10913151
- Nguyen, T. T., & Nahavandi, S. (2022). Deep learning for deepfakes creation and detection. Computer Vision and Image Understanding, 223. https://doi.org/10.1016/j.cviu.2022.103525
- Patel, K. J., & Desai, M. B. (2024). AI-driven advances and challenges in deepfake technology: A comprehensive review. Journal of Engineering and Science Research, 8(2), 34–45. https://doi.org/10.52783/jes.7451
- Sajini, T. (2021). A survey on deepfake detection techniques. International Journal of Innovative Research in Technology, 7(8), 12–17. Retrieved from https://www.researchgate.net/publication/348380923_A_Survey_on_Deepfake_Detection_Techniques
- Sunil, R., Mer, P., Diwan, A., Mahadeva, R., & Sharma, A. (2025). Exploring autonomous methods for deepfake detection: A detailed survey on techniques and evaluation. Heliyon, 11(3). https://doi.org/10.1016/j.heliyon.2025.e42273
- Vo, N. H., Phan, K. D., Tran, A.-D., & Dang-Nguyen, D.-T. (2022). Adversarial attacks on deepfake detectors: A practical analysis. In A. Del Bimbo, R. Cucchiara, & S. Sclaroff (Eds.), International Conference on Multimedia Modeling (Vol. 13142, pp. 300–312). Springer. https://doi.org/10.1007/978-3-030-98355-0_27
ISSN 3082-3684 (Online)
ISSN 3082-3676 (Print)