Deepfake Speech Detection: Identifying AI-Generated and Real Human Voices Using Hybrid Convolutional Neural Network and Long Short-Term Memory Model

Marc  P. Laureta,; John Maynardk  M. Atienza,; John Lemuel  B. Tapel,

Deepfake Speech Detection: Identifying AI-Generated and Real Human Voices Using Hybrid Convolutional Neural Network and Long Short-Term Memory Model

Marc P. Laureta | John Maynardk M. Atienza | John Lemuel B. Tapel

Abstract:

This study explored deepfake audio detection using English and Tagalog datasets to enhance multilingual speech classification. The rise of synthetic media, particularly deepfake audio, raises concerns about misinformation, security, and authenticity. To address this, the researchers developed a web-based detection system using a hybrid Convolutional Neural Network and Long Short-Term Memory Model (CNN-LSTM) model, which captured spatial and temporal features for accurate classification. The approach leveraged Mel spectrograms, convolutional layers for spatial patterns, and LSTM networks for temporal dependencies. Trained on an augmented dataset of over 176,000 samples and fine-tuned using TensorFlow, the model achieved 98.65% accuracy, with a precision of 98.60% and a recall of 98.76%. The system employed class weighting to address imbalance and used mixed-precision training for efficiency. Its architecture included Conv2D layers with Batch Normalization and MaxPooling, followed by TimeDistributed Dense layers and an LSTM for sequential modeling. Regularization and callbacks optimized performance, which was evaluated using accuracy, precision, recall, F1-score, and a confusion matrix. Results confirmed its efficacy in distinguishing real and AI-generated voices, mitigating risks from synthetic speech. Future work may refine dataset diversity and optimize system responsiveness for broader real-world implementation.

References:

Al-Badawy, E., Lyu, S., & Farid, H. (2019). Detecting AI-synthesized speech using deep learning. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2130–2136.
Al-Dulaimi, O. A. H. H., & Kurnaz, S. (2024). A hybrid CNN-LSTM approach for precision deepfake image detection based on transfer learning. Electronics, 13(9). https://doi.org/10.3390/electronics13091662
Al-Khazraji, S., Saleh, H. H., & Khalid, A. I. (2023). Impact of deepfake technology on social media: Detection, misinformation, and societal implications. Engineering Proceedings, 23, 429. https://doi.org/10.55549/epstem.1371792
Amin, M. A., Hu, Y., & Hu, J. (2024). Analyzing temporal coherence for deepfake video detection. Electronic Research Archive, 32(4), 2621–2641. https://doi.org/10.3934/era.2024119
Cinar, B. (2023). Deepfakes in cyber warfare: Threats, detection techniques and countermeasures. Asian Journal of Research in Computer Science, 16(4), 178–193. https://doi.org/10.9734/ajrcos/2023/v16i4381
Dwivedi, Y. K., Sharma, A., Rana, N. P., Giannakis, M., Goel, P., & Dutot, V. (2023). Evolution of artificial intelligence research in technological forecasting and social change: Research topics, trends, and future directions. Technological Forecasting and Social Change, 193. https://doi.org/10.1016/j.techfore.2023.122579
Guo, B., Tai, H., Luo, G., & Zhu, Y. (2024). AVSecure: An audio-visual watermarking framework for proactive deepfake detection. In 2024 IEEE 14th International Conference on Electronics Information and Emergency Communication (ICEIEC) (pp. 1–4). https://ieeexplore.ieee.org/document/10561738
Hamza, A., Javed, A. R. R., & Iqbal, F. (2022). Deepfake audio detection via MFCC features using machine learning. 2023 International Conference on Digital Forensics and Information Security (ICDFIS), 1–6. https://doi.org/10.1109/ACCESS.2022.3231480
Hany, M., Hamed, H., & Shalaby, M. (2023). The effect of deep learning methods on deepfake audio detection for digital investigation. Procedia Computer Science, 229, 1676–1684. https://doi.org/10.1016/j.procs.2023.01.291
Heidari, A., Navimipour, N. J., Dag, H., & Unal, M. (2023). Deepfake detection using deep learning methods: A systematic and comprehensive review. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 14(2). https://doi.org/10.1002/widm.1520
Mathew, J. J., Ahsan, R., & Furukawa, S., et al. (2024). Towards the development of a real-time deepfake audio detection system in communication platforms. arXiv. https://doi.org/10.48550/arXiv.2403.11778
Pallavi N, P, P. T., Sushma Bylaiah, & Goutam R. (2024). Adversarial Robustness in DeepFake Detection: Enhancing Model Resilience with Defensive Strategies. 2024 International Conference on Intelligent Cybernetics Technology & Applications (ICICyTA), 221–226. https://doi.org/10.1109/ICICYTA64807.2024.10913151
Nguyen, T. T., & Nahavandi, S. (2022). Deep learning for deepfakes creation and detection. Computer Vision and Image Understanding, 223. https://doi.org/10.1016/j.cviu.2022.103525
Patel, K. J., & Desai, M. B. (2024). AI-driven advances and challenges in deepfake technology: A comprehensive review. Journal of Engineering and Science Research, 8(2), 34–45. https://doi.org/10.52783/jes.7451
Sajini, T. (2021). A survey on deepfake detection techniques. International Journal of Innovative Research in Technology, 7(8), 12–17. Retrieved from https://www.researchgate.net/publication/348380923_A_Survey_on_Deepfake_Detection_Techniques
Sunil, R., Mer, P., Diwan, A., Mahadeva, R., & Sharma, A. (2025). Exploring autonomous methods for deepfake detection: A detailed survey on techniques and evaluation. Heliyon, 11(3). https://doi.org/10.1016/j.heliyon.2025.e42273
Vo, N. H., Phan, K. D., Tran, A.-D., & Dang-Nguyen, D.-T. (2022). Adversarial attacks on deepfake detectors: A practical analysis. In A. Del Bimbo, R. Cucchiara, & S. Sclaroff (Eds.), International Conference on Multimedia Modeling (Vol. 13142, pp. 300–312). Springer. https://doi.org/10.1007/978-3-030-98355-0_27

Tools

Cite this paper

Philippine E-Journals

Home⇛Isabela State University Linker: Journal of Engineering, Computing and Technology⇛vol. 2 no. 1 (2025)