HomeIsabela State University Linker: Journal of Engineering, Computing and Technologyvol. 2 no. 1 (2025)

Deepfake Speech Detection: Identifying AI-Generated and Real Human Voices Using Hybrid Convolutional Neural Network and Long Short-Term Memory Model

Marc P. Laureta | John Maynardk M. Atienza | John Lemuel B. Tapel

Discipline: Artificial Intelligence

 

Abstract:

This study explored deepfake audio detection using English and Tagalog datasets to enhance multilingual speech classification. The rise of synthetic media, particularly deepfake audio, raises concerns about misinformation, security, and authenticity. To address this, the researchers developed a web-based detection system using a hybrid Convolutional Neural Network and Long Short-Term Memory Model (CNN-LSTM) model, which captured spatial and temporal features for accurate classification. The approach leveraged Mel spectrograms, convolutional layers for spatial patterns, and LSTM networks for temporal dependencies. Trained on an augmented dataset of over 176,000 samples and fine-tuned using TensorFlow, the model achieved 98.65% accuracy, with a precision of 98.60% and a recall of 98.76%. The system employed class weighting to address imbalance and used mixed-precision training for efficiency. Its architecture included Conv2D layers with Batch Normalization and MaxPooling, followed by TimeDistributed Dense layers and an LSTM for sequential modeling. Regularization and callbacks optimized performance, which was evaluated using accuracy, precision, recall, F1-score, and a confusion matrix. Results confirmed its efficacy in distinguishing real and AI-generated voices, mitigating risks from synthetic speech. Future work may refine dataset diversity and optimize system responsiveness for broader real-world implementation.



References:

  1. Al-Badawy, E., Lyu, S., & Farid, H. (2019). Detecting AI-synthesized speech using deep learning. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2130–2136.
  2. Al-Dulaimi, O. A. H. H., & Kurnaz, S. (2024). A hybrid CNN-LSTM approach for precision deepfake image detection based on transfer learning. Electronics, 13(9). https://doi.org/10.3390/electronics13091662
  3. Al-Khazraji, S., Saleh, H. H., & Khalid, A. I. (2023). Impact of deepfake technology on social media: Detection, misinformation, and societal implications. Engineering Proceedings, 23, 429. https://doi.org/10.55549/epstem.1371792
  4. Amin, M. A., Hu, Y., & Hu, J. (2024). Analyzing temporal coherence for deepfake video detection. Electronic Research Archive, 32(4), 2621–2641. https://doi.org/10.3934/era.2024119
  5. Cinar, B. (2023). Deepfakes in cyber warfare: Threats, detection techniques and countermeasures. Asian Journal of Research in Computer Science, 16(4), 178–193. https://doi.org/10.9734/ajrcos/2023/v16i4381
  6. Dwivedi, Y. K., Sharma, A., Rana, N. P., Giannakis, M., Goel, P., & Dutot, V. (2023). Evolution of artificial intelligence research in technological forecasting and social change: Research topics, trends, and future directions. Technological Forecasting and Social Change, 193. https://doi.org/10.1016/j.techfore.2023.122579
  7. Guo, B., Tai, H., Luo, G., & Zhu, Y. (2024). AVSecure: An audio-visual watermarking framework for proactive deepfake detection. In 2024 IEEE 14th International Conference on Electronics Information and Emergency Communication (ICEIEC) (pp. 1–4). https://ieeexplore.ieee.org/document/10561738
  8. Hamza, A., Javed, A. R. R., & Iqbal, F. (2022). Deepfake audio detection via MFCC features using machine learning. 2023 International Conference on Digital Forensics and Information Security (ICDFIS), 1–6. https://doi.org/10.1109/ACCESS.2022.3231480
  9. Hany, M., Hamed, H., & Shalaby, M. (2023). The effect of deep learning methods on deepfake audio detection for digital investigation. Procedia Computer Science, 229, 1676–1684. https://doi.org/10.1016/j.procs.2023.01.291
  10. Heidari, A., Navimipour, N. J., Dag, H., & Unal, M. (2023). Deepfake detection using deep learning methods: A systematic and comprehensive review. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 14(2). https://doi.org/10.1002/widm.1520
  11. Mathew, J. J., Ahsan, R., & Furukawa, S., et al. (2024). Towards the development of a real-time deepfake audio detection system in communication platforms. arXiv. https://doi.org/10.48550/arXiv.2403.11778
  12. Pallavi N, P, P. T., Sushma Bylaiah, & Goutam R. (2024). Adversarial Robustness in DeepFake Detection: Enhancing Model Resilience with Defensive Strategies. 2024 International Conference on Intelligent Cybernetics Technology & Applications (ICICyTA), 221–226. https://doi.org/10.1109/ICICYTA64807.2024.10913151
  13. Nguyen, T. T., & Nahavandi, S. (2022). Deep learning for deepfakes creation and detection. Computer Vision and Image Understanding, 223. https://doi.org/10.1016/j.cviu.2022.103525
  14. Patel, K. J., & Desai, M. B. (2024). AI-driven advances and challenges in deepfake technology: A comprehensive review. Journal of Engineering and Science Research, 8(2), 34–45. https://doi.org/10.52783/jes.7451
  15. Sajini, T. (2021). A survey on deepfake detection techniques. International Journal of Innovative Research in Technology, 7(8), 12–17. Retrieved from https://www.researchgate.net/publication/348380923_A_Survey_on_Deepfake_Detection_Techniques
  16. Sunil, R., Mer, P., Diwan, A., Mahadeva, R., & Sharma, A. (2025). Exploring autonomous methods for deepfake detection: A detailed survey on techniques and evaluation. Heliyon, 11(3). https://doi.org/10.1016/j.heliyon.2025.e42273
  17. Vo, N. H., Phan, K. D., Tran, A.-D., & Dang-Nguyen, D.-T. (2022). Adversarial attacks on deepfake detectors: A practical analysis. In A. Del Bimbo, R. Cucchiara, & S. Sclaroff (Eds.), International Conference on Multimedia Modeling (Vol. 13142, pp. 300–312). Springer. https://doi.org/10.1007/978-3-030-98355-0_27