Multimodal and Large Language Model Approaches in Cybersecurity: A Systematic Review
Main Article Content
Abstract
The rapid evolution of cyber threats demands increasingly sophisticated defensive mechanisms. In recent years, Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) have gained traction as valuable tools across multiple cybersecurity domains, offering capabilities that extend well beyond traditional rule-based and classical machine learning approaches. This systematic review provides a detailed analysis of 55 research papers published between 2019 and 2025, examining the application of LLMs and multimodal AI across eight key cybersecurity domains: vulnerability detection, malware analysis, phishing detection, network intrusion detection, cyber threat intelligence, security operations, penetration testing, and deepfake detection. We present a unified taxonomy that categorizes these approaches by their architectural type, covering encoder-only models (BERT variants), decoder-only models (GPT family), and multimodal architectures, as well as by their application domains. Our comparative analysis shows that while LLMs demonstrate strong capabilities in code comprehension, threat classification, and automated security analysis, notable challenges persist in areas such as hallucination, adversarial robustness, and the dual-use nature of these technologies. We further examine the security vulnerabilities present in LLMs themselves, including prompt injection and jailbreaking attacks. This review identifies open research gaps and proposes future directions, including agentic AI workflows, privacy-preserving security models, and the development of domain-specific foundation models for cybersecurity.
Article Details
Section

This work is licensed under a Creative Commons Attribution 4.0 International License.
How to Cite
References
[1] D. M. Divakaran and S. Gupta, “Large language models in cybersecurity: A survey of applications, vulnerabilities, and defense techniques,” IEEE Access, vol. 12, pp. 179576–179609, 2024.
[2] M. A. Ferrag, O. Friha, D. Hamouda, L. Maglaras, and H. Janicke, “Generative AI and large language models for cyber security: All insights you need,” arXiv Preprint arXiv:2405.12750, 2024.
[3] F. N. Motlagh, M. Hajizadeh, M. Majd, P. Najafi, F. Cheng, and C. Meinel, “Large language models in cybersecurity: State-of-the-art,” arXiv Preprint arXiv:2402.00891, 2024.
[4] G. de J. C. da S. Zhang, L. Liu, S. Choi, R. Jain, and K. Suh, “A survey of large language models in cybersecurity,” arXiv Preprint arXiv:2402.01854, 2024.
[5] Y. Yao, J. Duan, K. Xu, Y. Cai, Z. Sun, and Y. Zhang, “A survey on large language model (LLM) security and privacy: The good, the bad, and the ugly,” High-Confidence Computing, vol. 4, no. 2, p. 100211, 2024.
[6] C. Thapa, S. I. Jang, M. E. Ahmed, S. Camtepe, J. Piber, and J. Grossklags, “Transformer-based language models for software vulnerability detection,” in Proceedings of ACSAC, 2022, pp. 481–496.
[7] M. Fu and C. Tantithamthavorn, “LineVul: A transformer-based line-level vulnerability prediction,” in Proceedings of the 19th International Conference on Mining Software Repositories (MSR), 2022, pp. 608–620.
[8] A. Rahali and M. A. Akhloufi, “MalBERT: Malware detection using transformers,” IEEE Access, vol. 11, pp. 88495–88511, 2023.
[9] J. Lee, P. Guo, J. Park, and L. Luo, “Multimodal large language models for phishing webpage detection and identification,” in Proceedings of the Symposium on Electronic Crime Research (eCrime), 2024.
[10] Y. Li, D. H. Chau, C. Zou, and T. Neth, “KnowPhish: Large language models meet multimodal knowledge graphs for enhancing reference-based phishing detection,” in Proceedings of the USENIX Security Symposium, 2024.
[11] M. A. Ferrag, M. Ndhlovu, N. Tihanyi, L. C. Cordeiro, M. Debbah, and T. Lestable, “Revolutionizing cyber threat detection with large language models: A privacy-preserving BERT-based lightweight model for IoT/IIoT networks,” arXiv Preprint arXiv:2306.14263, 2024.
[12] M. Alam, N. Dey, Y. Huang, and A. Bozorgi, “Looking beyond text: Reducing visual parroting with multimodal large language models for security operations center,” arXiv Preprint, 2024.
[13] G. Deng et al., “PentestGPT: An LLM-empowered automatic penetration testing tool,” in Proceedings of the USENIX Security Symposium, 2024.
[14] R. Fang, R. Bindu, A. Gupta, Q. Zhan, and D. Kang, “LLM agents can autonomously exploit one-day vulnerabilities,” arXiv Preprint arXiv:2404.08144, 2024.
[15] M. Bethany, S. Wheatley, B. Tobin, S. Neupane, and E. Mitra, “Large language models for automated social engineering attacks and defense,” arXiv Preprint, 2024.
[16] Z. Lin, J. Li, and Rel. Ren, “MIND-IoT: Multimodal IoT network traffic classification using transformer-CNN,” PeerJ Computer Science, vol. 10, p. e2326, 2024.
[17] X. Hou et al., “Large language models for software vulnerability detection: A survey,” arXiv Preprint arXiv:2403.08345, 2024.
[18] H. Xu, D. Xiao, Z. Li, J. Xu, and S. Wen, “Large language models for cyber security: A systematic literature review,” arXiv Preprint arXiv:2405.04760, 2024.
[19] E. Academy, X. Niu, W. Shadid, and E. Al-Shaer, “SecureBERT: A domain-specific language model for cybersecurity,” in Proceedings of the International Conference on Security and Privacy in Communication Systems (SecureComm), 2023, pp. 257–275.
[20] P. Ranade, A. Piplai, A. Joshi, and T. Finin, “CyBERT: Contextualized embeddings for the cybersecurity domain,” in Proceedings of the IEEE International Conference on Big Data, 2021, pp. 3334–3342.
[21] H. Pearce, B. Tan, B. Ahmad, R. Karri, and B. Dolan-Gavitt, “Examining zero-shot vulnerability repair with large language models,” in Proceedings of the IEEE Symposium on Security and Privacy (S&P), 2023, pp. 2339–2356.
[22] F. Perrina, G. Siracusano, and S. Zanero, “AGIR: Automating cyber threat intelligence reporting with natural language generation,” in Proceedings of the International Conference on Availability, Reliability and Security (ARES), 2023.
[23] Z. Feng et al., “CodeBERT: A pre-trained model for programming and natural languages,” in Findings of the Association for Computational Linguistics (EMNLP), 2020, pp. 1536–1547.
[24] R. Li et al., “StarCoder: May the source be with you!” arXiv Preprint arXiv:2305.06161, 2023.
[25] B. Rozière et al., “Code Llama: Open foundation models for code,” arXiv Preprint arXiv:2308.12950, 2024.
[26] S. Jia, R. Liu, J. Xu, and T. Yang, “Can large language models and vision-language models detect deepfakes?” arXiv Preprint, 2024.
[27] B. Steenhoek, M. M. Rahman, R. Jiles, and W. Le, “A comprehensive study of the capabilities of large language models for vulnerability detection,” arXiv Preprint arXiv:2403.17218, 2024.
[28] Y. Chen, Z. Ding, L. Chen, X. Fan, and D. Wagner, “DiverseVul: A new vulnerable source code dataset for deep learning based vulnerability detection,” in Proceedings of the 26th International Symposium on Research in Attacks, Intrusions and Defenses (RAID), 2023, pp. 654–668.
[29] X. Zhou, T. Zhang, and Lo. Lo, “Large language model for vulnerability detection: Emerging results and future directions,” in Proceedings of ICSE-NIER, 2024, pp. 47–51.
[30] G. Lu, X. Chen, B. Mao, K. Pei, and J. Gonzalez, “Grace: Empowering LLM-based software vulnerability detection with graph structure and in-context learning,” arXiv Preprint arXiv:2411.03592, 2024.
[31] Y. Wu et al., “How effective are neural networks for fixing security vulnerabilities,” in Proceedings of ISSTA, 2023, pp. 1282–1294.
[32] F. Demirkiran, A. Cayir, U. Unal, and H. Dag, “An ensemble of pre-trained transformer models for imbalanced multiclass malware classification,” Computers & Security, vol. 121, p. 102846, 2022.
[33] H. Xu, Z. Luo, M. Ma, H. Lu, and Y. Wang, “LLM4Decompile: Decompiling binary code with large language models,” in Proceedings of EMNLP, 2024.
[34] K. Pei, Z. Li, J. Ding, and B. Dolan-Gavitt, “Exploiting large language models for malware analysis,” arXiv Preprint, 2024.
[35] A. van der Heijden and L. Allodi, “Cognitive triaging of phishing attacks,” in Proceedings of the USENIX Security Symposium, 2019.
[36] T. Koide, D. Fukushi, H. Nakao, and D. Chiba, “Detecting phishing sites using ChatGPT,” arXiv Preprint arXiv:2306.05816, 2024.
[37] S. S. Roy, S. Nath, and D. Sisodia, “ChatBots to PhishBots? Preventing phishing attacks using large language models,” in DMNLP Workshop at AAAI, 2024.
[38] B. Alkhatib, S. Rass, and Y. Zhauniarovich, “Can BERT understand network traffic? Exploring the capabilities of NLP-based models for network analysis,” IEEE Transactions on Network and Service Management, vol. 19, no. 4, pp. 4946–4957, 2022.
[39] X. Liu, Y. Yang, and Z. He, “Large language model for network intrusion detection,” arXiv Preprint, 2024.
[40] K. Goodman, R. Rajagopalan, and M. Kremer, “A transformer-based framework for payload maliciousness detection,” in Proceedings of the Network and Distributed System Security Symposium (NDSS), 2024.
[41] K. Satvat, R. Gjomemo, and V. N. Venkatakrishnan, “EXTRACTOR: Extracting attack behavior from threat reports,” in Proceedings of the IEEE European Symposium on Security and Privacy (EuroS&P), 2021, pp. 598–615.
[42] S. Li, Z. Wen, and D. Kang, “TechniqueRAG: Adversarial technique annotation with retrieval-augmented generation,” in Findings of the Association for Computational Linguistics (ACL), 2025.
[43] B. Abdeen, A. Lakhani, E. Al-Shaer, and E. Academy, “RAGIntel: RAG-based LLM system for cyber attack investigation,” PeerJ Computer Science, vol. 10, p. e2517, 2024.
[44] M. Sahin, G.-V. Jourdan, F. Brust, and T. Kroeger, “Integrating large language models into security incident response,” in Proceedings of the USENIX Security Symposium, 2025.
[45] A. Chuvakin, F. Simorjay, and Y. Wei, “Large language models for security operations centers: A comprehensive survey,” arXiv Preprint, 2025.
[46] G. Siracusano, A. Ferroni, and S. Zanero, “Enhancing security operations center efficiency through multi-model integration of large language models and SIEM systems,” IEEE Transactions on Information Forensics and Security, 2024.
[47] A. Happe and J. Cito, “Getting pwn’d by AI: Penetration testing with large language models,” in Proceedings of the ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), 2023, pp. 2082–2086.
[48] J. Yang et al., “AutoAttacker: A large language model guided system to implement automatic cyber-attacks,” arXiv Preprint, 2024.
[49] Y. Shi et al., “SHIELD: An evaluation benchmark for face spoofing and forgery detection with multimodal large language models,” in Proceedings of ECCV, 2024.
[50] S. Hao, Y. Xu, L. Wang, and D. Wu, “Halligan: VLM agent for solving unseen visual CAPTCHAs,” in Proceedings of the USENIX Security Symposium, 2024.
[51] D. A. Coccomini, N. Messina, G. Amato, and F. Falchi, “Combining EfficientNet and vision transformers for video deepfake detection,” in Proceedings of the International Conference on Image Analysis and Processing (ICIAP), 2022, pp. 219–229.
[52] K. Greshake, S. Abdelnabi, S. Mishra, C. Endres, T. Holz, and M. Fritz, “Not what you’ve signed up for: Compromising real-world LLM-integrated applications with indirect prompt injection,” in Proceedings of the ACM Workshop on Artificial Intelligence and Security (AISec), 2023, pp. 79–90.
[53] S. Schulhoff et al., “Ignore this title and HackAPrompt: Exposing systemic weaknesses of LLMs through a global scale prompt hacking competition,” in Proceedings of EMNLP, 2023.
[54] A. Wei, N. Haghtalab, and J. Steinhardt, “Jailbroken: How does LLM safety training fail?” Advances in Neural Information Processing Systems (NeurIPS), vol. 36, 2024.
[55] A. Zou, Z. Wang, J. Z. Kolter, and M. Fredrikson, “Universal and transferable adversarial attacks on aligned language models,” arXiv Preprint arXiv:2307.15043, 2023.