AUTOMATING CYBER THREAT INTELLIGENCE EXTRACTION USING NATURAL LANGUAGE PROCESSING TECHNIQUES

Authors

  • Amjad Jumani Lecturer At Faculty Of Science And Technology Ilma University Karachi Author
  • Amber Baig Department of Computer Science, Faculty of Engineering, Science & Technology, Isra University, Hyderabad. Author
  • Engr. Dr. Shamim Akhtar Adjunct professor, Department of Information Systems and Cybersecurity, University of Common wealth Caribbean Author
  • Muhammad Shahmir Shamim Student, University of California Irvine, Author
  • Hira Zaheer UET Lahore Author
  • Areej Changaiz MSCS Computer Science , MYU University Author

DOI:

https://doi.org/10.71146/kjmr498

Keywords:

Cyber Threat Intelligence, Natural Language Processing, BERT, Entity Recognition, Information Extraction, Transformer Models, Cybersecurity, Threat Detection, Text Mining, Dependency Parsing

Abstract

The increasing negligence and complexity of online confrontations have made it abundantly clear that an organization must place a premium on real-time, ready-to-use, and expandable Cyber Threat Intelligence (CTI) strategies. The classical approach to CTI collection and analysis that heavily involves manual work over raw unstructured text-based data including threat reports, blogs, and advisories cannot keep up with the requirements of current cybersecurity threats. In this study, an intermediate form of Natural Language Processing (NLP) framework is introduced utilizing the state-of-the-art transformer models, namely fine-tuned versions of BERT architectures, and syntactic dependency parsing and domain-specific rule-based post-processing to automate CTI extraction. The dataset of more than 5,000 cybersecurity documents was created with a custom label that allows the system to extract the strongest threat entities such as names of malware, CVEs, IP addresses, threat actors, and TTPs. As experimental comparisons prove the proposed system vastly surpasses the existing BiLSTM-CRF and traditional CRF baselines scoring 0.90 F1-score in entity recognition. Error analysis also showed that syntactic and rule-based enhancements produced a big difference in entity fragmentation and false positives. The paper also investigates how preprocessing or data source quality and the process of entity links to external knowledge bases can aid in the optimal extraction of CTI. The findings demonstrate the promise of using advanced NLP methods to revolutionize CTI processes to perform more accurate, faster, and scalable threat intelligence processing to support proactive cybersecurity defense.

Downloads

Download data is not yet available.
image

Downloads

Published

2025-06-25

Issue

Section

Engineering and Technology

How to Cite

AUTOMATING CYBER THREAT INTELLIGENCE EXTRACTION USING NATURAL LANGUAGE PROCESSING TECHNIQUES. (2025). Kashf Journal of Multidisciplinary Research, 2(06), 184-201. https://doi.org/10.71146/kjmr498

Most read articles by the same author(s)

Similar Articles

21-30 of 230

You may also start an advanced similarity search for this article.