AUTOMATING CYBER THREAT INTELLIGENCE EXTRACTION USING NATURAL LANGUAGE PROCESSING TECHNIQUES
DOI:
https://doi.org/10.71146/kjmr498Keywords:
Cyber Threat Intelligence, Natural Language Processing, BERT, Entity Recognition, Information Extraction, Transformer Models, Cybersecurity, Threat Detection, Text Mining, Dependency ParsingAbstract
The increasing negligence and complexity of online confrontations have made it abundantly clear that an organization must place a premium on real-time, ready-to-use, and expandable Cyber Threat Intelligence (CTI) strategies. The classical approach to CTI collection and analysis that heavily involves manual work over raw unstructured text-based data including threat reports, blogs, and advisories cannot keep up with the requirements of current cybersecurity threats. In this study, an intermediate form of Natural Language Processing (NLP) framework is introduced utilizing the state-of-the-art transformer models, namely fine-tuned versions of BERT architectures, and syntactic dependency parsing and domain-specific rule-based post-processing to automate CTI extraction. The dataset of more than 5,000 cybersecurity documents was created with a custom label that allows the system to extract the strongest threat entities such as names of malware, CVEs, IP addresses, threat actors, and TTPs. As experimental comparisons prove the proposed system vastly surpasses the existing BiLSTM-CRF and traditional CRF baselines scoring 0.90 F1-score in entity recognition. Error analysis also showed that syntactic and rule-based enhancements produced a big difference in entity fragmentation and false positives. The paper also investigates how preprocessing or data source quality and the process of entity links to external knowledge bases can aid in the optimal extraction of CTI. The findings demonstrate the promise of using advanced NLP methods to revolutionize CTI processes to perform more accurate, faster, and scalable threat intelligence processing to support proactive cybersecurity defense.
Downloads

Downloads
Published
Issue
Section
License
Copyright (c) 2025 Amjad Jumani, Amber Baig, Engr. Dr. Shamim Akhtar, Muhammad Shahmir Shamim, Hira Zaheer, Areej Changaiz (Author)

This work is licensed under a Creative Commons Attribution 4.0 International License.