DETECTING PHISHING ATTACKS IN CYBERSECURITY USING MACHINE LEARNING WITH DATA PREPROCESSING AND FEATURE ENGINEERING
DOI:
https://doi.org/10.71146/kjmr335Keywords:
Phishing Detection, Email Security, Ensemble Learning, Fraud Detection, Spam FilteringAbstract
Phishing attacks are one of the most persistent cybersecurity threats, evolving rapidly to bypass traditional security measures. Given the widespread use of email for sensitive communications, detecting phishing attempts has become more critical than ever. This study explores the effectiveness of multiple machine learning models in classifying phishing emails using a dataset of 39,000 samples. To enhance accuracy, we employ preprocessing techniques such as feature engineering, vectorization, and class balancing with SMOTE (Synthetic Minority Over-sampling Technique). Our analysis compares various models, including Random Forest, XGBoost, Logistic Regression, Naïve Bayes, and AdaBoost, evaluating their performance using precision, recall, F1-score, and accuracy metrics. The results demonstrate that ensemble learning techniques, particularly XGBoost and Random Forest, significantly outperform other models, achieving accuracy rates as high as 99.00%. These findings reinforce the importance of advanced classification techniques and data preprocessing in phishing detection. Beyond academic implications, our research contributes to strengthening email security, mitigating financial losses, and protecting personal data from cyber threats. Future work could focus on integrating deep learning models and real-time detection systems to further improve accuracy and adaptability.
Downloads

Downloads
Published
Issue
Section
License
Copyright (c) 2025 Sohaib Latif, Saher Pervaiz (Author)

This work is licensed under a Creative Commons Attribution 4.0 International License.