DETECTING PHISHING ATTACKS IN CYBERSECURITY USING MACHINE LEARNING WITH DATA PREPROCESSING AND FEATURE ENGINEERING

Sohaib Latif; Saher Pervaiz

doi:10.71146/kjmr335

Authors

Sohaib Latif Department of Computer Science and Software Engineering, Grand Asian University, Sialkot, 51310, Pakistan. Author
Saher Pervaiz Department of Computer Science, The University of Chenab, Gujrat, 50700, Pakistan. Author

DOI:

https://doi.org/10.71146/kjmr335

Keywords:

Phishing Detection, Email Security, Ensemble Learning, Fraud Detection, Spam Filtering

Abstract

Phishing attacks are one of the most persistent cybersecurity threats, evolving rapidly to bypass traditional security measures. Given the widespread use of email for sensitive communications, detecting phishing attempts has become more critical than ever. This study explores the effectiveness of multiple machine learning models in classifying phishing emails using a dataset of 39,000 samples. To enhance accuracy, we employ preprocessing techniques such as feature engineering, vectorization, and class balancing with SMOTE (Synthetic Minority Over-sampling Technique). Our analysis compares various models, including Random Forest, XGBoost, Logistic Regression, Naïve Bayes, and AdaBoost, evaluating their performance using precision, recall, F1-score, and accuracy metrics. The results demonstrate that ensemble learning techniques, particularly XGBoost and Random Forest, significantly outperform other models, achieving accuracy rates as high as 99.00%. These findings reinforce the importance of advanced classification techniques and data preprocessing in phishing detection. Beyond academic implications, our research contributes to strengthening email security, mitigating financial losses, and protecting personal data from cyber threats. Future work could focus on integrating deep learning models and real-time detection systems to further improve accuracy and adaptability.