Improving Spam Detection for German Users: A Machine Learning Approach to German Email Classification

Kashif Iqbal; Muhammad Khalid; Shamim Akhtar; Sajid Yasin; Noor Ahmed; Aqsa Shahid

doi:10.71146/kjmr487

Improving Spam Detection for German Users: A Machine Learning Approach to German Email Classification

Authors

Kashif Iqbal Computer Science Department, Greenwich University Karachi, Pakistan. Author
Muhammad Khalid Computer Science Department, Greenwich University Karachi, Pakistan. Author
Shamim Akhtar Faculty of Engineering Science and Technology, IQRA University, Karachi. Author
Sajid Yasin Computer Science Department, Greenwich University Karachi, Pakistan. Author
Noor Ahmed Computer Science, SZABIST, Street, Karachi, 10587, Sindh, Pakistan. Author
Aqsa Shahid Department of Computer Science & Software Engineering, Ziauddin University, Karachi, Pakistan. Author

DOI:

https://doi.org/10.71146/kjmr487

Keywords:

Deutsch E-Mail Klassifizierung, ham und spam Klassifikator, DeutschE-Mail classification, Textklassifikation, Spam-Erkennung, Automatische E-Mail-Sortierung

Abstract

The proliferation of unsolicited and potentially harmful emails has necessitated the development of robust email classification systems. This study focuses on the classification of German-language emails using the CODEAALTAG dataset, a comprehensive collection of both legitimate (ham) and unwanted (spam) emails.

By leveraging this dataset, we apply various machine learning algorithms—including Naive Bayes, Support Vector Machines (SVM), Random Forests, and deep learning models—to accurately distinguish between ham and spam emails. The CODEAALTAG dataset is meticulously curated and features a wide array of attributes, including content-based features, header information, and technical metadata.

We evaluate the performance of these classification techniques using standard metrics such as accuracy, precision, recall, and F1-score. Our findings indicate that advanced feature selection methods and ensemble learning approaches significantly enhance classification accuracy.

The results demonstrate the efficacy of the CODEAALTAG dataset in training and validating high-performance email classifiers, thereby contributing to improved email security and user experience. This study underscores the importance of specialized datasets like CODEAALTAG in advancing the field of email filtering and provides valuable insights for future research and development in spam detection technologies.

Downloads

Download data is not yet available.

Downloads

Published

2025-06-01

Issue

Vol. 2 No. 06 (2025): Jun 2025

Section

Engineering and Technology

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

KJMR publishes all articles as open access under CC BY 4.0, allowing anyone to share and adapt the work, even commercially, with proper credit, a license link, and clear notice of changes, without implying endorsement. Authors retain copyright while granting the journal non-exclusive publishing and archiving rights and may self-archive without embargo. Third-party material requires proper permission, and the journal ensures long-term free access through its website and archiving partners.

How to Cite

Improving Spam Detection for German Users: A Machine Learning Approach to German Email Classification. (2025). Kashf Journal of Multidisciplinary Research, 2(06), 81-99. https://doi.org/10.71146/kjmr487