Improving Spam Detection for German Users: A Machine Learning Approach to German Email Classification
DOI:
https://doi.org/10.71146/kjmr487Keywords:
Deutsch E-Mail Klassifizierung, ham und spam Klassifikator, DeutschE-Mail classification, Textklassifikation, Spam-Erkennung, Automatische E-Mail-SortierungAbstract
The proliferation of unsolicited and potentially harmful emails has necessitated the development of robust email classification systems. This study focuses on the classification of German-language emails using the CODEAALTAG dataset, a comprehensive collection of both legitimate (ham) and unwanted (spam) emails.
By leveraging this dataset, we apply various machine learning algorithms—including Naive Bayes, Support Vector Machines (SVM), Random Forests, and deep learning models—to accurately distinguish between ham and spam emails. The CODEAALTAG dataset is meticulously curated and features a wide array of attributes, including content-based features, header information, and technical metadata.
We evaluate the performance of these classification techniques using standard metrics such as accuracy, precision, recall, and F1-score. Our findings indicate that advanced feature selection methods and ensemble learning approaches significantly enhance classification accuracy.
The results demonstrate the efficacy of the CODEAALTAG dataset in training and validating high-performance email classifiers, thereby contributing to improved email security and user experience. This study underscores the importance of specialized datasets like CODEAALTAG in advancing the field of email filtering and provides valuable insights for future research and development in spam detection technologies.
Downloads

Downloads
Published
Issue
Section
License
Copyright (c) 2025 Kashif Iqbal, Muhammad Khalid, Shamim Akhtar, Sajid Yasin, Noor Ahmed, Aqsa Shahid (Author)

This work is licensed under a Creative Commons Attribution 4.0 International License.