ADVANCING NAMED ENTITY RECOGNITION FOR URDU: A COMPARATIVE STUDY OF MACHINE LEARNING AND DEEP LEARNING APPROACHES

Authors

  • Muhammad Ali Hassan Department Of Computer Science, University Of Southern Punjab, Multan Author
  • Talha Farooq Khan Department Of Computer Science, University Of Southern Punjab, Multan Author
  • Mubasher Malik Department Of Computer Science, University Of Southern Punjab, Multan Author
  • Muhammad Sabir Department Of Computer Science, University Of Southern Punjab, Multan Author
  • Abdul Haseeb Qureshi Department Of Computer Science, University Of Southern Punjab, Multan Author

DOI:

https://doi.org/10.71146/kjmr536

Keywords:

Urdu Named Entity Recognition (NER), Machine Learning, Deep Learning, BiLSTM-GRU, mBERT, XLM-RoBERTa, Conditional Random Field (CRF), Logistic Regression, Support Vector Machine (SVM), Sequence Labeling, Low-Resource Languages, Natural Language Processing (NLP), Text Classification, Urdu Language Processing

Abstract

This paper introduces both Machine Learning (ML) and state-of-the-art Deep Learning (DL) methods for Named Entity Recognition (NER) in Urdu a low-resource language. The work compares a variety of models such as Conditional Random Fields (CRF), Logistic Regression, Support Vector Machines (SVM), BiLSTM+GRU, mBERT, and XLM-RoBERTa on a cross domain dataset of more than 1 million tokens for eight entity classes. Performance was compared using typical metrics: precision, recall, F1-score, and accuracy. Among the ML models, CRF had the best F1-score of 0.9899 and accuracy of 97%, lagging behind Logistic Regression and SVM. However, deep learning models performed much better than traditional approaches. The results show that our proposed hybrid technique outperforms existing state of the art techniques on Urdu NER, achieving an F-score of up to 0.997 when using BiLSTM+GRU, followed closely by XLM-RoBERTa and mBERT with F1-scores of 0.9969 and 0.996, respectively. One of the novel contributions of this paper is training and testing models on naturally ordered, domain-specific Urdu text, and building an in-house annotated corpus. It is proven from our results that transformer-based and hybrid recurrent models perform incredibly well for under-resourced NER tasks given the provision of clean, domain-specific data. This paper opens the way to future work on building real-world NLP applications for under-resourced languages.

Downloads

Download data is not yet available.
image

Downloads

Published

2025-07-15

Issue

Section

Engineering and Technology

How to Cite

ADVANCING NAMED ENTITY RECOGNITION FOR URDU: A COMPARATIVE STUDY OF MACHINE LEARNING AND DEEP LEARNING APPROACHES. (2025). Kashf Journal of Multidisciplinary Research, 2(07), 75-86. https://doi.org/10.71146/kjmr536

Most read articles by the same author(s)

Similar Articles

11-20 of 228

You may also start an advanced similarity search for this article.