CONVOLUTIONAL NEURAL NETWORK ARCHITECTURES FOR ROBUST IMAGE FEATURE REPRESENTATION IN RETRIEVAL SYSTEMS

Salahuddin; Muhammad Faisal Sohail; Hassan Raza Khan; Muhammad Tanveer Meeran

doi:10.71146/kjmr840

Authors

Salahuddin NFC Institute of Engineering and Technology, Multan, Pakistan. Author
Muhammad Faisal Sohail Government College University Faisalabad, Pakistan. Author
Hassan Raza Khan The Islamia University of Bahawalpur, Pakistan. Author
Muhammad Tanveer Meeran Universiti Malaysia Terengganu, Malaysia. Author

DOI:

https://doi.org/10.71146/kjmr840

Keywords:

Image descriptors, GoogleNet, Feature extraction, Image retrieval, Image classification

Abstract

Machine learning techniques have become fundamental to modern image classification and retrieval systems, enabling the extraction of meaningful and discriminative visual features from large image collections. In this research, we propose an efficient framework for generating image descriptors using deep Convolutional Neural Network (CNN) architectures, specifically GoogleNet , Inception V3 , and DenseNet-201 . These pre-trained deep models are utilized to capture multi-level visual characteristics, including fine texture patterns, shape cues, and high-level object semantics. To further strengthen the descriptive power of the extracted features, information from the three color channels is encoded and integrated, improving retrieval accuracy while preserving computational efficiency and response time.

As images propagate through the hierarchical layers of the CNNs, increasingly abstract feature maps are produced, forming distinctive feature signatures that represent the visual content. These signatures are then reorganized into a newly constructed feature matrix designed to encode spatial relationships, chromatic properties, and latent structural patterns within the image. This enriched representation provides a more holistic description of image content, making it suitable for content-based image retrieval (CBIR) and visual similarity analysis.

To validate the effectiveness and generalization capability of the proposed method, experiments were conducted on four widely recognized benchmark datasets: Corel-1K, CIFAR-10, 17-Flowers, and ZuBuD . The evaluation considered both retrieval precision and computational performance across datasets with varying resolutions, object categories, and scene complexities. Experimental results indicate that all three CNN architectures produce discriminative descriptors; however, DenseNet-201 consistently delivered superior performance on the CIFAR-10 dataset, which includes diverse object classes and varying image scales. Its densely connected architecture promotes improved feature reuse and gradient flow, leading to higher classification accuracy and more robust retrieval outcomes compared to GoogleNet and Inception V3. Overall, the proposed CNN-based descriptor generation framework demonstrates strong potential for scalable and accurate image retrieval applications in multimedia databases and intelligent visual search systems.