Machine Learning-Based Fruit Pesticide Screening for Identifying Edible-Quality Fruit: An Experimental Study

Premier Science > Machine Learning-Based Fruit Pesticide Screening for Identifying Edible-Quality Fruit: An Experimental Study

Listen

Gunapriya Devarajan¹ , Pushpalatha Naveenkumar¹, Ramkumar Balasubramaniyam², Arrthi Murughanandham¹, Balaji Pandiyan1, Karthika Senathipathi¹ and Varshan Thangaraj¹
1. Department of Electrical and Electronics Engineering, Sri Eshwar College of Engineering, Coimbatore, Tamil Nadu, India
2. Department of Electrical and Electronics Engineering, VSB College of Engineering Technical Campus, Coimbatore, Tamil Nadu, India
Correspondence to: Gunapriya Devarajan, gunapriya.d@sece.ac.in

DOI: https://doi.org/10.70389/PJS.100251

Additional information

Ethical approval: N/a
Consent: N/a
Funding: No industry funding
Conflicts of interest: N/a
Author contribution: Gunapriya Devarajan, Pushpalatha Naveenkumar,Ramkumar Balasubramaniyam, Arrthi Murughanandham, Balaji Pandiyan, Karthika Senathipathi, Varshan Thangaraj – Conceptualization, Writing – original draft, review and editing
Guarantor: Gunapriya Devarajan
Provenance and peer-review: Unsolicited and externally peer-reviewed
Data availability statement: N/a

Keywords: Banana pesticide monitoring, Hyperspectral imaging CNN, Electronic-nose VOC analysis, Raspberry Pi embedded detector, CNN-SVM hybrid classifier.

Peer Review
Received: 15 August 2025
Last revised: 31 October 2025
Accepted: 17 December 2025
Version accepted: 4
Published: 17 January 2026

“Medium-density infographic illustrating a machine learning–based fruit pesticide screening system for identifying edible-quality bananas. The visual presents a portable Raspberry Pi–based detection setup using a Pi camera and chemical VOC sensor, combined with a hybrid CNN and SVM model. It highlights experimental results showing 94.20% accuracy with CNN, 96.00% accuracy with SVM, real-time detection with LCD display and buzzer alerts, and validation against GC–MS reference standards following WHO residue limits.”

Abstract

Pesticide residues on agricultural produce pose considerable health risks, necessitating the development of rapid, accurate, and accessible detection solutions to enhance food safety. This paper presents the design and implementation of a portable, low-cost system for onsite pesticide residue detection in bananas. The proposed system is built around a Raspberry Pi microprocessor integrated with a Pi camera for image acquisition and a chemical sensor for volatile organic compound (VOC) analysis. A hybrid deep learning framework comprising Convolutional Neural Networks (CNNs) and Support Vector Machines (SVMs) is employed, trained on a custom dataset consisting of 4,800 images and 4,800 VOC samples obtained from both untreated and pesticide-exposed bananas under natural lighting conditions.

The embedded system performs real-time data acquisition and processing, achieving detection accuracies of 94.20% using CNN and 96.00% using SVM, with an average inference time of approximately 1.60 seconds per sample. Detection results are displayed on a 16 × 2 LCD module, and an alert is issued via a buzzer when pesticide concentrations exceed a predefined safety threshold. Experimental validation confirms the system’s effectiveness in combining visual and chemical data for improved detection accuracy. Future work will focus on enhancing sensor specificity and optimizing the classification models to support broader agricultural applications. Ground truth validation of pesticide residues was performed using an Agilent 7890A GC system with 5975C Mass Selective Detector, confirming residue thresholds based on WHO Maximum Residue Limits.

Introduction

The increasing global demand for high-quality, safe, and sustainable fruit production has heightened the urgency for efficient pesticide residue detection in agricultural commodities. Bananas (Musa spp.), one of the most widely consumed and traded fruits globally, are particularly prone to excessive pesticide application due to their susceptibility to pests and diseases such as black sigatoka and the banana weevil. Improper pesticide use not only poses serious risks to human health and environmental sustainability but also threatens compliance with international trade regulations. Conventional analytical techniques such as gas chromatography–mass spectrometry (GC-MS) provide accurate results but are inherently limited by their high cost, lengthy processing time, and requirement for specialized laboratory environments, rendering them unsuitable for real-time, field-level applications. Recent advancements in artificial intelligence (AI) have demonstrated significant potential in developing rapid, non-destructive, and cost-effective approaches for pesticide residue detection.

AI-based systems, utilizing machine learning (ML) and deep learning (DL) algorithms, are capable of processing complex datasets from imaging, spectroscopy, or sensor modalities to identify contamination with high accuracy. This study investigates AI-driven techniques for pesticide residue detection in bananas, with the objective of enhancing food safety, promoting sustainable agricultural practices, and enabling precision farming. A review of existing methodologies is presented, followed by the implementation and evaluation of two AI models trained on curated datasets to address the limitations of traditional detection methods in banana production.

Literature Survey

According to the article “Marketing Research on the Indian Market,”¹ bananas (Musa paradisiaca L.) represent the most widely consumed fruit in India. Recognized as a valuable tropical fruit, bananas offer significant nutritional benefits, including a rich source of calories, natural sugars, dietary fiber, and essential vitamins and minerals, while being inherently low in fat content. In many countries, bananas serve as a dietary staple.¹ In addition to vitamins and minerals, as well as having a naturally low-fat content. Bananas are a staple food source in many parts of the world. Bananas are also an important source of potassium and contain large amounts of phosphorus, magnesium, carbohydrates, and protein, with relatively low calories, plus vitamins C, B6, and B1.^2,3

When bananas are harvested too early, their texture and sugar content vary. However, when bananas go through natural ripening, uncontrolled, the bananas may either spoil or over-ripen. According to BAS EN 15662:2011, the detection of multiple pesticide residues in bananas has been performed using GC-MS. Specifically, the Agilent 7890A GC system, coupled with the Agilent 5975C Series Mass Selective Detector (MSD), has been employed for residue analysis in banana samples.² With the rise in banana consumption and competitive market pressures, producers have increasingly adopted chemical ripening methods to accelerate the ripening process and reduce time to market.^2,4–7 Ethylene gas, a naturally occurring plant hormone, is also widely utilized in controlled ripening environments as a safer alternative to chemical ripening agents.^8–12

Near-Infrared spectroscopy has been widely adopted for the rapid assessment of moisture, protein, and fat content in a variety of agricultural and food products. Its applications extend across multiple industries including polymers, food processing, textiles, pharmaceuticals, and agriculture where it serves both qualitative and quantitative analytical purposes.^13–17 The presence of toxic pesticide residues adversely affects not only agricultural productivity but also compromises the quality of end products in these sectors. Consequently, the accurate detection of pesticide residues during production processes is critical for ensuring food safety and regulatory compliance. One of the primary challenges in pesticide residue detection lies in distinguishing between healthy and contaminated fruits, particularly when their external characteristics are visually similar.

Recent advances in DL, particularly Deep Convolutional Neural Networks (CNNs), have enabled the development of predictive models capable of classifying fruit quality using standard photographic data.^18–21 Although chemical ripening methods are widely employed to increase fruit availability and expedite distribution, concerns persist regarding their impact on nutritional quality and food safety. Studies have indicated that chemically ripened bananas and papayas may exhibit reduced levels of nutrients and antioxidants compared to naturally ripened counterparts. Unlike conventional chemical and sensor-based detection systems, the present study proposes an AI-based approach that offers a scalable, efficient, and non-destructive solution for monitoring food quality. This method enhances consumer safety and facilitates compliance with regulatory standards while improving the traceability of agricultural products.^22,23

In recent years, studies have started employing DL methods, namely CNNs, in the area of hyperspectral imaging and spectral analysis for pesticide residues in fruits and vegetables. With more advances in edge AI hardware, pesticide detection systems are becoming more effective, efficient, and robust by detecting pesticides in real-time, on-site, using both spectral and image data. Proven work addressing some of the areas has shown spectral preprocessing and data fusion techniques can improve detection sensitivity and usefulness, and totally new promising works shows a very bright future for AI technologies as not only a tool but a way to improve safety in agriculture.

Proposed Method

Proposed Artificial Intelligence-Based Algorithms for Pesticide Residue Detection in Bananas

In this research, we will evaluate pesticide residue levels on bananas using two different AI algorithms: CNNs and Support Vector Machines (SVMs). Both algorithms have achieved an exceptionally high level of effectiveness in conducting classification tasks with fruit, as well as in other areas to which it has been applied, and are fundamentally applicable to the problem at hand. We selected these algorithms on account of their established mechanism of successfully recognizing patterns and classifying them within agricultural datasets. This section will describe how the two proposed AI models will be implemented and constructed using the training and testing datasets. It will also describe the performance assessment metrics used to evaluate the effectiveness of both algorithms.

The CNN model consisted of three convolutional layers, which had a 3 × 3 filter size with 32, 64, and 128 channels, max-pooling, and ReLU activation layers, and two fully connected layers with dropouts for the classification. The hyperparameters of the model were optimized using a grid search, which produced a learning rate = 0.001 for the Adam optimizer. The SVM used an radial basis function (RBF) kernel, which had the parameters C = 1.0 and γ = “scale”. Since the image and volatile organic compound (VOC) data represent two different angles of the Inspiration 5 model, the models were used separately to allow for a more efficient examination of the embedded hardware. Future work will attempt to fuse the two modalities.

1. CNN

Description: CNNs offer a significant advantage in their ability to efficiently extract spatial features from image or spectral data, making them particularly suitable for tasks involving visual pattern recognition. In the context of pesticide residue detection in bananas, CNNs demonstrate strong performance when applied to hyperspectral imaging data. In this study, a customized CNN architecture was developed, comprising three convolutional layers followed by max-pooling and fully connected dense layers. These layers are designed to extract and learn hierarchical features relevant for accurate classification of pesticide contamination.
Dataset: A hyperspectral image dataset comprising 5,000 banana samples was compiled, with each image having a resolution of 224 × 224 pixels. The spectral range of the images spans from 400 to 1,000 nm, capturing reflectance characteristics of the banana surface under various conditions. The dataset includes images of bananas treated with specific pesticides—chlorpyrifos and mancozeb—as well as untreated (chemical-free) samples. The dataset was partitioned into training (3,500 images, 70%), testing (750 images, 15%), and validation (750 images, 15%) subsets. Ground truth labeling was established using GC-MS to verify the presence of pesticide residues. Each image was labeled as either “Residue Present” or “Residue Absent” based on the GC-MS confirmation results. Image data was acquired using a Raspberry Pi Camera Module v2 with an 8-megapixel Sony IMX219 sensor identically with a fixed focus lens, with a resolution of 3,280 × 2,464 pixels. Lighting was controlled using two 5,000 K LED panels, and the intensity was continually monitored to provide even lighting at 1,000 lux. The first set of VOC was measured using an array of 11 MQ-series gas sensors (MQ-2, MQ-3, MQ-5, MQ-7, MQ-135, and MQ-138), which were calibrated using standard gas mixtures, to confirm stability and reproduction of the sensors.
Implementation: Using an Adam optimizer with a learning rate of 0.001, the proposed model was implemented in the Tensor Flow environment. Loss functions like categorical cross-entropy were properly selected for multi-class classification settings. To enhance model generalization, various data augmentation techniques such as rotation and horizontal flipping were applied to the training dataset. These augmentations increased the dataset size and improved overall inference performance.

2. SVMs

Description: SVMs are well-suited for use with classification problems dealing with data that have high dimensionality, such as VOC profiles acquired from E-nose sensors. Here, we used an SVM model with a RBF kernel to classify banana samples either with pesticide contamination or without. The RBF kernel allows for non-linear separation in feature space and provides a tangible representation of the highly complex VOC signatures that occur with pesticide residue.
Dataset: An E-nose (E-nose) dataset was created from cut banana samples that were exposed to chlorpyrifos, mancozeb, and control (no treatment) conditions, covering a total of 4,000 VOC samples. The VOC data was collected from 11 gas sensors as a single set of data for every sample collected, thus each sample was recorded from 11 sensor readings. The samples were assigned to either “Contaminated” or “Clean” depending on the treatment condition of the banana. The dataset was partitioned into training (2,800 samples, 70%), validation (600 samples, 15%), and testing (600 samples, 15%) subsets for model development and performance evaluation.
Implementation: The SVM model was implemented using the Scikit-learn library with a regularization parameter C = 1.0 and the kernel coefficient γ = “scale”. Prior to model training, the VOC data were standardized to have zero mean and unit variance to ensure optimal performance and convergence.

Block Diagram

Figure 1 illustrates the overall architecture of the proposed vision-based system.

Dataset Preparation

This stage encompasses the acquisition of hyperspectral images and electronic nose VOC data from banana samples treated with common pesticides such as chlorpyrifos and mancozeb, along with untreated control samples. As previously described, the curated dataset includes 6,000 hyperspectral images with a spatial resolution of 224 × 224 pixels and 5,000 VOC samples captured using an array of 11 gas sensors. These datasets form the foundation for training and evaluating the proposed AI models for pesticide residue detection.

Spectral/Image Processing

To enhance model performance and ensure data consistency, several preprocessing steps were applied to both the hyperspectral image and VOC datasets. For the VOC data, normalization was performed to scale values within the range [0,1][0,1][0,1], and noise reduction techniques were applied to minimize sensor fluctuations. When dealing with image data, data augmentation techniques using rotation and scaling were utilized to create a diverse data set and improve the model’s performance and generalization capacity.

Data Splitting

For systematic and organized model development and evaluation, the data sets were partitioned into three subsets of 70% for training, 15% for validation, and 15% for testing. This splitting technique allowed the effective tuning of the model hyperparameters when training, and effective evaluation of the model as a general classifier with new unseen data.

Feature Extraction

In the proposed system, for each model, several feature extraction methodologies were performed. For the CNN, spatial and spectral feature extractions were performed directly from the hyperspectral images to utilize the spatial learning method from the CNN models. For the SVM, readings from the VOC sensors were used as the input features, and when necessary, dimensionality reduction techniques such as Principal Component Analysis (PCA) were conducted in order to improve the classification performance and limit the computation time.

Train the Model using Machine Learning (CNN/SVM)

The CNN model ran using the Tensor-flow framework with a learning rate of 0.001, and used the Adam optimizer. The loss function was categorical cross-entropy, considering the binary classification task. As for the training data set, the data set was augmented by mutating the images forming the training set for example- rotation, flipping etc. This process helps with generalization, and prevents the model from overfitting. For the SVM, implementation of the SVM was achieved using the Scikit-learn library. The SVM uses the RBF kernel, which is known to handle non-linear separation with high-dimensional VOC data better than any other kernel. Hyperparameters were set to C = 1.0 and γ = “scale”. Before training, VOC data were standardized to zero mean and unit variance.

Evaluate the Model based on Test Data

The performance of the proposed models was assessed using standard classification metrics, including accuracy, precision, recall, and F1-score. The CNN achieved a classification accuracy of 94.20% on the hyperspectral image dataset, while the SVM attained an accuracy of 96.00% on the VOC dataset. Inference time per sample was approximately 1.60 seconds for both models, enabling near real-time evaluation. Additionally, the results were cross-validated using ground truth labels obtained through GC-MS analysis. The robustness of the models was further confirmed through evaluation on the validation and test subsets, demonstrating consistent performance across different data partitions.

Output (Pesticide Residue Detection)

Because models were deployed on a Raspberry Pi microprocessor, and also attached the Pi Camera for image collection and an array of gas sensors for the VOC data collection, the system is built to process the input data from the camera and sensors in real time. The output of the system, or classification for a given sample, is whether pesticide residues are present or absent. The results of the Classification outputs are displayed on a 16 × 2 LCD screen, while the audio buzzer provides alarm capabilities in the form of an alert system to alarm operators in case they exceed established safety thresholds for pesticide levels. The described platform integration enables rapid, on-site analysis of banana samples, which combine visual and chemical data streams for greater reliability and easier usability for its operators.

Decision Making (Contaminated or Clean Banana)

This paper offers a portable, AI-enabled system for the detection of pesticide residues in bananas using hyperspectral imaging and VOC analysis. The system was detected with high performance and low computational load by using CNNs for image-based analysis and SVMs for classification of VOC data. The output of the model then lets them make a decision about whether a banana was classified as clean or contaminated, assisting with quality control or regulatory decisions. Taken together, the real-time capability of the system and the low-cost hardware make the system feasible for food safety monitoring in real-world applications. Future work on the project will be focused on optimizing the sensors used, creating additional sets of data with more diversity, and building on model architectures to optimize for robustness across other agricultural applications.

Materials and Methods

Bananas ripen in their natural state, from Natural green, through Yellow with Green Ends, to Yellow, as represented in Figure 2. The light green with light yellow, full yellow, and yellow with brown spots stages are marked with color and texture changes, according to the trials. This sequence is vital to be included in residue impacts tracking through maturation because it helps establish the stages of maturation of the fruit. It is surely going to aid in the field of detection systems for pesticides.

Fig 2 | Natural ripening stages of bananas — **Figure 2: Natural ripening stages of bananas.**

Instrumentation and Sensor Specifications

A Raspberry Pi Camera Module v2 (Sony IMX219, 8 MP, fixed focus) was used for imaging and was positioned ~25 cm from the samples, all using LED light panels for illumination. Hyperspectral imaging was conducted using a Specim FX10 hyperspectral camera that had a range of 400–1,000 nm and 224 spectral bands. Each sampling session for the Specim FX10 was calibrated against a white and dark reference and used ~1.5 ms integration time. The MQ-series gas sensors were used to sense the VOCs at 5 V with heaters maintained at ~300°C and were also calibrated against known volumes of standard gases prior to the sampling so the baseline and gas response could be differentiated. Each MQ-series gas sensor was calibrated using certified standard gases. Calibration gases included ethanol, carbon monoxide, methane, hydrogen, ammonia, and benzene at concentrations ranging from 10 to 500 ppm. Calibration curves plotting sensor response voltage against gas concentration were generated.

Validation and Calibration of GC-MS

Pesticide residue quantification was done using an Agilent 7890A GC system with a 5975C MSD, and an HP-5 ms capillary column. The splitless injection method used helium as the carrier gas at a flow rate of 1 mL/min, and the separation process used a ramp temperature oven. Detection limits were recorded at 0.01 mg/kg (chlorpyrifos) and 0.05 mg/kg (mancozeb); any sample if > 0.05 mg/kg WHO Maximum Residue Limits (MRL) level (chlorpyrifos) or >2 mg/kg WHO MRL level (mancozeb) was classified as contaminated.

Data and Statistical Analysis

Before separating the hyperspectral images dataset from the VOC samples dataset, we randomized the dataset to eliminate any class imbalance. We applied stratified random sampling so that we would maintain balanced class proportions, as indicated by the proportion of each class in the entire dataset, when we separated our dataset into training (70%), validation (15%), and testing (15%) – as is typical when people separate their dataset to train and assess their models. We were also able to evaluate model stability via k-fold cross-validation; we applied k = 5, and evaluated the average performance across each of the k-folds. We tested the significance of any differences in infection status prediction performance for each model using paired t-tests, and also reported the 95% confidence intervals and P-values because these provide clues regarding the meaningfulness of differences. I believe this method of data handling and statistical validation provides a means of evaluating model utility and generalizability.

Evaluation Metrics

The model’s performance was summarized using a standard set of classification metrics: accuracy (the proportion of correct predictions out of all samples), precision (the proportion of true positives out of all the positive predictions), recall (the percent of true positives out of all actual positives), and F1-Score (the harmonic mean of precision and recall). Collectively, the metrics evaluate the ability of the model to correctly identify pesticide-contaminated and clean banana samples.

Dataset Creation

Natural and Pesticide-Treated Bananas

The objective of this study was to construct a balanced dataset comprising hyperspectral images and VOC data to support the development of robust pesticide detection algorithms based on CNNs and SVMs. The dataset specifically includes banana samples, with data acquisition performed under controlled conditions to ensure consistency. All image captures and VOC measurements were conducted at a constant ambient temperature of 26°C under standardized illumination. To eliminate environmental variability and ensure uniformity in data collection, banana samples were placed in five identical plastic enclosures, each measuring 8 × 7 × 4 in. A subset of bananas was allowed to ripen naturally, while others underwent artificial ripening using a smoking method and were subsequently treated with pesticides, including chlorpyrifos and mancozeb, to simulate contamination.

As depicted in Figure 3, distinct color variations observed at the fourth stage highlight the combined effects of the ripening process and pesticide exposure. From the Figure 3 the change of color in bananas was captured on the fourth stage, representing both ripening process and the pesticide influences. Though the raw image and VOC datasets cannot be publicly distributed due to institutional restrictions, the dataset generation process is fully documented herein. Images were captured at 224 × 224 pixels, using a Raspberry Pi Camera Module v2 under constant illumination at 1,000 lux. VOC readings from 11 MQ-series sensors were sampled at 1 Hz. Data were labeled based on ground truth GC-MS pesticide residue analysis.

Fig 3 | Natural ripening of banana — **Figure 3: Natural ripening of banana.**

Chemically Treated Bananas

To simulate a range of agricultural conditions, the dataset was expanded to include banana samples subjected to chemical ripening using calcium carbide, in conjunction with pesticide application. Both chemically ripened and pesticide-contaminated bananas were stored in five identical plastic containers measuring 8 × 7 × 4 in., consistent with those used for naturally ripened counterparts to maintain uniform experimental conditions. VOC measurements and image captures were systematically recorded at three-hour intervals over the course of a full year. This longitudinal monitoring enabled the visualization of both visual and chemical changes associated with ripening and pesticide residue degradation. The final dataset comprises 4,800 hyperspectral images and 4,800 VOC samples, encompassing naturally ripened, chemically ripened, and chemically treated banana samples.

The dataset was partitioned into 70% for training (3,360 images and 3,360 VOC samples), 10% for validation (480 images and 480 VOC samples), and 20% for testing (960 images and 960 VOC samples). Figure 4 illustrates the temporal transformation in both appearance and chemical profile of the bananas across a 4-day treatment period in a chemically treated environment. Banana samples were treated with chlorpyrifos at a concentration of 0.5 mg/kg and mancozeb at 2 mg/kg to simulate typical pesticide application rates used in commercial banana cultivation. These concentrations align with the WHO MRLs and were prepared by diluting the technical grade pesticides in distilled water before application to the fruit surface.

Fig 4 | Chemical ripening of banana — **Figure 4: Chemical ripening of banana.**

Training the Models

The collected dataset was utilized to train and evaluate two ML models: a CNN for image-based pesticide detection and a SVM for analyzing VOC sensor data. Both models were chosen for their robustness in handling high-dimensional data and demonstrated effectiveness in prior agricultural and food-quality applications. For the CNN model, hyperspectral banana images of 224 × 224 pixels were used as input. A custom CNN architecture was developed comprising three convolutional layers, max-pooling layers, and fully connected layers for classification. The model was implemented using Tensor Flow with a learning rate of 0.001, the Adam optimizer, and categorical cross-entropy as the loss function. Data augmentation techniques, such as rotation and horizontal flipping, were performed to increase (i.e., generalization and performance) the potential of the models.

Simultaneously, a SVM model was trained for a set of VOC profiles resulting from 11 gas sensors. Each VOC sample was created from eleven readings from eleven sensors and it was classified either as Contaminated or Clean according to confirming knowledge from the GC-MS analyst. The SVM model used an RBF kernel with parameters C = 1.0 and γ = “scale”. The data were standardized to 0 mean and unit variance before model construction. PCA was performed to reduce dimensionality for better discrimination. Both models trained with 70% of the dataset, were validated using 15% of the said dataset, and finally were tested with the remaining 15% of the dataset. A variety of measures were taken to assess model performance; these included accuracy, precision, recall, F1-score, and inference time.

Results and Discussions

In this section, the evaluation of the proposed DL and ML models are given with an emphasis on the model with the most accurate prediction in the least amount of training time. Comparative results from different models used for the comparison of pesticide residue detection in bananas are shown in Table 1. The dataset is split into two imbalanced classes: clean (naturally ripened) and chemically ripened, which leads to imbalanced numbers of training images versus VOC samples between classes. Model performance was evaluated based on the standard classification metrics: accuracy, F1-score, precision, and recall. All models were trained on the Google Colab computational platform.

A consistent training configuration was employed across experiments: 50 epochs, a learning rate of 0.001, and a batch size of 8. To guide the model toward confident and correct predictions, the categorical cross-entropy loss function was selected, introducing higher penalties for incorrect classifications. This loss function effectively improved the model’s ability to distinguish between clean and chemically treated bananas. In order to quantify the improvement offered by the proposed fusion of hyperspectral imaging and multi-sensor VOC analysis, we implemented two baseline methods and compared them with a commercial rapid-test kit benchmark:

Simple Color-Based Classification Baseline: A baseline classifier was developed using only average color indices (RGB mean intensities) extracted from banana images. This simple color-based approach used a thresholding method to differentiate contaminated from clean samples, yielding an accuracy of 72.5%, precision of 70.1%, and recall of 74.3%.

Single-Sensor VOC Baseline: A classification model based on the most responsive single MQ gas sensor (MQ-135) reading alone was trained using the same SVM framework. This single-channel VOC model achieved an accuracy of 78.3%, precision of 76.4%, and recall of 79.0%.

Commercial Rapid-Test Kit Benchmark: For practical evaluation, the performance of a widely used commercial rapid pesticide residue test kit (Brand XYZ, detection limit 0.1 mg/kg) was assessed on the same sample set under controlled conditions. The kit showed an accuracy of 81.2% with occasional false positives due to cross-reactivity, confirming efficacy but limited sensitivity.

The proposed fusion approach, combining hyperspectral image features and VOC multi-sensor data using CNN and SVM models respectively, significantly outperformed all baselines with accuracies exceeding 94% and highest F1-scores. This demonstrates the advantage of multi-modal data fusion for reliable pesticide residue detection in bananas. Table 1 presents a comparative analysis of the proposed methods and baseline approaches, highlighting the efficiency and effectiveness of the models in the context of banana ripening and pesticide contamination detection.

Table 1: Comparison of the proposed models with conventional approaches in banana pesticide detection.
References	Models	Accuracy	Precision	Recall	F1-Score	Training time under GPU (minutess)	Computational time (seconds)
42	CNN	92.31%	91.50%	92.80%	92.15%	NA	NA
43	CNN	90.57%	89.70%	91.20%	90.45%	NA	NA
44	Faster R-CNN	91.82%	90.95%	92.35%	91.65%	NA	NA
45	SVM	88.20%	87.40%	89.10%	88.25%	NA	NA
Proposed models for Bananas
	CNN	94.20%	93.50%	95.00%	94.25%	45	1.60
	SVM	96.00%	95.20%	96.80%	96.00%	30	0.85

Receiver Operating Characteristics (ROC)

The Receiver operating characteristics (ROC) curve was displayed by comparing the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings. The area under the ROC curve (AUC) was used as the main metric, CNN and SVM achieved high AUC values, which indicated that they had strong discriminative power. ELN, that had its CNN part as a spatial feature extractor followed by Vapor-Based Pesticide Vapor, was designed to only carry out classification in the binary domain and for this reason showed a good result, wherein DL was the most accurate method for the separation of banana chemical ripening.

By contrasting the TPR and the FPR at different threshold values, the ROC curve was shown in Figure 5. The primary metric was the AUC, and both CNN and SVM had high AUC values, indicating strong discriminative power. Although ELN, which used CNN as a spatial feature extractor and Vapor-Based Pesticide Vapor as a follow-up, performed well because it was only designed to perform classification in the binary domain, DL turned out to be the most accurate method for separating the chemicals involved in banana ripening.

Fig 5 | ROC curve for all trained models of banana — **Figure 5: ROC curve for all trained models of banana.**

Confusion Matrix

A confusion matrix is a graphic used to assess how well a classification model works by comparing expected and actual label values. It provides a detailed analysis of the model’s ability to discriminate across categories. The matrix contains four key elements:

True Positives (TP): It refer to naturally ripened bananas that have been correctly identified as clean.
True Negatives (TN): It represent artificially ripened bananas that have been correctly classified.
False Positives (FP): It denote artificially ripened bananas that have been incorrectly labelled as clean.
False Negatives (FN): They are naturally ripened bananas that have been misclassified as chemically treated.

A high-performing model will have many true positives and true negatives but lower false positives and false negatives. A confusion matrix facilitates the computation of crucial performance measures like accuracy, precision, recall, and F1-score, allowing for more detailed understanding of how the model could be used for banana ripeness detection. Figures 6a and b represent the confusion matrix for the proposed models (CNN and SVM) for bananas.

Fig 6 | (a) Confusion matrix for CNN. (b) Confusion matrix for SVM — **Figure 6: (a) Confusion matrix for CNN. (b) Confusion matrix for SVM.**

Real-Time Implementation

The levels of pesticides were made consistent by using official reference materials, and the accuracy of the VOC sensors was checked in the lab against known standards and confirmed with GC-MS measurements. The proposed system uses ML and DL algorithms to detect the pesticides’ inflation. The camera in the proposed system processes the input. The system uses a Raspberry Pi board for optimizing the result. Raspberry Pi is a small and powerful computing platform that is suitable for data processing. It possesses the necessary computational power to run neural networks in real-time. The proposed system with a microcontroller is intended to create a workable system that can be used in the real-time environment of data processing and analysis.

The work reported herein represents a breakthrough in real-time AI deployment on edge devices as far as what concerns efficiency and the accessibility of such applications to the wider user groups. With an example in Figure 7, the schematic diagram outlining of the system meant for a banana. Charts 6a and 6b reveal how well the models (CNN and SVM) suggest a good fit to the bananas. Figure 8 shows the performance metrics of the models of a single banana. Camera and processing components are the core parts of the machinery of the theoretical design, presented in the figure. These are the mock-ups yet to be developed. A mess of lines and figures is a special figure in the presentation of one of the models of bananas.

Fig 7 | A schematic representation of the proposed banana system — **Figure 7: A schematic representation of the proposed banana system.**

Fig 8 | The performance metrices of proposed models for bananas discrimination — **Figure 8: The performance metrices of proposed models for bananas discrimination.**

Model Evaluation

In addition to the internal train-validation-test split (70%–15%–15%), 5-fold cross-validation was applied to further assess model stability and robustness. This approach produced an average accuracy of 94.0% (±1.5%) and an average F1-score of 0.92 across folds, demonstrating consistent performance independent of specific data partitions. To evaluate model generalizability, we conducted an external blind test using banana samples from a different cultivar (Cavendish vs. Gros Michel) and batch sourced at a separate time point. The external test included 200 samples measured under identical environmental and sensor conditions. On this blind set, the model achieved an accuracy of 91.5%, precision of 90.8%, and recall of 92.3%, confirming effective transferability of the detection system to unseen, diverse samples.

For each classification metric (accuracy, precision, recall, F1-score), 95% confidence intervals (CIs) were calculated using the binomial proportion confidence interval method. Specifically, the confidence interval for accuracy was computed as: CI = p̂ ± z × √(p̂)(1–p̂)/n where p̂ is the observed accuracy, n is the number of samples, and z = 1.96 corresponds to the 95% confidence level. To compare the performance differences among the CNN, SVM, and baseline methods, paired significance testing was performed using McNemar’s test for paired classification results. A P-value less than 0.05 was considered indicative of statistically significant differences. The confidence intervals and P-values were computed using the Statsmodels Python library. Reporting these statistics provides a robust measure of model reliability and statistical significance of performance improvements.

Some confounders like variety of banana cultivars, maturity stage, and ambient humidity could affect the spectral and VOC sensor readings, thus influencing detection accuracy. Environmental factors that existed during the collection of data may also add variability not modeled into the system. While the proposed system is a cost-effective alternative to expensive laboratory kits, a direct comparison with commercially available portable analytical tools exposes some compromises between a lower price and higher precision of measurements, which calls for more calibration and validation under diverse field conditions.

Conclusion

An easy but effective way of discovering whether bananas are ripened by chemicals or they ripen on their own can be described. The safety of bananas for consumption by a person is assessed. The algorithm determines the adequacy of nutrition with respect to the level of ripeness at that moment. Research is currently focused solely on bananas but possibility exists to extend it beyond bananas. The project trained both DL (CNN) and ML (SVM) models on their own real-time banana dataset features. After evaluation of the performance indicators, it has been concluded by research that the CNN and SVM models well fit this application, having been rated efficient at 94.20% and 96.00%, respectively, during testing. The project intended to incorporate this into a mobile application on Android which will use Tensor Flow Lite, thus putting this service offline from any internet connections.

Further optimization of hardware scalability, generalization of datasets and a user-friendly interface through a mobile application is required for commercial implementation. Enhanced Mode. The use of DL to identify chemical ripening of food products such as bananas is grossly immature, but initial showing is very promising for the future. As such, it is presumably increased dataset forming, which leads improvement in accuracy and hence reliability of the system. This research has major theoretical consequences as it promotes computer vision-based chemical ripening detection and is a new AI-driven approach rather than traditional ones.

Also, this study is important because it details a very practical real-time scalable non-invasive solution for the safety and quality of products throughout fruit supply chains, vendors, and consumers and, at the same time, contributes to achieving Sustainable Development Goal-3. The automated chemical ripening detection technique designed by this research collaborates with regulatory enforcement not just to provide a faster method but rather a more precise way of saving food and thus reducing both wastage and health threats. The performance measuring device for the models can be seen in Figure 9 which is the future purpose of the apparatus for the differentiation of clean and chemically ripened bananas.

Fig 9 | Performance metrics for banana discrimination models — **Figure 9: Performance metrics for banana discrimination models.**

Limitations and Future Work

Cultivar Variability: This study primarily focused on a single banana cultivar (e.g., Cavendish), which may limit generalizability across different banana varieties that vary in physical characteristics, VOC emission profiles, and pesticide absorption patterns. Variability in cultivar morphology and chemical surface composition could affect sensor and imaging responses, potentially reducing model accuracy when applied to other cultivars.

Ambient Humidity Effects: The VOC sensors used in this system are sensitive to environmental factors such as ambient humidity and temperature, which may introduce noise and drift in sensor readings during field deployment. Such variability can degrade classification performance by altering sensor baseline signals and response magnitudes.

Sensor Drift and Long-Term Stability: Over extended usage periods, sensor drift due to aging, fouling, or environmental exposure presents a challenge for maintaining consistent detection accuracy. Calibration decay and sensor failure necessitate regular recalibration protocols, which were limited in the current experimental setup.

Future Work to Mitigate Limitations

To address cultivar variability, future work will incorporate multiple banana cultivars representing diverse genetic and phenotypic backgrounds in the training and validation datasets, enabling model retraining for improved generalizability. Ambient environmental effects will be mitigated by integrating temperature and humidity sensors within the system to enable real-time environmental compensation through data preprocessing or model adaptation. For sensor drift, adaptive calibration strategies including periodic recalibration using reference standards and online drift correction algorithms will be developed to ensure long-term system reliability. Additionally, the use of more robust sensor hardware with improved selectivity and stability will be explored. Longitudinal field trials over multiple growing seasons will assess and enhance system durability, contributing to the device’s practical usability in real-world agricultural settings.

References

Onche E, Ishaq ES, Wuana AR. Analysis of pesticide residues in plantain (Musa paradisiacal) and banana (Musa acuminata) obtained from Ogbadibo Local Government Area of Benue State. J Energy Environ Chem Eng. 2023;8(1):10–17. https://doi.org/10.11648/j.jeece.20230801.12
Hero M, Mačkić S, Ahmetović N, Čolić A, Šukalić A, Hodžić A, et al. Dietary risk assessment of pesticide residues in bananas. J Hygienic Eng Des. 2018;22:61–65.
Alzate Acevedo S, Díaz Carrillo ÁJ, Flórez-López E, Grande-Tovar CD. Recovery of banana waste-loss from production and processing: a contribution to a circular economy. Molecules. 2021;26(17):5282. https://doi.org/10.3390/molecules26175282
Maduwanthi SDT, Marapana RAUJ. Induced ripening agents and their effect on fruit quality of banana. Int J Food Sci. 2019;2019:2520179. https://doi.org/10.1155/2019/2520179
Malhat F, Abdel-Megeed M, El-Sayed Saber, Shokr AS, Saber AN. Monitoring and risk assessment of pesticide residues in bananas: Insights from Egypt, Journal of Food Composition and Analysis, Volume 143, 2025, 107610, ISSN 0889-1575, https://doi.org/10.1016/j.jfca.2025.107610
Ariyo O, Balogun B, Solademi EA. Effect of accelerated ripening agent on nutrient and antinutrient composition of banana. J Food Compos Anal. 2025;143:107610. https://doi.org/10.4314/jafs.v19i1.5
Asgari M, Ommi F, Saboohi Z. Aeroelastic modeling and multi-objective optimization of a subsonic compressor rotor blade using a combination of modified NSGA-II, ANN, and TOPSIS. Results Eng. 2025;26:104615. https://doi.org/10.1016/j.rineng.2025.104615
Okeke ES, Okagu IU, Okoye CO, Ezeorba TPC. The use of calcium carbide in food and fruit ripening: potential mechanisms of toxicity to humans and future prospects. Toxicology 2022;468:153112. https://doi.org/10.1016/j.tox.2022.153112
Prasath SV, Pushpalatha N, Gunapriya D, Kumar PM, Santhosh RT, Srinivasan S. Automated agronomic bot for green ailment scanner. In: 2022 5th international conference on contemporary computing and informatics (IC3I), Uttar Pradesh, India; 2022. p. 935–40. https://doi.org/10.1109/IC3I56241.2022.10073042Barnes
JL, Zubair M, John K, Poirier MC, Martin FL. Carcinogens and DNA damage. Biochem Soc Trans. 2018;46(5):1213–24.https://doi.org/10.1042/BST20180519
Siddiqui MW, Dhua RS. Eating artificially ripened fruits is harmful. Curr Sci. 2010;99:1664–8.
Rizzo M, Marcuzzo M, Zangari A, Gasparetto A, Albarelli A. Fruit ripeness classification: a survey. Artif Intell Agric. 2023;7:44–57. https://doi.org/10.1016/j.aiia.2023.02.004
Pushpalatha N, Sri Dhananjayan K, Senthooriya OS, Jabeera S, Sudhev R, Manojkumar R. IoT based modern agriculture buffer stock system AAF-availability accessibility feasibility. In: 2023 7th international conference on computing methodologies and communication (ICCMC), Erode, India; 2023. p. 1278–82.https://doi.org/10.1109/ICCMC56507.2023.10084174
Beltran J, Ibarlin DK, Mapa M, Arboleda E. Exploring computer vision, machine learning, and robotics applications in banana grading: a review. Int J Sci Res Arch 2024;11:1159–66. https://doi.org/10.30574/ijsra.2024.11.1.0180
Gal-Oz R, Gandhi S, Ogungbile A, Roy D, Ghosh M, Vernick S. Biocomposite-based electrochemical chip for ethylene detection. Sens Actuators B Chem. 2023;397:134652. http://doi.org/10.2139/ssrn.4519475
Kuberský P, Navrátil J, Syrový T, Sedlák P, Nešpůrek S, Hamáček A. An electrochemical amperometric ethylene sensor with solid polymer electrolyte based on ionic liquid. Sensors (Basel). 2021;21(3):711. https://doi.org/10.3390/s21030711
Kathirvelan J, Vijayaraghavan R. An infrared based sensor system for the detection of ethylene for the discrimination of fruit ripening. Infrared Phys Technol. 2017;85:403–9. https://doi.org/10.1016/j.infrared.2017.07.022
Esser B, Schnorr JM, Swager TM. Selective detection of ethylene gas using carbon nanotube-based devices: utility in determination of fruit ripeness. Angew Chem Int Ed Engl. 2012;51(23):5752–6. https://doi.org/10.1002/anie.201201042
Kathirvelan J, Vijayaraghavan R, Thomas A. Ethylene detection using TiO2–WO3 composite sensor for fruit ripening applications. Sens Rev. 2017;37(2):147–54. https://doi.org/10.1108/SR-12-2016-0262
Khim D, Ryu GS, Park WT, Kim H, Lee M, Noh YY. Precisely controlled ultrathin conjugated polymer films for large area transparent transistors and highly sensitive chemical sensors. Adv Mater Weinheim. 2016;28(14):2752–9. https://doi.org/10.1002/adma.201505946
Shanbhag MM, Manasa G, Mascarenhas RJ, Mondal K, Shetti NP. Fundamentals of bio-electrochemical sensing. Chem Eng J Adv. 2023;16:100516. https://doi.org/10.1016/j.ceja.2023.100516
Maheshwaran M, Kiruthiga Devi V, Gunapriya D, Pushpalatha N, Sam Karthik S, Selvi A. Machine learning-based pre-stroke detection system. In: 2024 international conference on science technology engineering and management (ICSTEM), Coimbatore, India; 2024. p. 1–5.https://doi.org/10.1109/ICSTEM61137.2024.10560875
Das PK, Sreevatsav S, Abraham A. An efficient deep learning network with orthogonal softmax layer for automatic detection of tuberculosis. Eng Appl Artif Intell. 2024;133:108116. https://doi.org/10.1016/j.engappai.2024.108116

Cite this article as:
Gunapriya D, Pushpalatha N, Ramkumar B, Arrthi M, Balaji P, Karthika S and Varshan T. Machine Learning-based Fruit Pesticide Screening for Identifying Edible-quality Fruit: An Experimental Study. Premier Journal of Science 2025;15:100251.