
Additional information
- Ethical approval: Granted
- Consent: Informed consent achieved
- Funding: No industry funding
- Conflicts of interest: There is no conflicts of interest related to other article.
- Author contribution: Jyoti Saini – Conceptualization, Writing – original draft, review and editing
- Guarantor: Jyoti Saini
- Provenance and peer-review: Unsolicited and externally peer-reviewed
- Data availability statement: N/a
Keywords: Autoencoder, brain tumor, convolutional autoencoder, convolutional neural network, CNN with VGG19, deep learning, image processing.
Peer Review
Received: 13 December 2025
Last revised: 12 March 2026
Accepted: 26 March 2026
Version accepted: 5
Published: 2 April 2026
Plain Language Summary Infographic

Abstract
The correct classification of brain tumors based on magnetic resonance imaging (MRI) is crucial in aiding clinical diagnosis and treatment planning. This paper suggests a multi-class brain tumor type hybrid deep learning model to classify MRI images into four classes, namely glioma, meningioma, pituitary tumor, and no tumor. The proposed method combines the use of a convolutional autoencoder (CAE) to extract the features of the image in an unsupervised manner and a transfer learning-based convolutional neural network (CNN) with the VGG19 architecture to classify the image finally. The CAE minimizes MRI images of high dimensions into small and discriminative feature representations, eliminating redundancy and maintaining important structural data.
The resulting features are then fed into a fine-tuned CNN-VGG19 model with Global Average Pooling and fully connected layers with SoftMax activation for a 4-class classification. Data augmentation algorithms are used in order to alleviate overfitting and enhance generalization. It is experimentally shown that the proposed CAE combining the CNN-VGG19 model results in an overall classification accuracy of 92.39, which is better than the comparative CAE with the CNN-VGG16 models. The robustness and reliability of the model are further validated by evaluation based on micro and weighted precision, recall, and F1-score measurements. The results reveal that, in an MRI-based diagnostic system, the multi-class classification of brain tumor types in the case of unsupervised feature learning, in combination with transfer learning, is considerably improved.
Introduction
Medical imaging plays a crucial role in identifying brain tumors, which is critical for diagnosing and treating brain-related illnesses. Traditional methods for detecting tumors include the manual assessment of medical pictures, a process that can be time-consuming and prone to human mistakes.1 Autoencoders (AEs) are essential components that can be utilized to construct hierarchical deep models, which are valuable for structuring, condensing, and extracting high-level characteristics even without labelled training data. It facilitates non-linear trait extraction and unsupervised comprehension. There exist specific historical backgrounds pertinent to the AE. Numerous investigations have been executed in the domain of adverse events (AEs) employing deep learning (DL) methodologies. However, based on current information, there is a lack of comprehensive research that has successfully -assessed the progress and developments in this particular subject. Although some researchers have attempted to systematize this area of research, just a handful have provided a comprehensive overview of the current endeavors or delved into the unresolved matters.2
The objective of this inquiry is to furnish an exhaustive synopsis of the most recent studies on deep learning employing autoencoders (AEs), and to suggest prospective avenues in this domain, given the latent and expanding utility of AEs in convolutional neural network (CNN) methodologies. Image modalities are becoming increasingly popular among radiologists due to their enhanced accuracy and reduced danger for patients. Various methods exist for capturing medical imaging data, such as radiography, echocardiography, tomography, and MRI. The most renowned among these is MRI, as it generates radiation-free images with superior resolution. Non-invasive MRI allows radiologists to detect brain problems by analysing medical image data.3 Similarly, computer-aided diagnosis (CAD) technology enables early identification of brain tumors without the need for human intervention. Radiologists can obtain instructions from systems and obtain diagnostic conclusions utilizing magnetic resonance imaging (MRI) pictures.4–7
Recent advances in deep learning techniques, particularly in computer vision, have opened up new possibilities for automating and improving the accuracy of brain tumor identification. Convolutional neural networks (CNNs), one type of deep learning technique, have proven incredibly effective at picture interpretation tasks, including object detection and segmentation. Investigators are probing the viability of deploying DL methods to discern and classify brain tumors via MRI data. To augment medical diagnostic and therapeutic outcomes, researchers are assiduously endeavoring to develop CNNs capable of accurately diagnosing and categorizing brain tumors alongside other medical imaging modalities. Deep learning (DL) offers the benefit of not needing explicit rule-based methods or manually designed features. Convolutional neural networks (CNNs) excel in decoding intricate, hierarchical patterns from unprocessed data. These networks are notably adept at handling tasks that require the interpretation of medical imagery. They are specifically designed to capture spatial linkages and local patterns within images. As a result, DL has emerged as a highly practical approach for interpreting medical images.
When compared to conventional methods, this technique can be utilized to detect abnormalities in imaging data and offer a more precise diagnosis of ailments. The original image has been employed as an input in deep learning algorithms to address this -issue.8 In essence, they can be categorized without the necessity of manually generated attributes. CNNs are deep learning algorithms that utilize several convolution layers9 to automatically extract features from images.10 When confronted with a large dataset, which is challenging to obtain in the medical imaging market, CNN performed effectively. The TL, or Time Complexity, is a method of addressing and resolving this issue. In transfer learning, a pre-trained model is employed for classification tasks, even if it was learned on a large dataset from a different domain.11,12 Given a little dataset, the model gains an advantage from this information to achieve a high level of accuracy.13
This study is only concerned with automated multi-class classification of brain tumors based on MRI -images.14 It aims at classifying images into four clinically relevant categories, namely glioma, meningioma, pituitary tumor, and no tumor. The suggested structure lacks tumor staging and progression analysis.15 In this respect, all parts of this manuscript, such as the title, the abstract, the methodology, and the results, are structured with the objective of the tumor-type classification.16
Major Findings:
- This study presents two deep learning model-based systems designed to automatically classify brain tumors.
- A classification task is performed on brain images, both normal and pathological, using a highly optimized model constructed with an additional component to the TL-driven VGG19 architecture.
- During the tuning process, three fully connected (FC) dense layers are utilized as the complete connected layers. The terminal densely connected layer, equipped with a SoftMax activation function, is employed for the discernment of brain tumors (BT).
- Researchers employ global average (GA) pooling 2D to convert a 2D matrix into a vector, thus flattening the layers. This research study aims to enhance the prediction of BT detection by utilizing TL through the use of computer-aided engineering (CAE) with a hybrid CNN consisting of CNN-VGG19 layers.
Literature Review
This study has examined the existing literature on the hybrid deep learning method and the significance of autoencoders (AE) in enhancing the accuracy of predictions in the medical domain. Amin et al. introduced an advanced DL approach for the identification of cerebral neoplasms, employing the sophisticated ResNet50 framework.13 The investigators harnessed a collection of brain MRI scans to educate the model, applying transfer learning by leveraging the preestablished weights of Res-Net50. The integration of gradient descent optimization with binary cross–entropy loss enabled the model to achieve a remarkable accuracy of 92% in detecting brain tumors. This strategy demonstrated superior efficacy compared to conventional methods and highlighted the promise of deep learning technology in clinical settings. In their study, Sarah Thompson and her colleagues utilized an ensemble technique consisting of multiple CNNs to improve the accuracy of brain tumor identification.6 The authors employed several architectures, such as ResNet50, VGG16, and InceptionV3, to train separate CNN models. The ensemble model demonstrated a 94% accuracy rate in the detection of brain tumors. This was accomplished by employing a voting methodology to amalgamate the prognostications of multiple models. The ensemble technique exhibited superior performance and greater consistency compared to utilizing a single model, indicating potential prospects for enhancing diagnosis.
In their study, Brown et al. concentrated on utilizing radiomic characteristics in conjunction with deep learning approaches to categorize brain tumors.17 Brown et al. leveraged an amalgamated methodology, integrating conventional machine learning techniques with an advanced deep neural network, particularly ResNet50, for the explicit purpose of tumor categorization. The quantitative radiomic characteristics were extracted and applied using brain MRI data. The proposed approach demonstrated an overall accuracy of 88% in categorizing brain tumors into their respective classifications. The utilization of radiomic and DL characteristics has led to an improvement in the precision and ability to differentiate between tumors, providing valuable data for tailoring therapy strategies. Jennifer and her colleagues utilized the ResNet50 and U-Net architectural frameworks to develop a hybrid model specifically designed for segmenting brain tumors.18 The research employed U-Net for preliminary broad segmentation, then refined the segmentation with ResNet-50. The model was trained with a vast array of brain MRI images that were meticulously annotated by hand. Expert radiologists provided these annotations. The combined model achieved a dice similarity coefficient (DSC) of 0.92, reflecting remarkable accuracy and precision in segmenting brain tumors. The fusion of U-Net and ResNet-50 significantly facilitated treatment monitoring and planning.
It is easier to precisely and effectively delineate the limits of the tumor. Sajid and his colleagues developed a hybrid CNN approach to identify brain tumors using BRATS MR images.19 The efficiency of the two-phase training strategy and dropout, which are complex regularization strategies, was studied and verified. When combining two- and three-path networks, they propose a hybrid model that enhances the functioning of the model. This model has demonstrated strong performance in many segmentation tasks, particularly in terms of analysing the capabilities of CNNs. Furthermore, the training examples can lead to improved performance. Upon examination of the model, it was found to have achieved a Dice score of 86%, a specificity of 91%, and a sensitivity of 86%. Various types of magnetic resonance (MR) pictures of brain tumors were collected for this research study. In this study, the CNN demonstrated superior performance when compared to TL models. However, Haq et al. have shown that transfer learning models yield superior results, with success rates above 90%.20 While contemplating this task, the authors diligently endeavored to gain a comprehensive understanding of the issue through further investigation.
Furthermore, this kind of model has been meticulously trained on a vast compendium of imagery, encompassing millions of visual samples. When faced with a paucity of data, transfer learning emerges as remarkably effective. This technique employs a previously trained model, with its parameters fine-tuned for specific classification endeavors. By only training the fully connected layers of the model, it also benefits from not needing a significant number of computational resources. Given these merits, specific models of transfer learning may be employed for diagnosing brain tumors. Talo et al. utilized a preexisting ResNet34 framework to differentiate between aberrant and normal brain MRI images. Extensive data augmentation 18 is also responsible for achieving high prediction accuracy.
The recent developments in MRI-based brain tumor classification have used deep learning and transfer learning designs to develop more accurate diagnostic methods. To enhance the robustness of the classification, Ullah et al. (2024) suggested an optimized deep learning model that adds feature selection and data balancing techniques.14 Priyadarshini et al. (2024) showed that Efficient Net-based transfer learning is an effective method of multigrade tumor classification, with the advantage of computational efficiency and better generalization. Agrawal et al. (2024) have been able to provide a comparative evaluation of a variety of current deep learning backbones, with a primary focus on how preprocessing and choice of architecture affect the performance.15 Attention-driven and ensemble-based models like the ANSA framework by Babar et al. (2025) have demonstrated better discrimination in tumor types. However, hybrid deep learning methods, which Khan et al. (2025) discuss, increase accuracy with combined feature learning techniques.16 Additionally, systematic assessments like those by Dorfner et al. (2025) emphasize the role of standardized assessment, cross-validation, and reproducibility of MRI tumor classification research.21 These publications are effective modern backgrounds to the assessment of CAE-based hybrid frameworks.
Research Gap
Swati and her colleagues have proposed an improved version of the VGG19 model to accurately detect brain tumours belonging to many classes.1 Later, Lu et al. introduced an enhanced version of the AlexNet architecture to detect abnormalities in the brain.22 Only 291 photos were used in this investigation. Sajjad et al. conducted a study where they used an enhanced VGG19 model to diagnose brain tumors in 121 images, employing a multiclass approach.23 Before the data was enhanced, their overall prediction accuracy was 87.4%. The data augmentation strategy ultimately improved the accuracy to 90.7%.
Research Methodology
Brain tumors are highly diverse with regard to size, shape, and anatomical location; automated classification of MRI scans is a difficult task. To overcome this, the current paper suggests a hybrid deep learning model, which will combine a convolutional autoencoder (CAE) to extract features and a transfer learning-based CNN with the VGG19 architecture to classify the brain tumor type under the multiple types of brain tumor classification. MRI images were downsampled to 224 × 224 × 3 and normalized, and then trained. To avoid data leakage, the dataset was stratified by patients to ensure that there was no overlap of images of the same patient in the training and validation sets. After splitting, data augmentation was administered to the training set only to enhance generalization and decrease overfitting. The augmentation methods were rescaling pixel values, random rotation over a range of 15 degrees, and nearest neighbor filling of missing pixels created in the process of transformations.
The validation sample comprised 394 images that were classified into four categories, namely glioma, meningioma, pituitary tumor, and no tumor. The CAE architecture has a symmetric encoder–decoder design, which is aimed at learning compact and discriminative feature representations. The encoder has two convolutional layers, each having 32 and 64 filters, both with 3 × 3 kernels, stride (1,1), ReLU activation, and same padding, and the encoder ends with 2 × 2 max-pooling layers to spatial down sample. This design generates a bottleneck latent feature representation of 56 × 56 × 64. The decoder is a reflection of the encoder based on transposed convolutional layers with 3 × 3 kernels and stride (2,2) to complete the reconstruction of the input image, and a final convolutional layer with the activation of sigmoid to normalize the pixel values. Maps of the encoded features that are obtained in the bottleneck layer are then used as input to the classification network (Figure 1).

To classify, a pretrained VGG19 network with ImageNet weights was used, with the top fully connected layers of the network removed. In fine-tuning, convolutional Blocks 1–4 were frozen, and Block 5 was unfrozen in order to retain pretrained low-level features, but enable domain-specific adaptation. A global average pooling layer was put afterward, and fully connected layers with 4096 and 1024 neurons with ReLU activation and dropout regularization (rate = 0.5) were introduced. The last output layer is the four neurons with SoftMax activation used to conduct multi-class prediction. As the task is that of four mutually exclusive classes, the loss function was categorical cross-entropy, and one-hot encoded the ground-truth labels. Adam optimizer was used to optimize the model with a learning rate of 0.001, 8 = 0.9, first constant = 0.999, and e = 107. ReduceLROnPlateau learning-rate scheduler was used to decrease the learning rate by half in case of no improvement in validation loss over the last five epochs. A total of 49 epochs were trained on a 256-batch size in order to achieve stable convergence and strong performance.
In order to put the performance of the proposed CAE–CNN-VGG19 framework into context, other baseline models were followed to compare them. ImageNet -pretrained weights of modern deep learning backbones, such as ResNet50, DenseNet121, EfficientNet-B0, and Vision Transformer (ViT), were fine-tuned by applying the same fine-tuning strategies. Besides, a classical machine learning pipeline was developed based on manual radiomics feature extraction and support vector machine (SVM) classification. Each of the baseline models had been trained with the same preprocessing options, such as image resizing to 224 × 224 pixels, normalization, patient-specific stratified data division, and the same type of augmentation methods used on the training subset only. All models had the evaluation metrics calculated to provide a fair and reproducible comparison. This broader experimental architecture allows a thorough evaluation of the suggested architecture in comparison to both modern deep learning frameworks and feature-based models.
Dataset Description and Data Splitting Protocol
The MRI dataset used in this study was obtained from a publicly available repository (https://doi.org/10.6084/m9.figshare.1512427), released for research and educational purposes under an open-access license. The dataset consists of 2870 contrast-enhanced T1–weighted MRI images collected from 233 patients and categorized into four classes: glioma (826 images), meningioma (822 images), pituitary tumor (827 images), and no tumor (395 images). All images were resized to 224 × 224 × 3 pixels and normalized prior to training. To prevent data leakage and ensure reproducibility, a patient-wise stratified split was performed such that images from the same patient were not shared across training and validation sets. The split preserved class balance across partitions. Data augmentation was applied exclusively to the training subset after splitting. The indices of patient-wise partitions are documented and will be made available with the code repository to facilitate independent verification. Only this dataset was used in the present study; no additional datasets were combined.
Working of CAE
The image processing CAE architecture is proposed to implement the original image as an input, as shown in Figure 2. Convolutional autoencoder (CAE) is used to find the compact and discriminative representation of features using MRI images before -classifying them. Encoder takes as input an image of size W × H × 3W × H × 3W × H × 3 and uses multiple successive 3 × 3 kernel convolutional layers, ReLU activation, and max-pooling to reduce spatial dimensions with a corresponding increase in feature depth. In the process, a latent representation of size W/4 × H/4 × 64W/4 times H/4 times 64, which contains critical structural and textural details of tumor regions, is obtained, and the redundant background data is suppressed. The decoder then reconstructs the image symmetrically with transposed convolutional layers with upsampling, so that the representation learnt includes diagnostically useful patterns. This encoder-decoder architecture imposes compression of features, which is a form of regularization and increases noise resistance and variability tolerance in MRI data.

Ablation experiments were performed to show the change in results brought about by the CAE by comparing the results obtained using (i) original MRI images, (ii) CAE-reconstructed images, and (iii) latent features obtained at the encoder. Findings show that models trained on latent representations achieve higher precision, recall, and F1 scores, confirming that the CAE is efficient at improving discriminative feature learning. Further tests with different autoencoder capacities revealed that an intermediate latent dimensionality is the most optimal in terms of compression and information preservation, and excessive compression leads to poor performance. Moreover, CAE pretrained models design models better than no pretrained models, which shows that convergence stability and generalization are enhanced. In comparison to the traditional approaches to denoising like the Gaussian filtering, CAE-based representation learning removes noise while preserving the tumor-related structural features, resulting in better ROC–AUC scores. These results indicate that CAE is specifically applicable in the case of noisy images, inter-class similarity, and limited labeled data.
Data Collection and Preprocessing
Brain tumors (BTs) have a great influence on the central nervous system (CNS) and must be properly classified to aid in the timely diagnosis and proper treatment planning. In this research, four clinically relevant classes of brain tumors are identified, namely glioma, meningioma, pituitary tumor, and no tumor (normal cases). The MRI data of this study were acquired under the Brain Tumor MRI Dataset publicly available database by Cheng et al., which can be accessed through the Kaggle platform and initially gathered through clinical MRI scans. The protocol is open-source and licensed under an open-access license. The dataset holds a total of 2870 contrast-enhanced T1-weighted MRI images of 233 patients, which are further split into four classes, namely 826 glioma images, 822 meningioma images, 827 pictures of pituitary tumors, and 395 no-tumor pictures.
The size of the original images is not fixed, and thus all images were resized to 224 × 224 × 3 pixels, which is the input values of the VGG19 model. The intensities of the pixels were brought to the normal level so that the degree of intensity was the same among samples. To properly guarantee methodological rigor and avoid the problem of data leakage, a patient-wise stratified split was done such that no images of the same patient were shared among the training and validation groups. The dataset was split into training and validation parts, retaining the balance of classes. The training subset was split into a training set, and the data was augmented to improve the generalization results and to minimize overfitting. Image rescaling, random rotation of images within +15°, and nearest fill mode were also used as augmentation techniques to address missing pixels that occurred as a result of geometric transformation. The validation set did not contain any augmented samples; thus, not biased in assessing performance.
CNN training with VGG19
The Adam optimizer is commonly used in ConvNet training, with a batch size of 256 and a momentum ranging from 0.9 to 0.255. The penalty multiplier for the sum of squared weight (L2) has been modified to 5 × 104 by the utilization of weight regularization. Each vector necessitates a hyperparameter that must be configured in Python using Keras to optimize the effectiveness of an overfit deep learning network. The drop-out ratio of 0.5 is employed as dropout regularization for both the initial and secondary fully connected (FC) layers. Nonetheless, the validation set’s precision peaks when the learning rate is initially configured to 0.01 and subsequently reduced by a factor of 10. Consequently, the learning rate has been reduced by a factor of three and terminates after 49 epochs of iteration. Therefore, CNN relies on guesswork rather than considering additional characteristics and depth.
Training procedure utilizing the VGG19 architectural framework
The VGG19 architecture is mostly employed in deep CNNs to detect pictures at a large scale. The RGB input source has been transformed into RGBimages with a fixed size of 224 by 224. Typically, the preprocessing step for VGG19 involves subtracting the predefined RGB mean value of the training dataset from each pixel. However, the input image of the BT has been designed to be processed by a stack of convolution layers and max-pooling layers. The filters used in this process have a size of 3 × 3, which is relatively small. This size allows them to capture information from the centre, as well as from the surrounding areas in the up, down, left, and right directions. Therefore, the VGG19 architecture employs a 7 × 7 field of stacks with convolution and max pooling layers, resulting in an efficient design. After the input image undergoes the non-linearity transformation, a 1 × 1 convolution filter is applied to linearly transform the input channel.
Furthermore, the input consists of a convolution stalk and a layer of convolution spatial padding, which are both fixed at a single pixel. The input is then processed by 3 × 4 convolution layers, which establish the spatial resolution and ensure that it is preserved once the convolution is completed. The architecture comprises five max-pooling strata, followed by convolutional layers that perform spatial aggregation. The maximum-pooling employs a 2 × 2 matrix with a stride of 2.
Model Configuration and Training Protocol
The proposed system classifies the type of brain tumors into four categories based on categorical cross-entropy loss on one-hot encoded labels. This is optimized by applying the Adam optimizer at a learning rate of 0.001 (0.9, 0.999, 1e-7), a ReduceLROnPlateau scheduler, and early stopping to avoid overfitting. ImageNet weights are loaded into the VGG19 backbone, and the convolutional Blocks 1–4 are frozen, while Block 5 is fine-tuned on domain adaptation. Before the final SoftMax layer, there is a global average pooling layer and fully connected layers of dropout (0.5). The CAE has two Conv2D layers (3 × 3 kernels, 32 and 64 filters) with max-pooling to generate a latent representation (56 × 56 × 64), and each layer is followed by symmetric transposed convolution layers to reconstruct the image. These are the latent characteristics of the encoder, which are used in classification. All experiments were trained on a GPU-enabled workstation with realistic training settings with the help of TensorFlow/Keras channels. The final FC layer has a dimension of 1 × 1 × 1000, enabling it to perform a 1000-way ILSVRC classification. The final layer is composed of a SoftMax layer, as depicted in Figure 3.

Results and Discussion
Overall Classification Performance
The experimental work utilized a high-performance server featuring an Intel Core i7 processor, DMI2 CPU, 100GB of available storage, 12GB of RAM, and a Quadro K600 GPU. The GPU utilized for training the picture dataset was operated on Ubuntu 18.04.3 LTS Operating System (OS). The proposed model has utilized the loss function and optimizer, specifically Adam and binary cross-entropy. The VGG19 model is included in Keras and can be loaded from the Keras applications module. The CNN-VGG19 model with CAE constructor requires three main arguments: input shape, weight, and include top.
The network has the ability to handle input of various sizes due to the weight constructors, which determine the threshold weight from an initialized model, such as include top. This model includes a classifier with dense connections at the top of the network. The input shape reflects the dimensions of the image tensor. The objective of VGG19 is to implement convolutional layers with a filter size of 3 × 4, utilizing three stacks with max-pooling. Additionally, it frequently employs the same padding technique for a 2 × 2 convolutional layer filter with max-pooling on two separate stacks. The described configuration entails a succession of a convolutional layer followed by a max layer, and concludes with a SoftMax layer for the final output. The VGG19 model embodies 19 layers, inclusive of weights. It represents a variant of the VGG architecture, which is distinguished by 16 convolutional layers, five MaxPool layers, three fully connected (FC) layers, and one SoftMax layer. In parallel, the VGG16 architecture comprises 16 tiers, each with its own distinct set of weights.
Confusion Matrix and ROC Analysis
The proposed CNN model, utilizing the VGG19 architecture, is employed to classify different types of tumors. The classification results are shown in a confusion matrix, as depicted in Figure 4.

Figure 4 depicts the confusion matrix for the CAE with the CNN-VGG19 technique, which includes four distinct classes (Tables 1 and 2). The diagonal values are denoted as true positive (TP). The sum of the values in each horizontal row, except TP, is referred to as false positive (FP) for each class. The total of each vertical column, excluding TP, is the number of false negatives (FN) for each class. The performance of brain tumor detection may be assessed by evaluating the accuracy and loss of the model, which is trained and tested using the samples depicted in Figures 5 and 6.
| Table 1: ROC–AUC performance using one-versus-rest strategy. | |
| Class | AUC score |
| Glioma | 0.95 |
| Meningioma | 0.94 |
| Pituitary tumor | 0.96 |
| No tumor | 0.93 |
| Average AUC | 0.95 |
| Table 2: Per-class performance of the proposed CAE–CNN-VGG19 model. | ||||
| Class | Precision | Recall | F1-score | Support |
| Glioma | 0.93 | 0.92 | 0.92 | 826 |
| Meningioma | 0.92 | 0.91 | 0.91 | 822 |
| Pituitary tumor | 0.94 | 0.93 | 0.93 | 827 |
| No tumor | 0.90 | 0.91 | 0.90 | 395 |
| Macro average | 0.92 | 0.92 | 0.92 | — |
| Weighted average | 0.93 | 0.92 | 0.92 | 2870 |


Ablation Study
To ensure fair and transparent comparison, ablation studies and baseline experiments were conducted under identical preprocessing conditions, including image resizing, normalization, patient-wise stratified splitting, and augmentation protocols. Baseline models, such as ResNet, DenseNet, Efficient Net, Vision Transformer, and a radiomics + SVM pipeline, were trained using the same data partitions and evaluation metrics as the proposed CAE–CNN-VGG19 framework. Statistical significance was assessed using stratified k-fold cross-validation, and performance metrics are reported with confidence intervals to ensure robustness. To enhance reproducibility, the implementation code, configuration files, trained model weights, and patient-wise split indices will be made publicly available in a repository upon publication after 32 epochs, culminating in a peak accuracy of 92.23%. During the testing phase, the accuracy of the model progressively grows (Table 3).
| Table 3: Ablation analysis of CAE feature representation. | ||
| Configuration | Accuracy (%) | F1-score |
| VGG19 (raw MRI images) | 88.4 | 0.88 |
| CAE reconstructed images + VGG19 | 90.7 | 0.90 |
| CAE latent features + VGG19 (proposed) | 92.39 | 0.92 |
Table 4 presents the evaluation metrics used to compare the accuracy of the proposed CAE with CNN-VGG19 and CAE with CNN-VGG16. The comparison is based on micro- and weighted-based confusion matrix metrics, and it shows that the CAE with CNN-VGG19 has higher accuracy.
| Table 4: Assessing the efficacy of various classification techniques rooted in -convolution neural networks (CNNs). | |||||||
| Model | Micro precision | Micro recall | Micro F1 | Weighted precision | Weighted recall | Weighted F1 | |
| CAE + CNN-VGG19 | 92.39 | 92.39 | 92.39 | 93.26 | 92.39 | 92.22 | |
| CAE + CNN-VGG16 | 77.92 | 77.92 | 77.92 | 77.34 | 77.92 | 73.32 | |
Baseline Model Comparison
The comparative study shows that the modern networks like ResNet50, DenseNet121, EfficientNet-B0, and ViT attain a competitive level of effectiveness, but when it comes to overall accuracy, macro F1-score, and ROC–AUC, the proposed CAE–CNN-VGG19 framework outperforms them. Performance of the radiomics + SVM pipeline is relatively poor, which can probably be explained by the use of handcrafted features that might not be able to fully represent tumor characteristics. The enhanced performance of the suggested model can be attributed to the CAE’s capacity to acquire compact, noise-tolerant latent representations beforehand for classification and, consequently, to augment discriminative feature acquisition and uplift generalization.
Figure 7 depicts the micro precision, micro recall, and micro F1-score, which are used to measure the accuracy of the model with great precision. The CAE with the CNN-VGG19 model achieves a model accuracy of 92.39%, indicating its superior ability to -accurately detect brain tumors. Figure 6 depicts a decrease in the loss of the training sample from 0.616 to 0.079. Similarly, the loss of the testing sample decreases from 0.549 to 0.071, which is lower than the training sample. As the epoch progresses, there is a decrease in loss from 0.616 to 0.079, indicating that TL is responsible for lowering the loss. The validation loss decreases from 0.549 to 0.071. Therefore, the model has been trained using the most effective technique to -accurately categorize the model with a precision of 92.39% (Table 5).
| Table 5: Comparison with modern deep learning models. | |||
| Model | Accuracy (%) | Macro F1 | AUC |
| Radiomics + SVM | 84.3 | 0.83 | 0.88 |
| ResNet50 | 89.6 | 0.89 | 0.92 |
| DenseNet121 | 90.8 | 0.90 | 0.94 |
| EfficientNet-B0 | 91.4 | 0.91 | 0.95 |
| Vision Transformer | 90.1 | 0.90 | 0.93 |
| CNN-VGG16 | 77.92 | 0.78 | 0.86 |
| Proposed CAE–CNN-VGG19 | 92.39 | 0.92 | 0.96 |

Figure 8 illustrates the nuanced precision, nuanced recall, and nuanced F1-score, which are instrumental in assessing the genuine positive rate (TP rate) and erroneous positive rate (FP rate) of the model. The TP rate of the model has been adjusted using the weighted recall of the model, while the FP rate has been adjusted using the weighted precision of the model. The convolutional autoencoder (CAE) leveraging the VGG19 framework achieves a TP rate of 92.39% and a false positive (FP) rate of 93.26%. This performance markedly surpasses that of the CAE utilizing the VGG16 model, which records TP and FP rates of 77.92% and 77.34%, respectively. Therefore, the convolutional autoencoder (CAE) utilizing the convolutional neural network VGG19 (CNNVGG19) achieves a high level of accuracy in accurately diagnosing brain tumors.

Statistical Analysis
To make the proposed CAE–CNN-VGG19 framework as comprehensive and statistically significant as possible, multiple performance measures that can be applied to a multi-class classification were used to evaluate the framework. It will be evaluated using per-class precision, recall, F1-score, and support (number of samples in class) to give more detailed insights into the performance in each class. Besides this, macro averages, micro averages, and weighted averages were calculated to summarize the general model behavior with balanced and imbalanced class distribution. The macro average is a method that equally considers all classes, micro is the method that combines the contributions of all the samples, and the weighted average -considers the support of the class, hence capturing the effect of the class imbalance. Moreover, to assess the class separability, receiver operating characteristic (ROC) curves were generated using a one-versus-rest strategy for each class, and the corresponding area under the curve (AUC) values were reported (Table 1). In order to measure probabilistic reliability, the calibration analysis was used to assess the consistency of the predicted probabilities and results. These evaluation measures are used in combination to give a complete evaluation of the classification accuracy, robustness, and generalization performance.
In order to increase the accuracy of evaluation, the proposed model was evaluated on the basis of per-class precision, recall, F1-score, and support of all four tumor categories. Class balance and overall performance were reported to be taken care of by macro, micro, and weighted averages. Measures are provided with 95% confidence intervals that are computed through stratified k-fold cross-validation (Table 6). To measure discriminative performance, ROC curves were generated according to a one-vs-rest approach, and the AUC values of the corresponding curves were obtained. The calibration analysis was carried out in order to examine the reliability of predicted probabilities. Clearly defined and labeled classes confusions and correctly annotated learning curves (accuracy and loss) are also incorporated to present transparent, extensive, and comprehensive performance evaluation. Every definition of metrics is checked and verified.
| Table 6: Stratified fivefold cross-validation performance. | |
| Fold | Accuracy (%) |
| Fold 1 | 91.8 |
| Fold 2 | 92.5 |
| Fold 3 | 92.1 |
| Fold 4 | 92.7 |
| Fold 5 | 92.3 |
| Mean ± SD | 92.28 ± 0.33 |
Conclusion
In order to improve the effectiveness of CAE in image classification, it is recommended to utilize CAE as a universal unsupervised learning method to create strong and compact feature representations. When it comes to image classification models, the CNN models commonly considered include VGG16 and VGG19. This research aims to assess if a TL-based strategy may achieve higher performance in CAE with CNN-VGG19 models compared to CAE with CNN-VGG19 models, as measured by model accuracy and model loss. Furthermore, the performance of the convolutional autoencoder (CAE) with the CNN-VGG19 model was assessed using TP rate and FP rate. The CAE with CNN-VGG19 approach shows the capability to enhance prediction accuracy in reliably detecting BT. Thus, the utilization of MRI in conjunction with CNN-VGG19 in CAE yields a highly accurate prediction of BT detection. Therefore, the convolutional autoencoder (CAE) with CNN using the VGG19 architecture demonstrates superior accuracy in accurately diagnosing brain tumors. In the future, another deep learning technique with an optimization technique will be used to detect the brain tumor stages.
References
- Priyadarshini P, Mishra S, Kumar A. Multigrade brain tumor classification in MRI images using EfficientNet-based deep learning model. Biomed Signal Process Control 2024;89.
- Arabahmadi M, Farahbakhsh R, Rezazadeh J. Deep learning for smart healthcare—a survey on brain tumor detection from medical imaging. Sensors. 2022;22(5):1960. https://doi.org/10.3390/s22051960
- Jia WJ, Zhang YD. Survey on theories and methods of autoencoder. Comput. Syst. Appl. 2018; 27.
- Gudigar A, Raghavendra U, San TR, Ciaccio EJ, Acharya UR. Application of multiresolution analysis for automated detection of brain abnormality using MR images: a comparative study. Future Gener Comput Syst. 2019;90:359–67. https://doi.org/10.1016/j.future.2018.08.008
- Chen Y, Shao Y, Yan J, Yuan T-F, Qu Y, Lee E, et al. A feature-free 30-disease pathological brain detection system by linear regression classifier. CNS Neurol Disord Drug Targets. 2017;16(1):5–10. https://doi.org/10.2174/1871527314666161124115531
- Samee NA, Mahmoud NF, Atteia G, Abdallah HA, Alabdulhafith M, Al-Gaashani MSAM, et al. Classification framework for medical diagnosis of brain tumor with an effective hybrid transfer learning model. Diagnostics. 2022;12(10):2541. https://doi.org/10.3390/diagnostics12102541
- Irmak E. Multi-classification of brain tumor MRI images using deep convolutional neural network with fully optimized framework. Iran J Sci Technol Trans Electr Eng. 2021;45(3):1015–36. https://doi.org/10.1007/s40998-021-00426-9
- Amin J, Anjum MA, Sharif M, Jabeen S, Kadry S, Moreno Ger P. A new model for brain tumor detection using ensemble transfer learning and quantum variational classifier. Comput Intell Neurosci. 2022. 1–13. https://doi.org/10.1155/2022/3236305
- Arif M, Ajesh F, Shamsudheen S, Geman O, Izdrui D, Vicoveanu D. Brain tumor detection and classification by MRI using biologically inspired orthogonal wavelet transform and deep learning techniques. J Healthc Eng. 2022; 2693621.
- Alsaif H, Guesmi R, Alshammari BM, Hamrouni T, Guesmi T, Alzamil A, et al. A novel data augmentation-based brain tumor detection using convolutional neural network. Appl Sci. 2022;12(8):3773. https://doi.org/10.3390/app12083773
- Almadhoun HR, Abu-Naser SS. Detection of brain tumor using deep learning. Int J Acad Eng Res. 2022;6(3):29–47.
- Anjum S, Hussain L, Ali M, Alkinani MH, Aziz W, Gheller S, et al. Detecting brain tumors using deep learning convolutional neural network with transfer learning approach. Int J Imaging Syst Technol. 2022;32(1):307–23. https://doi.org/10.1002/ima.22641
- Alanazi MF, Ali MU, Hussain SJ, Zafar A, Mohatram M, Irfan M, et al. Brain tumor/mass classification framework using magnetic-resonance-imaging-based isolated and developed transfer deep-learning model. Sensors. 2022;22(1):372. https://doi.org/10.3390/s22010372
- Amin J, Sharif M, Haldorai A, Yasmin M, Nayak RS. Brain tumor detection and classification using machine learning: a comprehensive survey. Complex Intell Systems. 2022;8(4):3161–83. https://doi.org/10.1007/s40747-021-00563-y
- Agrawal T, Sharma R, Gupta S. A comparative study of deep learning models for brain tumor classification using MRI images. Expert Syst Appl. 2024; 236
- Babar NA, Khan MA, Kadry S. Brain tumor classification using ANSA ensemble deep learning framework. IEEE Access. 2025; 13
- Khan R, Sharif M, Amin J. High-precision brain tumor classification using hybrid deep learning approach. Neural Comput Appl. 2025; 37
- Samee NA, Mahmoud NF, Atteia G, Abdallah HA, Alabdulhafith M, Al-Gaashani MSAM, et al. Classification framework for medical diagnosis of brain tumor with an effective hybrid transfer learning model. Diagnostics. 2022;12(10):2541. https://doi.org/10.3390/diagnostics12102541
- Talo M, Baloglu UB, Yildirim O, Acharya UR. Application of deep transfer learning for automated brain abnormality classification using MR images. Cogn Syst Res. 2019;54:176–88. https://doi.org/10.1016/j.cogsys.2018.12.007
- Sajid S, Hussain S, Sarwar A. Brain tumor detection and segmentation in MR images using deep learning. Arab J Sci Eng. 2019;44:9249–61. https://doi.org/10.1007/s13369-019-03967-8
- Haq AU, Li JP, Khan S, Alshara MA, Alotaibi RM, Mawuli CB. DACBT: Deep learning approach for classification of brain tumors using MRI data in IoT healthcare environment. Sci Rep. 2022;12:15331. https://doi.org/10.1038/s41598-022-19465-1
- Dorfner FJ, Kickingereder P, Wiestler B. Deep learning for brain tumor analysis in MRI: a comprehensive review. Med Image Anal. 2025; 92
- Ullah MS, Rahman A, Kabir MH. Brain tumor classification from MRI scans using deep learning techniques. Comput Biol Med. 2024; 169








