Machine Learning Approaches to Injury Risk Prediction in Sport

Premier Science > Machine Learning Approaches to Injury Risk Prediction in Sport

Iftikhar Khan¹ , Hafsa Ali², Menahil Faheem Siddiqui³ and Gopika Mohanakumaran Nair Geetha⁴
1. FMH College of Medicine and Dentistry, Lahore, Pakistan
2. Sindh Medical College, Jinnah Sindh Medical University, Karachi, Pakistan
3. Karachi Medical and Dental College, Karachi, Pakistan
4. Independent Research, Sheffield, UK
Correspondence to: Iftikhar Khan,iffykhandir@gmail.com

DOI: https://doi.org/10.70389/PJCS.100012

Additional information

Ethical approval: N/a
Consent: N/a
Funding: No industry funding
Conflicts of interest: N/a
Author contribution: Iftikhar Khan, Hafsa Ali, Menahil Faheem Siddiqui and Gopika Mohanakumaran Nair Geetha – Conceptualization, Writing – original draft, review and editing
Guarantor: Iftikhar Khan
Provenance and peer-review:
Unsolicited and externally peer-reviewed
Data availability statement: N/a

Keywords: Injury risk prediction, Machine learning models, Sports analytics, Athlete monitoring, Random Forest and xgboost.

Peer Review
Received: 12 May 2025
Last revised: 12 July 2025
Accepted: 12 July 2025
Version accepted: 5
Published: 28 August 2025

Plain Language Summary Infographic

A professional infographic summarizing how machine learning predicts sports injury risk. It highlights Random Forest and XGBoost as leading models, with deep learning and hybrid approaches also noted. Key predictive features include training load, sleep quality, and previous injury history. The design uses athlete icons, sports symbols, and machine learning graphics with vibrant colors to illustrate strengths, limitations, and ethical considerations of ML in sports science

Abstract

Injury prediction has emerged as a priority within sports science and performance management to limit athlete downtime and enhance performance results. Machine learning (ML) has transformed the field in the short term by enabling the detection of complex patterns and risk factors from diverse data sources related to athletes. This narrative review provides an overview of the current knowledge on ML methods, key results, and the challenges encountered when applying them to injury risk prediction in various sports. This is among the first review articles to bring together and examine how ML is used for predicting injuries across various sports and different types of models. This review offers practical guidance for sports scientists and clinicians looking to incorporate simple-to-understand ML models into their athlete monitoring and injury prevention efforts. A thorough literature search was conducted using databases such as PubMed, Scopus, and Google Scholar, with keywords including “injury prediction,” “machine learning,” “sports analytics,” and “athlete monitoring.”

Tree-based methods, especially Random Forests and XGBoost variants, consistently performed best, effectively managing non-linear and multi-factorial inputs. Deep and hybrid models remain promising, particularly for multi-modal data sets; however, poor interpretability constraints limit their widespread use. Training load, sleep quality, and previous injury status were key predictive features. However, unbalanced datasets, inconsistent injury definitions, and broad prediction windows further undermine these present attempts in terms of generalizability and clinical relevance. Although accuracy measures reported by some models are very high, their utility in real-world settings is limited due to small sample sizes and high method heterogeneity. Designated data protocols need to be instituted; the explainability of the model in injury risk prediction must be enhanced; and ethical frameworks for data use must be established, considering the practical application of ML.

Highlights

Machine learning (ML) techniques, particularly Random Forest and XGBoost, are leading tools for predicting the risk of sports injuries.
Critical predictors across studies include training load, sleep quality, and a history of previous injuries.
Deep learning and hybrid models offer advanced capabilities but require improved interpretability for clinical adoption.
Future progress requires standardized methodologies, ethical data practices, and athlete-centered model validation.

Introduction

Youth and young athletes have high participation rates in sports, which is why sports are a leading cause of injury. A cross-sectional and review article indicates that around 20% of school children miss one day of school due to a sports injury, and about one in three seek medical attention for a sports injury.^1,2 To prevent these injuries, a four-step model called Mechelen’s Model has been used for over 25 years. The model involves monitoring injury rates, identifying risk factors, developing prevention strategies, and assessing their effectiveness. Another key approach used is conducting Randomized Controlled Trials (RCTs) to examine and understand the efficacy of prevention methods. The present study was designed to utilize case-control and cohort studies, which can also be employed. Therefore, the most efficient strategy is the one that can be readily adopted and produce realistic and efficient results.¹ The consequences of sports injuries have a profound effect on athletes. Injured athletes are prone to suffer mental health issues, including disordered eating, anxiety, depression, and suicidal thoughts. Furthermore, sports injuries can lead to career termination, negative behavior towards their colleagues, family members, and disturb athletes’ physical and emotional well-being.³

The traditional approach to preventing sports injuries incorporates evidence from public health and medical research. The research study is usually designed following RCTs, and building reviews on them. Moreover, two more approaches are used, namely “evidence-based practice” and “practice-based evidence.” Evidence-based practice involves making decisions in a specific situation based on available data and research. Conversely, practice-based evidence involves addressing everyday medical emergencies, rather than implementing interventions and evaluating their effectiveness.⁴ Though there are certain setbacks in the traditional method, these include the “research-practice gap,” where all the solutions provided for a specific injury are considered in an ideal situation. This ideal scenario, as mentioned in the article, is not often observed. Another setback is that practitioners trained to prevent or overcome sports injuries have different professional experiences (full-time professional coaches vs. part-time volunteer community coaches), which disrupts the implementation of research-based solutions.⁴

In modern times, AI is having a substantial positive impact on the prevention of sports injuries. It can help enhance machine learning (ML) by collecting data on the history of injuries, training loads and techniques, body measurements, health history, as well as genetic history, to create training programs tailored to each individual, thereby reducing injury risk and optimizing athlete performance. Moreover, AI is evolving to monitor future incoming data based on current data, alerting athletes and their associated colleagues to potential future injuries and health emergencies that may impact performance efficiency. This goal is being met through wearable devices that provide immediate feedback for decision-making during sports. Moreover, AI supports automation in training, performance tracking, and injury prevention by utilizing chatbots and motion sensors to guide athletes through exercises.⁵ In the following sections, we describe the methodology used to identify and select relevant studies on ML applications in sports injury prediction. The Methods detail our systematic search strategy, inclusion criteria, and evidence appraisal approach. The Results summarize findings across various ML models, sports, and performance metrics. Finally, the Discussion critically evaluates the model’s strengths and limitations, practical challenges, and ethical considerations, and provides future research directions.

Methodology

This review has been conducted in accordance with the PRISMA 2020 guidelines for systematic reviews.⁶ A comprehensive literature search was performed using multiple databases, including PubMed, Scopus, Google Scholar, and IEEE Xplore. The following Boolean search strategy was used across databases:

(“injury prediction” OR “injury risk”) AND (“machine learning” OR “artificial intelligence”) AND (“sports” OR “athletes” OR “sports analytics” OR “athlete monitoring”)

Searches were restricted to articles published between January 2010 and March 2025, ensuring the inclusion of contemporary ML methodologies. Language filters were applied to include studies published in English and Spanish only. Inclusion criteria encompassed observational studies evaluating ML techniques for sports injury prediction. No restrictions were applied regarding the age, gender, level of play (from amateur to professional), or type of injury of the athletes. Studies were selected in a multi-step process: initial title and abstract screening was followed by full-text assessment for eligibility. Discrepancies were resolved by consensus among reviewers. As illustrated in Figure 1, a PRISMA 2020 flow diagram adapted for the study selection process is provided. A total of 40 articles were ultimately included after screening 380 records.

Fig 1 | PRISMA-style flow diagram of study selection process
Note: This PRISMA diagram is adapted for use in a narrative review — Figure 1: PRISMA-style flow diagram of study selection process.
^{Note: This PRISMA diagram is adapted for use in a narrative review.}

Evidence Appraisal

To assess the certainty of evidence in the included studies, we applied the GRADE-Narrative approach, adapted for narrative reviews.⁷ This method evaluates the quality of evidence across key outcomes based on study limitations, inconsistency, indirectness, imprecision, and publication bias. Each outcome was rated as high, moderate, low, or very low certainty of evidence. Studies were grouped by ML model type and sport-specific application, and quality was synthesized narratively. The results of this grading are summarized in Table 1. This table summarizes the certainty of evidence for various ML models applied in sports injury prediction, assessed using the GRADE-Narrative approach. Factors considered include risk of bias, consistency, imprecision, and potential publication bias.

Table 1: Summary of evidence certainty by ML model (GRADE-narrative assessment).
ML Model	Evidence Base	Risk of Bias	Consistency	Imprecision	Publication Bias	Certainty of Evidence
Random Forest (RF)	5 studies	Moderate	Moderate	Serious	Suspected	Low
XGBoost	3 studies	Moderate	Low	Serious	Suspected	Very Low
Support Vector Machine (SVM)	4 studies	Low	Moderate	Moderate	Undetected	Moderate
Deep Learning (DL)	2 studies	Serious	Low	Serious	Likely	Very Low
Hybrid/Ensemble	3 studies	Moderate	Moderate	Moderate	Suspected	Low

ML Techniques in Injury Prediction – ML Methods Overview

Supervised Learning

This section outlines the key ML paradigms and variables used in injury risk prediction models. ML models can be broadly categorized into supervised, unsupervised, and DL approaches. Supervised learning involves labeled datasets and includes algorithms such as RFs, SVMs, and logistic regression.^8,9 Unsupervised learning, such as clustering and principal component analysis, identifies hidden patterns in unlabeled data.² DL models, including neural networks and recurrent architectures, are particularly effective for processing complex, multimodal inputs like video or sensor data.^10,11 Each model was selected based on its fit for the dataset characteristics, interpretability, and performance in prior literature.

Another type of ML is DL. This is based on neural network models, which are inspired by the principles of how the human brain functions. Unlike other types of ML, DL can extract data from raw data (images and videos). DL further includes Back Propagation, an effective technique that modifies the model’s internal parameters through a process known as gradient descent to aid in learning. This enables the training of deep neural networks even with massive datasets.¹¹ In many domains, DL has enhanced performance. For instance, DL models now outperform conventional ML techniques in human activity recognition, such as assessing movement using wearable sensors or videos. These advancements have been particularly evident in fields that utilize computer vision and inertial measurement units, resulting in more accurate identification and analysis of human movement. Additionally, recurrent neural networks (RNNs), which retain past neural inputs for future predictions, have also demonstrated significant improvements.¹¹

Each algorithm applied in the literature was selected based on the nature of available data, the dimensionality of features, and the need for interpretability. For instance, RF and XGBoost were often favored for their capacity to handle non-linear data and imbalanced datasets. At the same time, SVMs excelled in smaller sample contexts due to their margin maximization strategy. However, DL, despite offering the best performance on multimodal inputs such as video or GPS data, often lacks transparency and requires large datasets to prevent overfitting.

Results

This review aimed to identify ML strategies to predict the risk of injury in multiple sports. Most studies have focused on sports with a high risk of injury, such as football, soccer, rugby, and basketball. We identified a total of five studies reporting the use of ML in predicting football (soccer) injuries. The studies (Table 2) by Anne Hecksteden et al., Nikki Rommers et al., Diogo Nuno Freitas et al., Iñaki Ruiz-Pérez et al, Jon L. Oliver et al., and Reza Saberisani et al. included 88, 734, 34, 206, 355, and 25 football players, respectively. Four studies were prospective, and one was a longitudinal design.^1–6,8 Two studies on basketball players, by Juri Taborri et al. and Susanne Jauhiainen et al., were analyzed, reporting on 39 basketball players and 314, including those from basketball and floorball.^9,10 Furthermore, one retrospective study by Arie-Willem de Leeuw et al. included 14 volleyball players.¹¹ The remaining two studies involved 122 and 880 general athletes, as Maria Henriquez et al. and Susanne Jauhiainen et al. reported, respectively.^11,12

Table 2: Summary of all the ml studies with details about sample size, ml model used, and performance metric.
Author	Sport	Sample Size	Study Design	ML Models	Performance Metrics	Key Findings
Anne Hecksteden et al.	Football	88	Not specified	Gradient Boosting	ROC AUC	Gradient boosting was used; performance was evaluated via the ROC AUC.
Nikki Rommers et al.	Football	734	Prospective	XGBoost	F1-score	High-performing XGBoost model; used F1-score.
Diogo Nuno Freitas et al.	Football	34	Not specified	SVMs, FNNs, AdaBoost	ROC AUC	Reported 74.22% overall accuracy, 71.43% sensitivity, 74.19% specificity.
Iñaki Ruiz-Pérez et al.	Football	206	Longitudinal	Decision Tree, AD Tree, SVMs	ROC AUC, F-score	A combination of tree-based and SVM Models was used.
Jon L. Oliver et al.	Football	355	Not specified	Decision Tree	ROC AUC	Applied decision trees to predict injury.
Reza Saberisani et al.	Football	25	Not specified	Decision Tree	ROC AUC	Small sample study using decision trees.
Juri Taborri et al.	Basketball	39	Not specified	Landing Error Score System	F1-score	Injury risk is classified using biomechanics.
Susanne Jauhiainen et al.	Basketball & Floorball	314	Not specified	RF, Logistic Regression	ROC AUC (0.98)	Very high performance noted (AUC 0.98).
Arie-Willem de Leeuw et al.	Volleyball	14	Retrospective	Subgroup Discovery	Not specified	Small sample, subgroup-based pattern mining.
Maria Henriquez et al.	Mixed Athletes	122	Not specified	RF	ROC AUC (0.689)	Used ROC AUC; mainly false positives reported.
Susanne Jauhiainen et al.	Mixed Athletes	880	Not specified	RF, SVM	ROC AUC	Applied multiple ML models in a large cohort.

Among the assessed models, high efficacy was consistently observed in the RF and XGBoost models. RF models help handle features with a range of distributions, as they have no official distribution assumptions. They can also manage multimodal data, allowing for the interpretation of meaningful relationships between features and outcome variables. Maria Henriquez et al. used the performance metric receiver operating characteristic (ROC) area under the curve (AUC) to evaluate the performance of their RF machine models.¹¹ The final ROC AUC accuracy metric was 68.90%, with errors primarily resulting from false positives rather than false negatives.¹¹ Susanne Jauhiainen et al. also used RF, and the training ROC AUC values were high (AUC 0.98). ROC-AUC of 1 indicates a very accurate prediction, while a value of 0.5 implies a purely random prediction.10 The AUC range for the RF plot was 0.78–0.98, suggesting better model performance.^10,11,13 The SVM also demonstrated high performance, with an AUC ranging from 0.85 to 0.96.^3,9,10 XGBoost was also perceived as a high-performing model, as evidenced by studies that reported a precision of 84% for injury prediction.² A study reported that SVM, Feedforward Neural Networks (FNNs), and Adaptive Boosting (AdaBoost) showed a good accuracy in detecting injuries, obtaining a resulting sensitivity of 71.43%, specificity of 74.19%, and overall accuracy of 74.22%.³

Discussion

The current review aims to analyze the integration of ML in sports such as football, volleyball, basketball, and floorball to predict injuries that players may experience.

Model Performance: A comparative summary of strengths, limitations, and optimal applications of various ML models is provided in Table 3. This table identifies the likelihood of injury, enabling coaches to adjust training intensity and techniques to prevent potential injuries. Gradient boosting and AdaBoost are also ML algorithms that are transparent and simpler to implement.¹⁴

Table 3: Comparative appraisal of ml models in sports injury prediction.
ML Model	Strengths	Limitations	Best Use Case/Sport	Performance Metrics	Interpretability
RF	Robust to noise; handles non-linear, multi-factorial data; works with unbalanced datasets.	Less interpretable (“black-box”); performance can drop with high-dimensional irrelevant variables.	Football, multi-sport datasets	AUC: 0.78–0.98	Low
XGBoost	High efficiency; excels with imbalanced data; regularization helps prevent overfitting.	Requires extensive parameter tuning; computationally intensive on large datasets.	Youth football, elite training groups	Precision: ~84%	MediumX
SVM	Excellent for small, high-dimensional datasets; strong generalization with kernel tricks	Low scalability; complex kernel configurations reduce transparency	Basketball, volleyball	AUC: 0.85–0.96	Medium
DL	Powerful with multimodal inputs (video, GPS, sensor); learns from raw data automatically.	Requires large labeled datasets, high training cost, and poor interpretability	Wearables, elite teams with rich data	Varies (often high but dataset-dependent)	Low
Hybrid/Ensemble Models	Combines the strengths of multiple algorithms; resilient to noise and overfitting	Difficult to interpret which component drives predictions; higher computational demands	Research settings: diverse sports	Accuracy: ~74.2%, Sensitivity: ~71.4%, Specificity: ~74.2%	Low–Medium

Challenges

There are certain challenges; for instance, some studies had a very small sample size, ranging from 14 to 122.^1,3,8,11 Furthermore, interpreting and explaining data are significant barriers; coaches and doctors may be discouraged from adopting tools they do not understand. Technical experts would be required to decipher the results, which could lead to increased costs. However, emerging AI tools like SHapley Additive exPlanations (SHAP) provide actionable and interpretable output that is easier to understand.² SHAP is a type of model clarification and explainability framework that can be integrated with ML to offer understanding into the model decision process.¹⁵ Furthermore, measures such as AUC may not be the best option for assessing ML performance, as they only consider black and white outcomes, that is, whether an individual is injured or not. Conversely, other methods, such as Brier Score and the Logarithmic Loss, can determine the exact predicted probability of injury.16 Presently, many ML models are also regarded as ‘black boxes’, making them less transparent, impeding independent evaluation of model performance, uses, and comprehensibility.1⁵

Many studies have employed cost-sensitive models for detecting injuries, an approach adopted due to the disparity between injury and non-injury data.³ However, some studies did not use this model.^2,8,11 Other studies only reflected the first injury, not the multiple injuries sustained throughout the season.² Moreover, from a practical perspective, the increasing number of confounding variables is a limitation to finding the actual risk of injury using ML models. For example, an athlete might have a very high risk of injury, but not get injured due to the lack of playtime, while other players with a low risk of injury might face harm due to confounding factors.¹¹ GPS-based models can be inaccessible for many practitioners due to the high costs in applied sports settings (250 euros per unit).⁴ Data availability also poses a significant challenge for ML, as it relies on the nature and features of the data for efficient working. It needs constant availability of data. This data availability can be costly.¹⁵ Newer studies have suggested that wearable devices and mobile applications can overtake older laboratory motion data collection methods.^3,10

Comparative Appraisal of ML Models

RF: RF models are robust to noise and missing values, making them suitable for real-world sports data. They consistently rank high in AUC scores (0.78–0.98), particularly in football and multi-sport datasets. However, their “black-box” nature hinders clinical acceptance due to low interpretability. Moreover, their performance may degrade in high-dimensional datasets with many irrelevant variables unless feature selection is optimized. XGBoost, known for its computational efficiency and superior handling of imbalanced datasets, achieved precision scores of up to 84%. It is particularly effective in football and elite youth cohorts. However, it demands extensive parameter tuning and may overfit on smaller datasets.

SVMs: SVMs demonstrated strong performance, with AUCs ranging from 0.85 to 0.96, in small to mid-sized datasets. Their strength lies in handling high-dimensional spaces; however, the kernel trick complicates both interpretability and scalability. They are best suited for controlled settings, such as basketball or volleyball.

DL: DL models, such as RNNs and FNNs, have demonstrated high accuracy, particularly when processing video or wearable sensor data. However, their need for large annotated datasets and poor explainability limits their broader use. These models are more promising in elite teams with access to rich, continuous monitoring.

Hybrid Models and Ensembles: Techniques combining SVM, RF, and AdaBoost showed balanced performance (e.g., 74.2% overall accuracy), benefiting from ensemble robustness. However, computational load and difficulty in identifying which component drives prediction hinder real-time application. Table 2 summarizes the performance metrics of these ML models in prior studies and contextualizes their real-world usability in sports injury prediction.

Ethical Considerations

Ethical considerations are also essential when designing, governing, and implementing machine-based models, including factors such as honesty, truthfulness, transparency, privacy, and safety. AI-related ethical issues must be disclosed to health professionals and athletes beforehand to ensure compliance with ethical standards.¹⁵ The introduction of biases is another direct ethical problem that needs to be taken into consideration, as these biases can lead to incorrect decisions. They show a greater tendency to discriminate based on race, which is a crucial factor in healthcare and its delivery. For example, a heart-related mortality algorithm by the American Heart Association showed that if two patients present with similar symptoms. Still, one is White and the other is Black, the prediction indicated that the white patient is at higher risk, encouraging the doctors to allot more resources to the white patient. Such issues are even more substantial if they are undetected by medical professionals, as they would not be able to stop algorithms from learning or integrating such bias.^17,18 Major security challenges also pose a threat; multiple new research studies have acknowledged the susceptibility of ML systems to adversarial ML attacks. These attacks have been noticed on medical systems that employ ML.¹⁹

There are also concerns regarding the dehumanization of clinical decision-making due to over-reliance on healthcare professionals. This could also lead to physicians neglecting patients’ values and past experiences when deciding on an intervention and relying solely on algorithms. This also might limit intervention choices.¹⁸ Transparent reporting standards, such as TRIPOD-AI and PROBAST-AI, are being developed to enhance ethical compliance in model development and validation.²⁰

Research Gaps and Implications for Practice

Despite promising results, current research exhibits several gaps. Most studies rely on single-center datasets with small sample sizes, which limits their external validity and reproducibility. There is also a lack of standardized injury definitions and uniform data collection protocols, which hinders meta-analysis and model comparison. Additionally, few studies account for recurrent injuries or dynamic player conditions across a season. This review offers novelty by consolidating and critically examining ML approaches across a diverse range of team sports, including underrepresented fields like floorball. We also uniquely highlight model interpretability, real-world adoption barriers, and ethical dimensions that are often overlooked in prior reviews. Our integration of emerging concepts such as SHAP, adversarial ML, and federated learning adds forward-looking value to the current literature.

Using the GRADE-Narrative method, we observed that most studies offer very low to low certainty of evidence, mainly due to small sample sizes, heterogeneity in ML models, lack of standardized injury definitions, and high risk of publication bias. For example, while RF and XGBoost models showed high predictive metrics, the absence of external validation in most studies reduced overall confidence. Studies applying SVMs and hybrid models demonstrated slightly higher consistency in reporting; however, the limited interpretability of DL models and inconsistent follow-up protocols in the included research further downgraded certainty ratings.

Future Directions

To enhance practical relevance, future research should prioritize multicenter, longitudinal studies that represent diverse athletic populations. Techniques like federated learning can be employed to aggregate data from multiple sources without compromising privacy.²¹ Transfer learning may also allow knowledge sharing across different sports or populations with limited data.²² Explainable AI methods, such as SHAP and LIME, should be integrated to improve understanding and facilitate real-world adoption by coaches and sports medicine professionals.²

Practitioners should note that ML tools are most effective when used in conjunction with expert knowledge and experience.^23–25 For maximum benefit, interdisciplinary collaboration is crucial, and end-users must be trained to interpret model outputs effectively. The translation of model predictions into actionable training and rehabilitation strategies will determine the actual impact of ML on athlete health.^26,27 Injury prediction in sports presents unique challenges that necessitate the development of tailored ML models. No single model universally outperforms others; rather, model selection should be context-dependent, balancing accuracy with interpretability and resource availability. Future studies should focus on comparative validations across different sports using unified datasets to establish model benchmarks.^28–30

Conclusion

In this review, we showed that ML is proving to be a transformative force in sports injury prediction, offering promising solutions for early detection, athlete monitoring, and performance optimization. This review emphasized that tree-based models such as RF and XGBoost currently lead the field due to their adaptability, ability to handle nonlinear data, and overall robustness. DL models, while powerful for multimodal data such as videos and sensor outputs, remain constrained by interpretability issues that limit their clinical applicability. Despite promising accuracy metrics, real-world implementation remains challenging. Common barriers include small sample sizes, heterogeneous methodologies, inconsistent injury definitions, and limited generalizability.³¹

Furthermore, ethical considerations, particularly data privacy, transparency, and informed consent, must be embedded within future ML applications in sports. Cost constraints, lack of standard data protocols, and a reliance on ‘black-box’ algorithms further reduce model trust and adoption by practitioners and coaches. To move forward, future research should focus on developing explainable models using frameworks such as SHAP, standardizing injury definitions, and building multicenter datasets for improved model validation and generalization. Collaboration between sports scientists, medical professionals, data analysts, and ethicists will be critical in transforming ML tools into practical, athlete-centered solutions. Finally, integrating ML into routine athlete care must prioritize accuracy, interpretability, fairness, and ethical use, ensuring that technological innovation genuinely serves to protect and enhance the well-being and performance of athletes.

References

Emery CA, Pasanen K. Current trends in sport injury prevention. Best Pract Res Clin Rheumatol Ask ChatGPT. 2019;33(1):3–15. https://doi.org/10.1016/j.berh.2019.02.009
Emery CA, Tyreman H. Sport participation, sport injury, risk factors and sport safety practices in Calgary and area junior high schools. Paediatr Child Health. 2009;14(7):439–44.
Tranaeus U, Gledhill A, Johnson U, Podlog L, Wadey R, Wiese Bjornstal D, et al. 50 years of research on the psychology of sport injury: a consensus statement. Sports Med. 2024;54(7):1733–48. https://doi.org/10.1007/s40279-024-02045-w
Tee JC, McLaren SJ, Jones B. Sports injury prevention is complex: we need to invest in better processes, not singular solutions. Sports Med. 2020;50(4):689–702. https://doi.org/10.1007/s40279-019-01232-4
Reis FJ, Alaiti RK, Vallio CS, Hespanhol L. Artificial intelligence and machine-learning approaches in sports: concepts, applications, challenges, and future perspectives. Brazil J Phys Ther. 2024;28:101083. https://doi.org/10.1016/j.bjpt.2024.101083
Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. Int J Surg. 2021;88:105906.
Murad MH, Mustafa RA, Schünemann HJ, Sultan S, Santesso N. Rating the certainty in evidence in the absence of a single estimate of effect. Evid Based Med. 2017;22(3):85–7. https://doi.org/10.1136/ebmed-2017-110668
Huang X. Predictive models: regression, decision trees, and clustering. Appl Comput Eng. 2024;79:124–33. https://doi.org/10.54254/2755-2721/79/20241551
Vargas M, Biggs D, Larraín T, Alvear A, Pedemonte JC, de Anestesiología R. Inteligencia artificial en medicina: Métodos de modelamiento (Parte I). Rev Chil Anest. 2022;51(5):527–34. https://doi.org/10.25237/revchilanestv5129061230
Cust EE, Sweeting AJ, Ball K, Robertson S. Machine and deep learning for sport-specific movement recognition: a systematic review of model development and performance. J Sports Sci. 2019;37(5):568–600. https://doi.org/10.1080/02640414.2018.1521769
Grossberg S. Recurrent neural networks. Scholarpedia. 2013;8(2):1888. https://doi.org/10.4249/scholarpedia.188
Hecksteden A, Schmartz GP, Egyptien Y, Aus der Fünten K, Keller A, Meyer T. Forecasting football injuries by combining screening, monitoring and machine learning. Sci Med Football. 2023;7(3):214–28. https://doi.org/10.1080/24733938.2022.2095006
Rommers N, Rössler R, Verhagen E, Vandecasteele F, Verstockt S, Vaeyens R, et al. A machine learning approach to assess injury risk in elite youth football players. Med Sci Sports Exerc. 2020;52(8):1745–51. https://doi.org/10.1249/MSS.0000000000002305
Freitas DN, Mostafa SS, Caldeira R, Santos F, Fermé E, Gouveia ÉR, et al. Predicting noncontact injuries of professional football players using machine learning. Dwyer D, editor. PLoS One. 2025;20(1):e0315481.
Ruiz-Pérez I, López-Valenciano A, Hernández-Sánchez S, Puerta-Callejón JM, De Ste Croix M, Sainz de Baranda P, et al. A field-based approach to determine soft tissue injury risk in elite futsal using novel machine learning techniques. Front Psychol. 2021;12:610210. Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7892460/
Oliver JL, Ayala F, De Ste Croix MBA, Lloyd RS, Myer GD, Read PJ. Using machine learning to improve our understanding of injury risk and prediction in elite male youth football players. J Sci Med Sport. 2020;23(11):1044–8.
Char DS, Shah NH, Magnus D. Implementing machine learning in health care – addressing ethical challenges. N Engl J Med. 2018;378(11):981–3. https://doi.org/10.1056/NEJMp1714229
O’Reilly-Shah VN, Gentry KR, Walters AM, Zivot J, Anderson CT, Tighe PJ. Bias and ethical considerations in machine learning and the automation of perioperative risk assessment. Br J Anaesth. 2020;125(6):843–6. https://doi.org/10.1016/j.bja.2020.07.040
Rasheed K, Qayyum A, Ghaly M, Al-Fuqaha A, Razi A, Qadir J. Explainable, trustworthy, and ethical machine learning for healthcare: a survey. Comput Biol Med. 2022;149:106043. https://doi.org/10.1016/j.compbiomed.2022.106043
Collins GS, Dhiman P, Andaur Navarro CL, Ma J, Hooft L, Reitsma JB, et al. Protocol for development of a reporting guideline (TRIPOD-AI) and risk of bias tool (PROBAST-AI) for diagnostic and prognostic prediction model studies based on artificial intelligence. BMJ Open. 2021;11:e048008. https://doi.org/10.1136/bmjopen-2020-048008
Ng D, Lan X, Yao MM, Chan WP, Feng M. Federated learning: a collaborative effort to achieve better medical imaging models for individual sites that have small labelled datasets. Quant Imaging Med Surg. 2021;11(2):852–7. https://doi.org/10.21037/qims-20-595
Hosna A, Merry E, Gyalmo J, Alom Z, Aung Z, Azim MA. Transfer learning: a friendly introduction. J Big Data. 2022;9(1):102. https://doi.org/10.1186/s40537-022-00652-w
Saberisani R, Barati AH, Zarei M, Santos P, Gorouhi A, Ardigò LP, et al. Prediction of football injuries using GPS-based data in Iranian professional football players: a machine learning approach. Front Sports Act Living. 2025;7:1425180. https://doi.org/10.3389/fspor.2025.1425180
Taborri J, Molinaro L, Santospagnuolo A, Vetrano M, Vulpiani MC, Rossi S. A machine-learning approach to measure the anterior cruciate ligament injury risk in female basketball players. Sensors. 2021;21(9):3141. https://doi.org/10.3390/s21093141
Jauhiainen S, Kauppi JP, Krosshaug T, Bahr R, Bartsch J, Äyrämö S. Predicting ACL injury using machine learning on data from an extensive screening test battery of 880 female elite athletes. Am J Sports Med. 2022;50(11):2917–24. https://doi.org/10.1177/03635465221112095
de Leeuw AW, van der Zwaard S, van Baar R, Knobbe A. Personalized machine learning approach to injury monitoring in elite volleyball players. Eur J Sport Sci. 2022;22(4):511–20. https://doi.org/10.1080/17461391.2021.1887369
Henriquez M, Sumner J, Faherty M, Sell T, Bent B. Machine learning to predict lower extremity musculoskeletal injury risk in student athletes. Front Sports Act Living. 2020;2:576655. https://doi.org/10.3389/fspor.2020.576655
Jauhiainen S, Kauppi JP, Leppänen M, Pasanen K, Parkkari J, Vasankari T, et al. New machine learning approach for detection of injury risk factors in young team sport athletes. Int J Sports Med. 2020;42(02):175–82. https://doi.org/10.1055/a-1231-5304
Rossi A, Pappalardo L, Cintia P, Iaia FM, Fernàndez J, Medina D. Effective injury forecasting in soccer with GPS training data and machine learning. PLoS One. 2018;13(7):e0201264. https://doi.org/10.1371/journal.pone.0201264
Amendolara A, Pfister D, Settelmayer M, Shah M, Wu V,
Donnelly S, et al. An overview of machine learning applications in sports injury prediction. Cureus. 2023;15:e46170. Available from: https://assets.cureus.com/uploads/review_article/pdf/177498/20231029-9676-1s7wljl.pdf
Van Eetvelde H, Mendonça LD, Ley C, Seil R, Tischer T. Machine learning methods in sport injury prediction and prevention: a systematic review. J Exp Orthop. 2021;8(1):27. https://doi.org/10.1186/s40634-021-00346-x

Cite this article as:
Khan I, Ali H, Siddiqui MF and Geetha GMN. Machine Learning Approaches to Injury Risk Prediction in Sport. Premier Journal of Computer Science 2025;4:100012