Revised Strengthening the reporting of cohort, cross-sectional and case-control studies in surgery (STROCSS) Guideline: An update for the age of Artificial Intelligence

Listen

Riaz A. Agha¹, Ginimol Mathew² , Rasha Rashid³,Ahmed Kerwan⁴ , Ahmed Al-Jabir⁵, Catrin Sohrabi² , Thomas Franchi⁶ , Maria Nicola⁷ , Maliha Agha¹ ; STROCSS Group

Premier Science, London, UK
Royal Free London NHS Foundation Trust, London, UK
Imperial College School of Medicine, London, UK
Harvard T.H. Chan School of Public Health, Boston, USA
University College London Hospital, London, UK
Wellington Regional Hospital, Te Whatu Ora Capital Coast and Hutt Valley, Wellington, New Zealand
Imperial College London, London, UK

Correspondence to: Riaz Agha, Premier Science, riaz@premierscience.com

STROCSS 2025 Guideline ChecklistDownload

download-pdf — **Download the STROCSS 2025 checklist**

DOI: https://doi.org/10.70389/PJS.100081

STROCSS Group Contributors

Achilleas Thoma, McMaster University, Canada
Alessandro Coppola, Sapienza University of Rome, Italy
Andrew J Beamish, Swansea Bay University Health Board, Swansea University, UK
Ashraf Noureldin, Almana Hospital, Khobar, Saudi Arabia
Ashwini Rao, Manipal Academy of Higher Education Manipal, India
Baskaran Vasudevan, MIOT Hospital, Chennai, India
Ben Challacombe, Guy’s and St Thomas’ Hospitals, UK
C S Pramesh, Tata Memorial Hospital, Homi Bhabha National Institute and National Cancer Grid, India
Duilio Pagano, IRCCS-ISMETT – UPMC Italy, Italy
Frederick Heaton Millham, Harvard Medical School, USA
Gaurav Roy, Cactus Communications Pvt Ltd, India
Huseyin Kadioglu, Saglik Bilimleri Universitesi, Turkiye
Iain James Nixon, NHS Lothian, UK
Indraneil Mukherjee, Staten Island University Hospital Northwell Health, USA
James Anthony McCaul, Queen Elizabeth University Hospital Glasgow and Institute for Cancer Therapeutics University of Bradford, UK
James Ngu, Changi General Hospital, Singapore
Joerg Albrecht, Cook County Health, USA
Juan Gomez Rivas, Hospital Clinico San Carlos, Madrid, Spain
K Veena L Karanth, District Hospital Udupi, India
Kandiah Raveendran, Fatimah Hospital, Malaysia
M Hammad Ather, Aga Khan University, Pakistan
Mangesh A. Thorat, Centre for Cancer Screening, Prevention and Early Diagnosis, Wolfson Institute of Population Health, Queen Mary University of London, London, UK; Breast Services, Homerton University Hospital, London, UK
Mohammad Bashashati, Dell Medical School, UT Austin, USA
Mushtaq Chalkoo, Government Medical College, Srinagar, Kashmir, India
Oliver J. Muensterer, Dr. von Hauner Children’s Hospital, LMU Medical Center, Munich, Germany
Patrick Bradley, Nottingham University Hospital, UK
Prabudh Goel, All India Institute of Medical Sciences, New Delhi, India
Prathamesh Pai, P D Hinduja Hospital, Khar, India
Priya Shinde, Homerton University Hospital, UK
Priya Ranganathan, Tata Memorial Centre, India
Raafat Yahia Afifi Mohamed, Cairo University, Egypt
Richard David Rosin, University of the West Indies Barbados, Barbados
Roberto Cammarata, Fondazione Policlinico Campus Biomedico, Italy
Roberto Coppola, Campus Bio Medico University, Italy
Rolf Wynn, UiT The Arctic University of Norway, Norway
Salim Surani, Texas A&M University, USA
Salvatore Giordano, University of Turku, Finland
Samuele Massarut, Centro di Riferimento Oncologico Aviano IRCCS, Italy
Shahzad G. Raja, Harefield Hospital, UK
Somprakas Basu, All India Institute of Medical Sciences Rishikesh, India
Syed Ather Enam, Aga Khan University, Pakistan
Teo Nan Zun, Changi General Hospital, Singapore
Todd Manning, Bendigo Health and Monash University, Australia
Veeru Kasivisvanathan, University College London, UK
Vincenzo La Vaccara, Fondazione Policlinico Campus Bio-Medico di Roma, Italy
Zubing Mei, Shuguang Hospital, Shanghai University of Traditional Chinese Medicine, China

Additional information

Ethical approval: N/a
Consent: N/a
Funding: None
Conflicts of interest: The authors have no financial, consultative, institutional, or other relationships that might lead to bias or a conflict of interest.
Author contribution: R.A.A.: conceptualisation and study design, supervision of the Delphi process, data interpretation, manuscript drafting and critical revision, approval of the final manuscript. A.K., A.A.-J., C.S., T.F., G.M., M.N., R.R., M.A. R.A.A: Participation in study design, generation of Delphi survey materials, data collection and analysis, contribution to drafting of new checklist items, manuscript writing and revision, approval of the final manuscript.
Guarantor: Riaz A Agha
Provenance and peer-review:
Unsolicited and externally peer-reviewed
Data availability statement: The Delphi survey data that informed this guideline (individual expert ratings and comments) are confidential and not publicly available, in accordance with the consensus process protocol. All relevant aggregated results are reported in this article.

Keywords: STROCSS guideline update, artificial intelligence in surgery, Delphi consensus STROCSS, AI transparency and ethics, AI reporting standards.

Peer-review
Received: 20 May 2025
Revised: 22 May 2025
Accepted: 23 May 2025
Published: 23 May 2025

Abstract

Introduction: Artificial intelligence (AI) is rapidly transforming healthcare and scientific publishing. Reporting guidelines need to be updated to take into account this advance. The STROCSS Guideline 2025 update adds a new AI-focused domain to promote transparency, reproducibility, and ethical integrity in surgical observational studies involving AI.

Methods: A Delphi consensus exercise was conducted to update the STROCSS guidelines. A panel of 49 surgical and scientific experts were invited to rate proposed new items. In round 1, participants scored each item on a 9-point Likert scale and provided feedback. Items not meeting consensus were revised or discarded.

Results: A 94% response rate occurred amongst participants (46/49) in the first round. Ratings were analysed for agreement levels, and consensus was reached on all six proposed AI-related items. A revised STROCSS checklist is presented which incorporates these new AI related items. Authors are now expected to disclose AI involvement not only in patient care but also in manuscript preparation, as exemplified by this paper.

Conclusion: The STROCSS 2025 guideline provides an up-to-date framework for surgical observational studies in the era of AI. Through a robust consensus STROCSS, we have added specific reporting criteria for AI to ensure that any use of artificial intelligence in a surgical observational study is clearly documented, explained and discussed including with respect to bias and ethics. This update will help maintain the quality, transparency, and clinical relevance of surgical observational studies, ultimately improving their educational value and trustworthiness for the surgical community.

Highlights

The STROCSS 2025 update introduces a new Artificial Intelligence (AI) domain (checklist items 5a–5f) to ensure transparency in surgical observational studies where AI is involved.
The revised guideline was developed via a one-round Delphi consensus exercise among 49 international experts, with 94% (46/49) participating and showing strong agreement on all new AI-related items.
Six new checklist items cover identification of AI use, detailed reporting of AI methods, data and validation, bias mitigation, and ethical considerations in surgical observational studies.
In line with emerging publication standards, the authors used a generative AI tool for language editing of this manuscript and have transparently declared this use, exemplifying the new recommendations for AI disclosure.

Introduction

The concept of AI dates back to Turing’s seminal question “Can machines think?” in 1950.¹ The official birth of AI as a field is traced to the 1956 Dartmouth conference led by John McCarthy, which coined the term “artificial intelligence” and conjectured that aspects of learning and intelligence could be simulated by machines.² In the decades since, AI has transitioned from theory to real-world applications. Notably, AI has grown into a global industry valued at approximately $638 billion in 2024³ and projected to reach an economic impact of $15.7 trillion by 2030.⁴ This explosive growth (illustrated in Figure 1) is driven by breakthroughs in machine learning, big data, cloud computing and computational power.

Figure 1: Projected growth of the global artificial-intelligence market. Source: PwC Global AI Study, 2024. — Figure 1: Projected growth of the global artificial-intelligence market.
^{Source: PwC Global AI Study, 2024}.

In medicine and surgery, AI applications are increasingly prevalent. Early successes of medical AI have been seen in diagnostic specialties – for example, AI-driven image analysis in radiology and pathology has achieved impressive accuracy, often exceeding human performance in detecting subtle findings.⁵ In surgical disciplines, AI is being explored for enhancing preoperative planning, intraoperative guidance (such as robotic surgery and real-time decision support), and postoperative outcome prediction.⁵ These technologies promise to augment the surgeon’s capabilities and personalise patient care. However, with this promise comes new responsibility: clinicians and researchers must ensure that when AI is involved in patient management, it is reported transparently and with sufficient detail to appraise its validity and safety

Strengthening The Reporting of Cohort, Cross-sectional and Case-Control Studies in Surgery (STROCSS) guideline – originally introduced in 2017⁶ and last updated in 2024⁷ – required further revision to urgently address AI-related reporting.⁸ Developed by Agha et al., the original STROCSS checklist aimed to improve the clarity, consistency, and educational value of cohort studies in surgery⁶ Subsequent updates in 2019⁹, 2021¹⁰, and 2024⁷ expanded and refined the criteria in response to feedback and evolving best practices. These guidelines have significantly improved reporting quality of surgical observational studies, although adherence by authors and journals has varied.¹¹ Prior research in this area has shown significant deficiencies in reporting amongst 92 case series that met inclusion criteria.¹² These included; failure to use standardised definitions (57%), missing or selective data (66%), transparency or incomplete reporting (70%), whether alternative study designs were considered (11%) and other issues (52%).¹²

Despite the growing presence of AI in healthcare, a gap exists in the STROCSS 2024 checklist – there were no specific items addressing how to report the use of AI in a surgical observational study. Omission of such details could lead to under-reporting of critical information. In other study designs, the need for AI-specific reporting guidelines has been recognised; for instance, the CONSORT-AI and SPIRIT-AI extensions have been published to guide reporting of clinical trials and protocols involving AI.^8,13-15 To ensure that surgical observational studies keep pace with these developments, an update to the STROCSS guidelines was imperative. This 2025 update focuses on integrating AI-related reporting standards into the established STROCSS structure.

In this paper, we describe the methods and outcomes of the STROCSS 2025 guideline update. We introduce a new domain of checklist items dedicated to AI and elaborate on their rationale. We also discuss the importance of these additions in the context of transparency, bias mitigation, and reproducibility, which are crucial for maintaining trust in both surgical observational studies and AI systems. Notably, in alignment with the principle of transparency, we also document our own use of AI during the preparation of this manuscript, as recommended by emerging editorial policies.¹¹ The updated STROCSS 2025 guideline will help authors of surgical observational studies provide clear and accountable descriptions when AI is part of patient care or part of the report-generation. Ultimately, this will enhance the value of surgical observational studies as scholarly contributions in an era where AI is becoming an integral part of healthcare.

Methods

Guideline development approach

The STROCSS 2025 update was developed through a Delphi consensus exercise, consistent with the approach used in prior STROCSS updates.^{6, 16} An initial meeting was held by the STROCSS Group steering committee which brainstormed important updates for the STROCSS guideline. The senior author (RA) put forward AI as an important, timely and critical update to be made at this time. Relevant AI specific items were then drafted, edited and approved to be put forward to a Delphi panel of experts. Invitations were sent via email to 49 experts in surgery, medicine and related fields. Invitees were provided with a summary of proposed new items (focused on AI reporting) and asked to participate in the consensus exercise.

InRound 1, panellists rated each proposed checklist item on a 1–9 Likert scale (where 1 = strongly disagree, 9 = strongly agree) to indicate their agreement that the item should be included in the updated guidelines. Participants could also provide free-text comments suggesting modifications or justifications. We included six candidate items (labelled 5a through to 5f) in the domain of “Artificial Intelligence”, drafted based on a preliminary literature review of AI reporting recommendations and input from the guideline authors. After Round 1, responses were analysed for consensus. An item was defined as achieving consensus for inclusion if ≥70% of respondents rated it 7–9 (agree) and <15% rated it 1–3 (disagree). This threshold was established a priori, in line with common Delphi methodology.⁶ Items that met consensus were provisionally accepted.

Data collection and analysis

The Delphi round was conducted via an online survey platform (Google Forms). Responses were collected anonymously, with panellists identified only by a study ID for tracking response rates. Quantitative data from Likert ratings were exported to Microsoft Excel for calculation of descriptive statistics. For each item, we computed the percentage of respondents who rated it in the high agreement range (7–9), moderate agreement range (4–6), and low agreement range (1–3). These are presented as consensus metrics. Table 1 summarises the score distribution for each new item (5a–5f) in the final round of Delphi.

Throughout the STROCSS, participants were encouraged to be critical and ensure each item added real value to the checklist. The high response rate (46 of 49 invited experts, i.e. 94%) and detailed comments provided indicate robust engagement from the expert panel. All data collected in the Delphi surveys were handled confidentially and were used solely for the purposes of this guideline development.

Integration into the STROCSS checklist

After the Delphi STROCSS, the steering committee finalised the phrasing of each new item (5a–5f) based on the panel’s preferred wording. The new domain was inserted into the STROCSS checklist as Section 5, entitled “Artificial Intelligence”, following the Abstract section (Section 4) and preceding the previous Introduction section (which is now renumbered as Section 6 in the 2025 checklist). This renumbering was done to maintain logical flow: the checklist now first addresses Title, Keywords, Highlights and Abstract (Section 1–4), then the presence of any AI element (Section 5), then Introduction (Section 6), and so forth. The rest of the STROCSS 2024 items were retained with minimal or no changes, aside from renumbering (e.g. what was item 5a “Introduction” in STROCSS 2024 is now item 6a in STROCSS 2025 etc.).

The final STROCSS 2025 checklist thus contains a total of 50 items (up from 44 items in STROCSS 2024) spanning all domains of a surgical cohort study. Table 2 provides the verbatim wording of the six new AI-focused items (5a–5f). These items are intended to be used by authors when preparing surgical observational studies: if an AI tool or algorithm was involved in the case in any manner, the author should address each of these points in the appropriate section of their report. If no AI was involved, these items would simply be marked “not applicable.” In the revised checklist document (available as supplementary material and on the STROCSS website), the new items are highlighted for ease of adoption by authors and journal editors.

Results

Response rate

There were 46 people who participated in the Delphi consensus exercise and this represents a 94% participation rate (46/49). Their characteristics by specialty and country are shown in figures 2 and 3 below.

Characteristics of participants

Figure 2. A bar chart showing specialties that participants who responded practice in. — **Figure 2: A bar chart showing specialties that participants who responded practice in.**

Figure 3. A bar chart showing Countries that participants who responded are from. — **Figure 3: A bar chart showing Countries that participants who responded are from.**

Delphi consensus outcomes

Table 1 below shows Delphi consensus scores for new AI-related checklist items (Section 5 “Artificial Intelligence”). Each value represents the percentage of Delphi panel participants giving a score in that range on the 9-point Likert scale for the item during the final round. Consensus for inclusion was defined as ≥70% of respondents scoring 7–9. All six items exceeded this threshold by a wide margin.

Table 1: Delphi consensus scores for new AI-related checklist items
Item	Summary of item	1–3 (Disagree) [%]	4–6 (Neutral) [%]	7–9 (Agree) [%]
5	AI usage declaration	2.2% (1/46)	2.2% (1/46)	95.7% (44/46)
5a	Purpose and Scope of AI Use	2.2% (1/46)	0% (0/46)	97.8% (45/46)
5b	AI Tool(s) and Configuration	8.7% (4/46)	13.0% (6/46)	78.2% (36/46)
5c	Data Inputs and Safeguards	8.7% (4/46)	6.5% (3/46)	84.8% (39/46)
5d	Human Oversight and Verification	6.5% (3/46)	4.3% (2/46)	89.1% (41/46)
5e	Bias, Ethics and Regulatory Compliance	8.7% (4/46)	8.7% (4/46)	82.6% (38/46)
5f	Reproducibility and Transparency	10.9% (5/46)	17.4% (8/46)	71.7% (33/46)

Following consensus, the six AI items were formally added to the STROCSS checklist. Free text comments made by some contributors led to minor changes like stating whether the AI was integrated with any other systems (added to item 5b), acknowledging the limitations of AI use (added to item 5d) and attempts independent replication of the query/input (added to item 5f). The wording of each item, as finalized, is shown in Table 2. Briefly, these items require authors to: 5a) declare any use of AI in the case and its purpose; 5b) provide details of the AI tool or algorithm (name, version, source); 5c) describe the development or training data of the AI tool (how it was developed and on what data, if known); 5d) report validation or performance metrics of the AI tool used which is relevant to the case; 5e) discuss any biases, limitations, or ethical issues related to AI’s use; and 5f) document patient consent or regulatory considerations for using AI, if applicable.

No changes were made to the core content of other sections of the checklist (Title, Abstract, Introduction, etc.) aside from renumbering due to the insertion of the new section. One minor addition was an explanatory note in the checklist introduction: authors are advised that if AI was not involved in their case, they may skip Section 5, but if AI contributed to diagnosis, management, or even manuscript preparation, the relevant items in Section 5 should be addressed. This ensures that the checklist remains adaptable to all surgical observational studies, whether or not AI is a factor.on: authors are advised that if AI was not involved in their case, they may skip Section 5, but if AI contributed to diagnosis, management, or even manuscript preparation, the relevant items in Section 5 should be addressed. This ensures that the checklist remains adaptable to all observational studies, whether or not AI is a factor.

**Figure 4: Delphi consensus results graph for new AI-related checklist item 5.**

**Figure 5: Delphi consensus results graph for new AI-related checklist item 5a: Purpose and Scope of AI Use.**

**Figure 6: Delphi consensus results graph for new AI-related checklist item 5b: AI Tool(s) and Configuration.**

**Figure 7: Delphi consensus results graph for new AI-related checklist item 5c: Data Inputs and Safeguards.**

**Figure 8: Delphi consensus results graph for new AI-related checklist item 5d: Human Oversight and Verification.**

**Figure 9: Delphi consensus results graph for new AI-related checklist item 5e: Bias, Ethics and Regulatory Compliance.**

**Figure 10: Delphi consensus results graph for new AI-related checklist item 5f: Reproducibility and Transparency.**

Table 2 below shows the new STROCSS 2025 checklist items (Section 5: Artificial Intelligence). Each item should be addressed in the surgical observational studies if applicable. “AI” refers to any artificial intelligence or machine-learning system relevant to the study. These items are intended to ensure transparency and reproducibility when AI is part of a surgical observational studies.

Table 2. New STROCSS 2025 checklist items
Item (AI Domain)	Checklist item description
5. AI usage declaration	Declaration of whether any AI was used in the research and manuscript development If no, proceed to item 6. If yes, proceed to item 5a
5a. Purpose and Scope of AI Use	– Precisely state why AI was employed (e.g. development of research questions, language drafting, statistical analysis/summarisation, image annotation, etc). – Was generative AI utilised and if so, how? – Clarify the stage(s) of the reporting workflow affected (planning, writing, revisions, figure creation). – Confirmation that the author(s) take responsibility for the integrity of the content affected/generated
5b. AI Tool(s) and Configuration	– Name each system (vendor, model, major version/date). – State the date it was used – Specify relevant parameters (e.g. prompt length, plug-ins, fine-tuning, temperature). – Declare whether the tool operated locally on-premises, or via a cloud API and any integrations with other systems.
5c. Data Inputs and Safeguards	– Describe categories of data provided to the AI (patient text, de-identified images, literature abstracts). – Confirm that all inputs were de-identified and compliant with GDPR/HIPAA. – Note any institutional approvals or data-sharing agreements obtained.
5d. Human Oversight and Verification	– Identify the supervising author(s) who reviewed every AI output. – Detail the STROCSS for fact-checking, clinical accuracy checks – State whether any AI-generated text/figures were edited or discarded. – Acknowledge the limitations of AI and its use
5e. Bias, Ethics and Regulatory Compliance	– Outline steps taken to detect and mitigate algorithmic bias (e.g. cross-checking against under-represented populations). – Affirm adherence to relevant ethical frameworks. – Disclose any conflicts of interest or financial ties to AI vendors.
5f. Reproducibility and Transparency	– Provide the exact prompts or code snippets (as supplementary material if lengthy). – Supply version-controlled logs or model cards where possible. – if applicable, state repository, hyperlink or digital object identifier (DOI) where AI-generated artefacts can be accessed, enabling attempts at independent replication of the query/input.

The above items (5a–5f) now form an integral part of the STROCSS 2025 checklist. An author writing up surgical observational studies is expected to incorporate this information into the relevant sections of their manuscript. For example, item 5a would typically be covered in the introduction of a study (where the setting and tools of care are described), whereas items 5b–5d might appear in the methods or results section (detailing what AI was used and its performance), and items 5e–5f are likely to be addressed in the discussion section (reflecting on biases and ethical considerations). By structuring the reporting in this way, readers of the cohort studies will gain a clear understanding of what AI was used, why it was used, how it functioned, and what its limitations are in the context of the study. This level of detail is crucial for interpreting the case’s findings, especially as AI algorithms can significantly influence clinical outcomes.

Discussion

The STROCSS 2025 guideline represents a proactive evolution of surgical observational study standards in response to the growing influence of AI in healthcare. Compared to the STROCSS 2024 update, which primarily refined existing sections, the defining feature of STROCSS 2025 is the introduction of an entirely new domain dedicated to artificial intelligence. This addition marks a significant broadening of the checklist’s scope – acknowledging that cohort, cross-sectional and case-control studies may now involve not only human clinicians and patients, but also AI tools as part of the diagnostic or therapeutic narrative. By explicitly addressing AI, the updated guidelines aim to enhance transparency and reproducibility in cohort studies. This aligns with broader efforts in medical research to improve reporting of AI.⁸

Transparency in reporting AI is the overarching theme of the new domain. Just as STROCSS champions transparency in clinical reporting, we recognise that AI algorithms must not become “black boxes” in case descriptions. Item 5a ensures that authors explicitly declare the use of AI, preventing scenarios where AI’s involvement might be obscured or assumed. This is analogous to disclosing a diagnostic test or a surgical device – readers deserve to know if AI was behind a key decision or outcome. Transparency is also reinforced by item 5b (tool identification) and 5c (development/data), which compel authors to provide enough technical detail for readers to grasp what the AI tool actually is. These items promote reproducibility: a future researcher or clinician reading the study should be able to identify the same AI tool, understand its training context, and thereby judge whether the case’s insights are transferable or credible in other settings.

The emphasis on bias mitigation and ethical considerations (item 5e and 5f) addresses increasing concerns about AI in medicine. AI systems, especially those based on machine learning, can inadvertently carry biases from their training data. If not reported, such biases could lead to misinterpretation of a case – for example, an AI diagnostic tool might perform poorly on certain demographic groups, which would be highly relevant if the study population belongs to that group. By asking authors to discuss AI biases and limitations, STROCSS 2025 aligns with the ethical principle of “do no harm” in publishing. It forces a moment of reflection: the case author must consider what AI might have missed or where it might be wrong. This practice can help mitigate over-reliance on AI and encourages authors to validate AI outputs with clinical judgment. In a broader sense, it contributes to the literature on AI by documenting real-world challenges and failures, not just successes, thereby preventing publication bias in favor of positive AI results.

Future directions for STROCSS and AI in surgical observational studies may include further refinements as the technology evolves. We expect that as more cohort studies are published under the STROCSS 2025 guideline, a body of examples will accumulate, illustrating how authors have implemented these items. We will monitor the uptake of the AI domain – for instance, tracking if authors encounter difficulties in obtaining certain information about proprietary AI tools. If so, this might spur collaborations between clinicians and AI developers to improve transparency (e.g., requiring companies to provide model details when their AI is used in published cohort studies). Additionally, while our current items focus on AI in patient care, future updates might consider AI used in writing or reviewing studies. In fact, the academic community is actively discussing standards for disclosing AI assistance in manuscript preparation. In this STROCSS update, we touch on that aspect by encouraging disclosure (item 5f covers AI in manuscript if it involves patient data or content generation). It’s plausible that a formal guideline for reporting the use of generative AI in scientific writing will emerge; until then, we have set a precedent by openly stating our use of an AI language model for editing this paper.

It is worth reflecting on the limitations of our guideline update STROCSS. First, our Delphi panel, while diverse, was limited to 49 invitees with 46 responders. Important perspectives, such as patients or regulators, were not directly represented. Patients especially might have views on how they want AI usage reported in observational studies (perhaps desiring even more clarity on consent and privacy). In future guideline efforts, including patient representatives as well as other perspectives could be valuable. Second, the AI domain items are somewhat general and meant to apply across all types of AI. AI in surgery can range from simple diagnostic apps to complex autonomous robots; not every item will fit perfectly to every scenario. We attempted to strike a balance with broad wording, but there may be cases that require interpretation of how to apply an item. We will rely on the judgment of authors, reviewers, and editors to implement these guidelines sensibly on a case-by-case basis. Third, as with any consensus-based guideline, there is a degree of subjectivity in what was included and the language in which it is expressed. It is possible that some readers will feel an important AI-related item is missing. We welcome feedback from the surgical community, as the STROCSS guideline is meant to be iterative – future revisions (beyond 2025) can certainly expand or adjust the AI domain as needed.

One immediate challenge is dissemination and training. Introducing six new items means authors must be educated about them. We plan to disseminate the STROCSS 2025 checklist through the EQUATOR Network website, the Premier Science Journals, and presentations at surgical conferences. Additionally, we will encourage journals to require STROCSS 2025 adherence in their observational study submissions, as endorsement by journals greatly drives usage. Experience from previous STROCSS iterations showed that when journal editors mandate the checklist and when authors see the benefit (in improved clarity of their reporting), compliance increases. We anticipate a similar positive impact: clearer reporting of cohort studies involving AI, which in turn will make it easier for readers to learn from those cases or even reproduce aspects of them (for instance, using the same AI tool on a similar patient). Ultimately, better reported studies can feed into higher-level evidence; a well-documented case of AI successfully detecting a rare complication could spur larger studies or inspire others to utilise that AI tool.

During the preparation of this guideline manuscript, we made use of generative AI as a writing aid. Specifically, the tool was used in the later stages to assist with polishing language. No content generation (ideas or drafting of sections) was delegated to AI; it was employed similarly to a grammar/style assistant under close human oversight. We mention this to practice what we preach: transparency about AI usage. As journals and publishers, as well as COPE and WAME, increasingly require disclosure of AI assistance,^13,14 we demonstrate that such disclosure is feasible and can be done without undermining the credibility of the work. The final content was rigorously verified by all authors to eliminate any potential AI-introduced errors (such as incorrect references or “hallucinated” facts). We found that using AI in this limited capacity did improve efficiency in editing, but human expertise remained essential for the substance and accuracy of the guideline. This experience underscores a broader point: AI can be a valuable tool in medical writing and research, but it must be applied responsibly and transparently.

Conclusion

Through a structured Delphi consensus and in response to the rapid expansion of artificial intelligence in healthcare, we have updated the STROCSS guideline to produce STROCSS 2025, a comprehensive reporting guideline for surgical observational studies in the age of AI. The addition of the new AI-focused domain (items 5a–5f) fills a critical gap, ensuring that any use of AI in a case is transparently reported with details on its implementation, validation, and ethical considerations. This update preserves the familiar structure of the STROCSS checklist while integrating modern considerations, thereby enabling authors to produce observational studies that are both up-to-date and rigorous. By following STROCSS 2025, clinicians and researchers will improve the clarity and reliability of observational studies, facilitating better knowledge sharing and ultimately enhancing patient care. As surgical practice increasingly intersects with advanced technologies, STROCSS 2025 will help maintain the integrity and educational value of observational studies, ensuring they remain a cornerstone of surgical literature in the years to come.

References

1. Turing AM. I.-Computing machinery and intelligence. Mind. 1950;59(236):433-460.
https://doi.org/10.1093/mind/LIX.236.433

2. Artificial Intelligence (AI) coined at Dartmouth. Dartmouth College. Accessed May 18, 2025. https://home.dartmouth.edu/about/artificial-intelligence-ai-coined-dartmouth

3. Artificial Intelligence (AI) market size to hit USD 3,680.47 bn by 2034 [cited 2025 May 18]. Available from: https://www.precedenceresearch.com/artificial-intelligence-market

4. PricewaterhouseCoopers. PWC’s global Artificial Intelligence study: sizing the prize. [cited 2025 May 18]. Available https://www.pwc.com/gx/en/issues/artificial-intelligence/publications/artificial-intelligence-study.html

5. McCartney J. AI is poised to “revolutionize” surgery. ACS. [cited 2025 May 19]. Available from: https://www.facs.org/for-medical-professionals/news-publications/news-and-articles/bulletin/2023/june-2023-volume-108-issue-6/ai-is-poised-to-revolutionize-surgery/

6. Agha RA, Borrelli MR, Vella Baldacchino M, Thavayogan R and Orgill DP. The STROCSS statement: Strengthening the Reporting of Cohort Studies in Surgery.
International Journal of Surgery 2017;46:198-202.
https://doi.org/10.1016/j.ijom.2017.02.676

7. Rashid R, Sohrabi C, Kerwan A, Franchi T, Mathew G, Nicola M and Agha RA. The STROCSS 2024 guideline: strengthening the reporting of cohort, cross-sectional, and case-control studies in surgery. International Journal of Surgery 2024; 110(6):3151-3165.
https://doi.org/10.1097/JS9.0000000000001268

8. Liu X, Rivera SC, Moher D, Calvert MJ, Denniston AK. Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: The consort-ai extension. BMJ. 2020 Sept 9; m3164.
https://doi.org/10.1136/bmj.m3164

9. Agha R, Abdall-Razak A, Crossley E, Dowlut N, Iosifidis C and Mathew G, for the STROCSS Group. The STROCSS 2019 Guideline: Strengthening the Reporting of Cohort Studies in Surgery. International Journal of Surgery 2019;72:156-165.
https://doi.org/10.1016/j.ijsu.2019.11.002

10. Mathew G, Agha R, Albrecht J, Goel P, Mukherjee I, Pai P, D’Cruz AK, Nixon IJ, Roberto K, Enam SA, Basu S, Muensterer OJ, Giordano S, Pagano D, Machado-Aranda D, Bradley PJ, Bashashati M, Thoma A, Afifi RY, Johnston M, Challacombe B, Ngu JC, Chalkoo M, Raveendran K, Hoffman JR, Kirshtein B, Lau WY, Thorat MA, Miguel D, Beamish AJ, Roy G, Healy D, Ather HM, Raja SG, Mei Z, Manning TG, Kasivisvanathan V, Rivas JG, Coppola R, Ekser B, Karanth VL, Kadioglu H, Valmasoni M, Noureldin A; STROCSS Group. STROCSS 2021: Strengthening the reporting of cohort, cross-sectional and case-control studies in surgery. International Journal of Surgery 2021;96:106165.
https://doi.org/10.1016/j.ijsu.2021.106165

11. Agha RA, Borrelli MR, Farwana R, Kusu-Orkar T, Millip MC, Thavayogan R, Garner J, Darhouse N, Orgill DP. Impact of the PROCESS guideline on the reporting of surgical case series: A before and after study. International Journal of Surgery. 2017;45:92-97

12. Agha R, Fowler A, Lee S, Gundogan B, Whitehurst B, Sagoo H, Jeong KJL, Altman D and Orgill D. A Systematic Review of the Methodological and Reporting Quality of Care Series in Surgery. British Journal of Surgery 2016;103(10):1253-8.
https://doi.org/10.1002/bjs.10235

13. Zielinski C, Winker MA, Aggarwal R, Ferris LE, Heinemann M, Lapeña JF, et al. Chatbots, generative AI, and scholarly manuscripts: WAME recommendations on chatbots and generative artificial intelligence in relation to scholarly publications. World Association of Medical Editors; 2023 May 31 [cited 2025 May 19]. Available from: https://wame.org/page3.php?id=106
https://doi.org/10.25100/cm.v54i3.5868

14. COPE Council. COPE position-authorship and AI-English. Committee on Publication Ethics; 2023 [cited 2025 May 19]. Available from: https://doi.org/10.24318/cCVRZBms

15. Science journals set new authorship guidelines for AI-generated text [Internet]. U.S. Department of Health and Human Services; [cited 2025 May 18]. Available https://factor.niehs.nih.gov/2023/3/feature/2-artificial-intelligence-ethics

16. Nasa P, Jain R, Juneja D. Delphi methodology in healthcare research: How to decide its appropriateness. World J Methodol. 2021 Jul 20; 11(4):116-29.
https://doi.org/10.5662/wjm.v11.i4.116

Appendix

STROCSS 2025 Guideline Checklist
Topic	Item	Item Description	Page Number
Title	1	The word ‘cohort’ or ‘cross-sectional’ or ‘case-control’ is includedTemporal design of study is stated (e.g. retrospective or prospective)The focus of the study is clearly stated (e.g. population, setting, disease, exposure/intervention, outcome etc.) STROCSS 2025 guidelines apply to all observational studies (e.g. cohort, cross-sectional, case-control etc.)
Highlights	2	Include three to five bullet points that summarise the key findings of the studyProvide a brief background to the study, the key results and clinical relevance
Abstract	3a	Provide a structured abstract that includes the following headings:BackgroundMethodsResultsConclusions
	3b	Background Briefly describe: Relevant contextScientific rationale for this studyAims and objectives
	3c	Methods Briefly describe: Type of study design (e.g. cohort, case-control, cross-sectional etc.)Specification of study design (e.g. retro-/prospective, single/multi-centred etc.)All patient groups involved, including control group, if applicableExposure/interventions (e.g. type, operators, recipients, dates and time frames etc.)Outcome measures – explicitly state primary and secondary outcome(s), where appropriateStatistical methods of assessment used, where applicable
	3d	Results Briefly describe: Summary dataPrincipal findings with qualitative descriptionsStatistical findings and their significance, where appropriate
	3e	Conclusion Describe key conclusions brieflyRefer to implications for clinical practice and public healthDescribe the need for and direction of future researchInclude a concise statement that encapsulates the significance of the research and its contribution to the field
Keywords	4	Include three to six keywords that identify what is covered in the study (e.g. patient population, diagnosis, or surgical intervention)Include study type as a keyword (e.g. cohort study, cross-sectional study, case-control study etc.)Include surgical speciality as one of the keywords.Include study location as one of the keywords.
Artificial Intelligence (AI) (some journals may prefer this in the methods and/or acknowledgments section and it should also be declared in the cover letter)	5	Declaration of whether any AI was used in the research and manuscript development If no, proceed to item 6. If yes, proceed to item 5a.
	5a	Purpose and Scope of AI Use Precisely state why AI was employed (e.g. development of research questions, language drafting, statistical analysis/summarisation, image annotation, etc). Was generative AI utilised and if so, how? Clarify the stage(s) of the reporting workflow affected (planning, writing, revisions, figure creation). Confirmation that the author(s) take responsibility for the integrity of the content affected/generated.
	5b	AI Tool(s) and Configuration Name each system (vendor, model, major version/date). State the date it was usedSpecify relevant parameters (e.g. prompt length, plug-ins, fine-tuning, temperature). Declare whether the tool operated locally on-premises, or via a cloud API and any integrations with other systems.
	5c	Data Inputs and Safeguards Describe categories of data provided to the AI (patient text, de-identified images, literature abstracts). Confirm that all inputs were de-identified and compliant with GDPR/HIPAA. Note any institutional approvals or data-sharing agreements obtained.
	5d	Human Oversight and Verification Identify the supervising author(s) who reviewed every AI output. Detail the process for fact-checking, clinical accuracy checksState whether any AI-generated text/figures were edited or discarded.Acknowledge the limitations of AI and its use
	5e	Bias, Ethics and Regulatory Compliance Outline steps taken to detect and mitigate algorithmic bias (e.g. cross-checking against under-represented populations). Affirm adherence to relevant ethical frameworks. Disclose any conflicts of interest or financial ties to AI vendors.
	5f	Reproducibility and Transparency Provide the exact prompts or code snippets (as supplementary material if lengthy). Supply version-controlled logs or model cards where possible. If applicable, state repository, hyperlink or digital object identifier (DOI) where AI-generated artefacts can be accessed, enabling attempts at independent replication of the query/input.
Introduction	6a	Introduction By referencing key literature throughout, comprehensively describe: Relevant background and scientific rationale for studyAims and objectivesResearch question and hypotheses, where appropriatePotential impact of research on future clinical practiceEconomic relevance of study to society
	6b	Guideline citation At the end of the introduction, refer to the STROCSS 2025 publication by stating: ‘This cohort/cross-sectional/case-control study has been reported in line with the STROCSS guidelines [include citation]’
Methods: Study Design	7a	Study design State the type of study design (e.g. cohort, cross-sectional, case-control etc.)Describe other key elements of study design (e.g. retro-/prospective, single/multi-centred etc.)Specify the duration of the study, including start and end dates
	7b	Setting and timeframe of research Comprehensively describe: Specific geographical locationNature of institution (e.g. primary/secondary/tertiary care setting, district general hospital/teaching hospital, public/private, low-resource setting etc.)Timeline for study, including dates for recruitment, exposure, follow-up, data collection etc.Any deviations from the initial study design plan or changes to the timeline during the research, with reasons and implications stated
	7c	Study groups Total number of participantsNumber of groupsNumber of participants in each groupDetail exposure/intervention allocated to each groupInclusion and exclusion criteria with clear definitions
	7d	Subgroup analysis Comprehensively describe: How subgroups were definedPlanned subgroup analysesMethods used to examine subgroups and their interactions
	7e	Follow up If applicable, comprehensively describe: Time, length, frequency, location and methods of follow-up (e.g. mail, telephone, with whom etc.)Any specific long-term surveillance requirements (e.g. imaging surveillance of endovascular aneurysm repair)Any specific post-operative instructions (e.g. post-operative medications, targeted physiotherapy etc.)
Methods: Participant Recruitment	8a	Recruitment Comprehensively describe: Period of recruitmentMethods of recruitment to each patient group (e.g. all at once, in batches, continuously till desired sample size is reached etc.)Sources of recruitment (e.g. physician referral, study website, social media, posters etc.)Any monetary/non-monetary incentivisation of participants to encourage involvement should be declared (the nature of any incentives provided must be clarified)Any challenges encountered during the recruitment processes, including how they were addressed
	8b	Sample size Comprehensively describe: Analysis to determine optimal sample size for study accounting for population/effect sizePower calculations with justifications for chosen statistical power, where appropriateMargin of error calculationAny associated ethical considerations
Methods: Intervention and Outcomes	9a	Pre-intervention considerations Comprehensively describe any preoperative patient optimisation: Lifestyle optimization (e.g. weight loss, smoking cessation, glycaemic control etc.)Medical optimisation (e.g. medication review, treating hypothermia/-volemia/-tension, ICU care etc.)Procedural optimisation (e.g. nil by mouth, enema etc.)Other (e.g. psychological support, physiotherapy etc.)
	9b	Intervention Comprehensively describe: Type of intervention and reasoning (e.g. pharmacological, surgical, physiotherapy, psychological etc.)Aim of intervention (e.g. preventative/therapeutic)Total cost of performing the interventionDegree of novelty of interventionAny learning required for interventionPrevalence or frequency at which the intervention is performedConcurrent treatments (e.g. antibiotics, analgesia, antiemetics, VTE prophylaxis etc.)Manufacturer and model details, where appropriate
	9c	Intra-intervention considerations Using figures and other media to illustrate wherever appropriate, comprehensively describe: Details pertaining to administration of intervention (e.g. anaesthetic, positioning, location, preparation, equipment needed, devices, sutures, operative techniques, operative time etc.)For pharmacological therapies, the formulation, dosages, routes, strength and durationsFor surgery, any post-operative instruction (e.g. when to remove staples or sutures)The degree of novelty for a surgical technique/device (e.g. ‘first in human’)
	9d	Operator details Comprehensively describe: Requirement for additional trainingLearning curve for technique, including how it was evaluated (e.g. number of cases required to reach a defined level of proficiency)Relevant training, specialisation and operator’s experience (e.g. average number of the relevant procedures performed annually)Any institutional support that was provided to operators to facilitate their training
	9e	Setting of intervention Comprehensively describe: Setting in which the intervention was performedLevel of experience the centre has in performing the intervention
	9f	Quality control Comprehensively describe: Measures taken to reduce inter-operator variability (e.g. regular team meetings, calibration exercises)Measures taken to ensure consistency in other aspects of intervention delivery (e.g. data collection)Measures taken to ensure quality in intervention delivery
	9g	Post-intervention considerations Comprehensively describe: Post-operative instructions and care (e.g. avoid heavy lifting, dietary restrictions etc.)Follow-up measuresFuture surveillance requirements (e.g. blood tests, imaging etc.)How patient engagement with post-intervention instructions will be encouraged and monitoredIf applicable, the criteria for patient discharge from the medical facility
	9h	Definition of outcomes Define primary outcomes, including validation with full reference to relevant studies, where applicableDefine secondary outcomes, where appropriateDescribe methods or instruments used to measure each outcome, with full reference given if validatedDescribe follow-up period for outcome assessment, divided by group
	9i	Statistics Comprehensively describe: Statistical tests and statistical package(s)/software usedRationale behind the statistical tests/software of choiceConfounders and their control, if knownAnalysis approach (e.g. intention to treat/per protocol)Any subgroup analysesLevel of statistical significanceHow the results of the statistical analyses are presented (e.g. p-values, confidence intervals, point estimates etc.)
Results	10a	Participants Comprehensively describe: With reasons, the flow of participants (recruitment, non-participation, cross-over and withdrawal), using a figure to illustrate where appropriatePopulation demographics (e.g. age, gender, relevant socio-economic features, prognostic features etc.)Any significant numerical differences across groupsIf applicable, the longitudinal changes in participant flow/demographics over time
	10b	Participant comparison Include table comparing baseline characteristics of cohort groups, with statistical data includedIn a concise manner, highlight the principal, significant findings Describe any group matching, with methods
	10c	Outcomes Comprehensively describe: Clinician-assessed and patient-reported outcomes (e.g. questionnaires with quality-of-life scales) for each groupExpected versus attained outcomes, as assessed by the clinicianPrimary and secondary outcomes, as previously defined (8i)Details of when the outcomes were recorded (e.g. at how many months/years post-operatively)Relevant photographs and imaging are desirableAny confounding factors and state which ones are adjusted and howAny changes to interventions, with rationale and diagram, if appropriate NB: reference relevant literature to inform expected outcomes
	10d	Tolerance Comprehensively describe: Assessment of tolerability of exposure/intervention within patient groupsMethods of measuring tolerance/adherenceIf applicable, specific patient perspectivesWhether these results will have an impact on the long-term applicability of the findings in clinical practiceLoss to follow-up (fraction and percentage), with reasons
	10e	Complications Comprehensively describe: Adverse events, and classify according to Clavien-Dindo classificationTiming of adverse eventsPrecautionary measures taken to prevent complications (e.g. antibiotic or venous thromboembolism prophylaxis)Management of adverse events (e.g. blood transfusion, wound care, revision surgery etc.)If applicable, whether the complication was reported to the national agency/pharmaceutical companyIf applicable, specify whether any complications were discussed locally and the impact of such discussions (e.g. during team morbidity & mortality meetings)State explicitly if there were no complications/adverse outcomes Dindo D, Demartines N, Clavien P-A. Classification of Surgical Complications. A New Proposal with Evaluation in a Cohort of 6336 Patients and Results of a Survey. Ann Surg. 2002; 240(2): 205-213
	10f	Key results Describe: Key findings, supported by relevant raw data and corresponding statistical analyses with significance
Discussion	11a	Principal findings By referencing key, relevant literature throughout, comprehensively describe: Summary of key findings and conclusionsRationale behind conclusions drawnComparison to current gold standard of care, current guidelines or similar researchImplications of findings for future clinical practice and guidelinesRelevant hypothesis generation
	11b	Strengths and limitations Comprehensively describe: Strengths of the studyWeaknesses and limitations of the studyMeasures taken to overcome the limitations, if applicablePotential impact on results and their interpretationAssessment and management of biasDeviations from protocol, with reasons stated
	11c	Relevance and implications Comprehensively describe: Relevance of findingsPotential implications for future clinical practice and guidelinesMeasures that can be taken to enhance the quality of research studyNeed for and direction of future research
Conclusion	12	Conclusions Summarise key conclusions, in a concise and succinct mannerOutline scope for and direction of future research
Additional information	13a	Registration In accordance with the Declaration of Helsinki, state the unique research registration number and where it was registered, with a hyperlink to the registry entry (this can be obtained from ResearchRegistry.com, ClinicalTrials.gov, ISRCTN etc.)N.B. All retrospective studies should be registered before submission; it should be stated that the research was retrospectively registered.* *‘Every research study involving human subjects must be registered in a publicly accessible database before recruitment of the first subject’
	13b	Ethical approval Whether ethical approval was needed or not, stated explicitlyReason(s) why ethical approval was/was not neededName of the body giving ethical approval and approval number
	13c	Informed consent State explicitly whether informed consent was obtained, or not.State reason(s) why informed consent was/was not obtainedState the nature of consent (e.g. verbal, written, digital/virtual)The authors must provide evidence of consent, where applicable, and if requested by the journalConsent should be provided for both the original intervention/procedure and publication of the study If consent was not provided by the patient themselves, explain why (e.g. death of the patient and consent provided by next of kin). If the patient or family members were untraceable, then document the tracing efforts undertaken
	13d	Protocol Give details of protocol (a priori or otherwise) including how to access it (e.g. web address, DOI etc.)Give details of protocol registration (e.g. protocol registration number, protocol registry’s name etc.)If published in a journal, cite and provide a full referenceIf applicable, detail any amendments made to the original protocol, giving reasons why the changes were made
Declarations	14a	Contributorship Acknowledge any patient and/or public and/or professional involvement in researchReport the extent of involvement of each contributor, specifically stating what they contributed to (e.g. patient recruitment, defining research outcomes, dissemination of results etc.).
	14b	Conflicts of interest Conflicts of interest, if any, are described
	14c	Funding Sources of funding (e.g. grant details), if any, are clearly statedRole of funder statedGuarantor named
	14d	Data sharing statement Explicitly state whether or not the datasets generated during study are available on request

Cite this article as:
Agha RA, Mathew G, Rashid R,Kerwan A, Al-Jabir A, Sohrabi C, Franchi T, Nicola M, Agha M, STROCSS Group. Revised Strengthening the reporting of cohort, cross-sectional and case-control studies in surgery (STROCSS) Guideline: An update for the age of Artificial Intelligence. Premier Journal of Science 2025:10;100081