Data collection & patient recruitment

The Microb-AI-ome partners comprise five expert clinical centres for CRC screening: Three in Ireland (MMUH, SVUH Dublin and Mercy Hosp. Cork) and AP-HP and ANGH hospitals in France.

During the course of the project, the Irish programme (BowelScreen) will recruit n=1,500 individuals who are FIT +ve and undergoing colonoscopy. The French partner will recruit through the APHP hospitals network and the ANGH organisation (Association Nationale des Hépatogastroentérologues des Hôpitaux généraux). These organisations cover a wide network of large volume endoscopic centres distributed all over France and will recruit approx. n=2,500 individuals who are FIT+ve or FIT–ve undergoing colonoscopy during opportunistic screening.

Following informed consent, clinical data, nutritional data, and faecal samples will be collected. Colonoscopy findings will be documented according to cancer, polyps (high and low risk), normal colonoscopies or other conditions (Crohn, diverticular disease, irritable bowel syndrome). All patient data will be subject to GDPR conditions.

The clinical, colonoscopic and pathological data will subsequently be associated with the metadata (WP3) to determine the effectiveness of the microbiome signal (Microb-AIs, CRC Profiler) in clinical practice (CRC Stratifier) as a primary tool for CRC screening.

Data Management Plan

With Microb-AI-net and the linked CRC Profiler apps controllable through the CRC Stratifier software, we will democratise research data utilisation and AI-enhanced stratification model application and (by design and architecture) obey FAIRification principles. By being compatible with AI and by depositing the Microb-AIs in its app store, we also contribute significantly to open research and to sustainability and continuously updated cyber security. Digital Object Identifiers will ensure persistent identifiability of trained classification models and (also intermediate) evaluation/analysis pipeline results. We will follow the community-driven AIMe minimal reporting standard for AI in medicine.

By month 6 of the project, we will have created a joint data management plan (DMP) with DMPOnline.


The ultimate exploitation goal of Microb-AI-ome is to reach a point of self-sufficiency beyond project runtime. In an optimal scenario, we might be able to inspire a novel market with European players at the forefront by turning the GDPR-based privacy requirements into a commercially exploitable advantage.

We will seek constructive feedback at every stage, leading to informed decision-making. At the beginning of the project we will identify all the relevant stakeholder groups, shakers and movers, and end-users to understand their needs and expectations.

For each stakeholder group, the appropriate means of communication (e.g. social media, direct communication, videos, etc.) will be selected in order to apply the most powerful strategies to convey the key messages. A stakeholder matrix will list all the target groups with descriptions on the engagement and analysis strategies to investigate the concerns and evidence needs.

Ethics Framework

Ethics and Data Protection are at the centre of Microb-AI-ome and the prime rationale behind its federated architecture. While this architecture enables Microb-AI-ome to overcome major data protection and ethical roadblocks, key activities of the project still involve the processing of special categories of personal data such as data concerning health, genetic data and possibly data revealing ethnic origin as well as the development and application of machine learning methods on such data.

Therefore, the highest scrutiny regarding privacy and security measures in the whole data processing life cycle is imperative. Of equal importance is the consideration of ethical aspects of these and further aspects of the involvement of humans, and patients in particular, as subjects of research.

We will Ethical and Human Rights Impact Assessment Framework will be elaborated in Microb-AI-ome.


Around mid-term, dissemination and exploitation actions will be prepared.

A Certification implementation plan for the use of CRC stratification tools, a Training plan for the clinical users of the CRC stratifier

as well as Microb-AI-ome PR videos for Stakeholders will be available.



Engagement with stakeholders will be crucial for this project to ensure implementation of project output, results, and recommendations. Workshops and webinars will be organised on specific topics to provide specific stakeholders with more in-depth information and allow for dedicated interaction and direct feedback.

Three workshops will be organised to disseminate key project results, to gather external experience feedback, and to account for potentially improved microbiome data mining tools.

Two workshops will be organised for clinicians and tumour boards to introduce the first CRC Stratifier prototype and gather first-hand feedback from potential users and, at the end of the project, to educate the clinicians and tumour board members regarding the final CRC Stratifier application in screening practice and cancer stratification.

The third workshop will introduce the CRC Stratifier to regulatory bodies in order to have it integrated with national & international guidelines.

Training Plan

In order to validate the CRC Stratifier for its ability to successfully stratify subjects by CRC status/risk, partners UCC and INRAE will develop a training program to enable clinicians to operate the federated database network clients (Microb-AI-Net) and the clinical CRC Stratifier software on site, and to provide feedback to the developers at GND and UHAM.

The clinicians will then apply the CRC Stratifier to subject data. Another part of the validation will revolve around comparing the developed federated models (Microb-AIs) to their centralised counterparts to ensure equivalent performance as well as studying how the federated models fare when dealing with imbalanced data distributions, in particular with heterogeneously distributed confounding factors.


CRC Profiler Apps

We will develop and implement a collection of federated CRC Profiler apps covering all steps of a patient stratification workflow, from normalisation and batch-effect correction to AI model learning and application.

Further, the individual CRC Profiler apps will be chained into pipelines connected to the federated database network Microb-AI-Net and become available for clinicians as CRC Stratifier software.


A mid-way hackathon for medical data scientists in order to test our Microb-AI-Net APIs, to allow for community-driven future CRC Profiler app development and to account for upcoming novel sequencing technologies and/or computational microbiome profiling technologies.

We will make Microb-AI-ome known in international study groups and cancer consortia.

To this end, we will mainly rely on social media platforms.

The Hackathon will be announced in due time.

Validation Results

The performance of the CRC Stratifier will be evaluated on a validation cohort comprising >2000 samples obtained from individuals who underwent colonoscopy screening.

With the combination of the Microb-AI-net database network and federated Microb-AIs, we can ensure that no privacy-sensitive patient data will leave the legally safe harbours of the local/national CRC databases, while allowing for developing robust and validated data-driven AI models. We will develop, deploy, and assess the new CRC Profiler AI apps as certifiable medical diagnostics devices and integrate them into a corresponding software for clinical practice (CRC Stratifier).

Microb-AI-ome will effectively aid in earlier and more precise measurement of risk of developing CRC, in order to more efficiently stratify patients for colonoscopies (>20% increased specificity) and other follow-up surveillance methods.