Research
Computational Pathology
Main Contact: Omar S. M. El Nahhas, MSc
Since 2018, our lab has mapped cancer biomarkers to phenotypes observed in histopathology images of solid tumors using Deep Learning. We developed deep learning methods to predict genetic alterations, clinical outcomes, and treatment responses directly from routinely available histology slides. Our open-source methods have been validated on tens of thousands of patients across a broad range of solid tumors, with emphasis on, but not limited to, colorectal, gastric, liver, and breast cancer. Our mission is to extend these methods by incorporating concepts from the rapidly evolving field of AI, such as foundation models and explainability methods, to contribute to new insights in cancer biology and democratize access to predictive and prognostic biomarkers.
Our computational pathology core team focuses on:
- Biomarker discovery: Advancing state-of-the-art deep learning methods to enable the development of novel predictive and prognostic biomarkers by effectively learning complex spatial patterns in histology data.
- Explainability and biological interpretation: Developing methods to shed light on the black-box decision-making of deep learning models using associative, generative, and counterfactual concepts on histology data.
- Large-scale validation of genotype-phenotype mapping in human cancer: Collecting histology data from clinical trials and real-world cohorts to evaluate the robustness of models in real clinical scenarios.
Deep Learning in Radiology
Main Contact: Dr. Marta Ligero Hernandez
We use deep learning to obtain clinically relevant representations from routine radiology images. Our focus is on uncovering subtle patterns in images through deep learning that enable a better understanding of the disease and forecast clinical outcomes. In contrast to classical handcrafted radiomics methods, we aim to develop end-to-end deep learning methods that do not rely on voxel-level annotations, such as weakly-supervised and self-supervised learned models.Our radiology projects are focused on:
- Foundation models for Radiology: Applying self-supervised models to heterogeneous, large patient cohorts to capture representative features of patients’ diseases.
- Radiology-based prognostic and predictive biomarkers: Developing radiology-based deep learning models for prognosis and response prediction to support established clinical decision aids, such as risk-scoring systems.
- Multimodal radiology-based methods: Integrating multiple radiology imaging acquisitions and modalities to improve diagnosis and disease characterization.
Natural Language Processing
Main Contact: Dr. Isabella Wiest
A large proportion of clinical data is encoded as natural language in text format. We use and develop large language models to extract information from this unstructured data and generate text based on specific input data, e.g., medical guidelines.Our language-based projects include:
- Information extraction: Building and evaluating local, privacy-preserving tools to transform unstructured data into structured data, thereby allowing us to automate processes such as parsing large textual datasets for scientific use or for practical uses such as quality control in the hospital.
- Text generation with augmented knowledge: Building and evaluating large language models with additional knowledge through in-context learning and retrieval augmented generation. We develop deep learning-based methods to add context and up-to-date quality-controlled information into scientific large language models, which can then be used as scientific or clinical decision aids in multiple clinical scenarios of cancer medicine and beyond.
- Bias evaluation: Developing frameworks to systematically evaluate LLMs to understand and quantify bias and hallucination and develop strategies to mitigate risks.
Deep Learning for Genomics
Main Contact: Michaela Unger, MSc
Cancer genomes can be analyzed with next-generation sequencing techniques. Subsequently, traditional bioinformatics methods are used to identify clinical targets specific for individual patients. In our work, we aim to develop deep learning methods to capture high-level properties of cancer genomes and identify new patterns of DNA alterations that elude detection by classical handcrafted methods.
Our deep learning genomics projects include:
- Genomic biomarkers: Utilizing deep learning to predict clinical endpoints from sequencing data, comparing our methods to established pipelines. Through this, we aim to identify molecular biomarkers with prognostic and predictive value, advancing personalized treatment strategies.
- DNA Patterns and Disease: Identifying subtle DNA alteration patterns linked to clinical outcomes to enhance our understanding of the disease’s biology. For example, identifying subtypes of patients with certain DNA damage signatures accumulated through understandable biological processes such as homologous repair deficiency.
- Foundational Cancer Genome Models: Creating foundational cancer genome models for broad applications, establishing meaningful representations of cancer genomes, unbiased by existing human knowledge.
Multimodal Deep Learning
Main Contact: TBC
With the advent of multimodal transformer models and other deep learning approaches, we now have the capabilities to develop models that can parse different types of information simultaneously, such as images plus genomic data or images plus text. We are developing multimodal systems to solve previously unsolvable questions in cancer research and oncology, such as better outcome predictions using the synergy between different data types or aiming to understand disease processes that are only partly reflected in a certain data type.Our multimodal AI focus areas include:
- Explainability through multimodality: Systematically using multimodal deep learning to uncover features in human tumors that only make sense given additional information from orthogonal information sources.
- Multimodal decision aids for oncologists: Developing large language model-centered tools that enable oncologists to automatically access multiple types of information and integrate them into a single prediction.
Swarm Learning
Main Contact: Dr. Oliver Lester Saldanha
The vast majority of medical data is stored decentrally, and typically it has to be centralized for training deep learning systems. We use decentralized deep learning techniques such as swarm learning, by which the information remains local at a given institution and does not have to be shared in order to assemble multicentric datasets. We specifically build networks of partners all across the world to enable the training of deep learning models on multicentric data without exchanging any raw data.
We are currently working on numerous projects that use Swarm Learning in a clinical context. The data used ranges from histopathology whole slide images, CT and MRI images, surgical video sequences to single-cell analysis. These are funded through multiple research consortia, including ODELIA, SWAG and DECADE.