Datasets
We strive to make all of our tools, protocols and datasets available to the community.
Cancer histology images
Human solid tumors are made up of many different tissue types. Image analysis pipelines often start with a classification of these regions (such as tumor, stroma, necrosis, etc.). These are labeled, quality-controlled sets of images that can be used to train tissue classifiers:
- 5000 labeled images of colorectal cancer tissue in # classes (from this paper): download
- 100,000 labeled images of colorectal cancer tissue in # classes (from this paper): download
- 1,000,000 images of colorectal cancer tissue in # classes: download
- ˜12k images for tumor detection in colorectal and gastric cancer (512x512 px at 0.5 µm/px, from this paper): download
After detecting tissue of interest in whole slide images, deep learning classifiers can extract clinically meaningful information from the images. These datasets can be used to train these classifiers:
-
˜400k image patches of microsatellite instable (MSI) vs. microsatellite stable (MSS) image patches of colorectal and gastric cancer (from this paper): download, derived from the TCGA data set at http://portal.gdc.cancer.gov.
-
image patches of all colorectal cancer (CRC) whole slide images from the TCGA database, conveniently cut into tiles of 512x512 px for subsequent deep learning analysis. Only the manually annotated tumor region was processed. Patient names are preserved in the dataset: https://zenodo.org/record/3784345. Corresponding genetic information are available at https://cbioportal.org. Original data credit: http://portal.gdc.cancer.gov.
Generated images
Deep generative adversarial networks can generate realistic histology images:
- 2500 image tiles of generated colorectal cancer tissue, 256x256 px
Protocols
- The “Aachen Protocol” for data preprocessing in deep learning histology image analysis: https://zenodo.org/record/3694994
Source code
- Deep learning-based prediction of molecular alterations pan-cancer: https://github.com/jnkather/DeepHistology
- Deep learning for detecting virus presence in cancer images: https://github.com/jnkather/VirusFromHE
- Deep learning for detecting MSI in gastrointestinal cancer, original codes: https://github.com/jnkather/MSIfromHE