Skip to content

Commit ad4b38a

Browse files
fedorovgitbook-bot
authored andcommitted
GITBOOK-408: change request with no subject merged in GitBook
1 parent c40e319 commit ad4b38a

File tree

6 files changed

+59
-35
lines changed

6 files changed

+59
-35
lines changed

.gitbook/assets/image (52).png

15.6 KB
Loading

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ layout:
2020
**Would you rather discuss your questions in an meeting with an expert from the IDC team? Book a 1-on-1 support session here:** [**https://tinyurl.com/idc-help-request**](https://tinyurl.com/idc-help-request)
2121
{% endhint %}
2222

23-
[**NCI Imaging Data Commons** **(IDC)**](https://imaging.datacommons.cancer.gov) is a cloud-based environment containing publicly available cancer imaging data co-located with analysis and exploration tools and resources. IDC is a node within the broader NCI [Cancer Research Data Commons (CRDC)](https://datacommons.cancer.gov/) infrastructure that provides secure access to a large, comprehensive, and expanding collection of cancer research data. 
23+
[**NCI Imaging Data Commons** **(IDC)**](https://imaging.datacommons.cancer.gov) is a cloud-based environment containing publicly available cancer imaging data co-located with analysis and exploration tools. IDC is a node within the broader NCI [Cancer Research Data Commons (CRDC)](https://datacommons.cancer.gov/) infrastructure that provides secure access to a large, comprehensive, and expanding collection of cancer research data. 
2424

2525
<figure><img src=".gitbook/assets/image.png" alt=""><figcaption><p>IDC data release v20 summary; see live dashboard <a href="https://lookerstudio.google.com/reporting/04cf5976-4ea0-4fee-a749-8bfd162f2e87/page/p_s7mk6eybqc">here</a>.</p></figcaption></figure>
2626

@@ -38,9 +38,9 @@ IDC is as much about data as it is about what you can do with the data! We maint
3838

3939
* **exploration**: start with the [IDC Portal](https://portal.imaging.datacommons.cancer.gov/explore/) to get an idea of the data available
4040
* **visualization**: examine images and image-derived annotations and analysis results from the convenience of your browser using integrated OHIF, VolView and Slim open source viewers
41-
* **programmatic access**: use [`idc-index` python package](https://github.com/ImagingDataCommons/idc-index) we provide to perform search, download and other operations programmatically
41+
* **programmatic access**: use [`idc-index` python package](https://github.com/ImagingDataCommons/idc-index) to perform search, download and other operations programmatically
4242
* **cohort building**: use rich and extensive metadata to build subsets of data programmatically using `idc-index` or BigQuery SQL
43-
* **download**: use your favorite S3 API client or `idc-index`, to efficiently fetch any of the IDC files from our public buckets
43+
* **download**: use your favorite S3 API client or `idc-index` to efficiently fetch any of the IDC files from our public buckets
4444
* **analysis**: conveniently access IDC files and metadata from the tools that are cloud-native, such as Google Colab or Looker; fetch IDC data directly into 3D Slicer using [SlicerIDCBrowser extension](https://github.com/ImagingDataCommons/SlicerIDCBrowser/)
4545

4646
{% hint style="info" %}

SUMMARY.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -21,13 +21,6 @@
2121
* [Requesting GCP cloud credits](introduction/requesting-gcp-cloud-credits.md)
2222
* [Requesting AWS cloud credits](introduction/requesting-aws-cloud-credits.md)
2323

24-
## Tutorials
25-
26-
* [Portal tutorial](tutorials/portal-tutorial.md)
27-
* [Python notebook tutorials](https://github.com/ImagingDataCommons/IDC-Tutorials)
28-
* [Slide microscopy](tutorials/slide-microscopy/README.md)
29-
* [Using QuPath for visualization](tutorials/slide-microscopy/qpath-for-sm-visualization.md)
30-
3124
## Data
3225

3326
* [Introduction](data/introduction.md)
@@ -47,6 +40,13 @@
4740
* [Data release notes](data/data-release-notes.md)
4841
* [Data known issues](data/data-known-issues.md)
4942

43+
## Tutorials
44+
45+
* [Portal tutorial](tutorials/portal-tutorial.md)
46+
* [Python notebook tutorials](https://github.com/ImagingDataCommons/IDC-Tutorials)
47+
* [Slide microscopy](tutorials/slide-microscopy/README.md)
48+
* [Using QuPath for visualization](tutorials/slide-microscopy/qpath-for-sm-visualization.md)
49+
5050
## DICOM
5151

5252
* [Introduction to DICOM](dicom/introduction.md)

core-functions-of-idc.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,9 @@
22

33
## Easy and efficient access to public cancer imaging data
44

5-
We ingest and distribute datasets from variety of sources and contributors, primarily focusing on large data collection initiatives sponsored by US National Cancer Institute. At this time, we do not have resources to prioritize receipt of the imaging data from individual PIs (but we are encouraging submissions of annotations for existing IDC data!). Nevertheless, if you feel you might have a compelling dataset, please email us at [[email protected]](https://mail.google.com/mail/?view=cm\&fs=1\&tf=1\&[email protected]).
5+
We ingest and distribute datasets from variety of sources and contributors, primarily focusing on large data collection initiatives sponsored by US National Cancer Institute.&#x20;
6+
7+
At this time, we do not have resources to prioritize receipt of the imaging data from individual PIs (but we are encouraging submissions of annotations/analysis results for existing IDC data!). Nevertheless, if you feel you might have a compelling dataset, please email us at [[email protected]](https://mail.google.com/mail/?view=cm\&fs=1\&tf=1\&[email protected]).
68

79
On ingestion, we harmonize images and image-derived data into DICOM format for interoperability, whenever data is represented in a non-DICOM format.
810

data/introduction.md

Lines changed: 35 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,46 @@
11
# Introduction
22

3-
{% hint style="info" %}
4-
Check out the [IDC Getting started tutorial](https://github.com/ImagingDataCommons/IDC-Examples/tree/master/notebooks/getting\_started) for a quick introduction into data organization and main features of our repository!
3+
## Data sources
54

6-
IDC data is replicated as [a public dataset in the Google Marketplace](https://console.cloud.google.com/marketplace/product/bigquery-public-data/nci-idc-data). You can see the summary dashboard of the dataset [here](https://datastudio.google.com/u/0/reporting/04cf5976-4ea0-4fee-a749-8bfd162f2e87).
7-
{% endhint %}
5+
Most of the data in IDC is received from the data collection initiatives/projects supported by US National Cancer Institute. Whenever source images or image-derived data is not in the DICOM format, it is harmonized into DICOM as part of the ingestion.&#x20;
6+
7+
IDC sources of data include:
8+
9+
* [The Cancer Imaging Archive (TCIA) (ongoing)](https://www.cancerimagingarchive.net/)
10+
* all DICOM files from the public collections are mirrored in IDC
11+
* a subset of digital pathology collections and analysis results harmonized from vendor-specific representation (as available from TCIA) into DICOM Slide Microscopy (SM) format&#x20;
12+
* [Childhood Cancer Data Initiative (CCDI) (ongoing)](https://www.cancer.gov/research/areas/childhood/childhood-cancer-data-initiative)
13+
* digital pathology slides harmonized into DICOM SM
14+
* [Genomic Data Commons (GDC)](https://portal.gdc.cancer.gov/)
15+
* The Cancer Genome Atlas (TCGA) slides harmonized into DICOM SM
16+
* [Human Tumor Atlas Network (HTAN)](https://humantumoratlas.org/)
17+
* release 1 of the HTAN data harmonized into DICOM SM
18+
* [National Library of Medicine Visible Human Project](https://www.nlm.nih.gov/research/visible/visible_human.html)
19+
* v1 of the Visible Human images harmonized into DICOM MR/CT/XC
20+
* [Genotype-Tissue Expression Project (GTex)](https://commonfund.nih.gov/GTEx)
21+
* digital pathology slides harmonized into DICOM SM
22+
23+
## Data provenance
824

9-
Currently, IDC is hosting data from the following data repositories:
25+
Whenever IDC replicates data from a publicly available source, we include the reference to the origin:
1026

11-
* publicly available radiology collections and analysis results collections (in DICOM format) from The Cancer Imaging Archive (TCIA)
12-
* whole slide pathology images (in [DICOM-TIFF format](../dicom/dicom-tiff-dual-personality-files.md)) collected by
13-
* The Cancer Genome Atlas (TCGA)
14-
* [Clinical Proteomic Tumor Analysis Consortium (CPTAC)](https://proteomics.cancer.gov/programs/cptac)
15-
* [National Lung Screening Trial (NLST)](https://www.cancer.gov/types/lung/research/nlst)
16-
* fluorescence images (in [DICOM-TIFF format](../dicom/dicom-tiff-dual-personality-files.md)) collected by the [Human Tumor Atlas Network (HTAN)](https://humantumoratlas.org/)
27+
* from the IDC Portal Explore page, click on the "i" icon next to the collection in the collections list&#x20;
28+
29+
<figure><img src="../.gitbook/assets/image (52).png" alt=""><figcaption></figcaption></figure>
30+
31+
* `source_doi` metadata column contains Digital Object Identifier (DOI) at the granularity of the individual files and is available both via [python `idc-index` package](https://github.com/ImagingDataCommons/idc-index) and BigQuery interfaces
1732

1833
{% hint style="info" %}
19-
If you would like us to prioritize an existing public collection, which is not currently included in the IDC offering, please start the discussion on our [forum](https://discourse.canceridc.dev/c/data/8)!
34+
Whenever source data is harmonized into DICOM, the DOI will correspond to a Zenodo entry for the result of harmonization, which in turn will reference the location where data can be accessed in the native format (if available). As an example, IDC NLM-Visible-Human-Project collection refers to this DOI that describes the dataset resulting from the original dataset harmonized into DICOM [https://doi.org/10.5281/zenodo.12690049](https://doi.org/10.5281/zenodo.12690049), which in turn references the [NLM Visible Human project page](https://www.nlm.nih.gov/research/visible/visible_human.html) containing information on accessing the original files collected by the project.
2035
{% endhint %}
2136

2237
Check out [Data release notes](data-release-notes.md) for information about the collections added in the individual IDC data releases.
2338

24-
In the following pages we discuss how to access datasets hosted by IDC and their organization.
39+
## Data ingestion process
40+
41+
Simplified workflow for IDC data ingestion is summarized in the following diagram.
42+
43+
{% embed url="https://docs.google.com/presentation/d/1UVpNVyVy3xIYLDnm4rtgAUmSu-uKQo5krekI9DSMT8o/edit?slide=id.g2fbbb94d529_0_76#slide=id.g2fbbb94d529_0_76" %}
44+
IDC data ingestion workflow
45+
{% endembed %}
46+

frequently-asked-questions.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -39,29 +39,29 @@ Please cite the latest paper from the IDC team. Please also make sure you acknow
3939

4040
IDC and TCIA are partners in providing FAIR data for cancer imaging researchers. While some of the functions between the two resources are similar, there are also key differences. The table below provides a summary of similarities and differences.
4141

42-
| Function | **IDC** | TCIA |
43-
| ----------------------------------------------------------------------- | ------------------------------------------------ | ---- |
44-
| De-identification | no, IDC can only host data already de-identified | yes |
45-
| Cloud-based data co-located with compute resources | yes | no |
46-
| Conversion of pathology images and image-derived data into DICOM format | yes | no |
47-
| Private data collections | no | yes |
48-
| Public data collections | yes | yes |
49-
| Version control of the data | [yes](data/data-versioning.md) | no |
42+
| Function | **IDC** | TCIA |
43+
| ----------------------------------------------------------------------- | ------------------------------------------------ | ------- |
44+
| De-identification | no, IDC can only host data already de-identified | yes |
45+
| Cloud-based data co-located with compute resources | yes | no |
46+
| Conversion of pathology images and image-derived data into DICOM format | yes | no |
47+
| Private data collections | no | yes |
48+
| Public data collections | yes | yes |
49+
| Version control of the data | [yes](data/data-versioning.md) | partial |
5050

5151
## Where do I learn more about other components of CRDC?
5252

5353
The main website for the Cancer Research Data Commons (CRDC) is [https://datacommons.cancer.gov/](https://datacommons.cancer.gov)
5454

5555
## What about non-imaging data that accompanies IDC collections?
5656

57-
Clinical data that was shared by the submitters is available for a number of imaging collections in IDC. Please see [this tutorial](https://github.com/ImagingDataCommons/IDC-Tutorials/blob/master/notebooks/advanced\_topics/clinical\_data\_intro.ipynb) on how to search that data and how to link clinical data with imaging metadata!
57+
Clinical data that was shared by the submitters is available for a number of imaging collections in IDC. Please see [this tutorial](https://github.com/ImagingDataCommons/IDC-Tutorials/blob/master/notebooks/advanced_topics/clinical_data_intro.ipynb) on how to search that data and how to link clinical data with imaging metadata!
5858

5959
Many of the imaging collections are also accompanied by the genomics or proteomics data. CRDC [Cancer Data Aggregator (CDA)](https://cda.readthedocs.io/en/latest/) provides the API to locate such related datasets.
6060

6161
## I want to search IDC content using an attribute not available in the portal
6262

6363
IDC Portal gives you access to just a small subset of the metadata accompanying IDC images. If you want to learn more about what is available, you have several options:
6464

65-
* [this notebook](https://github.com/ImagingDataCommons/IDC-Tutorials/blob/master/notebooks/getting\_started/part2\_searching\_basics.ipynb) from our Getting Started tutorial series explains how to use [`idc-index`](https://github.com/ImagingDataCommons/idc-index) - a python package that aims to simplify access to IDC data
66-
* [this more advanced notebook](https://github.com/ImagingDataCommons/IDC-Tutorials/blob/master/notebooks/getting\_started/part3\_exploring\_cohorts.ipynb) will help you get started with searching IDC metadata in BigQuery, which gives you access to all of the DICOM metadata extracted from IDC-hosted files
65+
* [this notebook](https://github.com/ImagingDataCommons/IDC-Tutorials/blob/master/notebooks/getting_started/part2_searching_basics.ipynb) from our Getting Started tutorial series explains how to use [`idc-index`](https://github.com/ImagingDataCommons/idc-index) - a python package that aims to simplify access to IDC data
66+
* [this more advanced notebook](https://github.com/ImagingDataCommons/IDC-Tutorials/blob/master/notebooks/getting_started/part3_exploring_cohorts.ipynb) will help you get started with searching IDC metadata in BigQuery, which gives you access to all of the DICOM metadata extracted from IDC-hosted files
6767
* if you are not comfortable writing queries or coding in pyhon, you can use [this DataStudio dashboard](https://datastudio.google.com/reporting/ab96379c-e134-414f-8996-188e678f1b70/page/KHtxB) to search using some of the attributes that are not available through the portal. You can also [extend this dashboard](cookbook/data-studio/cohort-dashboard.md) to include additional attributes.

0 commit comments

Comments
 (0)