{"metadata":{"image":[],"title":"","description":""},"api":{"url":"","auth":"required","settings":"","results":{"codes":[]},"params":[]},"next":{"description":"","pages":[]},"title":"Import CDS data","type":"basic","slug":"import-cds-data","excerpt":"","body":"[block:callout]\n{\n  \"type\": \"info\",\n  \"title\": \"\",\n  \"body\": \"Latest update of CDS data on the CGC brings one new CDS dataset: \\n* [PetaGene](https://www.petagene.com) - Practice Dataset, Decreases The Size Of Genomic Data\\n\\nSee the [details and history of CDS data updates](page:cds-data) on the CGC.\"\n}\n[/block]\n\n[block:callout]\n{\n  \"type\": \"info\",\n  \"body\": \"If you are working with the PLCO dataset, [download the PLCO manifest file](https://cgc-public.s3.amazonaws.com/manifest/PLCO-20210624.txt).\"\n}\n[/block]\n## About the CDS\n\nThe [Cancer Data Service (CDS)](https://datacommons.cancer.gov/repository/cancer-data-service) is a data repository under the NCI's Cancer Research Data Commons (CRDC) infrastructure for storing cancer research data generated by NCI funded programs. Its data is stored in the [Database for Genotypes and Phenotypes (dbGaP)](https://www.ncbi.nlm.nih.gov/gap/) database provided by National Center for Biotechnology Information (NCBI). CDS hosts datasets that contain controlled access data, with access permissions being controlled by dbGaP.\n\nCDS data can be imported to the CGC in the following ways:\n\n* [Using the integrated Cancer Data Service Explorer on the CGC](#section-import-cds-data-using-the-cancer-data-service-explorer)\n* [Using a manifest file generated through SRA Run Selector](#section-import-cds-data-using-a-manifest-file)\n[block:callout]\n{\n  \"type\": \"warning\",\n  \"title\": \"\",\n  \"body\": \"If you are trying to import data classified as *controlled* from CDS to the CGC, you need to be logged in to the CGC using an [eRA Commons account](https://docs.cancergenomicscloud.org/docs/sign-up-for-the-cgc#section-register-via-an-external-account) with access to controlled data.\"\n}\n[/block]\nPlease note that the Seven Bridges team strives to keep the data available for import on the CGC aligned with CDS data updates. However, CDS updates are not instantly available for import on the CGC. Find out the [currently available release](doc:cds-data#section-currently-available-release-on-the-cgc) of CDS data on the CGC and see the complete [history of updates](doc:cds-data#section-update-history).\n\n## Import CDS data using the Cancer Data Service Explorer \n\nCancer Data Service Explorer is an integrated dataset file explorer on the CGC that allows you to filter and select the exact data that you want to analyze further, and then perform a seamless import into a project on the CGC using the steps described below:\n1. While on the CGC's main dashboard, on the main menu bar click **Data** > **Cancer Data Service Explorer**.\n2. Click **Explore files**. File explorer opens.\n3. Use the search boxes and filters in the left pane to select the data that you want to analyze further.\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/3397fdd-cds-fe-1.png\",\n        \"cds-fe-1.png\",\n        1131,\n        644,\n        \"#333\"\n      ]\n    }\n  ]\n}\n[/block]\n4. Once you have selected your set of data, click **Copy to project** in the top-right corner. Copy dialog opens.\n5. In the **Select project** dropdown select a project that you want to export the files to. If you want to import data to a new project, click **Create new project**. If the import contains controlled data, such data can only be imported in a [controlled project](doc:projects-on-the-cgc#section-controlled-data-projects).\n6. (Optional) Once you have selected a project, enter [file tags](https://docs.cancergenomicscloud.org/docs/tag-your-files) in the **Add tags** field.\n7. In the **Resolve naming conflicts** dropdown select the action to be taken if a file with the same name already exists in the target project.\n8. Click **Copy**. Your files will be exported to the selected project.\n\n## Import CDS data using a manifest file \n\nThe process of importing CDS data to the Cancer Genomics Cloud (CGC) using a manifest file generated through [SRA Run Selector](https://www.ncbi.nlm.nih.gov/Traces/study/) consists of the following two stages:\n\n* Searching for data and downloading a manifest file from [SRA Run Selector](https://www.ncbi.nlm.nih.gov/Traces/study/).\n* Importing files to the CGC based on the downloaded manifest file.\n\n### Download CDS manifest files from SRA\n\nManifest files contain information about the data you want to import in the second stage of this process.\n\nTo download a manifest file:\n1. Open the [SRA Run Selector](https://www.ncbi.nlm.nih.gov/Traces/study/).\n2. In the **Accession** field enter the accession of your choice and click **Search**. The list of search results opens.\n3. (Optional) In the **Filters List** section on the left, select the criteria to narrow down the result list.\n\nNow you can proceed to do the following:\n\n* Download the manifest file _for the entire set of data_ returned for the accession:\n\n1. In the **Select** section, click **Metadata** in the **Total** table row. This downloads the manifest file for _all_ data for the accession.\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/e1a9412-cds-integration-1.png\",\n        \"cds-integration-1.png\",\n        1187,\n        700,\n        \"#333\"\n      ]\n    }\n  ]\n}\n[/block]\n* Select specific items from the list and download manifest file _for the selected items only_:\n\n1. Scroll down to see the list of items that match the search and filtering criteria.\n2. Check the boxes next to items you want to select.\n3. In the **Select** section, click **Metadata** in the **Selected** table row. This downloads the manifest file _for the selected data only_.\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/bb4fd5b-cds-integration-2.png\",\n        \"cds-integration-2.png\",\n        1190,\n        700,\n        \"#333\"\n      ]\n    }\n  ]\n}\n[/block]\n### Import CDS data to the CGC\n\nWhen you have downloaded a manifest file from [SRA](https://www.ncbi.nlm.nih.gov/Traces/study/), follow the steps below to import the data to the CGC:\n1. Navigate to a project on the CGC or [create](doc:create-a-project) one. If the manifest file includes references to controlled data, such data can only be imported in a [controlled project](doc:projects-on-the-cgc#section-controlled-data-projects).\n2. Once in the project, click the **Files** tab.\n3. Click **Add files** > **Import from a manifest file**.\n4. In the **Import files from** dropdown, select **Cancer Data Service (CDS)**.\n5. Click **Browse files** and select the manifest file from your local machine, or drag and drop the file onto the marked area. Alternatively, if you have already [uploaded](doc:upload-to-the-cgc) your generated manifest file to a project, click **Select manifest from project** and select the file.\n6. (Optional) In the **Add tags** field add the keywords (tags) that describe the imported items.\n7. **Resolve naming conflicts** - Select the action to be taken if a naming conflict occurs. Available actions are **Skip** (default option) and **Auto Rename**. Read more about [naming conflicts resolution](doc:upload-from-an-ftp-server#section-resolving-naming-conflicts).\n9. Click **Import**. The file import process starts and you are taken to the **Files** tab.","updates":[],"order":9,"isReference":false,"hidden":false,"sync_unique":"","link_url":"","link_external":false,"_id":"5fdcdd1ea0f9e2003f6a1cb4","createdAt":"2020-12-18T16:47:26.179Z","user":"5767bc73bb15f40e00a28777","category":{"sync":{"isSync":false,"url":""},"pages":["56268a69b1c2630d00b112b0","56268a85c2781f0d00364bbc","56268a92c2781f0d00364bbe","5637e0a0cfaa870d00cdeb6a","5637e0c3fbe1c50d008cb06a","5637e164f7e3990d007b2c41"],"title":"BRING DATA TO THE CGC","slug":"bring-your-private-data","order":9,"from_sync":false,"reference":false,"_id":"55faf932a8a7770d00c2c0bf","version":"55faf11ba62ba1170021a9aa","__v":6,"createdAt":"2015-09-17T17:32:34.286Z","project":"55faf11ba62ba1170021a9a7"},"version":{"version":"1.0","version_clean":"1.0.0","codename":"","is_stable":true,"is_beta":true,"is_hidden":false,"is_deprecated":false,"categories":["55faf11ca62ba1170021a9ab","55faf8f4d0e22017005b8272","55faf91aa62ba1170021a9b5","55faf929a8a7770d00c2c0bd","55faf932a8a7770d00c2c0bf","55faf94b17b9d00d00969f47","55faf958d0e22017005b8274","55faf95fa8a7770d00c2c0c0","55faf96917b9d00d00969f48","55faf970a8a7770d00c2c0c1","55faf98c825d5f19001fa3a6","55faf99aa62ba1170021a9b8","55faf99fa62ba1170021a9b9","55faf9aa17b9d00d00969f49","55faf9b6a8a7770d00c2c0c3","55faf9bda62ba1170021a9ba","5604570090ee490d00440551","5637e8b2fbe1c50d008cb078","5649bb624fa1460d00780add","5671974d1b6b730d008b4823","5671979d60c8e70d006c9760","568e8eef70ca1f0d0035808e","56d0a2081ecc471500f1795e","56d4a0adde40c70b00823ea3","56d96b03dd90610b00270849","56fbb83d8f21c817002af880","573c811bee2b3b2200422be1","576bc92afb62dd20001cda85","5771811e27a5c20e00030dcd","5785191af3a10c0e009b75b0","57bdf84d5d48411900cd8dc0","57ff5c5dc135231700aed806","5804caf792398f0f00e77521","58458b4fba4f1c0f009692bb","586d3c287c6b5b2300c05055","58ef66d88646742f009a0216","58f5d52d7891630f00fe4e77","59a555bccdbd85001bfb1442","5a2a81f688574d001e9934f5","5b080c8d7833b20003ddbb6f","5c222bed4bc358002f21459a","5c22412594a2a5005cc9e919","5c41ae1c33592700190a291e","5c8a525e2ba7b2003f9b153c","5cbf14d58c79c700ef2b502e","5db6f03a6e187c006f667fa4","5f894c7d3b0894006477ca01","6176d5bf8f59c6001038c2f7"],"_id":"55faf11ba62ba1170021a9aa","releaseDate":"2015-09-17T16:58:03.490Z","createdAt":"2015-09-17T16:58:03.490Z","project":"55faf11ba62ba1170021a9a7","__v":48},"project":"55faf11ba62ba1170021a9a7","__v":0,"parentDoc":null}
[block:callout] { "type": "info", "title": "", "body": "Latest update of CDS data on the CGC brings one new CDS dataset: \n* [PetaGene](https://www.petagene.com) - Practice Dataset, Decreases The Size Of Genomic Data\n\nSee the [details and history of CDS data updates](page:cds-data) on the CGC." } [/block] [block:callout] { "type": "info", "body": "If you are working with the PLCO dataset, [download the PLCO manifest file](https://cgc-public.s3.amazonaws.com/manifest/PLCO-20210624.txt)." } [/block] ## About the CDS The [Cancer Data Service (CDS)](https://datacommons.cancer.gov/repository/cancer-data-service) is a data repository under the NCI's Cancer Research Data Commons (CRDC) infrastructure for storing cancer research data generated by NCI funded programs. Its data is stored in the [Database for Genotypes and Phenotypes (dbGaP)](https://www.ncbi.nlm.nih.gov/gap/) database provided by National Center for Biotechnology Information (NCBI). CDS hosts datasets that contain controlled access data, with access permissions being controlled by dbGaP. CDS data can be imported to the CGC in the following ways: * [Using the integrated Cancer Data Service Explorer on the CGC](#section-import-cds-data-using-the-cancer-data-service-explorer) * [Using a manifest file generated through SRA Run Selector](#section-import-cds-data-using-a-manifest-file) [block:callout] { "type": "warning", "title": "", "body": "If you are trying to import data classified as *controlled* from CDS to the CGC, you need to be logged in to the CGC using an [eRA Commons account](https://docs.cancergenomicscloud.org/docs/sign-up-for-the-cgc#section-register-via-an-external-account) with access to controlled data." } [/block] Please note that the Seven Bridges team strives to keep the data available for import on the CGC aligned with CDS data updates. However, CDS updates are not instantly available for import on the CGC. Find out the [currently available release](doc:cds-data#section-currently-available-release-on-the-cgc) of CDS data on the CGC and see the complete [history of updates](doc:cds-data#section-update-history). ## Import CDS data using the Cancer Data Service Explorer  Cancer Data Service Explorer is an integrated dataset file explorer on the CGC that allows you to filter and select the exact data that you want to analyze further, and then perform a seamless import into a project on the CGC using the steps described below: 1. While on the CGC's main dashboard, on the main menu bar click **Data** > **Cancer Data Service Explorer**. 2. Click **Explore files**. File explorer opens. 3. Use the search boxes and filters in the left pane to select the data that you want to analyze further. [block:image] { "images": [ { "image": [ "https://files.readme.io/3397fdd-cds-fe-1.png", "cds-fe-1.png", 1131, 644, "#333" ] } ] } [/block] 4. Once you have selected your set of data, click **Copy to project** in the top-right corner. Copy dialog opens. 5. In the **Select project** dropdown select a project that you want to export the files to. If you want to import data to a new project, click **Create new project**. If the import contains controlled data, such data can only be imported in a [controlled project](doc:projects-on-the-cgc#section-controlled-data-projects). 6. (Optional) Once you have selected a project, enter [file tags](https://docs.cancergenomicscloud.org/docs/tag-your-files) in the **Add tags** field. 7. In the **Resolve naming conflicts** dropdown select the action to be taken if a file with the same name already exists in the target project. 8. Click **Copy**. Your files will be exported to the selected project. ## Import CDS data using a manifest file  The process of importing CDS data to the Cancer Genomics Cloud (CGC) using a manifest file generated through [SRA Run Selector](https://www.ncbi.nlm.nih.gov/Traces/study/) consists of the following two stages: * Searching for data and downloading a manifest file from [SRA Run Selector](https://www.ncbi.nlm.nih.gov/Traces/study/). * Importing files to the CGC based on the downloaded manifest file. ### Download CDS manifest files from SRA Manifest files contain information about the data you want to import in the second stage of this process. To download a manifest file: 1. Open the [SRA Run Selector](https://www.ncbi.nlm.nih.gov/Traces/study/). 2. In the **Accession** field enter the accession of your choice and click **Search**. The list of search results opens. 3. (Optional) In the **Filters List** section on the left, select the criteria to narrow down the result list. Now you can proceed to do the following: * Download the manifest file _for the entire set of data_ returned for the accession: 1. In the **Select** section, click **Metadata** in the **Total** table row. This downloads the manifest file for _all_ data for the accession. [block:image] { "images": [ { "image": [ "https://files.readme.io/e1a9412-cds-integration-1.png", "cds-integration-1.png", 1187, 700, "#333" ] } ] } [/block] * Select specific items from the list and download manifest file _for the selected items only_: 1. Scroll down to see the list of items that match the search and filtering criteria. 2. Check the boxes next to items you want to select. 3. In the **Select** section, click **Metadata** in the **Selected** table row. This downloads the manifest file _for the selected data only_. [block:image] { "images": [ { "image": [ "https://files.readme.io/bb4fd5b-cds-integration-2.png", "cds-integration-2.png", 1190, 700, "#333" ] } ] } [/block] ### Import CDS data to the CGC When you have downloaded a manifest file from [SRA](https://www.ncbi.nlm.nih.gov/Traces/study/), follow the steps below to import the data to the CGC: 1. Navigate to a project on the CGC or [create](doc:create-a-project) one. If the manifest file includes references to controlled data, such data can only be imported in a [controlled project](doc:projects-on-the-cgc#section-controlled-data-projects). 2. Once in the project, click the **Files** tab. 3. Click **Add files** > **Import from a manifest file**. 4. In the **Import files from** dropdown, select **Cancer Data Service (CDS)**. 5. Click **Browse files** and select the manifest file from your local machine, or drag and drop the file onto the marked area. Alternatively, if you have already [uploaded](doc:upload-to-the-cgc) your generated manifest file to a project, click **Select manifest from project** and select the file. 6. (Optional) In the **Add tags** field add the keywords (tags) that describe the imported items. 7. **Resolve naming conflicts** - Select the action to be taken if a naming conflict occurs. Available actions are **Skip** (default option) and **Auto Rename**. Read more about [naming conflicts resolution](doc:upload-from-an-ftp-server#section-resolving-naming-conflicts). 9. Click **Import**. The file import process starts and you are taken to the **Files** tab.