{"_id":"58f5d573cf6b642300b13f74","category":{"_id":"58f5d52d7891630f00fe4e77","project":"55faf11ba62ba1170021a9a7","version":"55faf11ba62ba1170021a9aa","__v":0,"sync":{"url":"","isSync":false},"reference":false,"createdAt":"2017-04-18T08:58:21.978Z","from_sync":false,"order":34,"slug":"data-cruncher","title":"DATA CRUNCHER"},"version":{"_id":"55faf11ba62ba1170021a9aa","project":"55faf11ba62ba1170021a9a7","__v":40,"createdAt":"2015-09-17T16:58:03.490Z","releaseDate":"2015-09-17T16:58:03.490Z","categories":["55faf11ca62ba1170021a9ab","55faf8f4d0e22017005b8272","55faf91aa62ba1170021a9b5","55faf929a8a7770d00c2c0bd","55faf932a8a7770d00c2c0bf","55faf94b17b9d00d00969f47","55faf958d0e22017005b8274","55faf95fa8a7770d00c2c0c0","55faf96917b9d00d00969f48","55faf970a8a7770d00c2c0c1","55faf98c825d5f19001fa3a6","55faf99aa62ba1170021a9b8","55faf99fa62ba1170021a9b9","55faf9aa17b9d00d00969f49","55faf9b6a8a7770d00c2c0c3","55faf9bda62ba1170021a9ba","5604570090ee490d00440551","5637e8b2fbe1c50d008cb078","5649bb624fa1460d00780add","5671974d1b6b730d008b4823","5671979d60c8e70d006c9760","568e8eef70ca1f0d0035808e","56d0a2081ecc471500f1795e","56d4a0adde40c70b00823ea3","56d96b03dd90610b00270849","56fbb83d8f21c817002af880","573c811bee2b3b2200422be1","576bc92afb62dd20001cda85","5771811e27a5c20e00030dcd","5785191af3a10c0e009b75b0","57bdf84d5d48411900cd8dc0","57ff5c5dc135231700aed806","5804caf792398f0f00e77521","58458b4fba4f1c0f009692bb","586d3c287c6b5b2300c05055","58ef66d88646742f009a0216","58f5d52d7891630f00fe4e77","59a555bccdbd85001bfb1442","5a2a81f688574d001e9934f5","5b080c8d7833b20003ddbb6f"],"is_deprecated":false,"is_hidden":false,"is_beta":true,"is_stable":true,"codename":"","version_clean":"1.0.0","version":"1.0"},"project":"55faf11ba62ba1170021a9a7","githubsync":"","__v":0,"parentDoc":null,"user":"5767bc73bb15f40e00a28777","updates":[],"next":{"pages":[],"description":""},"createdAt":"2017-04-18T08:59:31.582Z","link_external":false,"link_url":"","sync_unique":"","hidden":false,"api":{"results":{"codes":[]},"settings":"","auth":"required","params":[],"url":""},"isReference":false,"order":3,"body":"## Overview\n\nData Cruncher allows you to enter and execute Python, R or Julia code to perform further analyses on your data on the CGC. This page will explain how you can access Data Cruncher from a project on the CGC, set up an analysis and execute code within the analysis. To be able to run the analysis, you need execute permissions in the project.\n\n### [ 1 ] Access Data Cruncher\n\n1. Open the desired project on the CGC.\nThis project should contain the data that you want to analyze further using Data Cruncher.\n2. From the project's dashboard, click the **Interactive Analysis** tab.\nThe list of available interactive analysis tools opens. \n3. On the **Data Cruncher** card click **Open**.\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/b427f41-cruncher_card.png\",\n        \"cruncher_card.png\",\n        293,\n        441,\n        \"#eeebec\"\n      ]\n    }\n  ]\n}\n[/block]\nThis takes you to the Data Cruncher home page. If you have previous analyses, they will be listed on this page.\n\n### [ 2 ] Create and set up your analysis\n1. In the top-right corner click **Create new analysis**.\nThe **Create new analysis wizard** is displayed.\n2. On the first screen, name your analysis in the **Analysis name** field.\n3. Click **Next**.\n4. Select the instance for the analysis.\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/1c223d6-cruncher_quickstart_1.png\",\n        \"cruncher_quickstart_1.png\",\n        560,\n        380,\n        \"#6393a4\"\n      ]\n    }\n  ]\n}\n[/block]\nThe **Instance type** list displays available instances along with their disk size, number of vCPUs and memory (shown in brackets). The default instance is **c3.2xlarge** that has **160 GB** of SSD storage, **8 vCPUs** and **15 GB** of RAM. \n\n<a name=\"instance-inactivity\" style=\"color: #474a54; text-decoration: none;\">**Suspend time**</a> is the period of analysis inactivity after which the instance is stopped automatically. Inactivity implies that:\n* No files have been modified or created under the **Files** tab (in the `/sbgenomics/workspace` directory if you are using the Terminal).\n* There are no running kernels.\n\nApart from stopping the instance, this also includes stopping the analysis and saving all files that meet the criteria for automatic saving or have been selected to be saved as project files. Files that do not meet the criteria and are not manually saved to the project will be lost. Minimum suspend time is 15 minutes.\n\n5. Click **Next**.\n6. Define the automatic saving criteria:\n * **Ignore the following file types** - Files that have the listed extensions will never be automatically saved when the analysis is stopped. If you need to specify multiple extensions, they are separated by a comma, e.g. `.zip, .log`.\n * **Ignore files larger than** - Files bigger than the specified size will not be automatically saved when the analysis is stopped.\n7. Click **Start the analysis**.\nThe CGC will start acquiring an adequate instance for your analysis, which may take a few minutes. Analysis initialization goes through the following stages:\n    * **Allocating the instance for your analysis** - Obtain an instance from the cloud infrastructure provider.\n    * **Preparing the allocated instance** - Load the required software onto the instance.\n    * **Doing the final setup of the analysis environment** - Perform final settings and initialize the analysis environment.\n\nOnce an instance is ready, you will be notified.\n[block:callout]\n{\n  \"type\": \"info\",\n  \"body\": \"If you don't have execute permissions in the project where the analysis is being created, the button is labelled **Create the analysis**. This allows you to create the analysis in draft state with the defined settings, but not execute it.\"\n}\n[/block]\n## [ 3 ] Start the analysis\n\nOnce the Platform has acquired an instance for your analysis, you will be able to open the editor and run your analysis.\n\n1. Click **Open in editor**.\nThe Data Cruncher editor will open in a new window, offering three sections on the landing tab:\n * **Notebook** - select whether to create a **Python 2**, **Python 3**, **R** or **Julia** notebook. A notebook is the central element of a Data Cruncher analysis, where you can enter your code, but also store equations, visualizations and explanatory text.\n * **Console** - select any of the **Python 2**, **Python 3**, **R** or **Julia** options if you prefer to run your code interactively in a kernel.\n * **Other** - this section offers the following options:\n    * **Text Editor** - used to create any text-based file that you want to have or use during your analysis. For example, if you need to add a JSON file to your analysis files, you can select this option, enter or paste the JSON content and save the file with a .json extension.\n    * **Terminal** - a familiar way of interacting with the system by bringing the functionality of a Linux shell into the Data Cruncher analysis environment.\n2. Under **Notebook**, select one of the available options (**Python 2**, **Python 3**, **R** or **Julia**).\n3. Your notebook is now ready. You can start entering the code in the first blank cell at the top.\n\n## Where to go from here?\nTo get started with the Data Cruncher editor, read the [Editor quick reference](doc:editor-quick-reference).","excerpt":"","slug":"run-an-analysis-using-data-cruncher","type":"basic","title":"Run an analysis using Data Cruncher"}

Run an analysis using Data Cruncher


## Overview Data Cruncher allows you to enter and execute Python, R or Julia code to perform further analyses on your data on the CGC. This page will explain how you can access Data Cruncher from a project on the CGC, set up an analysis and execute code within the analysis. To be able to run the analysis, you need execute permissions in the project. ### [ 1 ] Access Data Cruncher 1. Open the desired project on the CGC. This project should contain the data that you want to analyze further using Data Cruncher. 2. From the project's dashboard, click the **Interactive Analysis** tab. The list of available interactive analysis tools opens. 3. On the **Data Cruncher** card click **Open**. [block:image] { "images": [ { "image": [ "https://files.readme.io/b427f41-cruncher_card.png", "cruncher_card.png", 293, 441, "#eeebec" ] } ] } [/block] This takes you to the Data Cruncher home page. If you have previous analyses, they will be listed on this page. ### [ 2 ] Create and set up your analysis 1. In the top-right corner click **Create new analysis**. The **Create new analysis wizard** is displayed. 2. On the first screen, name your analysis in the **Analysis name** field. 3. Click **Next**. 4. Select the instance for the analysis. [block:image] { "images": [ { "image": [ "https://files.readme.io/1c223d6-cruncher_quickstart_1.png", "cruncher_quickstart_1.png", 560, 380, "#6393a4" ] } ] } [/block] The **Instance type** list displays available instances along with their disk size, number of vCPUs and memory (shown in brackets). The default instance is **c3.2xlarge** that has **160 GB** of SSD storage, **8 vCPUs** and **15 GB** of RAM. <a name="instance-inactivity" style="color: #474a54; text-decoration: none;">**Suspend time**</a> is the period of analysis inactivity after which the instance is stopped automatically. Inactivity implies that: * No files have been modified or created under the **Files** tab (in the `/sbgenomics/workspace` directory if you are using the Terminal). * There are no running kernels. Apart from stopping the instance, this also includes stopping the analysis and saving all files that meet the criteria for automatic saving or have been selected to be saved as project files. Files that do not meet the criteria and are not manually saved to the project will be lost. Minimum suspend time is 15 minutes. 5. Click **Next**. 6. Define the automatic saving criteria: * **Ignore the following file types** - Files that have the listed extensions will never be automatically saved when the analysis is stopped. If you need to specify multiple extensions, they are separated by a comma, e.g. `.zip, .log`. * **Ignore files larger than** - Files bigger than the specified size will not be automatically saved when the analysis is stopped. 7. Click **Start the analysis**. The CGC will start acquiring an adequate instance for your analysis, which may take a few minutes. Analysis initialization goes through the following stages: * **Allocating the instance for your analysis** - Obtain an instance from the cloud infrastructure provider. * **Preparing the allocated instance** - Load the required software onto the instance. * **Doing the final setup of the analysis environment** - Perform final settings and initialize the analysis environment. Once an instance is ready, you will be notified. [block:callout] { "type": "info", "body": "If you don't have execute permissions in the project where the analysis is being created, the button is labelled **Create the analysis**. This allows you to create the analysis in draft state with the defined settings, but not execute it." } [/block] ## [ 3 ] Start the analysis Once the Platform has acquired an instance for your analysis, you will be able to open the editor and run your analysis. 1. Click **Open in editor**. The Data Cruncher editor will open in a new window, offering three sections on the landing tab: * **Notebook** - select whether to create a **Python 2**, **Python 3**, **R** or **Julia** notebook. A notebook is the central element of a Data Cruncher analysis, where you can enter your code, but also store equations, visualizations and explanatory text. * **Console** - select any of the **Python 2**, **Python 3**, **R** or **Julia** options if you prefer to run your code interactively in a kernel. * **Other** - this section offers the following options: * **Text Editor** - used to create any text-based file that you want to have or use during your analysis. For example, if you need to add a JSON file to your analysis files, you can select this option, enter or paste the JSON content and save the file with a .json extension. * **Terminal** - a familiar way of interacting with the system by bringing the functionality of a Linux shell into the Data Cruncher analysis environment. 2. Under **Notebook**, select one of the available options (**Python 2**, **Python 3**, **R** or **Julia**). 3. Your notebook is now ready. You can start entering the code in the first blank cell at the top. ## Where to go from here? To get started with the Data Cruncher editor, read the [Editor quick reference](doc:editor-quick-reference).