{"_id":"5abceec7b75a410012a15283","project":"55faf11ba62ba1170021a9a7","version":{"_id":"55faf11ba62ba1170021a9aa","project":"55faf11ba62ba1170021a9a7","__v":40,"createdAt":"2015-09-17T16:58:03.490Z","releaseDate":"2015-09-17T16:58:03.490Z","categories":["55faf11ca62ba1170021a9ab","55faf8f4d0e22017005b8272","55faf91aa62ba1170021a9b5","55faf929a8a7770d00c2c0bd","55faf932a8a7770d00c2c0bf","55faf94b17b9d00d00969f47","55faf958d0e22017005b8274","55faf95fa8a7770d00c2c0c0","55faf96917b9d00d00969f48","55faf970a8a7770d00c2c0c1","55faf98c825d5f19001fa3a6","55faf99aa62ba1170021a9b8","55faf99fa62ba1170021a9b9","55faf9aa17b9d00d00969f49","55faf9b6a8a7770d00c2c0c3","55faf9bda62ba1170021a9ba","5604570090ee490d00440551","5637e8b2fbe1c50d008cb078","5649bb624fa1460d00780add","5671974d1b6b730d008b4823","5671979d60c8e70d006c9760","568e8eef70ca1f0d0035808e","56d0a2081ecc471500f1795e","56d4a0adde40c70b00823ea3","56d96b03dd90610b00270849","56fbb83d8f21c817002af880","573c811bee2b3b2200422be1","576bc92afb62dd20001cda85","5771811e27a5c20e00030dcd","5785191af3a10c0e009b75b0","57bdf84d5d48411900cd8dc0","57ff5c5dc135231700aed806","5804caf792398f0f00e77521","58458b4fba4f1c0f009692bb","586d3c287c6b5b2300c05055","58ef66d88646742f009a0216","58f5d52d7891630f00fe4e77","59a555bccdbd85001bfb1442","5a2a81f688574d001e9934f5","5b080c8d7833b20003ddbb6f"],"is_deprecated":false,"is_hidden":false,"is_beta":true,"is_stable":true,"codename":"","version_clean":"1.0.0","version":"1.0"},"category":{"_id":"5771811e27a5c20e00030dcd","version":"55faf11ba62ba1170021a9aa","__v":0,"project":"55faf11ba62ba1170021a9a7","sync":{"url":"","isSync":false},"reference":false,"createdAt":"2016-06-27T19:40:14.237Z","from_sync":false,"order":7,"slug":"public-projects","title":"PUBLIC PROJECTS"},"user":"566590c83889610d0008a253","githubsync":"","__v":0,"parentDoc":null,"updates":[],"next":{"pages":[],"description":""},"createdAt":"2018-03-29T13:48:55.914Z","link_external":false,"link_url":"","sync_unique":"","hidden":false,"api":{"settings":"","results":{"codes":[]},"auth":"required","params":[],"url":""},"isReference":false,"order":1,"body":"##Overview\n \nSmart Variant Filtering (SVF) uses machine learning algorithms trained on features from the existing Genome In A Bottle (GIAB) variant-called samples (HG001-HG005) to perform variant filtering (classification).\n\nSmart Variant Filtering increases the precision of called SNVs (removes false positives) for up to 0.2% while keeping the overall f-score higher by 0.12-0.27% than in existing solutions. Indel precision is increased by up to 7.8%, while the f-score increase is in range of 0.1 to 3.2%.\n\n \n \n## Access the Smart Variant Filtering public project\n\n To access the Smart Variant Filtering public project:\n\n1. Click on **Public projects **from the top navigation bar.\n2. Select **Smart Variant Filtering**, as shown below.\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/88fa1de-smf-cgc.png\",\n        \"smf-cgc.png\",\n        589,\n        278,\n        \"#dddfe2\"\n      ],\n      \"border\": true\n    }\n  ]\n}\n[/block]\n You'll be taken to the main dashboard of the Smart Variant Filtering public project.\n\n##Use the Smart Variant Filtering public project\n\nAll Seven Bridges Platform users automatically have copy permissions for this project. This means you can copy the available data to your own projects on the Platform to execute analyses.\n\nYou have the options to:\n\n* **Copy the entire project** - Start from the copied project and use available apps to filter a VCF file.\n* Select and copy a subset of the data to your own project - Use the selected data within your own analyses in your project.\n\n##Copy the entire project\n\n1. Access the public project by selecting **Smart Variant Filtering** from **Public projects **in the top navigation bar.\n2. Click **Copy this project**, next to the project's title, as shown below.\n\n \n \n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/8be4583-copied-public-project.png\",\n        \"copied-public-project.png\",\n        1389,\n        663,\n        \"#ecf0ee\"\n      ],\n      \"border\": true\n    }\n  ]\n}\n[/block]\n3. In the pop-up window, you can name your copy of the project and select a billing group.\n4. Once you've customized the details, click **Copy** to copy the entire project.\n\nYou'll be redirected to the dashboard of your cloned project when it is ready, as shown below.\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/0616e21-cgc-svf-copied-project.png\",\n        \"cgc-svf-copied-project.png\",\n        1473,\n        776,\n        \"#e7ecec\"\n      ],\n      \"border\": true\n    }\n  ]\n}\n[/block]\nLearn more below on the available options once the project is copied.\n\n###Filter a VCF file\n\n1. Click the **Apps** tab.\n2. Click the run icon  next to the **Smart Variant Filtering** tool.\n3. Click **Select files** next to a file input and choose the files in the following manner (all input files are available after copying the public project):\n * **Model for filtering SNVs or table to perform learning** - choose `model_7_features_snv.sav`.\n * **Model for filtering indels or table to perform learning** - choose `model_7_features_indel.sav`.\n * **VCF to be filtered** - choose the VCF file that you want to filter.\n4. Click **Run**.\n\nOnce the task is completed, the output file, a filtered VCF file created by the tool, will be available in the **Output** column.\n\n###Filter large VCF files\nTo filter large VCF files, use the **Apply Smart Variant Filtering Parallel** workflow which performs filtering by parallelizing the process per chromosome. All required input files are available in your project after copying the public project.\n\n1. Click the **Apps** tab.\n2. Click the run icon next to the **Apply Smart Variant Filtering** Parallel workflow, which will create a draft task.\n3. Click **Select files** next to a file input and choose the files in the following manner:\n  * **dbsnp**  - choose `dbsnp_147.tab.vcf.gz`\n  * **genome_bed_file_for_scatter** - choose `human_g1k_v37_decoy.breakpoints.bed`\n  * **indel_model** - choose `model_6_features_indel.sav`\n  * **reference** - choose `human_g1k_v37_decoy.fasta`\n  * **snv_model** - choose `model_6_features_snv.sav`\n  * **vcf** - choose the VCF that you want to filter.\n4. Click **Run**.\n\nOnce the task is completed, the output file, a filtered VCF file created by the workflow, will be available in the **Output** column.\n\n###Train a model\n\nTo train a model that will be used for filtering a VCF, use the Smart Variant Filtering tool and provide it with tables which contain the features. All required input files are available in your project after copying the public project.\n\n1. Click the **Apps** tab.\n2. Click the run icon next to the **Smart Variant Filtering** tool.\n3. Click **Select files** next to a file input and choose the files in the following manner:\n  * **Model for filtering SNVs or table to perform learning** - choose `annotated_HG003_oslo_exome.tab.vcf_SNVs.table`\n  * **Model for filtering indels or table to perform learning** - choose `annotated_HG003_oslo_exome.tab.vcf_indels.table`\n  * **VCF to be filtered** - choose the VCF file that you want to filter.\n4. Click the **Define App Settings**tab and specify the machine learning algorithms:\n  * **Machine learning algorithm for SNVs and its params** - enter the classifier as well as the parameter set as comma separated values (e.g. `MLP,250,logistic,sgd`)\n  * **Machine learning algorithm for Indels and its params** - enter the classifier as well as the parameter set as comma separated values (e.g. `MLP,250,logistic,sgd`)\n5. Click **Run**.\n\nOnce the task is completed, the output file will be available in the **Output** column. The result is a trained model for both SNVs and indels.\n\n####Supported classifiers and parameter sets\n\nThe currently supported classifiers and its parameters are listed in the table below.\n[block:parameters]\n{\n  \"data\": {\n    \"h-0\": \"Classifier\",\n    \"h-1\": \"Parameter set\",\n    \"0-0\": \"`ADA`\",\n    \"0-1\": \"`n_estimators, learning_rate,algorithm`\",\n    \"1-0\": \"`KNN`\",\n    \"1-1\": \"`neighbors,algorithms,p_distance`\",\n    \"2-0\": \"`SVM`\",\n    \"2-1\": \"`C,kernels`\",\n    \"3-0\": \"`RF`\",\n    \"3-1\": \"`n_estimators, criterion`\",\n    \"4-0\": \"`QD`\",\n    \"5-0\": \"`MLP`\",\n    \"4-1\": \"`tol`\",\n    \"5-1\": \"`hidden_layer_sizes, activation,solver`\"\n  },\n  \"cols\": 2,\n  \"rows\": 6\n}\n[/block]\n###Train a model, filter variants and test the results\n\nThe entire process of training a model, applying a variant filter and benchmarking the obtained results can be done by running **Smart Variant Filtering - Train, filter and test **workflow. All required input files are available in your project after copying the public project.\n\n1. Click the **Apps** tab.\n2. Click the run icon next to the **Smart Variant Filtering - Train, filter and test **workflow.\n3. Click **Select files** next to a file input and choose the files in the following manner:\n  * **dbsnp**  - choose `dbsnp_147.tab.vcf.gz`\n  * **genome_bed_file** - choose `genome_bed_filehuman_g1k_v37_decoy.breakpoints.bed`\n  * **indel_tables** - choose the following files:\n    * `annotated_ERR17432.tab.150x.vcf_indels.table`\n    * `annotated_HG001-NA12878-50x.tab.vcf_indels.table`\n    * `annotated_HG003.tab.hs37d5.60x.1.converted.vcf_indels.table`\n    * `annotated_HG004.tab.hs37d5.60x.1.converted.vcf_indels.table`\n    * `annotated_HG005.tab.150424_S1.vcf_indels.table`\n    * `annotated_NA12878_CEPH_30x_ERR194147.tab.vcf_indels.table`\n    * `annotated_NA12878_V2.tab.5_Robot_1_R.vcf_indels.table`\n  * **reference** - choose `human_g1k_v37_decoy.fasta`\n  * **region_bed_for_vcf_benchmark** - choose `HG002_GRCh37_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-22_v.3.3.2_highconf_noinconsistent.bed`\n  * **sdf_template** - choose `1000g_v37_phase2.sdf.zip`\n  * **snv_tables** - choose the following files:\n    * `annotated_ERR17432.tab.150x.vcf_SNVs.table`\n    * `annotated_HG001-NA12878-50x.tab.vcf_SNVs.table`\n    * `annotated_HG003.tab.hs37d5.60x.1.converted.vcf_SNVs.table`\n    * `annotated_HG004.tab.hs37d5.60x.1.converted.vcf_SNVs.table`\n    * `annotated_HG005.tab.150424_S1.vcf_SNVs.table`\n    * `annotated_NA12878_CEPH_30x_ERR194147.tab.vcf_SNVs.table`\n    * `annotated_NA12878_V2.tab.5_Robot_1_R.vcf_SNVs.table`\n  * **truth_vcf** - choose `HG002_GRCh37_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-22_v.3.3.2_highconf_triophased.vcf`\n  * **truthset_bedfile** - choose `HG002_GRCh37_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-22_v.3.3.2_highconf_noinconsistent.bed` \n  * **vcf** - choose `HG002-NA24385-50x.vcf`\n\n4. Click **Run**.\n\nThe result of this task once it is completed will be trained models for SNVs and Indels, a filtered VCF and precision/recall compared to the truth set VCF.\n\n##Use a subset of the data\n\nInstead of cloning the entire project, you can choose to select and copy a subset of the data.\n\n1. Access the public project by selecting **Smart Variant Filtering** from **Public projects** in the top navigation bar. You'll be taken to the project dashboard of the Smart Variant Filtering public project, as shown below.\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/4b1bd62-cgc-svf-copied-project.png\",\n        \"cgc-svf-copied-project.png\",\n        1473,\n        776,\n        \"#e7ecec\"\n      ],\n      \"border\": true\n    }\n  ]\n}\n[/block]\n2. Click the **Files** tab in the upper lett corner. This will take you to the **Files** page for the Smart Variant Filtering project, as shown below.\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/f910cfd-cgc-copied-files-project.png\",\n        \"cgc-copied-files-project.png\",\n        1458,\n        773,\n        \"#1f578f\"\n      ],\n      \"border\": true\n    }\n  ]\n}\n[/block]\n3. Filter the files or search them by:\n  * **Keywords** - You can use the search bar at the top of the page to find files by entering the file name or notes associated with a file.\n  * **Metadata fields** - Next to the search bar, you will see drop-down menus for the metadata fields **Investigation**, **File extension**, and **Sample ID**. Selecting a particular metadata value from one of these menus displays only files that match the value.\n4. You can choose specific files by selecting the corresponding checkbox in front of the file name.\n5. Select as many files as you desire and click **Copy to**.\n6. Select a project from the drop-down menu.\n\nNow, you can start using the Smart Variant Filtering files you've added to your personal project in your own analysis.","excerpt":"","slug":"smart-variant-filtering","type":"basic","title":"Smart Variant Filtering"}

Smart Variant Filtering


##Overview Smart Variant Filtering (SVF) uses machine learning algorithms trained on features from the existing Genome In A Bottle (GIAB) variant-called samples (HG001-HG005) to perform variant filtering (classification). Smart Variant Filtering increases the precision of called SNVs (removes false positives) for up to 0.2% while keeping the overall f-score higher by 0.12-0.27% than in existing solutions. Indel precision is increased by up to 7.8%, while the f-score increase is in range of 0.1 to 3.2%. ## Access the Smart Variant Filtering public project To access the Smart Variant Filtering public project: 1. Click on **Public projects **from the top navigation bar. 2. Select **Smart Variant Filtering**, as shown below. [block:image] { "images": [ { "image": [ "https://files.readme.io/88fa1de-smf-cgc.png", "smf-cgc.png", 589, 278, "#dddfe2" ], "border": true } ] } [/block] You'll be taken to the main dashboard of the Smart Variant Filtering public project. ##Use the Smart Variant Filtering public project All Seven Bridges Platform users automatically have copy permissions for this project. This means you can copy the available data to your own projects on the Platform to execute analyses. You have the options to: * **Copy the entire project** - Start from the copied project and use available apps to filter a VCF file. * Select and copy a subset of the data to your own project - Use the selected data within your own analyses in your project. ##Copy the entire project 1. Access the public project by selecting **Smart Variant Filtering** from **Public projects **in the top navigation bar. 2. Click **Copy this project**, next to the project's title, as shown below. [block:image] { "images": [ { "image": [ "https://files.readme.io/8be4583-copied-public-project.png", "copied-public-project.png", 1389, 663, "#ecf0ee" ], "border": true } ] } [/block] 3. In the pop-up window, you can name your copy of the project and select a billing group. 4. Once you've customized the details, click **Copy** to copy the entire project. You'll be redirected to the dashboard of your cloned project when it is ready, as shown below. [block:image] { "images": [ { "image": [ "https://files.readme.io/0616e21-cgc-svf-copied-project.png", "cgc-svf-copied-project.png", 1473, 776, "#e7ecec" ], "border": true } ] } [/block] Learn more below on the available options once the project is copied. ###Filter a VCF file 1. Click the **Apps** tab. 2. Click the run icon next to the **Smart Variant Filtering** tool. 3. Click **Select files** next to a file input and choose the files in the following manner (all input files are available after copying the public project): * **Model for filtering SNVs or table to perform learning** - choose `model_7_features_snv.sav`. * **Model for filtering indels or table to perform learning** - choose `model_7_features_indel.sav`. * **VCF to be filtered** - choose the VCF file that you want to filter. 4. Click **Run**. Once the task is completed, the output file, a filtered VCF file created by the tool, will be available in the **Output** column. ###Filter large VCF files To filter large VCF files, use the **Apply Smart Variant Filtering Parallel** workflow which performs filtering by parallelizing the process per chromosome. All required input files are available in your project after copying the public project. 1. Click the **Apps** tab. 2. Click the run icon next to the **Apply Smart Variant Filtering** Parallel workflow, which will create a draft task. 3. Click **Select files** next to a file input and choose the files in the following manner: * **dbsnp** - choose `dbsnp_147.tab.vcf.gz` * **genome_bed_file_for_scatter** - choose `human_g1k_v37_decoy.breakpoints.bed` * **indel_model** - choose `model_6_features_indel.sav` * **reference** - choose `human_g1k_v37_decoy.fasta` * **snv_model** - choose `model_6_features_snv.sav` * **vcf** - choose the VCF that you want to filter. 4. Click **Run**. Once the task is completed, the output file, a filtered VCF file created by the workflow, will be available in the **Output** column. ###Train a model To train a model that will be used for filtering a VCF, use the Smart Variant Filtering tool and provide it with tables which contain the features. All required input files are available in your project after copying the public project. 1. Click the **Apps** tab. 2. Click the run icon next to the **Smart Variant Filtering** tool. 3. Click **Select files** next to a file input and choose the files in the following manner: * **Model for filtering SNVs or table to perform learning** - choose `annotated_HG003_oslo_exome.tab.vcf_SNVs.table` * **Model for filtering indels or table to perform learning** - choose `annotated_HG003_oslo_exome.tab.vcf_indels.table` * **VCF to be filtered** - choose the VCF file that you want to filter. 4. Click the **Define App Settings**tab and specify the machine learning algorithms: * **Machine learning algorithm for SNVs and its params** - enter the classifier as well as the parameter set as comma separated values (e.g. `MLP,250,logistic,sgd`) * **Machine learning algorithm for Indels and its params** - enter the classifier as well as the parameter set as comma separated values (e.g. `MLP,250,logistic,sgd`) 5. Click **Run**. Once the task is completed, the output file will be available in the **Output** column. The result is a trained model for both SNVs and indels. ####Supported classifiers and parameter sets The currently supported classifiers and its parameters are listed in the table below. [block:parameters] { "data": { "h-0": "Classifier", "h-1": "Parameter set", "0-0": "`ADA`", "0-1": "`n_estimators, learning_rate,algorithm`", "1-0": "`KNN`", "1-1": "`neighbors,algorithms,p_distance`", "2-0": "`SVM`", "2-1": "`C,kernels`", "3-0": "`RF`", "3-1": "`n_estimators, criterion`", "4-0": "`QD`", "5-0": "`MLP`", "4-1": "`tol`", "5-1": "`hidden_layer_sizes, activation,solver`" }, "cols": 2, "rows": 6 } [/block] ###Train a model, filter variants and test the results The entire process of training a model, applying a variant filter and benchmarking the obtained results can be done by running **Smart Variant Filtering - Train, filter and test **workflow. All required input files are available in your project after copying the public project. 1. Click the **Apps** tab. 2. Click the run icon next to the **Smart Variant Filtering - Train, filter and test **workflow. 3. Click **Select files** next to a file input and choose the files in the following manner: * **dbsnp** - choose `dbsnp_147.tab.vcf.gz` * **genome_bed_file** - choose `genome_bed_filehuman_g1k_v37_decoy.breakpoints.bed` * **indel_tables** - choose the following files: * `annotated_ERR17432.tab.150x.vcf_indels.table` * `annotated_HG001-NA12878-50x.tab.vcf_indels.table` * `annotated_HG003.tab.hs37d5.60x.1.converted.vcf_indels.table` * `annotated_HG004.tab.hs37d5.60x.1.converted.vcf_indels.table` * `annotated_HG005.tab.150424_S1.vcf_indels.table` * `annotated_NA12878_CEPH_30x_ERR194147.tab.vcf_indels.table` * `annotated_NA12878_V2.tab.5_Robot_1_R.vcf_indels.table` * **reference** - choose `human_g1k_v37_decoy.fasta` * **region_bed_for_vcf_benchmark** - choose `HG002_GRCh37_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-22_v.3.3.2_highconf_noinconsistent.bed` * **sdf_template** - choose `1000g_v37_phase2.sdf.zip` * **snv_tables** - choose the following files: * `annotated_ERR17432.tab.150x.vcf_SNVs.table` * `annotated_HG001-NA12878-50x.tab.vcf_SNVs.table` * `annotated_HG003.tab.hs37d5.60x.1.converted.vcf_SNVs.table` * `annotated_HG004.tab.hs37d5.60x.1.converted.vcf_SNVs.table` * `annotated_HG005.tab.150424_S1.vcf_SNVs.table` * `annotated_NA12878_CEPH_30x_ERR194147.tab.vcf_SNVs.table` * `annotated_NA12878_V2.tab.5_Robot_1_R.vcf_SNVs.table` * **truth_vcf** - choose `HG002_GRCh37_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-22_v.3.3.2_highconf_triophased.vcf` * **truthset_bedfile** - choose `HG002_GRCh37_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-22_v.3.3.2_highconf_noinconsistent.bed` * **vcf** - choose `HG002-NA24385-50x.vcf` 4. Click **Run**. The result of this task once it is completed will be trained models for SNVs and Indels, a filtered VCF and precision/recall compared to the truth set VCF. ##Use a subset of the data Instead of cloning the entire project, you can choose to select and copy a subset of the data. 1. Access the public project by selecting **Smart Variant Filtering** from **Public projects** in the top navigation bar. You'll be taken to the project dashboard of the Smart Variant Filtering public project, as shown below. [block:image] { "images": [ { "image": [ "https://files.readme.io/4b1bd62-cgc-svf-copied-project.png", "cgc-svf-copied-project.png", 1473, 776, "#e7ecec" ], "border": true } ] } [/block] 2. Click the **Files** tab in the upper lett corner. This will take you to the **Files** page for the Smart Variant Filtering project, as shown below. [block:image] { "images": [ { "image": [ "https://files.readme.io/f910cfd-cgc-copied-files-project.png", "cgc-copied-files-project.png", 1458, 773, "#1f578f" ], "border": true } ] } [/block] 3. Filter the files or search them by: * **Keywords** - You can use the search bar at the top of the page to find files by entering the file name or notes associated with a file. * **Metadata fields** - Next to the search bar, you will see drop-down menus for the metadata fields **Investigation**, **File extension**, and **Sample ID**. Selecting a particular metadata value from one of these menus displays only files that match the value. 4. You can choose specific files by selecting the corresponding checkbox in front of the file name. 5. Select as many files as you desire and click **Copy to**. 6. Select a project from the drop-down menu. Now, you can start using the Smart Variant Filtering files you've added to your personal project in your own analysis.