{"_id":"5d0242ea954bad00320b4248","project":"55faf11ba62ba1170021a9a7","version":{"_id":"55faf11ba62ba1170021a9aa","project":"55faf11ba62ba1170021a9a7","__v":45,"createdAt":"2015-09-17T16:58:03.490Z","releaseDate":"2015-09-17T16:58:03.490Z","categories":["55faf11ca62ba1170021a9ab","55faf8f4d0e22017005b8272","55faf91aa62ba1170021a9b5","55faf929a8a7770d00c2c0bd","55faf932a8a7770d00c2c0bf","55faf94b17b9d00d00969f47","55faf958d0e22017005b8274","55faf95fa8a7770d00c2c0c0","55faf96917b9d00d00969f48","55faf970a8a7770d00c2c0c1","55faf98c825d5f19001fa3a6","55faf99aa62ba1170021a9b8","55faf99fa62ba1170021a9b9","55faf9aa17b9d00d00969f49","55faf9b6a8a7770d00c2c0c3","55faf9bda62ba1170021a9ba","5604570090ee490d00440551","5637e8b2fbe1c50d008cb078","5649bb624fa1460d00780add","5671974d1b6b730d008b4823","5671979d60c8e70d006c9760","568e8eef70ca1f0d0035808e","56d0a2081ecc471500f1795e","56d4a0adde40c70b00823ea3","56d96b03dd90610b00270849","56fbb83d8f21c817002af880","573c811bee2b3b2200422be1","576bc92afb62dd20001cda85","5771811e27a5c20e00030dcd","5785191af3a10c0e009b75b0","57bdf84d5d48411900cd8dc0","57ff5c5dc135231700aed806","5804caf792398f0f00e77521","58458b4fba4f1c0f009692bb","586d3c287c6b5b2300c05055","58ef66d88646742f009a0216","58f5d52d7891630f00fe4e77","59a555bccdbd85001bfb1442","5a2a81f688574d001e9934f5","5b080c8d7833b20003ddbb6f","5c222bed4bc358002f21459a","5c22412594a2a5005cc9e919","5c41ae1c33592700190a291e","5c8a525e2ba7b2003f9b153c","5cbf14d58c79c700ef2b502e"],"is_deprecated":false,"is_hidden":false,"is_beta":true,"is_stable":true,"codename":"","version_clean":"1.0.0","version":"1.0"},"category":{"_id":"57bdf84d5d48411900cd8dc0","version":"55faf11ba62ba1170021a9aa","__v":0,"project":"55faf11ba62ba1170021a9a7","sync":{"url":"","isSync":false},"reference":false,"createdAt":"2016-08-24T19:41:01.302Z","from_sync":false,"order":29,"slug":"api-hub","title":"API Hub"},"user":"566590c83889610d0008a253","__v":0,"parentDoc":null,"updates":[],"next":{"pages":[],"description":""},"createdAt":"2019-06-13T12:34:50.713Z","link_external":false,"link_url":"","sync_unique":"","hidden":false,"api":{"results":{"codes":[]},"settings":"","auth":"required","params":[],"url":""},"isReference":false,"order":5,"body":"##Introduction\n\nThe API rate limit is a limit to the number of calls you can send to the CGC public API within a predefined time frame. That limit is 1000 requests within 5 minutes. After this limit is reached, no further calls are accepted by the API server, until the 5 minute interval ends.\n\nAll rate limit information is returned to the user in the following HTTP headers:\n\n1. The header `X-RateLimit-Limit` represents the rate limit - currently this is 1000 requests per five minutes.\n2. The header `X-RateLimit-Remaining` represents your remaining number of calls before hitting the limit.\n3. The header `X-RateLimit-Reset` - represents the time in Unix timestamp when the limit will be reset\n\nTo learn how you can write optimized code and make the most of the CGC public API regardless of the rate limit, please consult the examples below.\n\nEach of the examples will first illustrate what an un-optimized code can look like and what makes it less ideal given the rate limit as well as a concrete recommendation on how you can optimize it. \n\n  * [Submitting tasks for execution](#section-submitting-tasks-for-execution)\n  * [Copying files between projects](#section-copying-files-between-projects)\n  * [Importing files from a volume](#section-importing-files-from-a-volume)\n  * [Updating file metadata](#section-updating-file-metadata)\n  * [Deleting multiple files](#section-deleting-multiple-files)\n  * [Exporting files to a volume](#section-exporting-files-to-a-volume)\n  * [Setting maximum pagination limit in queries](#section-setting-maximum-pagination-limit-in-queries)\n  * [Finding project by a name](#section-finding-project-by-name)\n\nNote that these examples assume that you are using the [Python client ](doc:api-python-library) for the CGC public API.\n\n##Submitting tasks for execution\n\nThere are two methods for starting multiple tasks on CGC:\n\n1. Submit tasks one by one inside a loop.\n2. Submit a batch task. A batch task can consist of many child tasks and can be created with a single API call, whereas submitting tasks inside a loop requires one API call for each task. Using batch tasks is therefore the recommended approach.\n\n###Not optimized for rate limit\n\nIn the example below, we iterate over samples and submit tasks one by one (in this case we run Salmon for RNA-seq analysis). This example assumes that we already grouped our input FASTQ files by samples in a dictionary.\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"for sample_id, fastq_files in samples.items():\\n     \\n    inputs = {\\n        \\\"reads\\\": fastq_files,\\n        \\\"transcriptome_fasta_or_salmon_index_archive\\\": index_file,\\n        \\\"gtf\\\": gene_map_file\\n    }\\n     \\n    task = api.tasks.create(\\n        name=\\\"Salmon {}\\\".format(sample_id),\\n        project=project,\\n        app=salmon_workflow.id,\\n        inputs=inputs\\n    )\",\n      \"language\": \"json\"\n    }\n  ]\n}\n[/block]\n###Optimized for rate limit\n\nIn this example, which is optimized for rate limit, we run Salmon as a batch task for all input files at once using `sample_id` as batching criterion. This example assumes that file metadata `sample_id` is already set for all input files.\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"inputs = {\\n    \\\"reads\\\": fastq_files,\\n    \\\"transcriptome_fasta_or_salmon_index_archive\\\": index_file,\\n    \\\"gtf\\\": gene_map_file\\n}\\n \\nbatch_task = api.tasks.create(\\n    name=\\\"Salmon batch\\\",\\n    project=project,\\n    app=salmon_workflow.id,\\n    inputs=inputs,\\n    batch_input=\\\"reads\\\",\\n    batch_by={\\n        \\\"type\\\": \\\"CRITERIA\\\",\\n        \\\"criteria\\\": [\\\"metadata.sample_id\\\"]\\n    }\\n)\",\n      \"language\": \"json\"\n    }\n  ]\n}\n[/block]\n##Copying files between projects\n\nInstead of copying individual files, which will make one API call per file, we recommend using a bulk API call. This way you can copy up to 100 files with a single API call.\n\n###Not optimized for rate limit\n\nCopying individual files requires two API calls for each file: one to find the file by name, and another one to copy it. We recommend using the bulk API call instead.\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"for name in source_file_names:\\n    f = api.files.query(project=src_project, names=[name])[0]\\n    f.copy(project=target_project)\",\n      \"language\": \"json\"\n    }\n  ]\n}\n[/block]\n###Optimized for rate limit\n\nUsing a bulk API call you can copy up to 100 files.\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"def bulk_copy_files(files_to_copy, target_project):\\n    \\\"Copies files in batches of size 100\\\"\\n     \\n    final_responses = {}\\n     \\n    for i in range(0, len(files_to_copy), 100):\\n         \\n        files = [f for f in files_to_copy[i:i + 100]]\\n        responses = api.actions.bulk_copy_files(files, target_project.id)\\n         \\n        for fileid, response in responses.items():\\n            if response['status'] != 'OK':\\n                raise Exception(\\n                    \\\"Error copying {}: {}\\\".format(fileid, response)\\n                )\\n                 \\n        final_responses.update(responses)\\n     \\n    return final_responses\\n     \\n     \\nfiles_to_copy = list(\\n    api.files.query(\\n        project=src_project,\\n        names=source_files,\\n        limit=100\\n    ).all()\\n)\\n \\nresponses = bulk_copy_files(files_to_copy, target_project)\",\n      \"language\": \"json\"\n    }\n  ]\n}\n[/block]\n##Importing files from a volume\n\nThe CGC API allows you to import files from a volume in bulk rather than one by one. Using the bulk API feature, you can import up to 100 files per call.\n\n\n###Not optimized for rate limit\n\nImporting individual files requires two API calls for each file: one to find the file by name, and another one to import it. We recommend using the bulk API call instead.\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"for f in files_to_import:\\n     \\n    imported_file = api.imports.submit_import(\\n        volume=volume,\\n        project=dest_project,\\n        location='christian_demo_files/' + f,\\n        overwrite=True\\n    )\",\n      \"language\": \"json\"\n    }\n  ]\n}\n[/block]\n###Optimized for rate limit\n\nUsing the bulk API feature, you will first query all files that need to be imported and then use one API to import up to 100 files.\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"def bulk_import_files(file_names, volume, location, project, overwrite=True, chunk_size=100):\\n    \\\"Imports list of files from volume in bulk\\\"\\n \\n    def is_running(response):\\n        if not response.valid or response.resource.error:\\n            raise Exception(\\n                '\\\\n'.join([\\n                    str(response.resource.error),\\n                    response.resource.error.message,\\n                    response.resource.error.more_info\\n                ]))\\n        return response.resource.state not in [\\\"COMPLETED\\\", \\\"FAILED\\\", \\\"ABORTED\\\"]\\n     \\n    final_responses = []\\n     \\n    # import files in batches of 100 each\\n    for i in range(0, len(file_names), chunk_size):\\n         \\n        # setup list of dictionary with import requests\\n        imports = [\\n            {\\n                'volume': volume,\\n                'location': location + '/' + fn,\\n                'project': project,\\n                'name' : fn, \\n                'overwrite': True\\n            }\\n            for fn in file_names[i:i + chunk_size]\\n        ]\\n \\n        # initiate bulk import of batch and wait until finished\\n        responses = api.imports.bulk_submit(imports)\\n        while any([is_running(r) for r in responses]):\\n            time.sleep(10)\\n            responses = api.imports.bulk_get([r.resource for r in responses])\\n             \\n        final_responses.extend(responses)\\n         \\n    return final_responses\\n \\n \\nresponses = bulk_import_files(\\n    file_names=files_to_import,\\n    volume=volume,\\n    location=\\\"christian_demo_files\\\",\\n    project=dest_project\\n)\",\n      \"language\": \"json\"\n    }\n  ]\n}\n[/block]\n\n## Updating file metadata\n\nMetadata for multiple files can be set using a bulk API call instead of one call per file. Setting metadata for the files is typically required before they can be provided as input to a CWL workflow.\n\nIn the examples below, we will assume that there is a list of FASTQ files for a specific sample and we want to set both `sample_id` and `paired_end` metadata information for all of them.\n\n###Not optimized for rate limit\n\nIn an example which is not optimized for the rate limit we are iterating over all FASTQ files and setting metadata for each of the files individually.\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"# set metadata for forward read files\\nfor fastq in forward_reads:\\n    fastq.metadata['sample_id'] = 'my-sample'\\n    fastq.metadata['paired_end'] = '1'\\n    fastq.save()\\n \\n# set metadata for reverse read files\\nfor fastq in reverse_reads:\\n    fastq.metadata['sample_id'] = 'my-sample'\\n    fastq.metadata['paired_end'] = '2'\\n    fastq.save()\",\n      \"language\": \"json\"\n    }\n  ]\n}\n[/block]\n###Optimized for rate limit\n\nAn optimal way to update metadata for multiple files is to use a bulk API call and update metadata for up to 100 files per call.\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"def bulk_update_metadata(files, metadata, replace=False):\\n    \\\"\\\"\\\"Updates metadata for list of files in bulk. Input lists must be ordered\\n    pairs, i.e. the first element in list 'files' corresponds to first element\\n    in list 'metadata' etc.\\\"\\\"\\\"\\n     \\n    final_responses = []\\n     \\n    # process in batches of 100\\n    for i in range(0, len(files), 100):\\n \\n        files_chunk = [f for f in files[i:i + 100]]\\n        md_chunk = [md for md in metadata[i:i + 100]]\\n \\n        # make sure metadata attribute is set for all files before update;\\n        # avoids lazy fetching of metadata for each file in subsequent loop\\n        md_missing = any([not f.field('metadata') for f in files_chunk])\\n        if md_missing:\\n            files_chunk = api.files.bulk_get(files_chunk)\\n \\n        # set metadata for each file\\n        for t in zip(files_chunk, md_chunk):\\n            t[0].metadata = t[1]\\n \\n        # update or replace existing metdata\\n        if replace :\\n            responses = api.files.bulk_update(files_chunk)\\n        else:\\n            responses = api.files.bulk_edit(files_chunk)\\n \\n        # check for errors\\n        for r in responses:\\n            if not r.valid:\\n                raise Exception(\\n                    '\\\\n'.join([str(r.error), r.error.message, r.error.more_info])\\n                )\\n \\n        final_responses.extend(responses)\\n         \\n    return final_responses\\n \\nmetadata = []\\nfor fastq in forward_reads:\\n    metadata.append({\\\"sample_id\\\" : \\\"my-sample\\\", \\\"paired_end\\\" : \\\"1\\\"})\\nfor fastq in reverse_reads:\\n    metadata.append({\\\"sample_id\\\" : \\\"my-sample\\\", \\\"paired_end\\\" : \\\"2\\\"})\\n   \\nresponses = bulk_update_metadata(forward_reads + reverse_reads, metadata)\",\n      \"language\": \"json\"\n    }\n  ]\n}\n[/block]\n\n##Deleting multiple files\n\nThis example will show how you can delete multiple files with the API rate limit in mind. The optimal way to delete multiple files is via bulk API call which can delete up to 100 files.\n\n###Not optimized for rate limit\n\nFetch and delete files one by one using a loop.\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"for fn in source_file_names:\\n    f = api.files.query(project=src_project, names=[fn])[0]\\n    f.delete()\",\n      \"language\": \"json\"\n    }\n  ]\n}\n[/block]\n###Optimized for rate limit\n\nFetch all files at once and then use a bulk API call to delete them in batches of 100 files or less.\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"def bulk_delete_files(files_to_delete, chunk_size=100):\\n    \\\"Deletes files in bulk, 100 files per API call (max)\\\"\\n     \\n    final_responses = []\\n     \\n    for i in range(0, len(files_to_delete), chunk_size):\\n         \\n        files = [f for f in files_to_delete[i:i + chunk_size]]\\n        responses = api.files.bulk_delete(files)\\n         \\n        for idx, r in enumerate(responses):\\n            if not r.valid:\\n                raise Exception(\\n                    '\\\\n'.join([\\n                        str(r.error) + \\\": \\\" + r.error.message,\\n                        r.error.more_info,\\n                        files[idx].name\\n                    ]))\\n         \\n        final_responses.extend(responses)\\n         \\n    return final_responses\\n \\nfiles_to_delete = list(\\n    api.files.query(\\n        project=src_project,\\n        names=source_file_names,\\n        limit=100\\n    ).all())\\n \\nresponses = bulk_delete_files(files_to_delete)\",\n      \"language\": \"json\"\n    }\n  ]\n}\n[/block]\n\n##Exporting files to a volume\n[block:callout]\n{\n  \"type\": \"info\",\n  \"body\": \"When exporting a file from the CGC to an attached volume, export is possible only to a volume that is in the same location (cloud provider and region) as the project from which the file is being exported.\"\n}\n[/block]\nHere the goal is to export files from a CGC project to a volume (cloud bucket). Again, CGC bulk API calls should be used to reduce the overall number of API calls. Note that below examples make use of the `copy_only` export feature, which requires `advance_access` to be activated when initializing the API.\n\n###Not optimized for rate limit\n\nIn this example, files are fetched and exported in a loop, one by one.\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"for name in source_file_names:\\n     \\n    f = api.files.query(project=src_project, names=[name])[0]\\n     \\n    export = api.exports.submit_export(\\n        file=f,\\n        volume=volume,\\n        location=\\\"christian_demo_files/\\\" + f.name,\\n        overwrite=True,\\n        copy_only=False\\n    )\",\n      \"language\": \"json\"\n    }\n  ]\n}\n[/block]\n###Optimized for rate limit\n\nFetch and export files in bulk.\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"def bulk_export_files(files, volume, location, overwrite=True, copy_only=False, chunk_size=100):\\n    \\\"Exports list of files to volume in bulk\\\"\\n \\n    def is_running(response):\\n        if not response.valid:\\n            raise Exception(\\n                '\\\\n'.join([\\n                    str(response.error),\\n                    response.error.message,\\n                    response.error.more_info\\n                ]))\\n        return response.resource.state not in [\\\"COMPLETED\\\", \\\"FAILED\\\", \\\"ABORTED\\\"]\\n     \\n    final_responses = []\\n \\n    # export files in batches of 100 files each\\n    for i in range(0, len(files), chunk_size):\\n         \\n        # setup list of dictionary with export requests\\n        exports = [\\n            {\\n                'file': f,\\n                'volume': volume,\\n                'location': location + '/' + f.name, \\n                'overwrite': overwrite\\n            }\\n            for f in files[i:i + chunk_size]\\n        ]\\n \\n        # initiate bulk export of this batch and wait until finished\\n        responses = api.exports.bulk_submit(exports, copy_only=copy_only)\\n        while any([is_running(r) for r in responses]):\\n            time.sleep(10)\\n            responses = api.exports.bulk_get([r.resource for r in responses])\\n             \\n        final_responses.extend(responses)\\n         \\n    return final_responses\\n \\nfiles_to_export = list(\\n    api.files.query(\\n        project=src_project,\\n        names=source_file_names,\\n        limit=100\\n    ).all())\\n \\nresponses = bulk_export_files(\\n    files=files_to_export,\\n    volume=volume,\\n    location='christian_demo_files',\\n    copy_only=False\\n)\",\n      \"language\": \"json\"\n    }\n  ]\n}\n[/block]\n##Setting maximum pagination limit in queries\n\nSeveral API calls allow setting a pagination limit to the number of results that are returned. Changing the default pagination limit (50) to its allowed maximum value (100) cuts the number of required API calls in half when iterating over the entire result set of a query.\n\nPagination limits can be set for various API calls, but we recommend that you set it for the following queries as they tend to return the largest result sets:\n\n  * api.files.query()\n  * api.projects.query()\n  * api.tasks.query()\n  * task.get_batch_children() \n\n###Not optimized for rate limit\n\nHere is an example for a project query that uses the default pagination limit of 50.\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"for project in api.projects.query().all():\\n    print(project)\",\n      \"language\": \"json\"\n    }\n  ]\n}\n[/block]\n###Optimized for rate limit\n\nIn the example below, the limit is set to its allowed maximum value of 100.\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"for project in api.projects.query(limit=100).all():\\n    print(project)\",\n      \"language\": \"json\"\n    }\n  ]\n}\n[/block]\n\n##Finding project by name\n\n###Not optimized for rate limit\n\nIterate over all projects and compare names.\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"project = [\\n    p for p in api.projects.query().all()\\n    if p.name == project_name\\n][0]\",\n      \"language\": \"json\"\n    }\n  ]\n}\n[/block]\n###Optimized for rate limit\n\nUse 'name' query parameter in search to restrict results. Query parameter performs partial match, so name comparison is still required to ensure the exact match.\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"project = [\\n    p for p in api.projects.query(name=project_name).all()\\n    if p.name == project_name\\n][0]\",\n      \"language\": \"json\"\n    }\n  ]\n}\n[/block]","excerpt":"","slug":"api-rate-limit","type":"basic","title":"API rate limit"}
##Introduction The API rate limit is a limit to the number of calls you can send to the CGC public API within a predefined time frame. That limit is 1000 requests within 5 minutes. After this limit is reached, no further calls are accepted by the API server, until the 5 minute interval ends. All rate limit information is returned to the user in the following HTTP headers: 1. The header `X-RateLimit-Limit` represents the rate limit - currently this is 1000 requests per five minutes. 2. The header `X-RateLimit-Remaining` represents your remaining number of calls before hitting the limit. 3. The header `X-RateLimit-Reset` - represents the time in Unix timestamp when the limit will be reset To learn how you can write optimized code and make the most of the CGC public API regardless of the rate limit, please consult the examples below. Each of the examples will first illustrate what an un-optimized code can look like and what makes it less ideal given the rate limit as well as a concrete recommendation on how you can optimize it. * [Submitting tasks for execution](#section-submitting-tasks-for-execution) * [Copying files between projects](#section-copying-files-between-projects) * [Importing files from a volume](#section-importing-files-from-a-volume) * [Updating file metadata](#section-updating-file-metadata) * [Deleting multiple files](#section-deleting-multiple-files) * [Exporting files to a volume](#section-exporting-files-to-a-volume) * [Setting maximum pagination limit in queries](#section-setting-maximum-pagination-limit-in-queries) * [Finding project by a name](#section-finding-project-by-name) Note that these examples assume that you are using the [Python client ](doc:api-python-library) for the CGC public API. ##Submitting tasks for execution There are two methods for starting multiple tasks on CGC: 1. Submit tasks one by one inside a loop. 2. Submit a batch task. A batch task can consist of many child tasks and can be created with a single API call, whereas submitting tasks inside a loop requires one API call for each task. Using batch tasks is therefore the recommended approach. ###Not optimized for rate limit In the example below, we iterate over samples and submit tasks one by one (in this case we run Salmon for RNA-seq analysis). This example assumes that we already grouped our input FASTQ files by samples in a dictionary. [block:code] { "codes": [ { "code": "for sample_id, fastq_files in samples.items():\n \n inputs = {\n \"reads\": fastq_files,\n \"transcriptome_fasta_or_salmon_index_archive\": index_file,\n \"gtf\": gene_map_file\n }\n \n task = api.tasks.create(\n name=\"Salmon {}\".format(sample_id),\n project=project,\n app=salmon_workflow.id,\n inputs=inputs\n )", "language": "json" } ] } [/block] ###Optimized for rate limit In this example, which is optimized for rate limit, we run Salmon as a batch task for all input files at once using `sample_id` as batching criterion. This example assumes that file metadata `sample_id` is already set for all input files. [block:code] { "codes": [ { "code": "inputs = {\n \"reads\": fastq_files,\n \"transcriptome_fasta_or_salmon_index_archive\": index_file,\n \"gtf\": gene_map_file\n}\n \nbatch_task = api.tasks.create(\n name=\"Salmon batch\",\n project=project,\n app=salmon_workflow.id,\n inputs=inputs,\n batch_input=\"reads\",\n batch_by={\n \"type\": \"CRITERIA\",\n \"criteria\": [\"metadata.sample_id\"]\n }\n)", "language": "json" } ] } [/block] ##Copying files between projects Instead of copying individual files, which will make one API call per file, we recommend using a bulk API call. This way you can copy up to 100 files with a single API call. ###Not optimized for rate limit Copying individual files requires two API calls for each file: one to find the file by name, and another one to copy it. We recommend using the bulk API call instead. [block:code] { "codes": [ { "code": "for name in source_file_names:\n f = api.files.query(project=src_project, names=[name])[0]\n f.copy(project=target_project)", "language": "json" } ] } [/block] ###Optimized for rate limit Using a bulk API call you can copy up to 100 files. [block:code] { "codes": [ { "code": "def bulk_copy_files(files_to_copy, target_project):\n \"Copies files in batches of size 100\"\n \n final_responses = {}\n \n for i in range(0, len(files_to_copy), 100):\n \n files = [f for f in files_to_copy[i:i + 100]]\n responses = api.actions.bulk_copy_files(files, target_project.id)\n \n for fileid, response in responses.items():\n if response['status'] != 'OK':\n raise Exception(\n \"Error copying {}: {}\".format(fileid, response)\n )\n \n final_responses.update(responses)\n \n return final_responses\n \n \nfiles_to_copy = list(\n api.files.query(\n project=src_project,\n names=source_files,\n limit=100\n ).all()\n)\n \nresponses = bulk_copy_files(files_to_copy, target_project)", "language": "json" } ] } [/block] ##Importing files from a volume The CGC API allows you to import files from a volume in bulk rather than one by one. Using the bulk API feature, you can import up to 100 files per call. ###Not optimized for rate limit Importing individual files requires two API calls for each file: one to find the file by name, and another one to import it. We recommend using the bulk API call instead. [block:code] { "codes": [ { "code": "for f in files_to_import:\n \n imported_file = api.imports.submit_import(\n volume=volume,\n project=dest_project,\n location='christian_demo_files/' + f,\n overwrite=True\n )", "language": "json" } ] } [/block] ###Optimized for rate limit Using the bulk API feature, you will first query all files that need to be imported and then use one API to import up to 100 files. [block:code] { "codes": [ { "code": "def bulk_import_files(file_names, volume, location, project, overwrite=True, chunk_size=100):\n \"Imports list of files from volume in bulk\"\n \n def is_running(response):\n if not response.valid or response.resource.error:\n raise Exception(\n '\\n'.join([\n str(response.resource.error),\n response.resource.error.message,\n response.resource.error.more_info\n ]))\n return response.resource.state not in [\"COMPLETED\", \"FAILED\", \"ABORTED\"]\n \n final_responses = []\n \n # import files in batches of 100 each\n for i in range(0, len(file_names), chunk_size):\n \n # setup list of dictionary with import requests\n imports = [\n {\n 'volume': volume,\n 'location': location + '/' + fn,\n 'project': project,\n 'name' : fn, \n 'overwrite': True\n }\n for fn in file_names[i:i + chunk_size]\n ]\n \n # initiate bulk import of batch and wait until finished\n responses = api.imports.bulk_submit(imports)\n while any([is_running(r) for r in responses]):\n time.sleep(10)\n responses = api.imports.bulk_get([r.resource for r in responses])\n \n final_responses.extend(responses)\n \n return final_responses\n \n \nresponses = bulk_import_files(\n file_names=files_to_import,\n volume=volume,\n location=\"christian_demo_files\",\n project=dest_project\n)", "language": "json" } ] } [/block] ## Updating file metadata Metadata for multiple files can be set using a bulk API call instead of one call per file. Setting metadata for the files is typically required before they can be provided as input to a CWL workflow. In the examples below, we will assume that there is a list of FASTQ files for a specific sample and we want to set both `sample_id` and `paired_end` metadata information for all of them. ###Not optimized for rate limit In an example which is not optimized for the rate limit we are iterating over all FASTQ files and setting metadata for each of the files individually. [block:code] { "codes": [ { "code": "# set metadata for forward read files\nfor fastq in forward_reads:\n fastq.metadata['sample_id'] = 'my-sample'\n fastq.metadata['paired_end'] = '1'\n fastq.save()\n \n# set metadata for reverse read files\nfor fastq in reverse_reads:\n fastq.metadata['sample_id'] = 'my-sample'\n fastq.metadata['paired_end'] = '2'\n fastq.save()", "language": "json" } ] } [/block] ###Optimized for rate limit An optimal way to update metadata for multiple files is to use a bulk API call and update metadata for up to 100 files per call. [block:code] { "codes": [ { "code": "def bulk_update_metadata(files, metadata, replace=False):\n \"\"\"Updates metadata for list of files in bulk. Input lists must be ordered\n pairs, i.e. the first element in list 'files' corresponds to first element\n in list 'metadata' etc.\"\"\"\n \n final_responses = []\n \n # process in batches of 100\n for i in range(0, len(files), 100):\n \n files_chunk = [f for f in files[i:i + 100]]\n md_chunk = [md for md in metadata[i:i + 100]]\n \n # make sure metadata attribute is set for all files before update;\n # avoids lazy fetching of metadata for each file in subsequent loop\n md_missing = any([not f.field('metadata') for f in files_chunk])\n if md_missing:\n files_chunk = api.files.bulk_get(files_chunk)\n \n # set metadata for each file\n for t in zip(files_chunk, md_chunk):\n t[0].metadata = t[1]\n \n # update or replace existing metdata\n if replace :\n responses = api.files.bulk_update(files_chunk)\n else:\n responses = api.files.bulk_edit(files_chunk)\n \n # check for errors\n for r in responses:\n if not r.valid:\n raise Exception(\n '\\n'.join([str(r.error), r.error.message, r.error.more_info])\n )\n \n final_responses.extend(responses)\n \n return final_responses\n \nmetadata = []\nfor fastq in forward_reads:\n metadata.append({\"sample_id\" : \"my-sample\", \"paired_end\" : \"1\"})\nfor fastq in reverse_reads:\n metadata.append({\"sample_id\" : \"my-sample\", \"paired_end\" : \"2\"})\n \nresponses = bulk_update_metadata(forward_reads + reverse_reads, metadata)", "language": "json" } ] } [/block] ##Deleting multiple files This example will show how you can delete multiple files with the API rate limit in mind. The optimal way to delete multiple files is via bulk API call which can delete up to 100 files. ###Not optimized for rate limit Fetch and delete files one by one using a loop. [block:code] { "codes": [ { "code": "for fn in source_file_names:\n f = api.files.query(project=src_project, names=[fn])[0]\n f.delete()", "language": "json" } ] } [/block] ###Optimized for rate limit Fetch all files at once and then use a bulk API call to delete them in batches of 100 files or less. [block:code] { "codes": [ { "code": "def bulk_delete_files(files_to_delete, chunk_size=100):\n \"Deletes files in bulk, 100 files per API call (max)\"\n \n final_responses = []\n \n for i in range(0, len(files_to_delete), chunk_size):\n \n files = [f for f in files_to_delete[i:i + chunk_size]]\n responses = api.files.bulk_delete(files)\n \n for idx, r in enumerate(responses):\n if not r.valid:\n raise Exception(\n '\\n'.join([\n str(r.error) + \": \" + r.error.message,\n r.error.more_info,\n files[idx].name\n ]))\n \n final_responses.extend(responses)\n \n return final_responses\n \nfiles_to_delete = list(\n api.files.query(\n project=src_project,\n names=source_file_names,\n limit=100\n ).all())\n \nresponses = bulk_delete_files(files_to_delete)", "language": "json" } ] } [/block] ##Exporting files to a volume [block:callout] { "type": "info", "body": "When exporting a file from the CGC to an attached volume, export is possible only to a volume that is in the same location (cloud provider and region) as the project from which the file is being exported." } [/block] Here the goal is to export files from a CGC project to a volume (cloud bucket). Again, CGC bulk API calls should be used to reduce the overall number of API calls. Note that below examples make use of the `copy_only` export feature, which requires `advance_access` to be activated when initializing the API. ###Not optimized for rate limit In this example, files are fetched and exported in a loop, one by one. [block:code] { "codes": [ { "code": "for name in source_file_names:\n \n f = api.files.query(project=src_project, names=[name])[0]\n \n export = api.exports.submit_export(\n file=f,\n volume=volume,\n location=\"christian_demo_files/\" + f.name,\n overwrite=True,\n copy_only=False\n )", "language": "json" } ] } [/block] ###Optimized for rate limit Fetch and export files in bulk. [block:code] { "codes": [ { "code": "def bulk_export_files(files, volume, location, overwrite=True, copy_only=False, chunk_size=100):\n \"Exports list of files to volume in bulk\"\n \n def is_running(response):\n if not response.valid:\n raise Exception(\n '\\n'.join([\n str(response.error),\n response.error.message,\n response.error.more_info\n ]))\n return response.resource.state not in [\"COMPLETED\", \"FAILED\", \"ABORTED\"]\n \n final_responses = []\n \n # export files in batches of 100 files each\n for i in range(0, len(files), chunk_size):\n \n # setup list of dictionary with export requests\n exports = [\n {\n 'file': f,\n 'volume': volume,\n 'location': location + '/' + f.name, \n 'overwrite': overwrite\n }\n for f in files[i:i + chunk_size]\n ]\n \n # initiate bulk export of this batch and wait until finished\n responses = api.exports.bulk_submit(exports, copy_only=copy_only)\n while any([is_running(r) for r in responses]):\n time.sleep(10)\n responses = api.exports.bulk_get([r.resource for r in responses])\n \n final_responses.extend(responses)\n \n return final_responses\n \nfiles_to_export = list(\n api.files.query(\n project=src_project,\n names=source_file_names,\n limit=100\n ).all())\n \nresponses = bulk_export_files(\n files=files_to_export,\n volume=volume,\n location='christian_demo_files',\n copy_only=False\n)", "language": "json" } ] } [/block] ##Setting maximum pagination limit in queries Several API calls allow setting a pagination limit to the number of results that are returned. Changing the default pagination limit (50) to its allowed maximum value (100) cuts the number of required API calls in half when iterating over the entire result set of a query. Pagination limits can be set for various API calls, but we recommend that you set it for the following queries as they tend to return the largest result sets: * api.files.query() * api.projects.query() * api.tasks.query() * task.get_batch_children() ###Not optimized for rate limit Here is an example for a project query that uses the default pagination limit of 50. [block:code] { "codes": [ { "code": "for project in api.projects.query().all():\n print(project)", "language": "json" } ] } [/block] ###Optimized for rate limit In the example below, the limit is set to its allowed maximum value of 100. [block:code] { "codes": [ { "code": "for project in api.projects.query(limit=100).all():\n print(project)", "language": "json" } ] } [/block] ##Finding project by name ###Not optimized for rate limit Iterate over all projects and compare names. [block:code] { "codes": [ { "code": "project = [\n p for p in api.projects.query().all()\n if p.name == project_name\n][0]", "language": "json" } ] } [/block] ###Optimized for rate limit Use 'name' query parameter in search to restrict results. Query parameter performs partial match, so name comparison is still required to ensure the exact match. [block:code] { "codes": [ { "code": "project = [\n p for p in api.projects.query(name=project_name).all()\n if p.name == project_name\n][0]", "language": "json" } ] } [/block]