Bring Nextflow apps to the CGC
To get access to this feature, please contact our Support Team.
Overview
NextflowΒ is one of the most popular standards for describing reproducible scientific workflows that use software containers. If you already have tools or execution pipelines that are described in Nextflow, this page will provide details on how you can push them to the CGC and run your analysis at scale, using the full potential and benefits of the Seven Bridges execution environment.
Prerequisites
- An account on the CGC.
- Installed
sbpack
. For more details on what sbpack can do, how to install it and its main use cases, see About sbpack below. - Docker image containing the app you want to run and its dependencies, available in a registry that is accessible by the CGC (such as the CGC Image Registry). The full path of the image in the registry must be properly referenced in the app's Nextflow description. See how to create and upload a Docker image and make sure to edit the Nextflow code to use the image.
About sbpack
The primary use of sbpack
is to provide an easy way to upload (sbpack
) and download (sbpull
) apps to/from any Seven Bridges powered platform. Since it is a command-line tool, it can be particularly useful as a part of continuous development and integration pipelines for bioinformatics apps, as it allows seamless automated deployment of new app versions to Seven Bridges environments. It works with apps described using the following workflow description standards:
- Common Workflow Language (CWL). Apart from enabling the standard app pull and push flows, also provides advanced functionalities such as resolution of linked processes, schemadefs and $includes and $imports.
- Nextflow. Adapts, prepares and pushes Nextflow apps for execution in Seven Bridges environments using a special
sbpack_nf
command. - Workflow Description Language (WDL). Uses theΒ
sbpack_wdl
command to convert and push WDL apps forΒ execution in Seven Bridges environments.
To install sbpack
, use the standard install method through pip
:
pip install sbpack
Procedure
The procedure of publishing Nextflow apps for use on the CGC is a process that consists of the following two stages:
- Initial app conversion. In this step, your Nextflow app will be converted to a format that is executable on the CGC, but some benefits and functionalities, such as preselection of only specific files to keep as outputs in order to reduce storage costs, will remain unavailable as they are not supported by Nextflow. This is why it is strongly recommended to go through the next step.
- Optimizing the converted app for execution in Seven Bridges environments. The app that has been initially converted now contains an additional configuration file that you will use to define CGC-specific options and fully optimize it for use in the Seven Bridges execution environment. Once the optimized configuration is prepared, the app configuration is pushed to the CGC again.
Initial app conversion
This step adapts the Nextflow app for execution on the CGC. It is performed by executing theΒ sbpack_nf
command in the following format:
sbpack_nf --profile PROFILE --appid APPID --workflow-path WORKFLOW_PATH --entrypoint file_name.nf
In the command above, replace the placeholders as follows:
PROFILE
is the CGC profile containing the CGC API endpoint and authentication token, as set in the Seven Bridges credentials file.APPID
Β specifies the identifier of the app on the CGC, in theΒ{user}/{project}/{app_name}
Β format.Β TheΒ{user}
Β part is your CGC username. TheΒ{project}
Β part is the project to which you want to push the app andΒ{app_id}
is the ID you want to assign to the app. For example, the full app ID can beΒrfranklin/my-new-project/my-nextflow-app
. If the specified app ID does not exist, it will be created. If it exists, a newΒ revision (version)Β of the app will be created.WORKFLOW_PATH
Β needs to be replaced with the path where the Nextflow app files are located on your local machine.file_name.nf
should be replaced with the name of the actual .nf file containing your app's Nextflow code.
Here is a sample of the command:
sbpack_nf --profile cgc --appid rfranklin/nextflow-project/test-app --workflow-path /Users/rfranklin/apps/nextflow/demo --entrypoint app.nf
Once executed successfully, this command will convert the Nextflow app for use on the CGC and push it to the CGC project specified as the value of the --appid
argument. The local directory specified as the value ofΒ --workflow-path
will now contain additional nextflow_schema.json
andΒ sb_nextflow_schema.yaml
Β files. TheΒ sb_nextflow_schema.yaml
Β file contains configuration parameters that can be adjusted and optimized for execution on the CGC.
Optionally, to avoid pushing the app to the CGC at this stage and perform optimizations for the Seven Bridges execution environment beforehand, use theΒ --dump-sb-app
flag at the end of the command. For a full list of available arguments to the sbpack_nf
Β command, see the sbpack_nfΒ command reference.
Optimizing the converted app for execution in Seven Bridges environments
When you have performed the initial conversion step, the generatedΒ sb_nextflow_schema.yaml
Β file is important as it contains confguration parameters that will help you optimize the app for execution on the CGC. The file consists of the following major sections:
- The initial section that includes general app information and the documentation content describing the app (if any):
app_content
: Contains details about app's package and Nextflow file:Β Βcode_package
: CGC ID of the file that contains the Nextflow code. This is replaced by theΒgit_pull
key if theΒ--no-package
option was used to set a git repository as the source of the app's code. See the sbpack_nf command reference for details.entrypoint
:Β Relative path to the file containing the Nextflow code, relative to the root directory of the ZIP file defined incode_package
. Usuallymain.nf
.executor_version
: Version of the Nextflow executor you want to run your code (e.g.21.10.5
). If not specified, the version22.04.4
will be used.
class
: Defines the type of workflow description language used for the app. The value will always beΒnextflow
Β for Nextflow apps.cwlVersion
: Defines the version of CWL used to describe the app.Β The value will always beΒNone
Β for Nextflow apps.doc
: The Markdown-formatted text describing the app.Β- TheΒ
inputs
section that defines details of the app inputs. - TheΒ
outputs
section that defines details of app outputs. - TheΒ
requirements
section that defines app execution requirements such as initial working directory.
Configuring inputs
Each of the app inputs that is present in theΒ inputs
section contains the following basic details:
id
: Unique identifier of the input.inputBinding
: Defines the mapping of the input to the command line of the app that is being executed.Β IfinputBinding
Β is omitted, the input is made available in the execution environment, but is not passed on to the Nextflow executor.prefix
: The command line argument that takes the value provided on the input.Β Inputs that haveinputBinding.prefix
Β defined will have their value passed on to the Nextflow executor via the command line. For example, if you provide a value for a Nextflow param namedinput_file
, prefix would be defined as--input_file
.
default
: The default value for the input. If the input value is not set on task execution, this default value is taken and passed on to the executor as defined withΒinputBinding.prefix
.label
: Text description of the input.Βsbg:toolDefaultValue
:Β Default value of the input in the Nextflow workflow. Value provided here is not used in execution and is descriptive (for information purpose) only.sbg:fileTypes
: Comma separated (with spaces) value of file extensions that are used in the file picker when setting up tasks. For example:sbg:fileTypes: βFASTQ, FASTQ.GZ"
.type
: The type of value expected on the input.
To accommodate for the transition between Nextflow and the Seven Bridges execution environment,Β theΒsb_nextflow_schema.yaml
Β file will always contain an additional input whose ID isΒauxiliary_files
, which contains the list of files not added as explicit workflow inputs but are required for workflow execution. To enable proper execution on the CGC, please do not remove this input fromsb_nextflow_schema.yaml
. Learn more about available types of inputs.
Example: File input
Due to the way Nextflow treats file inputs, when an app is converted and theΒ sb_nextflow_schema.yaml
Β file is created, file inputs are treated as strings, as shown in the code below:
type:
- string
To make the app work properly on the CGC, this needs to be changed as follows:
type:
- File
Configuring outputs
In addition to executing Nextflow apps on the CGC, you also need to optimize app outputs to produce and save only files that match the defined criteria, extending the standard Nextflow behavior that does not offer strict output location selection. To achieve this and be able to further configure your app outputs, see the details about configuration parameters contained in theΒ outputs
section of theΒ sb_nextflow_schema.yaml
Β file:
id
: Unique identifier of the output. You can change this value to provide a more adequate and descriptive one if necessary.outputBinding
: Defines theglob
Β expression or pattern that will be used to select the output directory.glob
: The glob expression that defines the items to keep as outputs on the output port.
type
: The type of output value.
Example: Configuring a hard-coded output directory name
TheΒ sb_nextflow_schema.yaml
Β file always contains one automatically generated app output:
outputs:
- id: nf_workdir
outputBinding:
glob: work
type: Directory
To configure the output to fetch the directory in which your app produces its output files, replace the values as follows:
outputs:
- id: output_dir
outputBinding:
glob: 'outputs'
type: Directory
In the example above, replace output_dir
Β with an ID that describes your output and replaceΒ outputs
Β with the directory where the app you are executing outputs its results.
Example: Configuring a dynamic output directory name
Apart from hard-coding the name of your output directory, you can also use theΒ sb_nextflow_schema.yaml
Β file to set the name of the output directory by defining it in an app input, provided that the tool itself supports the option of defining the output directory name using the corresponding input argument and its value. The first step is to define the input that takes the output directory name (in theΒ sb_nextflow_schema.yaml inputs
section):
- id: outdir
inputBinding:
prefix: --outdir
type:
- string
- 'null'
Once you have defined the input, define an output, where the output directory glob
Β will be a variable that gets the value defined in the input above.
outputs:
- id: output_directory
outputBinding:
glob: $(inputs.outdir)
type: Directory
TheΒ $(inputs.outdir)
value is a variable that will be replaced with the actual value entered in theΒ outdir
input when the app is executed.
Configuring requirements
TheΒ requirements
sections is primarily used for two execution-related parameters:
- Setting input staging (making input files available in the app's working directory)
- Setting instances that are used for app executions on the CGC
Setting input staging
Files that are named as inputs to a tool are not, by default, in the tool's working directory. In most apps this access is sufficient, since most tools only need to read their input files, process the data contained in them, and write new output files on the basis of this data to their working directory. However, in some cases an app might require input files to be placed directly in its working directory. If this is the case with your app, modify theΒ requirements
section in theΒ Β sb_nextflow_schema.yaml
Β file as follows:
requirements:
- class: InitialWorkDirRequirement
listing:
- $(inputs.auxiliary_files)
- $(inputs.in_transcriptome)
Entries under listing
Β define files and directories that will be made available in the appβs working directory before the command is executed. The files and directories are usually defined as variables named after their respective input IDs, one of which,Β $(inputs.auxiliary_files)
, is automatically generated and added in the conversion step.
Another useful option is creation of a file directly in the working directory. This is done by defining entryname
andΒ entry
keys in the InitialWorkDirRequirement
Β class, as follows:
requirements:
- class: InitialWorkDirRequirement
listing:
- entryname: input_nanoseq.csv
entry: |
${
if (inputs.auxiliary_files && !inputs.in_csv_file){
var content = 'group,barcode';
for (var i = 0; i < inputs.auxiliary_files.length; i++){
if (inputs.auxiliary_files[i].metadata['barcode']){
var barcode = inputs.auxiliary_files[i].metadata['barcode'];
}
else {
var barcode = '';
}
if (inputs.auxiliary_files[i].metadata['group']){
var group = inputs.auxiliary_files[i].metadata['group'];
}
else {
var group = '';
}
content = content.concat(group,',',barcode,'\\n');
}
return content
}
else {
return ''
}
In the example code above, entryname
defines the name of the file generated in the working directory, which isΒ input_nanoseq.csv
, whileΒ entry
contains a Javascript expression that populates the generated file by getting barcode
Β and group
Β metadata values from input files and concatenating them in a single CSV file. The expression can be defined to match your needs and intended use. Read more about dynamic expressions in tool descriptions or see some of the most common expression examples in our Javascript Cookbook.
Setting execution instances
Another useful option that is available for configuration in theΒ hints
section is the definition of the computation instance used for app execution on the CGC. This is also done by defining key-value pairs as follows:
hints:
- class: sbg:AWSInstanceType
value: c4.8xlarge;ebs-gp2;2000
In this case, the workflow uses a c4.8xlarge
Β instance with 2000 GB of attached EBS storage. The value consists of the following three parts (separated by ;
):
- Instance type, e.g.Β
c4.8xlarge.
- Attached disk type: alwaysΒ
ebs-gp2
for all instances with EBS storage. - Disk size in GB.
See the list ofΒ AWS instances that are available for task execution on the CGC.Β
Pushing the optimized app configuration to the CGC
When you are done with changes to theΒ sb_nextflow_schema.yaml
Β file, push the optimized app configuration to the CGC. As we just making configuration changes to an app that has already been pushed to the CGC, this can be done using the regularΒ sbpack
command in the following format:
sbpack <profile-name> <app_id> <config_file>
In the command above,Β <profile-name>
refers to the CGC profile containing the CGC API endpoint and authentication token, as set in the Seven Bridges credentials file. The <app_id>
parameter specifies the identifier of the app on the CGC. Use the sameΒ Β <app_id>
you used in the initial conversion step. Finally, <config_file>
Β is theΒ sb_nextflow_schema.yaml
Β in which you made app execution optimizations. The final command should be, for example:
sbpack cgc-profile rfranklin/nextflow-project/test-app sb_nextflow_schema.yaml
This pushes the modified app configuration to the CGC and creates a new revision (version) of the app. Once this is done, you are ready to run a task using the app.
Copying Nextflow apps between projects on the CGC
When an app is on the CGC, you canΒ copyΒ it and use it on other CGC projects. To copy Nextflow apps between projects, use theΒ sbcopy
Β command that is a part of theΒ sbpack
Β utility:
sbcopy [--profile PROFILE] --appid APPID --projectid PROJECTID
The command takes the following arguments:
PROFILE
: refers to the CGC profile containing the CGC API endpoint and authentication token, as set in theΒ Seven Bridges credentials file.APPID
: specifies the identifier of the app on the CGC.Β Takes the formΒ{user}/{project}/{app_id}
. TheΒ{user}
Β part is your CGC username. TheΒ{project}
Β part is the source project where the app is located andΒ{app_id}
Β is the ID of the app you want to copy; for exampleΒrfranklin/my-new-project/my-nextflow-app
.PROJECTID
: is the identifier of the destination project where the app will be copied.Β Takes the form ofΒ{user}/{project}
.
The final command should be, for example:
sbcopy cgc-profile rfranklin/nextflow-project/test-app jsmith/my-nextflow-project
Note that Nextflow app copies made through standard visual interface or API methods instead of using
sbcopy
will still point to the originally pushed code package and the original project where it is located. This might cause failures due to lack of permissions, if users who need to run the copied instances of the app aren't added to the project where the original code package is located. To avoid this, please usesbcopy
to copy Nextflow apps between projects on the CGC, as described above.
Updating already converted and optimized apps
If you have already converted your app, made optimizations in the sb_nextflow_schema.yaml
file, and pushed the app to the CGC, all subsequent updates to the app's Nextflow code and the process of propagating the update to the CGC are quite straightforward. If the updates you made do not require changes to manually configured parameters in the sb_nextflow_schema.yaml
file (such as inputs, outputs, requirements, etc.), create a new code package by running a command in the following format:
sbpack_nf --profile PROFILE_NAME --appid APPID --workflow-path WORKFLOW_PATH --entrypoint ENTRYPOINT --sb-schema SB_SCHEMA
This command is almost the same as the initial app conversion step, but differs in the additional --sb-schema
argument. This argument allows you to provide and reuse an existing sb_nextflow_schema.yaml
configuration file where you have already made optimizations (configuration of inputs, outputs, requirements, etc.) for the execution of your app on the CGC. The command will generate a new code package based on your updated Nextflow code provided through --workflow-path
and --entrypoint
and the YAML or JSON configuration file provided through --sb-schema
, and push the updated app to the CGC creating a new revision (version).
sbpack_nf
Β command reference
sbpack_nf
Β command referenceHere is a list describing all available arguments od theΒ sbpack_nf
command that is used to convert and push Nextflow apps for execution on the CGC.
Argument | Required | Description |
---|---|---|
-h , --help | Shows the list of all arguments and their corresponding explanations. | |
--profile PROFILE | CGC profile containing the CGC API endpoint and authentication token, as set in the Seven Bridges credentials file. If you are using the default profile, this parameter can be omitted. | |
--appidΒ APPID | Required | The ID of the Nextflow app once it is pushed to the CGC. Takes the form {user}/{project}/{app_id} . TheΒ {user} Β part is your CGC username. TheΒ {project} Β part is the project to which you want to push the app andΒ {app_id} is the ID you want to assign to the app, for exampleΒ rfranklin/my-new-project/my-nextflow-app . |
--workflow-path WORKFLOW_PATH | Required | Path to the main workflow directory (the local directory where the app's files are located). |
--entrypoint ENTRYPOINT | Required | Relative path to the the file that contains the app's Nextflow code from the main workflow directory defined in --workflow-path . |
--sb-package-id SB_PACKAGE_ID | ID of an already uploaded package. If you have already converted and pushed the app to the CGC, it has its own code package ID, as shown in theΒ code_package key in the sb_nextflow_schema.yaml file. When the package ID is provided, the conversion script will skip the upload step and thus take less time to execute. | |
--sb-doc SB_DOC | Path to the app description document written in Markdown. The document is meant to provide additional details about the app and will be shown when viewing app details on the CGC. If not provided, README.md will be used if available in the same directory where entrypoint file is located. | |
--sb-schema SB_SCHEMA | Path to an existing sb_nextflow_schema file in JSON or YAML format. This allows you to use an existing configuration file where you have already made optimizations (configuration of inputs, outputs, requirements, etc.) for the execution of your app on the CGC. | |
--dump-sb-app | Dumps the converted app to a local file without pushing it to the CGC. Using this option will enable you to convert the app and generate theΒ sb_nextflow_schema.yaml Β file to make configuration optimizations prior to pushing the app the CGC. | |
--no-package | Doesn't push the app's code package to the CGC, but references a git repository containing the app's code instead.Β Git repository address is specified as the git_pull key in the sb_nexftlow_schema.yaml file, instead ofΒ code_package . For example:Β git_pull: https://git.domain.com/repository . The value for theΒ git_pull key is the URL you would normally use to clone the repository to your local environment. | |
--json | Creates the sb_nextflow_schema file in JSON format instead of the default YAML. |
Important notes for executing Nextflow apps on the CGC
- Workflows are executed in Local mode. Make sure your workflow can run in Local mode before porting it to the CGC.
- Use of Docker is required. See how to create and upload a Docker image containing your app and make sure to edit the Nextflow code to use the newly created image. If the Docker image is not specified for a process, a default alpine image will be used.
- Default Docker image is enforced using the Nextflow executor
-with-docker parameter
. Note that this is a setting that canβt be changed and it will override thedocker.container
value innextflow.config
. As an example of good general practice, we advise specifying a container for each individual process. If a workflow does have the image specified innextflow.config
, thenextflow.config
file can be slightly modified, so that the setting does not get ignored on the CGC. For example:
process.container = 'nfcore/atacseq:1.2.1'
needs to be replaced with:
process {
withName: '!foo' { container = 'nfcore/atacseq:1.2.1'}
}
- Execution is done in a dedicated working directory and all the Nextflow work is done in the
work
directory inside the working directory. Avoid using and relying on hard-coded paths in workflows. - If you need to explicitly enable DSL version 1, add
nextflow.enable.dsl=1
at the beginning of your application's code or follow these instructions from the official Nextflow documentation.
Differences between running CWL and Nextflow apps on the CGC
Executions on the CGC normally result in separate jobs (steps) being created for each tool in the workflow. When a Nextflow pipeline is executed on the CGC, a single executionΒ job will be created, regardless of the number of tools within the pipeline. This "one app, one job" approach results in the following differences compared to CWL app executions:
- Memoization is not available for Nextflow apps. As memoization relies on using previous job outputs to skip identical jobs in new executions, it is not useful in situations when there is only one job in an app execution.
- Handling ofΒ spot instanceΒ interruptionΒ can't be used withΒ Nextflow apps. If a spot instance gets terminated when a Nextflow app is running, the app would have to be rerun on an on-demand instance from the beginning, which would result in increased costs instead of savings.
- Task stats and logsΒ are organized differently. As task statistics and logs are usually shown and organized per job, Nextflow apps will have cumulative stats for the single job in the execution, while logs will be saved in a single folder.
Updated over 1 year ago