Input file options

Stage input

The input staging functionality allows you to make a tool's input files available in the tool's working directory.

Files that are named as inputs to a tool are not, by default, in the tool's working directory. In most workflows this access is sufficient, since most tools only need to read their input files, process the data contained in them, and write new output files on the basis of this data to their working directory. However, in some cases a tool might require input files to be in its working directory, and this is where the stage input option is used.

Staging an input in a CWL v1.x app

To stage an input (make files provided through the input port available directly in the tool's working directory), in a CWL v1.x app, you need to use the File requirements option, as specified below:

  1. Open a tool in the tool editor.
  2. Scroll down to the File Requirements section.
  3. Click Add and select File. An item is added to File Requirements and a new menu appears on the right, with Writable, File Name and File Content input fields.
  4. Click </> next to the File Content input field.
  5. Type the following expression: $(inputs.<input_id>), where <input_id> is the ID of the input port you want to stage. You can also insert the inputs.<input_id> structure within brackets by expanding the inputs object on the far right in the expression editor, and double-clicking the desired input port. The input type needs to be File, Directory or an array of Files or Directories.
  6. Optionally, you can specify whether to Copy or Link the input files. This is done by setting the Writable toggle to Yes for Copy and No for Link (See the difference between Copy and Link below).
  7. Optionally, if the File Content expression returns a File or a Directory, a different name for the staged input can be specified in the File Name field. Note that if the staged input is an array of Files or Directories, the File Name field is ignored.
  8. Click Save. Staging is now configured and the input file(s) or directory(s) provided on this input port during execution will be available in the tool's working directory.

Staging an input in a CWL sbg:draft-2 app

Stage Input allows you to modify the way files appear in a tool's working directory in two ways:

  • Copy files that are input to a tool to that tool's working directory. This makes the files directly available in the working directory. Copying takes longer and takes up more disk space than Linking, so this option is most often used when the tool needs to modify the input file. Otherwise, the Link option is more convenient.
  • Link the files input to a tool to that tool's working directory, using a hard link. This option is used in the following circumstances:
    • To pass the files through the tool, and report them as the tool's outputs, without actually modifying the files.
    • To simplify the relative path of the input files to the tool. If you need to include the file paths of a tool's input files as arguments passed to the tool, the path will be simpler if there is a link from the input files to the tool's working directory.
    • To make files available in the working directory in case the tool expects them to be there.

Please note that while a tool on the CGC cannot create output files outside its own working directory, it may write other files – such as config files – outside the working directory. This means that you don't need to modify a tool that is configured to write files to a different directory, as long as you do not wish to treat those files as outputs.

Files that are copied or linked using the Stage Input option will not be produced as the tool's outputs unless output port(s) are created specifically for them.

Consider the following diagram:

580

In this example, files are input to Tool B from Tool A. If Tool B needs the input files to be available in its working directory, you need to use the Stage Input setting on the input_file input port on Tool B.

To copy the input files from Tool A to the working directory of Tool B, in order to modify them:

  1. Open Tool B in the Tool Editor.
  2. On the Inputs tab, click the tool description for the inputs port input_file.
  3. Click Stage Input and select Copy from the dropdown menu,

To create symbolic links for the input files into the working directory of Tool B, in order to pass them through, along with other output files but not modify them:

  1. Open Tool B in the Tool Editor.
  2. On the Inputs tab, click the tool description for the inputs port input_file.
  3. Click Stage Input and select Link from the dropdown menu,

Since the Link option only results in creation of a symbolic link, it is generally faster than Copy. However, Copy might be required in certain cases. Besides writing data to files produced by other tools, you also need to select Copy when the tool you are configuring needs to add the output files of the previous tool in the workflow to an archive. In order to be able to archive the files, you need actual copies of the files in the tool's working directory.

To see how the Stage Input option is used in an actual workflow on the CGC:
VarScan2 Workflow from BAM is a public workflow that contains a tool named SAMtools Index FASTA. This tool indexes a sequence in FASTA format and outputs a FAI index file. It is also able to output the FASTA file that has been provided as its input.

543

To output the FASTA file that is used as the SAMTools Index FASTA's input file, configure the input_fasta_file input port as Stage Input > Link. This creates a symbolic link to the input FASTA file in the working directory of SAMtools Index FASTA and the tool passes the FASTA file as its output, along with the generated index file.

Load Content

In the context of an input port for files, the $self object is a JavaScript object referring to the file objects, which can be used in the Value field (see the documentation on dynamic expressions in tool descriptions for details).

Checking the box marked Load Content on the inputs tab adds an additional property to $self, namely contents, which refers to the first 64 KB of the file contents.

Secondary files

This field is present if the input type is File or Array of files. It allows you to define the extension to be appended to the name of the input file to get the name of the secondary file that is to be loaded automatically along with the input file on this input port. For example, if the input file is input.bam and you enter .bai in the Secondary File field, the resulting file path will be input.bam.bai. If you want to remove an extension from the input file name before adding the secondary file extension, add a caret ^ for each extension you want to remove. For example, if the input file is input.bam and you enter ^.bai in the Secondary File field, then the secondary file path will be input.bai.

If you are using CWL version sbg:draft-2, a secondary file can be defined for an input port only if the Include in command line switch is set to Yes. If you do not want to include this input port in the command line, set the Value field to return no value by populating it with an expression such as:

{
    return "";
}