About app input and output types

Overview

One of the key factors to successfully wrapping apps for use on the CGC is proper understanding and configuration of app inputs and outputs and their types. An optimal setup of app inputs and outputs will make the app easier and faster to run and easier to use in an automated scenario using the API or one of the API libraries.

Available types of app inputs and outputs on the CGC correspond to CWL types, but also apply to converted Nextflow and WDL apps and are classified as follows:

  • Primitive types:
    • null
    • boolean
    • int
    • long
    • float
    • double
    • string
  • Special types:
    • Any
    • File
    • Directory
  • Complex types:
    • record
    • array

Primitive types

These types correspond to their counterpart data types in most well-known programming languages. The following table explains each of the types:

TypeDescription
nullNo value
booleanA binary value
int32-bit signed integer
long64-bit signed integer
floatDingle precision (32-bit) IEEE 754 floating-point number
doubleDouble precision (64-bit) IEEE 754 floating-point number
stringUnicode character sequence

Here is an example of how these would be configured when defining an input schema for an app:

inputs:
  - id: use_index_file
    type: boolean
    inputBinding:
      position: 1
      prefix: -f
  - id: output_file_name
    type: string
    inputBinding:
      position: 3
      prefix: -o
  - id: threads
    type: int
    inputBinding
      position: 2
      prefix: -t
  - id: index_file
    type: File?
    inputBinding:
      prefix: -i
      position: 4

The code above represents a section of a YAML file where different app inputs are defined. This will be a part of the complete app description written in CWL, or a part of the configuration file when optimizing Nextflow or WDL apps for use on the Platform. For details about parameters available under inputBinding please refer to CWL documentation.

Primitive types can also be defined for app outputs:

outputs:
  - id: id_tumor
    outputSource:
      - gatk_collectreadcounts_tumor/entity_id
    type: int
  - id: entity_id_normal
    outputSource:
      - gatk_collectreadcounts_normal/entity_id
    type: string?

Special types

Available special input types are: Any,  File and Directory.

Any

The Any type validates for any non-null value.

File

File is one of the most common input and output types in bioinformatics analyses.

File inputs

File inputs have a number of properties that provide metadata about the file and here are some of the most important ones:

  • path: The local path to the file within the execution environment. When running tasks through the CGC API, set path as the platform File ID.
  • basename: The name of the file without any leading directory path. 
  • secondaryFiles: A list of additional files or directories that are associated with the primary file and must be transferred alongside the primary file.

Here is an example of a File input inside a YAML file containing the app description:

inputs:
  - id: bam_file
    type: 'File?'
    secondaryFiles:
      - .bai
    inputBinding:
      prefix: - -file=
      separate: false
      position: 1

For more details about file inputs and available properties, please refer to CWL documentation.

File outputs

Files are also the most common app output type. File outputs can be produced in two ways:

  1. By getting the output file directly from an output port in a tool (node) in a workflow, by specifying outputSource in the <node_id>/<output_id> format: 
outputs:
  - id: out_tumor
    outputSource: sbg_group_outputs_tumor/out_file
    type: File
  1. By using a glob expression that matches the needed files in a tool:
outputs:
 - id: out_archive
    outputBinding:
	   glob: 'samples.tar.gz'
    type: File

File outputs can also be configured as arrays.

Directory

This input and output type represents a directory that is passed to the app or provided as an output of the app.

Directory inputs

When used as an input type, it has several properties that provide additional data about the input:

  • path: The local path to the directory prior to app execution.
  • basename: The base name of the directory, without any leading directory path.
  • listing: List of files or subdirectories contained in this directory.

Here is an example of a directory input inside a YAML file containing the app description:

inputs:
  - id: samples
    type: Directory
    basename: "samples"
    inputBinding:
      prefix: - -samples=
      separate: false
      position: 2

Directory outputs

To configure a directory output, use the following syntax:

  1. If you are getting the output directory from an output port in a tool (node) in a workflow, specify outputSource in the <node_id>/<output_id> format: 
outputs:
  - id: normalized_samples
    type: Directory
    outputSource: sbg_group_outputs_tumor/out_dir
  1. If you are configuring a directory output in a command-line tool, use a glob expression that matches the needed files in a tool:
outputs:
  - id: normalized_samples
    type: Directory
    outputBinding:
	   glob: 'samples'

For more details about file inputs and available properties, please refer to CWL documentation.

Complex types

Array

Arrays are used to provide multiple values in a single input or output parameter and contain values that belong to the primitive types.

Array inputs

To define an input array in an app description, use one of the following two options:

  1. Define an input whose type is array, then define data types that can be present in the array.
inputs:
  - id: samples
    type:
      type: array
      items: File
      inputBinding:
        prefix: -F
    inputBinding:
      position: 2
  1. Add brackets[] after the type name to indicate that input parameter is array of that type:
inputs:
  - id: sample_ids
    type: string[]
    inputBinding:
      prefix: -C=
      itemSeparator: ","
      separate: false
      position: 4

Array outputs

To create an output that will produce an array of values, use the following syntax:

outputs:
  output:
    type:
      type: array
      items: File
    outputBinding:
      glob: '*.txt'

This specific output will return all files that match the *.txt glob.

Record

Record inputs

Records are complex input types that are used to combine multiple arguments (fields) in a single input. They are useful when additional information needs to be passed along with primary data in an input. Here's how records are defined in an app description:

inputs
  - id: record_input
    type:
      - 'null'
      - type: record
        fields:
          file:
            type: File
            inputBinding:
              prefix: -f
           sample_id:
            type: string
            inputBinding:
              prefix: -s
        name: record_input
    inputBinding:
      position: 0

Enum

Enum inputs

Enum consists of a set of predefined values (symbols). When used as an input, an enum is defined as follows:

inputs:
  - id: format
    type:
      - 'null'
      - type: enum
        symbols:
          - bam
          - sam
          - bam_mapped
          - sam_mapped
          - fastq
    inputBinding:
      position: 2
      prefix: '- -format'

Input optionality

Inputs can either be required or optional. When an input is required, a value must be provided to it in order for the app to be executed. On the other hand, an optional input may not have a value (more precisely, the value of the input is null) but still allow the app to be executed normally. There are two ways to define an input as optional in the app description:

  1. By adding a question mark next to the type definition:
inputs:
  - id: threads
    type: int?
    inputBinding
      position: 2
      prefix: -t

In this example, type is defined as int?, meaning that the input type is integer, while the question mark defines the input as optional.

  1. By also adding null to the actual type of the input, as shown in the example below:
inputs:
  - id: threads
    type:
      - int
      - 'null'
    inputBinding
      position: 2
      prefix: -t

In this example, the type key contains an array of two values, where the first one is int defining the actual type of the input, and the second one is 'null' which specifies that the input is optional.