About app input and output types
Overview
One of the key factors to successfully wrapping apps for use on the CGC is proper understanding and configuration of app inputs and outputs and their types. An optimal setup of app inputs and outputs will make the app easier and faster to run and easier to use in an automated scenario using the API or one of the API libraries.
Available types of app inputs and outputs on the CGC correspond to CWL types, but also apply to converted Nextflow and WDL apps and are classified as follows:
- Primitive types:
null
boolean
int
long
float
double
string
- Special types:
Any
File
Directory
- Complex types:
record
array
Primitive types
These types correspond to their counterpart data types in most well-known programming languages. The following table explains each of the types:
Type | Description |
---|---|
null | No value |
boolean | A binary value |
int | 32-bit signed integer |
long | 64-bit signed integer |
float | Dingle precision (32-bit) IEEE 754 floating-point number |
double | Double precision (64-bit) IEEE 754 floating-point number |
string | Unicode character sequence |
Here is an example of how these would be configured when defining an input schema for an app:
inputs:
- id: use_index_file
type: boolean
inputBinding:
position: 1
prefix: -f
- id: output_file_name
type: string
inputBinding:
position: 3
prefix: -o
- id: threads
type: int
inputBinding
position: 2
prefix: -t
- id: index_file
type: File?
inputBinding:
prefix: -i
position: 4
The code above represents a section of a YAML file where different app inputs are defined. This will be a part of the complete app description written in CWL, or a part of the configuration file when optimizing Nextflow or WDL apps for use on the Platform. For details about parameters available under inputBinding
please refer to CWL documentation.
Primitive types can also be defined for app outputs:
outputs:
- id: id_tumor
outputSource:
- gatk_collectreadcounts_tumor/entity_id
type: int
- id: entity_id_normal
outputSource:
- gatk_collectreadcounts_normal/entity_id
type: string?
Special types
Available special input types are: Any
, File
and Directory
.
Any
The Any type validates for any non-null value.
File
File is one of the most common input and output types in bioinformatics analyses.
File inputs
File inputs have a number of properties that provide metadata about the file and here are some of the most important ones:
path
: The local path to the file within the execution environment. When running tasks through the CGC API, set path as the platform File ID.basename
: The name of the file without any leading directory path.secondaryFiles
: A list of additional files or directories that are associated with the primary file and must be transferred alongside the primary file.
Here is an example of a File input inside a YAML file containing the app description:
inputs:
- id: bam_file
type: 'File?'
secondaryFiles:
- .bai
inputBinding:
prefix: - -file=
separate: false
position: 1
For more details about file inputs and available properties, please refer to CWL documentation.
File outputs
Files are also the most common app output type. File outputs can be produced in two ways:
- By getting the output file directly from an output port in a tool (node) in a workflow, by specifying
outputSource
in the<node_id>/<output_id>
format:
outputs:
- id: out_tumor
outputSource: sbg_group_outputs_tumor/out_file
type: File
- By using a glob expression that matches the needed files in a tool:
outputs:
- id: out_archive
outputBinding:
glob: 'samples.tar.gz'
type: File
File outputs can also be configured as arrays.
Directory
This input and output type represents a directory that is passed to the app or provided as an output of the app.
Directory inputs
When used as an input type, it has several properties that provide additional data about the input:
path
: The local path to the directory prior to app execution.basename
: The base name of the directory, without any leading directory path.listing
: List of files or subdirectories contained in this directory.
Here is an example of a directory input inside a YAML file containing the app description:
inputs:
- id: samples
type: Directory
basename: "samples"
inputBinding:
prefix: - -samples=
separate: false
position: 2
Directory outputs
To configure a directory output, use the following syntax:
- If you are getting the output directory from an output port in a tool (node) in a workflow, specify
outputSource
in the<node_id>/<output_id>
format:
outputs:
- id: normalized_samples
type: Directory
outputSource: sbg_group_outputs_tumor/out_dir
- If you are configuring a directory output in a command-line tool, use a glob expression that matches the needed files in a tool:
outputs:
- id: normalized_samples
type: Directory
outputBinding:
glob: 'samples'
For more details about file inputs and available properties, please refer to CWL documentation.
Complex types
Array
Arrays are used to provide multiple values in a single input or output parameter and contain values that belong to the primitive types.
Array inputs
To define an input array in an app description, use one of the following two options:
- Define an input whose type is
array
, then define data types that can be present in the array.
inputs:
- id: samples
type:
type: array
items: File
inputBinding:
prefix: -F
inputBinding:
position: 2
- Add brackets
[]
after the type name to indicate that input parameter is array of that type:
inputs:
- id: sample_ids
type: string[]
inputBinding:
prefix: -C=
itemSeparator: ","
separate: false
position: 4
Array outputs
To create an output that will produce an array of values, use the following syntax:
outputs:
output:
type:
type: array
items: File
outputBinding:
glob: '*.txt'
This specific output will return all files that match the *.txt
glob.
Record
Record inputs
Records are complex input types that are used to combine multiple arguments (fields) in a single input. They are useful when additional information needs to be passed along with primary data in an input. Here's how records are defined in an app description:
inputs
- id: record_input
type:
- 'null'
- type: record
fields:
file:
type: File
inputBinding:
prefix: -f
sample_id:
type: string
inputBinding:
prefix: -s
name: record_input
inputBinding:
position: 0
Enum
Enum inputs
Enum consists of a set of predefined values (symbols). When used as an input, an enum is defined as follows:
inputs:
- id: format
type:
- 'null'
- type: enum
symbols:
- bam
- sam
- bam_mapped
- sam_mapped
- fastq
inputBinding:
position: 2
prefix: '- -format'
Input optionality
Inputs can either be required or optional. When an input is required, a value must be provided to it in order for the app to be executed. On the other hand, an optional input may not have a value (more precisely, the value of the input is null) but still allow the app to be executed normally. There are two ways to define an input as optional in the app description:
- By adding a question mark next to the type definition:
inputs:
- id: threads
type: int?
inputBinding
position: 2
prefix: -t
In this example, type
is defined as int?
, meaning that the input type is integer, while the question mark defines the input as optional.
- By also adding
null
to the actual type of the input, as shown in the example below:
inputs:
- id: threads
type:
- int
- 'null'
inputBinding
position: 2
prefix: -t
In this example, the type
key contains an array of two values, where the first one is int
defining the actual type of the input, and the second one is 'null'
which specifies that the input is optional.