Cloud storage providers come with their own interfaces, features and terminology. At a certain level, though, they all view resources as data objects organized in repositories. Authentication and operations are commonly defined on those objects and repositories, and while each cloud provider might call these things different names and apply different parameters to them, their basic behavior is the same.
The CGC mediates access to these repositories using volumes. A volume is associated with a particular cloud storage repository that you have enabled the CGC to read from (and, optionally, to write to). Currently, volumes may be created using two types of cloud storage repository: Amazon Web Services (AWS) S3 buckets and Google Cloud Storage (GCS) buckets.
A volume enables you to treat the cloud repository associated with it as external storage for the CGC. You can 'import' files from the volume to the CGC to use them as inputs for computation. Similarly, you can write files from the CGC to your cloud storage by 'exporting' them to your volume. Please note that export to a volume is available only via the API (including API client libraries), and through the Seven Bridges CLI.
The following diagram shows the relationship between a user's AWS repositories, bucket1 and bucket2, and a volume, from which you can import and export the contents of the buckets to a CGC project.
To create a volume on the CGC, you should provide the following information:
A volume name consists of 3-32 letters of the English alphabet, numbers, and underscores (_). Each of your volumes must have a unique name.
The access mode of a volume is either read-only (
RO) or read-write (
Each volume will have its access mode set by default to read-only.
Read-only volumes can only be used to make files available to the CGC for reading; for example, by importing files into projects. Read-write volumes' contents can be modified by operations performed with the Volumes API. Note that regardless of access mode, the owner of the volume is still the only CGC account that can issue operations on that volume. However, it is a good practice to avoid configuring your input buckets as read-write, as a safeguard against honest mistakes.
To access the cloud storage provider on your behalf, the CGC needs to know the type of storage provider it should talk to, and be configured to interface with it.
As each cloud storage provider supports slightly different features and authorization schemas, the configuration will be different for each storage type. Depending on your selected storage provider and your intended use of the volume, there might be additional properties you may want to set to further modify the way the CGC accesses it.
For more details on each supported storage type and its corresponding configuration, see the following pages:
A prefix is a string that can be set on a volume to refer to a specific location on a single bucket. This feature can be used to replicate the folder or grouping structure inside a bucket in your cloud storage account. As such, a prefix is useful if you have grouped files devoted to distinct projects within a single bucket.
For example, on Amazon S3, folders can also be assigned specific access permissions in the bucket policy itself, or an IAM user can be configured to have read-only or read-write access per each folder. When you assign your volume a prefix as well as a bucket, you limit the volume's operations within the specified folder on the bucket.
For example, if you set the
prefix for a volume to
"a10", and import a file with
location set to
"test.fastq" from the volume to the Platform, then the object that will be referred to by the newly-created alias will be
Once a volume has been configured, you can make its data objects available for computation on the CGC, and copy files to the volume from the CGC.
These operations can be carried out using the Volumes API. All calls are listed in the Volumes section of the API documentation.
Updated over 1 year ago