File archiving overview

Storing large amounts of data in the cloud is a convenient way to have it available for computation. Instant availability is, however, not always our primary concern

Sometimes we have to deal with "cold" data – files that are not required for processing and have a very low chance of being accessed over a longer time period, but must nonetheless remain available for compliance with local and federal laws, best practice guidelines and internal processes – such as input and output files belonging to completed analyses.

Data that will not be used for some time (typically over three months) can be moved into archival storage. Archived files are billed at a significantly reduced price compared to the data which is always available. This makes archival a good solution for infrequently accessed files.

📘

The CGC currently offers Amazon Glacier as the archiving back-end. For up-to-date pricing information in the storage services that the CGC supports, please refer to the official pricing plans at Amazon Glacier.

Cost Savings

Depending on the storage service used, storing data in an archive typically costs around a third as much as storing data that is always available.

As with all other costs for user-uploaded data hosted on the Platform, the CGC passes the charges that we incur for archiving data directly to the customer without markup.

📘

In July 2015, Amazon's S3 in the US East region charged $0.0275-0.03 per gigabyte of standard storage data per month. A gigabyte of data stored in Amazon's archiving facility, Glacier, in the same region was billed at $0.007 per month. Keeping data in Glacier rather than S3 would yield monthly savings of approximately 60%.

In addition to data hosting charges, Amazon Glacier may charge additional archival, restoration or early deletion fees. If you incur these additional costs, then we will pass them on to you without markup.

However, if archival storage is accessed infrequently over a number of months, these charges should not be expected to affect the projected cost savings significantly.

Limitations of Archiving

Moving data to and from archival storage is not instantaneous. Depending on the type of archival storage used, it can take from several hours, up to a day or more to archive or restore large files.

When archived, files can not be used as inputs to the tasks, downloaded, visualized in the Genome Browser nor can their content be obtained in any way. Archived files must first be restored.