Upload a custom python program using a Dockerfile

Objective

This tutorial will explain how to create a Docker image containing two custom Python programs, using a Dockerfile to build the image. You will then learn how to push the image to the CGC image registry.

The scripts used in the tutorial are:

  • transcribe.py - Gets an input DNA file composed of nucleotides and transcribes it into RNA. The output file produced by the program is rna.txt.
  • translate.py - Takes mRNA as input and translates it into a peptide. The output file produced by the program is peptide.txt.

📘

Learn more about Dockerfiles.

Prerequisites

For this tutorial, you will need:

  1. An account on the CGC.
  2. One of the following machines:

🚧

On this page:

1. Create a project on the CGC

To create a project:

  1. Click Projects in the top navigation bar and choose Create a project.
  2. Name the project dna2protein.
480
  1. Click Create.
    You have now created the new project.

2. Build the image

You will first need to download the package containing the programs and the Dockerfile, and then create a Docker image called dna2protein containing the programs.

To do this:

  1. Click here to download the archive containing the programs and the Dockerfile.
  2. Open up a terminal.

📘

Depending on your operating system, first make sure that Docker is started:

  • Docker on Mac OS 10.10.3 Yosemite or newer run Docker for Mac and start a terminal of your choice.
  • Docker on Mac OS 10.8 Mountain Lion or newer run Docker Machine, by opening Docker Quickstart terminal or by using the command docker-machine start default.
  • Windows 7 or 8: run Docker Quickstart Terminal.
  • Windows 10: run Docker for Windows and start a terminal of your choice.
  • Linux: skip this step.
  1. Use the cd command to navigate to the folder where the downloaded archive is located, for example:
cd /home/rfranklin/Downloads
  1. Unzip the downloaded file by typing:
unzip dna2protein-tutorial.zip -d ../dna2protein
  1. Navigate to the folder containing the programs and the Dockerfile:
cd ../dna2protein/dna2protein-tutorial

You are now ready to build the image.

  1. Build the image from the Dockerfile:
docker build -t dna2protein .

The image is built based on the instructions in the Dockerfile, which is supplied in the downloaded archive.
When the build process is over, you will have a newly created image containing the Python programs.

  1. To test the programs, enter the following command which will run the image and open the bash terminal inside a container:
docker run -ti dna2protein bash

Then display the help messages:

transcribe.py -h

This command should return:

Translates a DNA input test into a RNA
 
positional arguments:
  dna            DNA input file to transcribe
 
optional arguments:
  -h, --help     show this help message and exit
  -v, --verbose
  --version      show program's version number and exit

You can do the same with translate.py:

translate.py -h

Which should return:

Transcribe the provided mRNA into a peptide.
 
positional arguments:
  mRNA           mRNA to transcribe
 
optional arguments:
  -h, --help     show this help message and exit
  --verbose, -v  Run in verbose mode (default: False)
  --version      show program's version number and exit

You have now successfully created and tested a Docker image containing the software.

  1. Enter the following command to leave the container:
exit

3. Push the image to the CGC image registry

As you now need to push the image to the CGC image registry, you have to specify the repository name for the image, according to the following naming convention: cgc-images.sbgenomics.com/<username>/<repository_name>:<tag>. Note that the <user_name> part needs to be your username on the CGC, while <repository_name> must be at least 3 characters long and can only contain lowercase letters, numbers, ., - and _. Learn more about repository names in the CGC image registry.

  1. Enter the following command to specify the repository name:
docker tag dna2protein cgc-images.sbgenomics.com/<username>/dna2protein:v0.5.4.dev

The repository name also includes the version of the software we are wrapping for use on the CGC, which is currently 0.5.4.dev.

  1. Now you need to log in to the CGC image registry (cgc-images.sbgenomics.com):
docker login cgc-images.sbgenomics.com

❗️

You should enter your authentication token in response to the password prompt, not your CGC password.

  1. Finally, push the image to the CGC image registry:
docker push cgc-images.sbgenomics.com/<username>/dna2protein:v0.5.4.dev

Once the process has been completed, use the Tool Editor to provide a description of the programs on the CGC.