Reproducibility with AWS – NGS2015

Leigh Sheneman, PhD student at MSU in CS. Evolution and learning with digital organisms, applying to real world organisms in the future!


Start EC2, medium-sized m3.medium is fine. Log in, update and install stuff. We need the packages for the software we will run with the eel-pond protocols:

Discussion about interesting package name,

apt-get -y install libncurses5-dev

Titus: text window graphics from 70s games, likely needed for samtools tview? Everyone: Ahhhhh (understanding)

We’re going to make public AMI, for times if we wanted to share and distribute to colleagues for collaboration.

Go to EC2 console.




Efficient way to capture OS and software, can terminate instance and keep AMI and only get charged fraction of cost (about $0.10 per month/GB) rather than keeping instance running. Snapshot is for volume of data rather than image, which is OS filesystem.

Change permissions so you are not the only owner. Since we want to make public.



Takes some time to make this public. So, wait a bit before sharing AMI-ID.

Important, this image was created in the ‘N. Virginia’ region. This image is only visible in the ‘N. Virginia’ region. There are other ways to share between regions.

Class discussion about costs for hosting images and sharing images associated with publications. Who pays? If reviewers of papers will need the images, how does that work? It is easy to share data and software associated with analyses for studies. We can provide all the instructions and data and software we want. But no one has figured out a realistic and sustainable management framework for computing resources for scientific studies. Reproducibility is of concern, but there are no incentives for scientists to provide data and transparent analyses via methods like AWS AMI to demonstrate reproducibility. If this were required for publication, there would likely be more funding resources available and everyone would do this instead of a select few. Now, people can provide stuff like this, but who is really going out and checking other peoples’ data and code and software, besides reviewers and few colleagues?

Create Volume

Make sure the availability zone (e.g. us-east-1e) matches the instance. If not, pull down menu and select:




Then attach a new 100GB volume to instance. Log out of ssh, log back in. Run mount commands to format disk :


In the above list /dev/xvda1 is system disk, we attached /dev/xvdf

See elastic cloud computing manual for Amazon Web Services: AMI, Volume, Snapshot, and Instances.

If creating an image for someone else, you would do the above where we took an image of an OS and a snapshot of a volume.

Now, (power pose) we will load someone else’s snapshot (it’s really our snapshot, but same idea). First, we have to Launch an AMI instance, m3.medium is fine:


Then, create a volume from the snapshot to add to the running instance.


The volume is available to attach:


Under “Actions”, attach volume and select the running instance (should pop up once you start typing).

Log in, then mount volume (do not format new volume because this contains the data!), and it is there!!


Creating a bucket to share files, S3

If you wanted to host files for others to download, $0.10/GB per month.


Then, you can get the link for people to download:

curl -O

About Lisa Johnson

PhD candidate at UC Davis in Molecular, Cellular, and Integrative Physiology
This entry was posted in Genomics Workshop, reproducibility, workshops. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s