Workshop on Genomics (Week 2, Day 2) – Metagenomics

Interesting argument for the importance of social media. Rapid analysis of data for diagnosis of public health-related issues before publications/reports can be written/submitted. Also, public record of conversations and advice that might help people learning.

Dr. Nick Loman:

Metagenomics Lab 1 Exercise:

Using MetaPhlan (marker gene method) MEGAN (least common ancestor parameters)

Datasets from article:

Screenshot from 2014-01-21 14:06:12

Paired-end Illumina sequencing data, with sample ID and species names in data files.

ubuntu@ip-10-151-69-171:~/shotgun_metagenomics$ seqtk

Usage:   seqtk  
Version: 1.0-r32

Command: seq       common transformation of FASTA/Q
         comp      get the nucleotide composition of FASTA/Q
         sample    subsample sequences
         subseq    extract subsequences from FASTA/Q
         trimfq    trim FASTQ using the Phred algorithm

         hety      regional heterozygosity
         mutfa     point mutate FASTA at specified positions
         mergefa   merge two FASTA/Q files
         randbase  choose a random base from hets
         cutN      cut sequence at long N
         listhet   extract the position of each het

Usage: seqtk sample [-s seed=11]  |

Subsample 1000, 10000, 100000, 100000 sequences from dataset:

%seqtk sample -s 1234 2638-H-STEC_1_final.fastq.gz 1000 > 2638_1000_1.fastq
%seqtk sample -s 1234 2638-H-STEC_2_final.fastq.gz 1000 > 2638_1000_2.fastq
%cat 2638_1000_1.fastq 2638_1000_2.fastq > 2638_1000.fastq

ubuntu@ip-10-151-69-171:~/shotgun_metagenomics$ 2638_1000.fastq --bowtie2db ~/software/metaphlan/bowtie2db/mpa --bt2_ps sensitive-local --nproc 8
k__Bacteria     100.0
k__Bacteria|p__Proteobacteria   100.0
k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria    100.0
k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacteriales       100.0
k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacteriales|f__Enterobacteriaceae 100.0
k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacteriales|f__Enterobacteriaceae|g__Escherichia  100.0
k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacteriales|f__Enterobacteriaceae|g__Escherichia|s__Escherichia_unclassified 100.0

More reads change taxonomic assignment (presumably makes them more accurate):
Screenshot from 2014-01-21 15:10:11

MEGAN (Metagenomics Analysis tool)


This is a GUI software package available under the “Other” menu of the AWS virtual ubuntu instance.

Screenshot from 2014-01-21 16:18:39


About Lisa Cohen

PhD student at UC Davis.
This entry was posted in Bioinformatics, Genomics Workshop, software. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s