Metagenomics Classifier: Kraken

The disease detector.

Overview

One of the common analyses done on the newly sequenced genome is to assign taxonomic labels to short read sequences, usually obtained through meta-genomic studies.

This pipeline is based on nf-core/vipr pipeline.

Workflow

User Journey

As you can see, this pipeline also shares the same structure with the others; what you need to do to use it are a few simple steps:

  1. once you have selected your pipeline of interest, upload your fastq files;

  2. select the parameters;

  3. run the pipeline;

then, you will be redirected to the result page.

As you can see, the pipeline design is divided into two parts: a first one dedicated to the overview of the pipeline itself,where you can name your analysis and view the estimated cost for use, and a second part concerned about data and parameters.

Let's take a look at the parameters.

The first step is to load your Fastq Files, where the Short Reads are noted; then, upload your metadata file containing:

  • sample name;

  • run;

  • short reads;

  • long reads.

As for quality control, whether the user wants to do it, just switch on.

For Adapter Trimming, the program used will be skewer tool.

For Decontamination, hg19 human reference genome will be used for alignment, and the program involved in removing DNA reads is decont.

Kraken is the tool used for metagenomic classification of the reads.

For Assembly, Polishing & Alignment:

  • Tadpole is used for de novo assembly;

  • ViPR TOOLS is used as assembly polishing tool;

  • BWA maps back the assembled contigs to the reads;

  • lofreq used either as tool for low frequency variant calling and program for variant calling on realigned files;

  • SnpEff is used for variant annotation on VCF files.

Default Parameters Set

QC:

  • FastQC: switch on.

Adapter Trimming:

  • Program for Short Reads: skewer.

Decontamination:

  • Reference genome: hg19;

  • Program for decontamination: decont.

Metagenomic classification:

  • Classification tool: kraken.

Assembly, Polishing and Alignment:

  • Variant Calling Program: lofreq;

  • Variant Annotation Program: SnpEff.

Results

Once you have selected the dataset to be used, chosen the pipeline and set all the parameters, you can start your analysis using the Run Analysis box: here you can see the uploading percentage of your data; at this point you will be redirected to this page, where you can keep an eye on which works are In Progress, which are Completed, and choose to carry out a new analysis.

By clicking on your JobName, you will have access to this page, where you can monitor all the processes involved in your analysis:

Now, selecting the Results box on the right, let's take a look at the demo results obtained using the Default Parameters Set:

Sequence Counts

Sequence counts for each sample. Duplicate read counts are an estimate only.

GC Content (or guanine-cytosine content)

is the percentage of nitrogenous bases guanine (G) or cytosine (C) in a DNA or RNA molecule. This measure indicates the proportion of G and C bases out of an implied four total bases, considering:

  • adenine and thymine in DNA,

  • adenine and uracil in RNA.

This module measures the GC content across the whole length of each sequence in a file and compares it to the distribution of GC content in another file.

Hg19 Alignment Stats

This graph is obtained from decont tool.

Kraken Results

This graph is obtained by using Kraken, a new sequence classification tool whose main features are:

  • accuracy, comparable to the best sequence classification techniques;

  • speed far exceeds both classifiers and abundance estimation programs. This speed advantage derives from the use of exact-match database queries of k-mers, rather than inexact alignment of sequences.

Kraken is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies. It is used in analyzes where you want to detect the presence, even if minimal, of something.

By clicking on the Interactive Graphs option, you can also view your results like this:

Finally, using the Export box, you will be able to download the results of your analysis in a .pdf format file.

Reference

Last updated