Date: Feb 6, 2024
Author: Shirley Li, xue.li37@tufts.edu
Class ID: Sp24-IDGH-1001-1-Bioinformatics
Canvas link: https://canvas.tufts.edu/courses/55751
Introduction to Metagenomics - Session 2#
Learning Objective#
NCBI Database Proficiency: Develop skills to efficiently locate and interpret data on the NCBI database, including navigating to specific BioProject and SRA experiment pages.
Data Retrieval from Published Papers: Gain the ability to identify and extract relevant raw data and metadata from published scientific papers.
Metagenomic Sequencing Platforms: Learn about different sequencing platforms by analyzing their data characteristics, specifically focusing on Illumina and Nanopore technologies.
Taxonomic Analysis: Acquire practical experience in assigning taxonomic labels to sequencing reads using Kraken2 on Tufts Galaxy, and in converting and visualizing these labels with Krona.
Data Comparison and Interpretation: Enhance skills in comparing visualized data with NCBI SRA information and drawing conclusions about sample composition.
Exercise 2: NCBI Database Exploration#
Objective:#
Engage in a hands-on exercise to explore the NCBI database using specific SRA run IDs. Your task will involve navigating various sections of the database and applying your understanding of sequencing technologies to hypothetical research scenarios.
Instructions:#
Utilize the SRA run ID to search the NCBI website. Explore the corresponding SRA, BioSample, and BioProject sections related to this SRA run.
Assignment Completion: Choose one SRA run and document your findings in the provided Google Spreadsheet: Exercise Spreadsheet.
A screenshot of the spreadsheet
Questions for Analysis
a. Mars Soil Sample Analysis: If you obtained a soil sample from Mars for identifying microorganisms and assembling their genomes, which sequencing technology would be optimal? Consider factors like the detection of novel organisms and the precision required for genome assembly. Discuss your choice, focusing on read length, accuracy, and cost implications.
b. Gut Microbiome Study: In researching the impact of dietary changes on the gut microbiome, what type of sample would you collect, and which sequencing technology would be most suitable? Provide your rationale for this choice.
Additional Resources: Hints for these questions can be found here.
Exercise 3 Taxonomy assignment and interpretation.#
Objective:#
Use Kraken2 for taxonomy assignment and visualize the results with a Krona plot. Interpretate and present the result.
Instructions:#
[!NOTE] The tools we will use for this analysis are:
Download and Extract Reads in FASTA/Q
Kraken2
filter
sort
Log in to your Galaxy account.
Name the history as “Session 2 Metagenomics-ERR12302112” by double clicking the “Unnamed history”.
Now let’s start the analysis:
Under tools on the far left of the page, search for Download and Extract Reads in FASTA/Q format from NCBI SRA, run the tool with the following parameters:
- Accession: ERR12302112
- Click Execute
Kraken2 assign taxonomic labels to sequencing reads with the following parameters:
- Single or paired end: Single
- Input Sequences: the output from last step. Ex: 1.ERR12302112 (fastq-dump)
- Click Create Report, then set Print a report with aggregrate counts/clade to file to Yes
- Select a Kraken2 database: Minikraken2 v2
Note this step will create two output files
Filter data on any column using simple expressions with the following parameters:
- Filter: the report output from last step. Ex: Report: Kraken2 on data 1
- With following condition: c4==”S”
This will keep the rows whose fourth column has a character S, S stands for speciesSort data in ascending or descending order with the following parameters:
- Sort Dataset: the output file from filter. Ex: Filter on data 2
- with flavor: Numerical sort
- everything in: Descending order
Take a look at the output file, the first few lines should be like this:
In-class assignment:#
Divide into teams (either two or three teams). Each team should select one SRA run from the provided google spreadsheet. Then, replicate the previously outlined steps to identify the top three prevalent species. Research one or two of these species using Google, and compare your findings with the samples to check for coherence. Each team will be given five minutes to showcase their findings. An example report can be found here.
[!WARNING]
Warning: Ensure you generate a fresh history and assign a distinct name for the analysis.
Click the “+” button on the top right to create new history session.
Exercise 4 Taxonomy visualization.#
Objectives:#
The exercise aims to utilize Krona for creating interactive visualizations of taxonomic data, highlighting the tool’s effectiveness in representing complex hierarchical structures. It also involves a comparison with NCBI Kroa, assessing differences in visualization techniques and data representation.
Instructions:#
Switch back to the session “Session 2 Metagenomics-ERR12302112”.
Krakentools: Convert kraken report file to krona text file with the following parameters:
Kraken report file: The report output from Kraken2. Ex: Report: Kraken2 on data1
This will generate an output called “Krakentools: Convert kraken report file on data 2”
Visualize with Krona Visualize any hierarchical data with the following parameters:
Select input file: Krakentools: Convert kraken report file on data 2.
This will generate an output called “Krona on data 5: HTML”
Compare the Krona plot with it on NCBI SRA. Link is here.
Click Show Krona View
NCBI uses Sequence Taxonomic Analysis Tool (STAT), a scalable k-mer-based tool for fast assessment of taxonomic diversity intrinsic to submissions, independent of metadata.
Reference#
https://bisonnet.bucknell.edu/files/2021/05/Kraken2-Help-Sheet.pdf
https://microbiomejournal.biomedcentral.com/articles/10.1186/s40168-020-00900-2
https://jddtonline.info/index.php/jddt/article/view/5433
https://www.sciencedirect.com/science/article/pii/S094450132200194X?via%3Dihub
https://benlangmead.github.io/aws-indexes/k2