Tutorial Library

Bioinformatics Tutorial Library

Hands-on tutorials using real public datasets. Every tutorial runs end-to-end with copy-paste commands, actual output examples, and figures from the original analyses.

30 Tutorials Real NCBI/ENA Datasets HPC-Ready Scripts
Byte
Advertisement
Ad slot — 728×90 leaderboard — paste AdSense code here

Transcriptomics

📈
Bash R Intermediate
RNA-seq Analysis with a Genome

QC, HISAT2 splice-aware mapping, featureCounts read quantification, and differential expression with DESeq2.

Arabidopsis atrx-1 mutant — PRJNA348194
~120 min HISAT2, featureCounts, DESeq2
Start Tutorial
🌍
Bash R Intermediate
De Novo Transcriptomics: Trinity + DESeq2

Assemble a transcriptome without a reference genome using Trinity, quantify with Salmon, and run differential expression with DESeq2. For non-model organisms.

Coral bleaching RNA-seq — Matz et al. 2013 (PRJNA189030)
~150 min Trinity, Salmon, tximport, DESeq2
Start Tutorial
📉
R Intermediate
DESeq2 & edgeR: Differential Expression Deep Dive

Design matrices, Wald vs LRT tests, multi-factor paired designs, batch correction, edgeR QLF, GSEA with fgsea, and publication-quality volcano/heatmap figures.

Human airway smooth muscle — Himes et al. 2014 (PRJNA229998)
~120 min DESeq2, edgeR, limma, fgsea
Start Tutorial
📊
R Intermediate
WGCNA Gene Co-expression Networks

Build weighted gene correlation networks, detect co-expression modules, and relate modules to traits.

Maize Ligule Development (Johnston et al. 2014)
~90 min WGCNA, ggplot2
Start Tutorial
🧪
Python Intermediate
Single-Cell RNA-seq with Scanpy (Python)

QC, normalization, HVG selection, PCA, UMAP, Leiden clustering, marker gene identification, cell type annotation, doublet detection with Scrublet, and pseudo-bulk DE.

Human PBMCs 3k — 10x Genomics (healthy donor)
~120 min Scanpy, AnnData, Scrublet, pydeseq2
Start Tutorial
🧬
Bash R Advanced
Single-Cell RNA-seq: Cell Ranger + Seurat

10x Chromium data processing with Cell Ranger, then clustering, annotation and visualization with Seurat.

C. robusta tunicate — SRR8111691-4
~150 min Cell Ranger, Seurat
Start Tutorial

Chromatin & Epigenomics

🧪
Bash R Intermediate
ChIP-seq: Peak Calling & Motif Analysis

Bowtie2 alignment, Picard deduplication, MACS2 peak calling, IDR reproducibility, deepTools QC, HOMER motif discovery, ChIPseeker annotation, and DiffBind differential binding.

ENCODE CTCF ChIP-seq K562 — ENCSR000EGM
~150 min Bowtie2, MACS2, HOMER, deepTools, DiffBind
Start Tutorial
📒
Bash Intermediate
ATAC-seq Chromatin Accessibility

Bowtie2 alignment, organelle read removal, MACS2 peak calling, and bigWig generation for IGV visualization.

Arabidopsis BRM chromatin — PRJNA351855
~90 min Bowtie2, MACS2, samtools
Start Tutorial

Variant Calling, GWAS & Cancer Genomics

🧬
Bash Advanced
GATK Best Practices SNP Calling

Full GATK4 pipeline: BWA-MEM alignment, BQSR, HaplotypeCaller, joint genotyping, and VQSR filtering.

A. halleri populations — PRJEB18647
~180 min BWA, GATK4, samtools
Start Tutorial
🔍
Bash Python Intermediate
SnpEff & SnpSift: Variant Annotation & Filtering

Build custom databases for any organism, annotate VCFs with functional impact, understand HIGH/MODERATE/LOW tiers, fix common pitfalls, SnpSift filter expressions, and automated pipeline scripts.

Rice 3000 Genomes — PRJEB6180 (O. sativa RAP-DB)
~120 min SnpEff, SnpSift, bcftools
Start Tutorial
🏗
Bash Python Intermediate
Universal Variant Filtering Pipeline

VQSR for human WGS, hard filters for non-model organisms & small cohorts, RNA-seq variant filters, bcftools filtering for any caller, genotype-level filtering, automated Python pipeline, and Ti/Tv validation.

Works with any VCF — GATK, FreeBayes, DeepVariant, Strelka2
~100 min GATK4 VQSR, bcftools, vcftools
Start Tutorial
📊
Bash R Advanced
GWAS: Genome-Wide Association Study

PLINK2 QC, population stratification PCA, kinship matrix, GEMMA linear mixed model, Manhattan & QQ plots, LD clumping, and biomaRt gene annotation.

Arabidopsis 1001 Genomes flowering time — Atwell et al. 2010
~150 min PLINK2, GEMMA, qqman, biomaRt
Start Tutorial
🧬
Bash Python Advanced
Somatic Variant Calling: Mutect2 Tumor-Normal

GATK Mutect2 with matched normal, Panel of Normals, contamination estimation, FilterMutectCalls, VEP annotation, and COSMIC mutational signature analysis with SigProfiler.

TCGA-A8-A08B breast cancer WES — GDC Data Portal
~180 min GATK Mutect2, VEP, SigProfilerExtractor
Start Tutorial

Genome Assembly & Annotation

🏗
Bash Intermediate
Genome Assembly: k-mer Analysis to Assembly

GenomeScope k-mer profiling to estimate size and ploidy, then de novo assembly with SPAdes (prokaryote) or Canu (eukaryote).

Jellyfish k-mer + SPAdes/Canu workflows
~120 min Jellyfish, GenomeScope, SPAdes, Canu
Start Tutorial
🏷
Bash Advanced
Genome Annotation with MAKER2

Evidence-based gene prediction using MAKER2, training AUGUSTUS with BUSCO genes, and evaluating models with AED scores.

H. glycines soybean cyst nematode genome
~180 min MAKER2, AUGUSTUS, BUSCO, SNAP
Start Tutorial
🧱
Bash Intermediate
Secreted Protein Prediction: SignalP & TMHMM

Predict signal peptides, transmembrane domains, and subcellular localization using deep learning tools on nematode proteomes.

Plant-parasitic nematode proteome
~60 min SignalP 6.0, TMHMM, DeepLoc 2.0
Start Tutorial

Metagenomics & Microbiome

🌿
Bash Intermediate
Amplicon Metagenomics with QIIME2

16S amplicon workflow: DADA2 denoising, ASV table, taxonomy, alpha/beta diversity, and PERMANOVA statistical testing.

Arabidopsis root microbiome — PRJEB15671
~120 min QIIME2, DADA2, q2-diversity
Start Tutorial
🌍
Bash Python Advanced
MAG Binning: MetaBat2 + CheckM + GTDB-Tk

MEGAHIT co-assembly, Bowtie2 multi-sample coverage, MetaBat2 binning, CheckM quality, GTDB-Tk taxonomy, dRep dereplication, and Prokka gene annotation.

HMP human gut metagenome — PRJNA46333
~180 min MEGAHIT, MetaBat2, CheckM, GTDB-Tk, dRep
Start Tutorial
🦠
Bash Intermediate
Taxonomic Classification with Kraken2

Build a custom Kraken2 database, classify shotgun metagenomics reads, and visualize results with Pavian/Krona.

Nematode clade RNAseq — custom DB build
~90 min Kraken2, Bracken, Krona
Start Tutorial

Comparative Genomics & Evolution

🔗
Bash Advanced
Gene Orthology & Synteny Analysis

OrthoFinder for ortholog calling between multiple genomes, then i-ADHoRe for synteny block detection and circular synteny plots.

Nematode genomes (G. pallida, H. glycines, M. hapla)
~120 min OrthoFinder, i-ADHoRe, Opscan
Start Tutorial
🔬
Bash Advanced
Positive Selection with PAML / CodeML

OrthoFinder orthogroups to codon alignments, PAML CodeML site models, and likelihood ratio tests to identify positively selected genes.

E. coli + Shigella genomes (5 genomes)
~150 min OrthoFinder, PAML, pal2nal, clustalo
Start Tutorial
📈
R Intermediate Live Sheets
Linkage Map Construction

Build a genetic map from scratch with live interactive Excel-style spreadsheets showing every formula: recombination frequency, LOD score, Kosambi vs Haldane mapping functions, marker ordering, and linkage group assignment. Every calculation is transparent and editable in the browser.

Soybean RIL population — R/qtl built-in; ASMap package
~120 min R/qtl, ASMap, JoinMap
Start Tutorial
🏠
R Intermediate
QTL Mapping Across Populations

F2, RIL, NAM, and GWAS populations using R/qtl2 interval mapping, composite interval mapping, LOD threshold permutation, raw data walkthrough in Excel, GAPIT mixed model, multi-environment QTL, and MAS marker identification.

Hyper F2 mouse cross — Broman et al. (R/qtl2 package)
~150 min R/qtl2, GAPIT3, Excel
Start Tutorial
🔁
Bash Python R Advanced
Whole Genome Duplication & Polyploidy Analysis

WGD concepts explained, Ks molecular clock plots, Gaussian peak fitting, MCScanX synteny blocks, WGDI dot plots, subgenome phasing for allopolyploids, fractionation bias, and multi-species Ks comparison.

Arabidopsis thaliana TAIR10 & Brassica napus Darmor-bzh
~150 min wgd, MCScanX, WGDI, SubPhaser
Start Tutorial
🔬
R Intermediate
Label-Free Proteomics: MaxQuant & Perseus

MaxQuant LFQ database search, Perseus statistical workflow, MNAR imputation, limma moderated t-test, volcano plots, GO enrichment, and STRING protein interaction network.

UPS1 spike-in benchmark — Cox et al. 2014 (PRIDE PXD000279)
~120 min MaxQuant, Perseus, limma, STRINGdb
Start Tutorial

Phylogenetics Deep Dive (8-Module Series)

🌎
Bash Advanced
Module 1 — Substitution Models, ML & Bayesian MCMC

Site-heterogeneous models (C60, PMSF, CAT-GTR), PhyloBayes, MrBayes MCMC diagnostics, stepping-stone marginal likelihood, and model adequacy testing.

Opisthokonta 248-gene phylogenomics — Whelan et al. 2017 (PRJNA385600)
~120 min IQ-TREE2, PhyloBayes, MrBayes
Start Module 1
📈
Bash R Advanced
Module 2 — Gene/Species Trees, Networks & Topology Tests

Concordance factors, ASTRAL coalescent, phylogenetic networks (SNaQ, HyDe), AU/SH/KH topology tests, and long branch attraction mitigation.

Angiosperm plastome gene trees — Wickett et al. 2014 (PRJNA196530)
~120 min ASTRAL, IQ-TREE2, PhyloNet, SNaQ
Start Module 2
🕐
Bash Advanced
Module 3 — Divergence Time Estimation

Strict & relaxed molecular clocks, fossil calibration strategy, BEAST2 node dating, tip dating with fossilized birth-death, MCMCTree, and treePL.

Mammal timetree — dos Reis et al. 2012 (32 nuclear genes, 16 calibrations)
~150 min BEAST2, MCMCTree, treePL, Tracer
Start Module 3
🧬
R Bash Advanced
Module 4 — Ancestral States & Molecular Selection

Ancestral state reconstruction (Mk, BM, OU, BioGeoBEARS), ancestral sequence inference, dN/dS, PAML branch-site models, HyPhy BUSTED/MEME, and convergent evolution.

Passerine beak morphology — Cooney et al. 2017 (259 species phenome)
~120 min phytools, BioGeoBEARS, PAML, HyPhy
Start Module 4
📈
R Bash Advanced
Module 5 — Diversification & Macroevolution

Lineage diversification (BAMM, RPANDA, ClaDS), state-dependent diversification (BiSSE, HiSSE, SecSSE), trait diversification (QuaSSE), mass extinction detection (CoMET, PyRate), and adaptive radiation.

Cetacean body-size evolution — Slater et al. 2017 (87 species, BAMM landmark dataset)
~120 min BAMM, RPANDA, diversitree, PyRate
Start Module 5
🔗
Bash Advanced
Module 6 — Comparative Genomics & Synteny

Whole-genome alignment (minimap2, MUMmer, Cactus), synteny (MCScanX, GENESPACE), WGD Ks plots, CAFE5 gene family evolution, chromosomal rearrangements, and conserved regulatory elements (phastCons, phyloP).

Brassicaceae chromosome evolution — Brassica napus genome (GCA_000686985.2)
~150 min MUMmer, MCScanX, GENESPACE, CAFE5
Start Module 6
🌎
Bash R Advanced
Module 7 — Population Genomics & Phylogeography

PSMC/MSMC demographic inference, D-statistics introgression, Bayesian phylogeography (BEAST2), species delimitation (BPP, mPTP), phylogenetic comparative methods, co-phylogenetics, and ancient DNA (mapDamage).

Brown bear phylogeography — Barlow et al. 2018 (PRJEB22764, 64 genomes)
~150 min PSMC, BEAST2, BPP, mPTP, mapDamage
Start Module 7
🚀
Bash R Advanced
Module 8 — Workflows, Simulation & Tree Visualization

Snakemake/Nextflow phylogenomics pipelines, RevBayes custom models, simulation for validation (Seq-Gen, SimPhy, SLiM, msprime), and publication-quality visualization (ggtree, iTOL, toytree).

1KP thousand-plant transcriptome — Wickett et al. 2014 (workflow benchmark dataset)
~120 min Snakemake, RevBayes, ggtree, msprime
Start Module 8

Population Genomics

🧬
Bash R Intermediate
VCF Filtering & Population Stats

Filter multi-sample VCFs with VCFtools (missingness, MAF, LD), compute basic stats (Tajima's D, pi, Fst), and run PCA with PLINK for quick population structure overview.

Arabidopsis thaliana 1001 genomes — PRJNA273563
~90 min VCFtools, PLINK 2.0, bcftools
Start Tutorial
🌎
Bash R Intermediate
Population Structure with ADMIXTURE

LD pruning, cross-validation to choose K, ADMIXTURE ancestry proportion inference, and publication-quality structure bar plots with ggplot2.

Human HGDP panel — Bergström et al. 2020 (929 individuals, 52 populations)
~90 min ADMIXTURE, PLINK 2.0, ggplot2
Start Tutorial
🕐
Bash Python Advanced
Demographic History: PSMC & SMC++

Reconstruct historical effective population size (Ne) trajectories from individual genome sequences using PSMC (pairwise) and SMC++ (multi-sample, more recent history).

Snow leopard whole-genome — Prum et al. / PRJNA304161
~120 min PSMC, SMC++, bcftools, samtools
Start Tutorial
🔬
Bash R Advanced
Fst & Selective Sweep Detection

Per-site and windowed Fst, Tajima's D, Pi, neutrality tests, composite likelihood ratio sweeps (SweeD), and extended haplotype homozygosity (XP-EHH, iHS) with rehh.

Maize domestication sweeps — Hufford et al. 2012 (PRJNA178613)
~120 min VCFtools, SweeD, rehh, ggplot2
Start Tutorial
👥
Bash R Advanced
Introgression: ABBA-BABA & D-statistics

Patterson's D-statistics, f-branch introgression fractions (f3, f4), ADMIXTOOLS2 graph fitting, and fd-statistics for detecting gene flow between populations.

Neanderthal introgression — Meyer et al. 2012 / modern human SGDP panel
~120 min ADMIXTOOLS2, Dsuite, R
Start Tutorial
📈
Bash R Intermediate
Population Graph & Migration with TreeMix

Build maximum-likelihood population trees with migration edges, determine optimal number of migration events, and visualize population graphs with residual plots.

Human HGDP populations — Pickrell & Pritchard 2012 (TreeMix original paper dataset)
~90 min TreeMix, PLINK 2.0, R optM
Start Tutorial

Advanced Genetics & Epigenomics

🌎
Bash R Advanced Live Sheets
Population Genetics: Fst, π, Tajima's D

Per-site and windowed Fst, nucleotide diversity (π), Tajima's D, AMOVA, and selection scans (iHS, XP-EHH). Every formula shown in a live interactive spreadsheet. Covers vcftools, PLINK, and rehh pipelines.

Rice 3K diversity panel — 3K RGP; Chr04 selection sweep region
~120 min vcftools, PLINK, selscan, rehh, R
Start Tutorial
🧲
Bash R Intermediate Live Sheets
Haplotype Phasing: SHAPEIT4 & WhatsHap

Statistical phasing (SHAPEIT4), read-backed phasing (WhatsHap), long-read HiFiasm phasing, haplotype block detection, IDR phase quality evaluation, and breeding applications. D-prime LD calculator included.

Rice 3K chr04 multi-sample VCF; PacBio HiFi example
~120 min SHAPEIT4, WhatsHap, HiFiasm, PLINK2
Start Tutorial
📈
Bash R Advanced Live Sheets
CNV Detection from WGS

Read-depth (CNVnator), split-read/PE (Manta), and cohort-level (SMOOVE/LUMPY) CNV calling. Depth normalisation, GC bias correction, caller merging, duphold genotyping, and genome-wide visualisation. Live depth-ratio calculator included.

WGS 30x diploid sample; multi-sample rice cohort
~120 min CNVnator, Manta, SMOOVE, duphold, R
Start Tutorial
🏵
Bash R Advanced Live Sheets
Advanced GWAS: Mixed Models & Fine-mapping

LD pruning, population stratification PCA, genomic inflation λ, GEMMA LMM, GAPIT3 MLM, Bonferroni vs FDR thresholds (live calculator), Manhattan and QQ plots, SuSiE credible sets, and multi-trait mvLMM.

Rice 3K panel — IRRI 3K RGP genotypes + agronomic traits
~150 min PLINK2, GEMMA, GAPIT3, susieR, R
Start Tutorial
🧪
Bash R Advanced Live Sheets
Epigenomics: ATAC-seq & ChIP-seq

Full ATAC-seq and ChIP-seq pipeline: Tn5 shift, fragment-size QC, TSS enrichment, MACS3 peak calling, IDR reproducibility, HOMER/MEME motif analysis, DESeq2 differential accessibility, and deepTools heatmaps. Live FRiP QC calculator.

ENCODE ATAC-seq; ENCODE H3K27ac ChIP-seq
~150 min MACS3, HOMER, MEME, deepTools, DESeq2
Start Tutorial

Nature Methods & Protocols 2024–2025

🧬
Python Bash Intermediate Nature Methods 2024
FastOMA: Orthology Inference at Scale

Infer orthologs and hierarchical orthologous groups (HOGs) across 1,000+ species in hours. Uses pyHam for HOG analysis and single-copy orthologs for species tree inference with IQ-TREE.

UniProt plant proteomes (Arabidopsis, Rice, Maize)
~40 min FastOMA, pyHam, MAFFT, IQ-TREE2
Start Tutorial
🤖
Python Advanced Nature Methods 2024
scGPT: Foundation Model for Single-Cell Multi-Omics

GPT-style transformer pre-trained on 33M single-cell profiles. Zero-shot cell annotation, multi-batch integration, gene regulatory network inference from attention weights, and perturbation response prediction.

PBMC 3k + CELLxGENE pre-trained weights
~60 min scGPT, Scanpy, PyTorch, HuggingFace
Start Tutorial
🌏
Python Intermediate Nature Methods 2024
SpatialData: Universal Spatial Omics Framework

Unified Zarr-based format for Visium, Xenium, MERFISH, CODEX, and Slide-seq. Multi-slide alignment, spatially variable gene detection with Moran's I, neighbourhood enrichment, and ligand-receptor analysis with Squidpy.

10x Visium mouse brain; 10x Xenium demo
~50 min SpatialData, spatialdata-io, Squidpy, Scanpy
Start Tutorial
📈
Bash Python R Nature Methods 2024 (LRGASP)
Long-Read RNA-Seq: Transcript Discovery & DTU

Full-length isoform sequencing with Oxford Nanopore and PacBio HiFi. Dorado basecalling, Minimap2 splice-aware alignment, IsoQuant/FLAMES/StringTie2 transcript assembly, and differential transcript usage with DEXSeq/Swish.

LRGASP benchmark data (ONT + PacBio HiFi)
~60 min Dorado, Minimap2, IsoQuant, FLAMES, DEXSeq
Start Tutorial
🏅
Python Intermediate Method of the Year 2024
Spatial Proteomics: CODEX, IMC & Phenocycler

Analyse multiplexed tissue imaging data — 40–100 proteins with single-cell spatial resolution. DeepCell Mesmer cell segmentation, scimap phenotyping, neighbourhood enrichment, cell-cell proximity analysis, and spatial scatter visualisation.

CODEX FFPE tissue (tumour microenvironment)
~55 min DeepCell, scimap, AnnData, Scanpy
Start Tutorial
🔗
R Advanced Seurat + Signac
Single-Cell Multiome: Joint ATAC + RNA Analysis

Simultaneously profile chromatin accessibility and gene expression in the same cell. Cell Ranger ARC, Seurat WNN integration, MACS2 per-cluster peak calling, chromVAR TF motif activity, and LinkPeaks cis-regulatory inference.

10x Multiome PBMC (10x Genomics demo)
~60 min Seurat, Signac, chromVAR, MACS2, JASPAR2020
Start Tutorial
Python Bash R AI-Assisted
Vibe Science ✨ — Claude, Codex & Gemini CLI for Bio

Stop hand-writing boilerplate. Use Claude, OpenAI Codex, and Gemini CLI to generate pipelines, debug cryptic errors, and analyse genomes at superhuman speed. Includes real prompts, ethical guardrails, and commentary on how this is changing science.

All bioinformatics use cases
~35 min Claude CLI, Codex CLI, Gemini CLI
Start Tutorial