Tutorial Library
Hands-on tutorials using real public datasets. Every tutorial runs end-to-end with copy-paste commands, actual output examples, and figures from the original analyses.
Transcriptomics
QC, HISAT2 splice-aware mapping, featureCounts read quantification, and differential expression with DESeq2.
Arabidopsis atrx-1 mutant — PRJNA348194Assemble a transcriptome without a reference genome using Trinity, quantify with Salmon, and run differential expression with DESeq2. For non-model organisms.
Coral bleaching RNA-seq — Matz et al. 2013 (PRJNA189030)Design matrices, Wald vs LRT tests, multi-factor paired designs, batch correction, edgeR QLF, GSEA with fgsea, and publication-quality volcano/heatmap figures.
Human airway smooth muscle — Himes et al. 2014 (PRJNA229998)Build weighted gene correlation networks, detect co-expression modules, and relate modules to traits.
Maize Ligule Development (Johnston et al. 2014)QC, normalization, HVG selection, PCA, UMAP, Leiden clustering, marker gene identification, cell type annotation, doublet detection with Scrublet, and pseudo-bulk DE.
Human PBMCs 3k — 10x Genomics (healthy donor)10x Chromium data processing with Cell Ranger, then clustering, annotation and visualization with Seurat.
C. robusta tunicate — SRR8111691-4Chromatin & Epigenomics
Bowtie2 alignment, Picard deduplication, MACS2 peak calling, IDR reproducibility, deepTools QC, HOMER motif discovery, ChIPseeker annotation, and DiffBind differential binding.
ENCODE CTCF ChIP-seq K562 — ENCSR000EGMBowtie2 alignment, organelle read removal, MACS2 peak calling, and bigWig generation for IGV visualization.
Arabidopsis BRM chromatin — PRJNA351855Variant Calling, GWAS & Cancer Genomics
Full GATK4 pipeline: BWA-MEM alignment, BQSR, HaplotypeCaller, joint genotyping, and VQSR filtering.
A. halleri populations — PRJEB18647Build custom databases for any organism, annotate VCFs with functional impact, understand HIGH/MODERATE/LOW tiers, fix common pitfalls, SnpSift filter expressions, and automated pipeline scripts.
Rice 3000 Genomes — PRJEB6180 (O. sativa RAP-DB)VQSR for human WGS, hard filters for non-model organisms & small cohorts, RNA-seq variant filters, bcftools filtering for any caller, genotype-level filtering, automated Python pipeline, and Ti/Tv validation.
Works with any VCF — GATK, FreeBayes, DeepVariant, Strelka2PLINK2 QC, population stratification PCA, kinship matrix, GEMMA linear mixed model, Manhattan & QQ plots, LD clumping, and biomaRt gene annotation.
Arabidopsis 1001 Genomes flowering time — Atwell et al. 2010GATK Mutect2 with matched normal, Panel of Normals, contamination estimation, FilterMutectCalls, VEP annotation, and COSMIC mutational signature analysis with SigProfiler.
TCGA-A8-A08B breast cancer WES — GDC Data PortalGenome Assembly & Annotation
GenomeScope k-mer profiling to estimate size and ploidy, then de novo assembly with SPAdes (prokaryote) or Canu (eukaryote).
Jellyfish k-mer + SPAdes/Canu workflowsEvidence-based gene prediction using MAKER2, training AUGUSTUS with BUSCO genes, and evaluating models with AED scores.
H. glycines soybean cyst nematode genomePredict signal peptides, transmembrane domains, and subcellular localization using deep learning tools on nematode proteomes.
Plant-parasitic nematode proteomeMetagenomics & Microbiome
16S amplicon workflow: DADA2 denoising, ASV table, taxonomy, alpha/beta diversity, and PERMANOVA statistical testing.
Arabidopsis root microbiome — PRJEB15671MEGAHIT co-assembly, Bowtie2 multi-sample coverage, MetaBat2 binning, CheckM quality, GTDB-Tk taxonomy, dRep dereplication, and Prokka gene annotation.
HMP human gut metagenome — PRJNA46333Build a custom Kraken2 database, classify shotgun metagenomics reads, and visualize results with Pavian/Krona.
Nematode clade RNAseq — custom DB buildComparative Genomics & Evolution
OrthoFinder for ortholog calling between multiple genomes, then i-ADHoRe for synteny block detection and circular synteny plots.
Nematode genomes (G. pallida, H. glycines, M. hapla)OrthoFinder orthogroups to codon alignments, PAML CodeML site models, and likelihood ratio tests to identify positively selected genes.
E. coli + Shigella genomes (5 genomes)Build a genetic map from scratch with live interactive Excel-style spreadsheets showing every formula: recombination frequency, LOD score, Kosambi vs Haldane mapping functions, marker ordering, and linkage group assignment. Every calculation is transparent and editable in the browser.
Soybean RIL population — R/qtl built-in; ASMap packageF2, RIL, NAM, and GWAS populations using R/qtl2 interval mapping, composite interval mapping, LOD threshold permutation, raw data walkthrough in Excel, GAPIT mixed model, multi-environment QTL, and MAS marker identification.
Hyper F2 mouse cross — Broman et al. (R/qtl2 package)WGD concepts explained, Ks molecular clock plots, Gaussian peak fitting, MCScanX synteny blocks, WGDI dot plots, subgenome phasing for allopolyploids, fractionation bias, and multi-species Ks comparison.
Arabidopsis thaliana TAIR10 & Brassica napus Darmor-bzhMaxQuant LFQ database search, Perseus statistical workflow, MNAR imputation, limma moderated t-test, volcano plots, GO enrichment, and STRING protein interaction network.
UPS1 spike-in benchmark — Cox et al. 2014 (PRIDE PXD000279)Phylogenetics Deep Dive (8-Module Series)
Site-heterogeneous models (C60, PMSF, CAT-GTR), PhyloBayes, MrBayes MCMC diagnostics, stepping-stone marginal likelihood, and model adequacy testing.
Opisthokonta 248-gene phylogenomics — Whelan et al. 2017 (PRJNA385600)Concordance factors, ASTRAL coalescent, phylogenetic networks (SNaQ, HyDe), AU/SH/KH topology tests, and long branch attraction mitigation.
Angiosperm plastome gene trees — Wickett et al. 2014 (PRJNA196530)Strict & relaxed molecular clocks, fossil calibration strategy, BEAST2 node dating, tip dating with fossilized birth-death, MCMCTree, and treePL.
Mammal timetree — dos Reis et al. 2012 (32 nuclear genes, 16 calibrations)Ancestral state reconstruction (Mk, BM, OU, BioGeoBEARS), ancestral sequence inference, dN/dS, PAML branch-site models, HyPhy BUSTED/MEME, and convergent evolution.
Passerine beak morphology — Cooney et al. 2017 (259 species phenome)Lineage diversification (BAMM, RPANDA, ClaDS), state-dependent diversification (BiSSE, HiSSE, SecSSE), trait diversification (QuaSSE), mass extinction detection (CoMET, PyRate), and adaptive radiation.
Cetacean body-size evolution — Slater et al. 2017 (87 species, BAMM landmark dataset)Whole-genome alignment (minimap2, MUMmer, Cactus), synteny (MCScanX, GENESPACE), WGD Ks plots, CAFE5 gene family evolution, chromosomal rearrangements, and conserved regulatory elements (phastCons, phyloP).
Brassicaceae chromosome evolution — Brassica napus genome (GCA_000686985.2)PSMC/MSMC demographic inference, D-statistics introgression, Bayesian phylogeography (BEAST2), species delimitation (BPP, mPTP), phylogenetic comparative methods, co-phylogenetics, and ancient DNA (mapDamage).
Brown bear phylogeography — Barlow et al. 2018 (PRJEB22764, 64 genomes)Snakemake/Nextflow phylogenomics pipelines, RevBayes custom models, simulation for validation (Seq-Gen, SimPhy, SLiM, msprime), and publication-quality visualization (ggtree, iTOL, toytree).
1KP thousand-plant transcriptome — Wickett et al. 2014 (workflow benchmark dataset)Population Genomics
Filter multi-sample VCFs with VCFtools (missingness, MAF, LD), compute basic stats (Tajima's D, pi, Fst), and run PCA with PLINK for quick population structure overview.
Arabidopsis thaliana 1001 genomes — PRJNA273563LD pruning, cross-validation to choose K, ADMIXTURE ancestry proportion inference, and publication-quality structure bar plots with ggplot2.
Human HGDP panel — Bergström et al. 2020 (929 individuals, 52 populations)Reconstruct historical effective population size (Ne) trajectories from individual genome sequences using PSMC (pairwise) and SMC++ (multi-sample, more recent history).
Snow leopard whole-genome — Prum et al. / PRJNA304161Per-site and windowed Fst, Tajima's D, Pi, neutrality tests, composite likelihood ratio sweeps (SweeD), and extended haplotype homozygosity (XP-EHH, iHS) with rehh.
Maize domestication sweeps — Hufford et al. 2012 (PRJNA178613)Patterson's D-statistics, f-branch introgression fractions (f3, f4), ADMIXTOOLS2 graph fitting, and fd-statistics for detecting gene flow between populations.
Neanderthal introgression — Meyer et al. 2012 / modern human SGDP panelBuild maximum-likelihood population trees with migration edges, determine optimal number of migration events, and visualize population graphs with residual plots.
Human HGDP populations — Pickrell & Pritchard 2012 (TreeMix original paper dataset)Advanced Genetics & Epigenomics
Per-site and windowed Fst, nucleotide diversity (π), Tajima's D, AMOVA, and selection scans (iHS, XP-EHH). Every formula shown in a live interactive spreadsheet. Covers vcftools, PLINK, and rehh pipelines.
Rice 3K diversity panel — 3K RGP; Chr04 selection sweep regionStatistical phasing (SHAPEIT4), read-backed phasing (WhatsHap), long-read HiFiasm phasing, haplotype block detection, IDR phase quality evaluation, and breeding applications. D-prime LD calculator included.
Rice 3K chr04 multi-sample VCF; PacBio HiFi exampleRead-depth (CNVnator), split-read/PE (Manta), and cohort-level (SMOOVE/LUMPY) CNV calling. Depth normalisation, GC bias correction, caller merging, duphold genotyping, and genome-wide visualisation. Live depth-ratio calculator included.
WGS 30x diploid sample; multi-sample rice cohortLD pruning, population stratification PCA, genomic inflation λ, GEMMA LMM, GAPIT3 MLM, Bonferroni vs FDR thresholds (live calculator), Manhattan and QQ plots, SuSiE credible sets, and multi-trait mvLMM.
Rice 3K panel — IRRI 3K RGP genotypes + agronomic traitsFull ATAC-seq and ChIP-seq pipeline: Tn5 shift, fragment-size QC, TSS enrichment, MACS3 peak calling, IDR reproducibility, HOMER/MEME motif analysis, DESeq2 differential accessibility, and deepTools heatmaps. Live FRiP QC calculator.
ENCODE ATAC-seq; ENCODE H3K27ac ChIP-seqNature Methods & Protocols 2024–2025
Infer orthologs and hierarchical orthologous groups (HOGs) across 1,000+ species in hours. Uses pyHam for HOG analysis and single-copy orthologs for species tree inference with IQ-TREE.
UniProt plant proteomes (Arabidopsis, Rice, Maize)GPT-style transformer pre-trained on 33M single-cell profiles. Zero-shot cell annotation, multi-batch integration, gene regulatory network inference from attention weights, and perturbation response prediction.
PBMC 3k + CELLxGENE pre-trained weightsUnified Zarr-based format for Visium, Xenium, MERFISH, CODEX, and Slide-seq. Multi-slide alignment, spatially variable gene detection with Moran's I, neighbourhood enrichment, and ligand-receptor analysis with Squidpy.
10x Visium mouse brain; 10x Xenium demoFull-length isoform sequencing with Oxford Nanopore and PacBio HiFi. Dorado basecalling, Minimap2 splice-aware alignment, IsoQuant/FLAMES/StringTie2 transcript assembly, and differential transcript usage with DEXSeq/Swish.
LRGASP benchmark data (ONT + PacBio HiFi)Analyse multiplexed tissue imaging data — 40–100 proteins with single-cell spatial resolution. DeepCell Mesmer cell segmentation, scimap phenotyping, neighbourhood enrichment, cell-cell proximity analysis, and spatial scatter visualisation.
CODEX FFPE tissue (tumour microenvironment)Simultaneously profile chromatin accessibility and gene expression in the same cell. Cell Ranger ARC, Seurat WNN integration, MACS2 per-cluster peak calling, chromVAR TF motif activity, and LinkPeaks cis-regulatory inference.
10x Multiome PBMC (10x Genomics demo)Stop hand-writing boilerplate. Use Claude, OpenAI Codex, and Gemini CLI to generate pipelines, debug cryptic errors, and analyse genomes at superhuman speed. Includes real prompts, ethical guardrails, and commentary on how this is changing science.
All bioinformatics use cases