Bioinformatics News & Preprints

Tool updates, field breakthroughs, and curated preprints from bioRxiv and arXiv — everything happening in computational biology right now.

Byte
🛠️

Tool Updates

Major releases and version updates across the bioinformatics stack

R / Single-Cell

2024

Seurat v5: Sketch Analysis & Bridge Integration

Seurat v5 rewrites the internals to support sketch-based analysis (analyse millions of cells without full loading via BPCells), Bridge Integration for multiome/ATAC-RNA linking, and a unified JoinLayers() API. The biggest architectural change since v3.

  • BPCells on-disk matrix backend — 10M+ cells on a laptop
  • Sketch-based UMAP and clustering on representative subsets
  • Bridge Integration replaces WNN for multiome data
  • New SCTransform v2 as default normalisation
Seurat v5 docs
R / Bioconductor

Oct 2024

Bioconductor 3.20 — 2,300+ Packages, R 4.4

The October 2024 release adds 73 new software packages and ships updated versions of core frameworks.

  • DESeq2 1.44 — faster LRT, improved convergence
  • scran 1.30 — improved batch-aware normalisation
  • SingleCellExperiment 1.26 — lazy sparse support
  • New spatial packages: MerfishData, SpatialExperiment 1.14
  • edgeR 4.2 — quasi-likelihood improvements
Release notes
Workflows

2024–2025

nf-core Passes 100 Pipelines — All DSL2, Containerised

The nf-core community now maintains over 100 peer-reviewed, CI-tested Nextflow DSL2 pipelines with Docker/Singularity support.

  • nf-core/rnaseq v3.14 — STARsolo + Alevin-fry quantification
  • nf-core/scrnaseq v2.7 — Cell Ranger, STARsolo, Alevin support
  • nf-core/sarek v3.4 — germline + somatic variant calling
  • nf-core/atacseq v2.1 — ENCODE3-compliant ATAC pipeline
  • nf-core/spatialvi — Visium + Xenium spatial analysis (new!)
Browse nf-core
Workflows

2024

Nextflow 24.x: Fusion FS, Wave Containers & Seqera Platform

Nextflow 24 makes cloud-scale workflows dramatically simpler with three headline features.

  • Fusion — virtual file system with native S3/GCS access (no staging)
  • Wave — build and provision containers on-demand at runtime
  • Seqera Platform — cloud monitoring, resource optimisation, collaboration
  • New nf-test framework for unit-testing pipelines
Nextflow blog
Python / Single-Cell

2024

Scanpy 1.10 & scverse Ecosystem — GPU-Accelerated UMAP

Scanpy 1.10 and the broader scverse ecosystem reach maturity with performance and interoperability improvements.

  • RAPIDS-singlecell integration for GPU-accelerated PCA, UMAP, clustering
  • AnnData 0.10 zarr backend — lazy loading of 10M+ cell datasets
  • Muon for multi-modal data (RNA+ATAC+protein)
  • scvi-tools 1.1 — SCVI, SCANVI, totalVI, MULTIVI updates
  • Squidpy 1.5 — faster spatial graph and niche analyses
Scanpy release notes
Variant Calling

2024–2025

GATK 4.6: STR Calling, DRAGEN-GATK Hybrid Mode Production-Ready

The Broad's latest GATK release ships significant improvements across the variant calling stack.

  • HaplotypeCaller — improved short tandem repeat (STR) genotyping
  • Mutect2 — updated artifact filtering for FFPE and low-purity samples
  • CNVSomaticPairWorkflow — streamlined somatic CNV detection
  • DRAGEN-GATK hybrid — clinical-grade speed + GATK accuracy
  • New Funcotator annotation bundles for hg38
GATK releases
Sequencing / Assembly

2025

Oxford Nanopore R10.4.1: Q30 Long-Read Accuracy Standard

ONT's R10.4.1 flow cell with Dorado v0.7+ basecalling now routinely achieves Q30 (>99.9%) single-read accuracy on long reads — matching Illumina for SNP/indel detection. Key milestones:

  • Q30 simplex reads with 400bps kit, R10.4.1 pore
  • 5mCpG methylation calling integrated into Dorado
  • PromethION 2 Solo — portable sequencing at scale
  • Long-read-only clinical WGS pipelines now viable
ONT accuracy page
Assembly

2024–2025

Hifiasm 0.20: T2T Assembly with HiFi + Ultra-Long ONT

Hifiasm now jointly assembles PacBio HiFi and ultra-long ONT reads to produce fully phased T2T-quality diploid assemblies — bringing T2T to population scale.

  • Hi-C phasing mode for trio-free haplotype separation
  • Centromere and telomere assembly via UL-ONT spanning
  • Benchmarked on 25 HG002 assemblies — best contiguity of any assembler
  • Works with PacBio Revio + ONT PromethION combo data
Hifiasm GitHub
Alignment

2024

STAR 2.7.11 + STARsolo: The New Default for RNA-Seq + scRNA-Seq

STAR 2.7.11 consolidates its position as the standard RNA-seq aligner with enhanced STARsolo for single-cell quantification.

  • STARsolo now matches Cell Ranger output for 10x Chromium v2/v3
  • Supports 10x Multiome, ATAC barcode demultiplexing
  • Velocity-compatible spliced/unspliced output
  • 2-pass alignment now default in nf-core/rnaseq
STAR releases

🌍

Field News

Major breakthroughs, consortia milestones, and AI advances in genomics

AI / Structural Bio

2024

AlphaFold 3: Proteins, DNA, RNA & Ligands in One Model

DeepMind's AlphaFold 3 (Nature 2024) extends structure prediction to nucleic acids and small molecules jointly with proteins — enabling drug-target complex prediction, RNA structure modelling, and protein-DNA binding prediction in one forward pass. Model weights available for non-commercial research.

Nature paper
AI / Genomics

2025

Evo 2: 40B Parameter DNA Language Model Across All Life

Arc Institute's Evo 2 is trained on 9.3 million genomes spanning bacteria, archaea, and eukaryotes. It generates functional DNA sequences de novo, predicts variant effect scores, and enables zero-shot gene design. The GPT-4 moment for genomics — open weights.

Arc Institute
Single-Cell Atlas

2025–2026

Human Cell Atlas: 50 Million Cells, 35 Organs

The HCA consortium has mapped over 50 million cells across 35 tissue types — the most comprehensive cellular atlas of the human body ever assembled. Data is freely available via the HCA Data Portal and CZ CELLxGENE Discover. Used as the gold-standard annotation reference for dozens of single-cell tools.

HCA Portal
Pangenomics

2024–2025

HPRC Pangenome v2: 94 Haplotype Assemblies, Graph Reference

The Human Pangenome Reference Consortium released HPRCv2, a pangenome graph from 94 haplotype-resolved assemblies representing global diversity. Tools like vg and Minigraph-Cactus support alignment to the graph, reducing reference bias in variant calling and GWAS — especially for non-European populations.

HPRC Portal
Spatial Omics

Nature Methods 2024

Spatial Proteomics: Nature Methods Method of the Year 2024

CODEX, IMC, Phenocycler, and MIBI-TOF were collectively named Method of the Year 2024 by Nature Methods — reflecting how routine sub-cellular resolution protein mapping in intact tissue has become. Combined with spatial transcriptomics (Xenium, Visium HD), spatial multi-omics is now a mature field.

Nature Methods editorial
AI / Gene Regulation

2024

Nucleotide Transformers & HyenaDNA: Long-Range Sequence Models

Multiple groups released DNA sequence transformers (Nucleotide Transformer by InstaDeep/EMBL, HyenaDNA by Stanford) that learn long-range genomic context. These models predict regulatory element activity, variant effects, and chromatin states from sequence — no ChIP-seq or ATAC-seq required.

bioRxiv preprint

📄

Latest Preprints bioRxiv & arXiv

Curated high-impact preprints in computational biology and genomics

Nature Methods · scRNA-Seq

2023–2024

scVI-tools 1.0: Probabilistic Deep Learning for Single-Cell Omics

Formalises the VAE-based framework for scRNA-seq integration, deconvolution, and DE. Covers SCVI, SCANVI (semi-supervised), totalVI (CITE-seq), and MULTIVI (multi-modal). Now the standard for large atlas integration (>100K cells).

Nature Methods
Nature · AI + Bio

2023

Geneformer: Transfer Learning in Single-Cell Gene Expression Space

Context-aware transformer pre-trained on 30M single-cell transcriptomes (Theodoris et al.). Fine-tuned for chromatin dynamics, gene network inference, and in-silico perturbation response prediction — demonstrating that transfer learning works in transcriptomics as powerfully as in NLP.

Nature paper
Nature Methods · ML + Genomics

2023

Enformer: Predicting Gene Expression from 196kb DNA Context

DeepMind's Enformer predicts RNA-seq, ATAC-seq, and ChIP-seq tracks directly from 196kb of raw DNA sequence. Enables in-silico regulatory variant effect prediction without any experimental assay — a major advance for variant annotation and functional genomics.

Nature Methods
bioRxiv · Population Genomics

2024–2025

GLIMPSE2: Population-Scale Low-Coverage WGS Imputation

GLIMPSE2 imputes 0.1–1× low-coverage WGS to near-WGS genotype quality using a haplotype reference panel. Orders of magnitude faster than BEAGLE for biobank-scale cohorts, supports the HPRC pangenome reference. Makes large population studies affordable on sequencing budgets.

GLIMPSE2 docs
bioRxiv · Epigenomics

2024

ArchR 2.0: Scalable Single-Cell ATAC-Seq Analysis Framework

ArchR 2.0 (Granja et al.) rewrites the HDF5-backed ATAC analysis framework with improved Arrow file format, faster iterative LSI, and tighter integration with Seurat and Signac. Handles 1M+ cells, provides trajectory analysis, and links peaks to genes via co-accessibility and co-expression.

ArchR docs
arXiv · Protein + Structure

2024–2025

ESM3: Multimodal Protein Language Model (Sequence, Structure, Function)

EvolutionaryScale's ESM3 jointly models protein sequence, 3D structure, and functional annotations in a single generative model. Demonstrated de novo generation of fluorescent proteins distant from all known sequences — the first AI-designed protein with confirmed novel function. Open weights released.

ESM3 blog
bioRxiv · RNA-Seq

2024

satuRn vs DEXSeq: Differential Transcript Usage Benchmark

Head-to-head benchmarks (Gilis et al.) show satuRn's quasi-binomial GLM approach has superior FDR control and speed compared to DEXSeq for differential transcript usage analysis. Now recommended as the primary DTU tool in the Bioconductor RNA-seq workflow.

satuRn Bioconductor
Nature · Metagenomics

2024

Unified Human Gastrointestinal Genome (UHGG) v2: 286,000 Genomes

The UHGG v2 catalogue (Almeida et al.) compiles 286,000 non-redundant gut microbial genomes from 42,000 human samples — the most comprehensive human gut reference to date. Enables species-level strain tracking, resistome profiling, and metabolic pathway reconstruction from shotgun metagenomics.

UHGG v2 Portal
bioRxiv · GWAS

2024–2025

PolyFun + SuSiE: Fine-Mapping 95% Credible Sets at GWAS Scale

PolyFun (Weissbrod et al.) uses functional annotations to improve GWAS fine-mapping priors, while SuSiE (Wang et al.) delivers 95% credible sets via sum of single effects. Together they are now the recommended post-GWAS fine-mapping pipeline — replacing FINEMAP for most applications.

PolyFun GitHub