Epigenomics: ATAC-seq & ChIP-seq

End-to-end ATAC-seq and ChIP-seq analysis: QC, alignment, peak calling (MACS3), IDR reproducibility, motif analysis (HOMER/MEME), differential accessibility (DESeq2), and bigWig normalisation. Live FRiP calculator and peak signal spreadsheets included.

~150 min Intermediate-Advanced Live spreadsheets MACS3 / HOMER / DESeq2
Byte

ATAC-seq vs ChIP-seq: Core Differences

FeatureATAC-seqChIP-seq
What it measuresChromatin accessibility (open vs closed chromatin)Binding of a specific protein (transcription factor, histone mark) to DNA
How it worksTn5 transposase cuts and tags open chromatin; closed chromatin is inaccessibleImmunoprecipitation of cross-linked protein-DNA; sequencing of enriched fragments
Starting material50,000-100,000 fresh nuclei; works from fresh/frozen tissueMillions of fixed cells; requires antibody of interest
Controls neededNo IP control needed; mitochondrial DNA % is a QC proxyInput DNA control (matched, same chromatin without IP) REQUIRED
Peak typesTwo types: nucleosome-free regions (NFR, <200 bp) and nucleosomal (200-1000 bp)TF binding: narrow peaks (~100-300 bp); histone marks: broad peaks (1-50 kb)
Key QC metricFRiP (fraction of reads in peaks) >20%; TSS enrichment >7FRiP >1% (TF) or >5% (histone); input-normalised peak enrichment
Byte
The most common ATAC-seq mistake: using bulk cells that include too many non-relevant cell types. ATAC-seq measures accessibility in the AVERAGE across all cells. If your sample is 80% stromal cells and 20% immune cells, your peaks reflect the stromal epigenome overwhelmingly. For heterogeneous tissues: use single-cell ATAC-seq (scATAC-seq) or FACS-sort the cell type of interest before nuclei extraction.

Step 1 — QC and Alignment

Bash -- ATAC-seq and ChIP-seq alignment pipeline
## ================================================================
## ATAC-seq / ChIP-seq alignment pipeline
## Tools: Trim Galore, Bowtie2, samtools, Picard, deepTools
## ================================================================

SAMPLE="sample_ATAC_rep1"
R1="${SAMPLE}_R1.fastq.gz"
R2="${SAMPLE}_R2.fastq.gz"
REF_INDEX="bowtie2_index/genome"   ## pre-built bowtie2 index
CHROMSIZES="genome.chrom.sizes"

## Step 1: Adapter trimming
## ATAC-seq uses Nextera adapters; ChIP-seq typically uses TruSeq
trim_galore \
  --paired \
  --nextera \              ## Nextera adapters for ATAC-seq
  --cores 8 \
  --fastqc \
  -o trimmed/ \
  ${R1} ${R2}
## For ChIP-seq: remove --nextera (TruSeq adapters auto-detected)

## Step 2: Alignment with Bowtie2
## --very-sensitive: more accurate; -X 2000: allow up to 2 kb insert (nucleosomal)
bowtie2 \
  -p 16 \
  --very-sensitive \
  -X 2000 \
  -x ${REF_INDEX} \
  -1 trimmed/${SAMPLE}_R1_val_1.fq.gz \
  -2 trimmed/${SAMPLE}_R2_val_2.fq.gz \
  2> ${SAMPLE}_bowtie2.log | \
  samtools view -bS -q 30 | \   ## MAPQ >= 30 (unique mappers only)
  samtools sort -@ 8 \
  -o ${SAMPLE}_sorted.bam
samtools index ${SAMPLE}_sorted.bam

## Alignment rate check
grep "overall alignment rate" ${SAMPLE}_bowtie2.log
## Expect: ATAC-seq >85%, ChIP-seq >80%

## Step 3: Remove PCR duplicates
picard MarkDuplicates \
  INPUT=${SAMPLE}_sorted.bam \
  OUTPUT=${SAMPLE}_dedup.bam \
  METRICS_FILE=${SAMPLE}_dup_metrics.txt \
  REMOVE_DUPLICATES=true \     ## set false to only mark; true to remove
  VALIDATION_STRINGENCY=LENIENT

samtools index ${SAMPLE}_dedup.bam

## Duplication rate: expect ATAC-seq 20-40%; ChIP-seq 10-50%
grep "Unknown Library" ${SAMPLE}_dup_metrics.txt | head -2

## Step 4: ATAC-seq ONLY -- remove mitochondrial reads
## Mitochondrial DNA is highly accessible and inflates peak numbers
samtools view -h ${SAMPLE}_dedup.bam | \
  grep -v "chrM\|chrMT\|MT" | \
  samtools sort -O bam \
  -o ${SAMPLE}_noMT.bam
samtools index ${SAMPLE}_noMT.bam

MT_reads=$(samtools view -c -f 1 ${SAMPLE}_dedup.bam chrM 2>/dev/null)
total_reads=$(samtools view -c -F 4 ${SAMPLE}_dedup.bam)
echo "MT reads fraction: $(echo "scale=3; ${MT_reads}/${total_reads}" | bc)"
## Expect: MT fraction < 20% for ATAC-seq; > 30% suggests poor nuclei prep

## Step 5: ATAC-seq ONLY -- shift reads by +4 / -5 bp (Tn5 cut correction)
## The Tn5 transposase cuts and inserts sequencing adapters at both ends of
## an accessible region. The actual cut site is offset by 4 bp (+ strand)
## and 5 bp (- strand) from the read start. deepTools alignmentSieve corrects this.
alignmentSieve \
  --ATACshift \                  ## apply +4/-5 Tn5 shift
  --bam ${SAMPLE}_noMT.bam \
  --outFile ${SAMPLE}_shifted.bam \
  --numberOfProcessors 8
samtools sort -O bam -@ 8 -o ${SAMPLE}_shifted_sorted.bam ${SAMPLE}_shifted.bam
samtools index ${SAMPLE}_shifted_sorted.bam

## Step 6: Generate normalised bigWig for visualisation (IGV, UCSC)
bamCoverage \
  --bam ${SAMPLE}_shifted_sorted.bam \
  --outFileName ${SAMPLE}.bw \
  --normalizeUsing RPGC \        ## reads per genomic content (depth-normalised)
  --effectiveGenomeSize 2300000000 \  ## rice genome effective size
  --extendReads \                ## extend PE reads to fragment length
  --binSize 10 \
  --numberOfProcessors 16
Download:

Step 2 — ATAC-seq Quality Control: Fragment Size and TSS Enrichment

ATAC-seq has two unique QC metrics that have no equivalent in ChIP-seq: the fragment size distribution and the TSS enrichment score.

QC MetricWhat it showsGood valueBad value
Fragment size distributionNucleosomal ladder: peaks at ~200 bp (mono), ~400 bp (di), ~600 bp (tri-nucleosomal). NFR <150 bp should be the dominant peak.Clear ladder with NFR peak dominant (>50% of reads)No ladder (bad transposition) or only large fragments (over-digestion or poor nuclei)
TSS enrichmentRatio of read density at TSSs vs flanking regions. Active genes have accessible chromatin around their TSSs.>7 for human/mouse; >4 for most organisms<3: poor signal-to-noise; nuclei were too fragmented or sample had low viability
FRiP scoreFraction of reads in peaks: reads_in_peaks / total_mapped_reads>20% for ATAC; >1% for TF ChIP<5%: sample failed or wrong antibody (ChIP)
Mitochondrial fraction% of reads mapping to chrM<20%>40%: nuclei preparation was poor; cytoplasmic contamination
R -- Fragment size distribution and TSS enrichment plot
## ================================================================
## ATAC-seq QC: fragment size distribution and TSS enrichment
## Using ATACseqQC package (Bioconductor)
## ================================================================
## BiocManager::install("ATACseqQC")
library(ATACseqQC); library(ggplot2)

BAM <- "sample_ATAC_rep1_shifted_sorted.bam"
GENOME <- "BSgenome.Osativa.IRGSP1"  ## or hg38, mm10, etc.

## Fragment size distribution
fragSize <- fragSizeDist(BAM, bamFile.labels="ATAC-rep1")
## A well-prepared ATAC-seq sample shows a characteristic nucleosomal ladder:
## ~100 bp (NFR), ~200 bp (mono-nucleosomal), ~400 bp (di-nucleosomal), etc.

## TSS enrichment score with ATACseqQC
library(TxDb.Hsapiens.UCSC.hg38.knownGene)  ## or equivalent for your organism
txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene
seqlev <- paste0("chr", c(1:22, "X", "Y"))

## Get TSS positions
txs <- transcripts(txdb, filter=list(tx_chrom=seqlev))
pt <- SSreg(BAM, txs=txs, genome=GENOME, outPath="TSS_enrichment",
            seqlev=seqlev)
## Output: TSS_enrichment.pdf with enrichment profile

## Quick manual TSS enrichment with deepTools
## computeMatrix reference-point \
##   --referencePoint TSS \
##   --beforeRegionStartLength 2000 \
##   --afterRegionStartLength 2000 \
##   --binSize 10 \
##   --skipZeros \
##   -S sample_ATAC.bw \
##   -R genes.bed \
##   -o tss_matrix.gz

## plotHeatmap -m tss_matrix.gz -out tss_heatmap.png \
##   --colorMap RdBu_r --sortUsing sum
Download:

Step 3 — Peak Calling with MACS3

MACS3 (Model-based Analysis of ChIP-seq) identifies genomic regions with significantly enriched read coverage. The key concept is the fold enrichment: reads at a peak vs the local background and genome-wide background.

MACS3 Peak Significance Formula
lambda_local = max(lambda_BG, lambda_1k, lambda_5k, lambda_10k)
Poisson p-value = P(X >= observed_count | lambda_local)
Fold enrichment = observed_depth / lambda_local
lambda_BGGenome-wide background: total_reads / genome_size * window_size lambda_1k/5k/10kLocal background estimated from 1 kb, 5 kb, 10 kb windows around the peak. This corrects for local copy number variations and repetitive regions. lambda_localMaximum of all lambda estimates -- uses the most conservative (highest) background, minimising false positives FE = 1No enrichment vs background FE > 5Strong peak signal; typical threshold for high-quality TF ChIP-seq and ATAC-seq
Bash -- MACS3 peak calling for ATAC-seq and ChIP-seq
## ================================================================
## MACS3 peak calling
## ATAC-seq: no control, --nomodel, --shift -100 --extsize 200
## ChIP-seq TF: with input control, narrow peaks
## ChIP-seq histone: broad peaks with --broad flag
## ================================================================

## --- ATAC-seq peak calling ---
macs3 callpeak \
  -t sample_ATAC_rep1_shifted_sorted.bam \
  -f BAMPE \             ## paired-end BAM with fragment coordinates
  --nomodel \            ## no model building (Tn5 shift already applied)
  --nolambda \           ## disable local lambda correction for ATAC
  --keep-dup all \       ## duplicates already removed in alignment step
  -g 3.8e8 \             ## effective genome size (rice: 3.8e8; human: 2.7e9)
  --outdir macs3_atac/ \
  -n sample_ATAC_rep1 \
  -B \                   ## output bedGraph for visualisation
  -q 0.05 \              ## q-value (FDR) threshold for peak calling
  2> macs3_atac/sample_ATAC_rep1_macs3.log

## Peak output files:
## _peaks.narrowPeak: peak coordinates + summit + -log10(q-value) + fold enrichment
## _summits.bed: single-base summit positions (best for motif analysis)
## _control_lambda.bdg, _treat_pileup.bdg: bedGraph tracks

echo "ATAC peaks called: $(wc -l < macs3_atac/sample_ATAC_rep1_peaks.narrowPeak)"

## --- ChIP-seq TF narrow peaks (with input control) ---
macs3 callpeak \
  -t chip_TF_rep1.bam \
  -c chip_input_rep1.bam \   ## matched input control (REQUIRED for ChIP)
  -f BAMPE \
  -g 2.7e9 \
  --outdir macs3_chip_TF/ \
  -n TF_rep1 \
  -q 0.01 \                  ## stricter threshold for ChIP-seq
  --call-summits            ## output summit positions for TF binding

## --- ChIP-seq histone broad peaks (H3K27me3, H3K36me3 etc.) ---
macs3 callpeak \
  -t chip_H3K27me3_rep1.bam \
  -c chip_input_rep1.bam \
  -f BAMPE \
  --broad \                  ## broad peak calling mode
  --broad-cutoff 0.1 \       ## q-value for broad region
  -g 2.7e9 \
  --outdir macs3_chip_broad/ \
  -n H3K27me3_rep1

## Step: filter blacklisted regions (known ENCODE artefact regions)
## Download from: https://github.com/Boyle-Lab/Blacklist
bedtools intersect \
  -v \                       ## -v = keep entries NOT overlapping blacklist
  -a macs3_atac/sample_ATAC_rep1_peaks.narrowPeak \
  -b hg38_blacklist_v2.bed > sample_ATAC_rep1_peaks_filtered.narrowPeak

echo "Peaks after blacklist filter: $(wc -l < sample_ATAC_rep1_peaks_filtered.narrowPeak)"
Download:

Step 4 — IDR: Irreproducible Discovery Rate

IDR (Li et al. 2011) is the ENCODE standard for measuring reproducibility between replicates. It ranks peaks by signal in both replicates and tests whether the top-ranked peaks are consistently high in both -- a truly reproducible peak should be highly ranked in both replicates.

Bash -- IDR reproducibility between ATAC-seq replicates
## ================================================================
## IDR (Irreproducible Discovery Rate) analysis
## Requires: idr package (pip install idr)
## Input: narrowPeak files from two replicates, sorted by -log10(p-value)
## ================================================================

## Sort peaks by p-value (column 8 of narrowPeak = -log10 q-value)
sort -k8,8rn macs3_atac/sample_ATAC_rep1_peaks.narrowPeak \
  > rep1_sorted.narrowPeak
sort -k8,8rn macs3_atac/sample_ATAC_rep2_peaks.narrowPeak \
  > rep2_sorted.narrowPeak

## Run IDR
idr \
  --samples rep1_sorted.narrowPeak rep2_sorted.narrowPeak \
  --input-file-type narrowPeak \
  --rank p.value \          ## rank by p-value (most common for ATAC-seq)
  --output-file idr_output.txt \
  --output-file-type narrowPeak \
  --idr-threshold 0.05 \    ## IDR < 0.05 = reproducible peak
  --plot \                  ## generate IDR diagnostic plot
  --log-output-file idr.log

## Count IDR-passing peaks
IDR_PEAKS=$(wc -l < idr_output.txt)
echo "IDR-reproducible peaks (IDR < 0.05): ${IDR_PEAKS}"
## Expect: for a good ATAC-seq experiment, 60-80% of peaks in each replicate
## should pass IDR

## IDR output column 12 = IDR score; filter at IDR < 0.05 (local IDR)
awk '$12 < 0.05' idr_output.txt > idr_reproducible_peaks.bed
echo "High-reproducibility peaks: $(wc -l < idr_reproducible_peaks.bed)"

## Create consensus peak set (union of IDR peaks from both replicates)
## Standard: extend peaks to 500 bp centred on summit for consistency
awk 'BEGIN{OFS="\t"} {
  summit = $2 + $10;  ## $10 = summit offset from peak start
  start = summit - 250;
  end   = summit + 250;
  if (start < 0) start = 0;
  print $1, start, end, $4, $5
}' idr_reproducible_peaks.bed | \
  sort -k1,1 -k2,2n | \
  bedtools merge -i - > consensus_peaks_500bp.bed

echo "Consensus peak set: $(wc -l < consensus_peaks_500bp.bed) peaks"
Download:

Step 5 — Motif Analysis with HOMER and MEME

Bash -- De novo and known motif enrichment analysis
## ================================================================
## Motif analysis: HOMER de novo + MEME suite
## Input: peak summit BED file (100-200 bp centred on summit)
## ================================================================

PEAKS="consensus_peaks_500bp.bed"
GENOME_FASTA="genome.fasta"
GENOME_DIR="homer_genome/"   ## pre-prepared by HOMER: prepareGenome.pl

## Step 1: HOMER de novo motif finding
## HOMER searches for enriched 8-12 bp sequences in peaks vs background
findMotifsGenome.pl \
  ${PEAKS} \
  ${GENOME_DIR} \           ## genome prepared with HOMER prepareGenome.pl
  motif_results/ \
  -size 200 \               ## sequence window around peak summit
  -mask \                   ## mask repeat regions
  -p 16 \                   ## threads
  -len 8,10,12              ## motif lengths to search

## Output: motif_results/homerResults.html -- interactive motif report
## Top de novo motifs with E-values, % targets, % background, known matches
## motif_results/knownResults.txt -- enrichment of known TF motifs from JASPAR

## Step 2: MEME-ChIP for comprehensive motif analysis
## First extract sequences from peaks
bedtools getfasta \
  -fi ${GENOME_FASTA} \
  -bed ${PEAKS} \
  -fo peak_sequences.fasta

## Run MEME-ChIP
meme-chip \
  -oc meme_results/ \
  -db /path/to/JASPAR2022_CORE_plants_non-redundant.meme \  ## motif database
  -meme-p 16 \
  -meme-nmotifs 15 \
  -meme-minw 6 -meme-maxw 20 \
  peak_sequences.fasta

## Step 3: Motif scanning (find all instances of a known motif in peaks)
## Scan for all JASPAR TF binding sites in your peak set
fimo \
  --oc fimo_results/ \
  --thresh 1e-4 \
  JASPAR2022_CORE_plants.meme \
  peak_sequences.fasta
## Output: fimo_results/fimo.tsv -- all significant motif hits
Download:

Step 6 — Differential Accessibility with DESeq2

R -- Differential accessibility analysis with DESeq2
## ================================================================
## Differential accessibility (ATAC-seq) using DESeq2
## Same approach works for differential ChIP-seq binding
## ================================================================
## BiocManager::install(c("DESeq2", "chromVAR", "GenomicRanges"))
library(DESeq2); library(dplyr); library(ggplot2)

## Step 1: Count reads in consensus peaks for each sample
## Use featureCounts (Subread package) or bedtools multicov
## featureCounts -a consensus_peaks_500bp.saf -F SAF -p -T 8 \
##   -o peak_counts.txt rep1.bam rep2.bam treatment1.bam treatment2.bam

## Load count matrix
counts <- read.table("peak_counts.txt", header=TRUE, skip=1, row.names=1)
count_matrix <- as.matrix(counts[, 6:ncol(counts)])  ## columns 6+ = sample counts
colnames(count_matrix) <- gsub(".*/(.*)\\.bam", "\\1", colnames(count_matrix))

## Sample metadata
meta <- data.frame(
  sample    = colnames(count_matrix),
  condition = c("control","control","treatment","treatment"),
  row.names = colnames(count_matrix)
)

## Step 2: Run DESeq2
dds <- DESeqDataSetFromMatrix(
  countData = count_matrix,
  colData   = meta,
  design    = ~condition
)
## Filter low-count peaks
dds <- dds[rowSums(counts(dds) >= 10) >= 2, ]
dds <- DESeq(dds)

## Step 3: Extract results
res <- results(dds,
               contrast = c("condition","treatment","control"),
               alpha    = 0.05)
res_df <- as.data.frame(res) %>%
  tibble::rownames_to_column("peak_id") %>%
  filter(!is.na(padj)) %>%
  arrange(padj)

cat("Gained peaks (treatment > control, FC>2, FDR<5%):",
    sum(res_df$padj < 0.05 & res_df$log2FoldChange > 1, na.rm=TRUE), "\n")
cat("Lost peaks (control > treatment, FC<0.5, FDR<5%):",
    sum(res_df$padj < 0.05 & res_df$log2FoldChange < -1, na.rm=TRUE), "\n")

## Step 4: MA plot
ggplot(res_df, aes(baseMean, log2FoldChange,
                   colour=padj < 0.05 & abs(log2FoldChange) > 1)) +
  geom_point(size=0.5, alpha=0.6) +
  scale_colour_manual(values=c("FALSE"="#adb5bd","TRUE"="#e63946"),
                      labels=c("Not sig","Significant"), name=NULL) +
  scale_x_log10() +
  geom_hline(yintercept=c(-1,0,1), linetype=c("dashed","solid","dashed"),
             colour=c("blue","grey50","red"), linewidth=0.5) +
  labs(x="Mean normalised count", y="Log2 fold change (treatment / control)",
       title="Differential chromatin accessibility (DESeq2)",
       subtitle=paste0(sum(res_df$padj < 0.05 & res_df$log2FoldChange > 1, na.rm=TRUE),
                       " gained peaks; ",
                       sum(res_df$padj < 0.05 & res_df$log2FoldChange < -1, na.rm=TRUE),
                       " lost peaks")) +
  theme_bw(base_size=12)
ggsave("differential_accessibility_MA.png", width=8, height=6, dpi=200)
Download:

Step 7 — Visualisation: Heatmaps and Genome Browser Tracks

Bash + R -- deepTools heatmap and profile plot
## ================================================================
## Heatmap and average profile plots using deepTools
## ================================================================

## computeMatrix: compute coverage in windows around each peak
computeMatrix reference-point \
  --referencePoint center \
  --beforeRegionStartLength 2000 \
  --afterRegionStartLength 2000 \
  --binSize 25 \
  --skipZeros \
  --missingDataAsZero \
  -S sample_ATAC_control_rep1.bw \
     sample_ATAC_control_rep2.bw \
     sample_ATAC_treatment_rep1.bw \
     sample_ATAC_treatment_rep2.bw \
  -R consensus_peaks_500bp.bed \
  -o atac_heatmap_matrix.gz \
  -p 16

## plotHeatmap: sorted by signal, with colour scale
plotHeatmap \
  -m atac_heatmap_matrix.gz \
  --out atac_peak_heatmap.png \
  --colorMap RdYlBu_r \
  --zMax 5 --zMin -1 \
  --sortUsing sum \
  --samplesLabel Control_R1 Control_R2 Treatment_R1 Treatment_R2 \
  --regionsLabel "Consensus peaks" \
  --dpi 200

## plotProfile: average signal across all peaks
plotProfile \
  -m atac_heatmap_matrix.gz \
  --out atac_avg_profile.png \
  --plotType lines \
  --perGroup \
  --colors "#2166ac" "#4393c3" "#d6604d" "#b2182b" \
  --samplesLabel Control_R1 Control_R2 Treatment_R1 Treatment_R2

## ================================================================
## R: volcano plot for differential accessibility
## ================================================================
library(ggplot2); library(ggrepel)
res_df$label <- ifelse(res_df$padj < 0.001 & abs(res_df$log2FoldChange) > 2,
                        res_df$peak_id, "")
ggplot(res_df, aes(log2FoldChange, -log10(padj),
                   colour=padj < 0.05 & abs(log2FoldChange) > 1)) +
  geom_point(size=0.8, alpha=0.7) +
  geom_text_repel(aes(label=label), size=2.5, max.overlaps=20) +
  scale_colour_manual(values=c("FALSE"="#adb5bd","TRUE"="#e63946"),
                      labels=c("Not sig","FDR<5%, FC>2"), name=NULL) +
  geom_vline(xintercept=c(-1,1), linetype="dashed", colour="grey40") +
  geom_hline(yintercept=-log10(0.05), linetype="dashed", colour="grey40") +
  labs(x="Log2 fold change", y="-log10(adjusted p-value)",
       title="Differential ATAC-seq peaks: Volcano plot") +
  theme_bw(base_size=12)
ggsave("differential_accessibility_volcano.png", width=8, height=6, dpi=200)
Download:

Common Pitfalls

ProblemSymptomFix
Low FRiP in ATAC-seqFRiP < 5%; too many background readsMost likely cause: poor nuclei isolation (cytoplasmic contamination). Repeat nuclei prep. Alternatively: insufficient sequencing depth or wrong cell type being assayed.
Mitochondrial reads dominateMT fraction > 40%Mitochondria are highly accessible. Improve nuclei isolation protocol: use IGEPAL-CA630 (0.1%) or NP-40 to lyse cells more cleanly. Add 0.1% digitonin. Always remove MT reads before peak calling.
No nucleosomal ladderFragment size distribution is flat; no peaks at 200, 400 bpOver-transposition (too much Tn5) or degraded DNA. Optimise Tn5 concentration (typically 2.5 uL per 50K cells). Check DNA integrity on Bioanalyzer before library prep.
ChIP-seq peaks without input controlThousands of peaks in gene-poor regions; peaks in blacklist regionsALWAYS use a matched input control for ChIP-seq. Input-less peak calling (--nolambda) is only appropriate for ATAC-seq.
IDR below 50% passingLess than half of peaks in each replicate are IDR-reproducibleCheck individual replicate quality (FRiP, TSS enrichment). Replicates may have batch effects (different cell passages, different operators). Consider whether the biological condition is inherently variable.
Differential analysis confounded by read depthSamples with more reads always have more "gained" peaksAlways normalise: DESeq2 uses size factors computed from the count matrix. Use quantile normalisation (chromVAR) for large-scale comparisons. Never compare raw peak counts between samples.