ChIP-Seq & ATAC-Seq Epigenomics

From raw reads to biological insight — alignment, peak calling with MACS2/3, differential binding with DiffBind, motif discovery with HOMER, and signal visualization with deepTools.

~80 min Intermediate Bash / R / Python
Byte

1 Overview

ChIP-Seq identifies where proteins (transcription factors, histones) bind DNA. ATAC-Seq maps open chromatin regions. Both produce genome-wide signal tracks analyzed similarly.

AssayWhat It MeasuresPeak TypeTypical Peaks
ChIP-Seq (TF)Transcription factor binding sitesNarrow (sharp)5,000–50,000
ChIP-Seq (Histone)Histone modifications (H3K4me3, H3K27ac)Broad (wide)10,000–100,000
ATAC-SeqOpen chromatin / accessibilityNarrow + some broad30,000–150,000
CUT&RUNProtein-DNA binding (low background)Narrow or broadSimilar to ChIP
CUT&TagHistone marks (low cell input)Narrow or broadSimilar to ChIP
Standard Epigenomics Pipeline
  1. QC & Trim — FastQC, fastp
  2. Align — Bowtie2 (ChIP) or BWA-MEM2 (ATAC)
  3. Filter — remove duplicates, MT reads, low MAPQ
  4. Peak Call — MACS2/3
  5. Visualize — deepTools bigWig, heatmaps
  6. Differential — DiffBind or DESeq2 on peak counts
  7. Motifs — HOMER, MEME-ChIP
  8. Annotate — ChIPseeker, GREAT

2 QC & Alignment

Bash
# QC and trim
fastp -i sample_R1.fq.gz -I sample_R2.fq.gz \
  -o trimmed_R1.fq.gz -O trimmed_R2.fq.gz \
  --html qc_report.html -w 8

# Build Bowtie2 index (if not done)
bowtie2-build reference.fasta bt2_index

# Align with Bowtie2 (ChIP-Seq standard)
bowtie2 -x bt2_index \
  -1 trimmed_R1.fq.gz -2 trimmed_R2.fq.gz \
  --very-sensitive -p 16 --no-mixed --no-discordant \
  --maxins 1000 \
  | samtools sort -@ 8 -o aligned.bam
samtools index aligned.bam

# Check alignment stats
samtools flagstat aligned.bam
# Expect: >80% mapping rate for ChIP, >90% for ATAC
Download:

3 BAM Filtering

Remove duplicates, mitochondrial reads, multi-mappers, and blacklist regions.

Bash
# Remove duplicates with Picard
picard MarkDuplicates \
  I=aligned.bam O=dedup.bam M=dup_metrics.txt \
  REMOVE_DUPLICATES=true

# Alternative: samtools markdup
samtools markdup -r aligned.bam dedup.bam

# Filter: MAPQ>=30, proper pairs, no MT reads
samtools view -b -q 30 -f 2 -F 1804 dedup.bam \
  | grep -v "chrM" > filtered.bam
samtools index filtered.bam

# Remove ENCODE blacklist regions (human/mouse)
bedtools intersect -v -abam filtered.bam \
  -b blacklist.bed > clean.bam
samtools index clean.bam

# Check: how many usable reads?
samtools view -c clean.bam
# ChIP-Seq: aim for >10M reads. ATAC-Seq: >25M reads.
Download:
Mnemonic

"Dup → MAPQ → Mito → Blacklist" — DMMB

Filter in this order: Duplicates → MAPQ threshold → Mitochondrial → Blacklist regions. Skipping any step = noisy peaks.

4 Peak Calling with MACS2/3

Bash
# Narrow peaks (TF ChIP-Seq or ATAC-Seq)
macs3 callpeak \
  -t treatment.bam -c input.bam \
  -f BAMPE -g hs \
  -n my_experiment \
  --outdir peaks/ \
  -q 0.05 \
  --keep-dup all

# Broad peaks (histone marks: H3K36me3, H3K27me3)
macs3 callpeak \
  -t treatment.bam -c input.bam \
  -f BAMPE -g hs \
  -n histone_broad \
  --outdir peaks/ \
  --broad --broad-cutoff 0.1

# Common genome sizes: hs=human, mm=mouse, ce=C.elegans, dm=Drosophila
# For other organisms, calculate: effective genome size
# python -c "print(sum(l for l in genome_lengths if l > 0) * 0.8)"

# Key output files:
# peaks/my_experiment_peaks.narrowPeak  — peak coordinates + scores
# peaks/my_experiment_summits.bed       — peak summit positions
# peaks/my_experiment_treat_pileup.bdg  — signal track (bedGraph)
Download:
ParameterDefaultWhen to Change
-ghsSet to your organism's effective genome size
-f BAMPEAUTOUse BAMPE for paired-end to use fragment size
-q0.05More stringent: 0.01. More permissive for histone: 0.1
--broadoffEnable for histone marks: H3K36me3, H3K27me3, H3K9me3
--nomodelmodel onUse for ATAC-Seq with --shift -75 --extsize 150
MACS3 Summary Output # Command: macs3 callpeak -t treatment.bam -c input.bam -f BAMPE -g hs # Fragment size = 180 # Total tags in treatment: 25,456,789 # Total tags in control: 22,345,678 # Paired peaks: 23,456 # Peak with FDR < 0.05: 18,234

5 ATAC-Seq Specific Steps

ATAC-Seq has unique preprocessing requirements due to the Tn5 transposase insertion offset.

Bash
# ATAC-Seq: Tn5 offset correction (+4bp/−5bp)
# alignmentSieve from deepTools handles this:
alignmentSieve -b filtered.bam \
  --ATACshift -o shifted.bam -p 16
samtools sort shifted.bam -o shifted_sorted.bam
samtools index shifted_sorted.bam

# Peak calling for ATAC-Seq (NO input control)
macs3 callpeak -t shifted_sorted.bam \
  -f BAMPE -g hs \
  --nomodel --shift -75 --extsize 150 \
  -n atac_peaks --outdir atac_peaks/ \
  -q 0.01 --keep-dup all --call-summits

# Fragment size distribution (QC check)
# Expect: nucleosome-free (<150bp), mono-nucleosome (150-300bp), etc.
samtools view -f 2 filtered.bam | awk '{print abs($9)}' | \
  sort -n | uniq -c | awk '{print $2"\t"$1}' > fragment_sizes.txt
Download:
Byte key
ATAC-Seq QC Checks:
  • Fragment size distribution — should show clear nucleosome ladder (peaks at ~200, ~400, ~600 bp)
  • TSS enrichment — signal should pile up at transcription start sites (enrichment >5 = good)
  • FRiP (Fraction of Reads in Peaks) — should be >20% for good ATAC-Seq
  • Mitochondrial reads — should be <20% after filtering (high MT = failed experiment)

6 Differential Binding with DiffBind

R
library(DiffBind)

# Create sample sheet (CSV)
# SampleID, Condition, Replicate, bamReads, Peaks, PeakCaller
samples <- read.csv("samplesheet.csv")

# Load data
dba_obj <- dba(sampleSheet=samples)

# Count reads in consensus peak set
dba_obj <- dba.count(dba_obj, minOverlap=2)

# Normalize
dba_obj <- dba.normalize(dba_obj)

# Set up contrast
dba_obj <- dba.contrast(dba_obj, categories=DBA_CONDITION)

# Differential analysis (uses DESeq2 under the hood)
dba_obj <- dba.analyze(dba_obj, method=DBA_DESEQ2)

# Get results
results <- dba.report(dba_obj, th=0.05, fold=1)
write.csv(as.data.frame(results), "differential_peaks.csv")

# Visualization
dba.plotMA(dba_obj)
dba.plotVolcano(dba_obj)
dba.plotHeatmap(dba_obj, contrast=1, correlations=FALSE)
dba.plotPCA(dba_obj, DBA_CONDITION)
Download:
Thumb Rule — Replicates: Minimum 2 replicates per condition, ideally 3+. DiffBind needs replicates to estimate variance. Without replicates, you can only do overlap analysis (not statistical differential binding).

7 Motif Discovery

Bash
# HOMER — most popular for ChIP/ATAC motif analysis
# Find motifs enriched in peaks vs random background
findMotifsGenome.pl peaks.bed hg38 motif_output/ \
  -size 200 -mask -p 16

# Known motif enrichment only (faster)
findMotifsGenome.pl peaks.bed hg38 motif_output/ \
  -size 200 -mask -p 16 -nomotif

# MEME-ChIP — comprehensive motif analysis suite
meme-chip -meme-maxw 20 -meme-nmotifs 10 \
  -oc meme_output peak_sequences.fasta

# Annotate peaks with closest motif occurrence
annotatePeaks.pl peaks.bed hg38 -m motif_output/homerResults/motif1.motif \
  > peaks_with_motif.txt
Download:
HOMER Known Motif Results (example TF ChIP) Rank Motif Name p-value % Target % Background 1 CTCF(Zf)/... 1e-2345 45.2% 3.1% 2 BORIS(Zf)/... 1e-1234 38.7% 5.2% 3 AP-1(bZIP)/... 1e-456 22.1% 8.9% 4 GATA(Zf)/... 1e-234 15.3% 4.5%

8 deepTools Visualization

Bash
# Generate normalized bigWig signal tracks
bamCoverage -b clean.bam -o signal.bw \
  --normalizeUsing RPKM --binSize 10 \
  --extendReads 200 -p 16

# Compute signal matrix around genes (TSS ± 3kb)
computeMatrix reference-point \
  -S treatment.bw control.bw \
  -R genes.bed \
  --referencePoint TSS \
  -a 3000 -b 3000 \
  -o matrix.gz -p 16

# Profile plot (line plot showing signal around TSS)
plotProfile -m matrix.gz -o tss_profile.png \
  --perGroup --colors blue grey \
  --plotTitle "ChIP signal at TSS"

# Heatmap
plotHeatmap -m matrix.gz -o tss_heatmap.png \
  --colorMap RdYlBu_r --whatToShow "heatmap and colorbar" \
  --sortUsing mean

# Correlation between samples (QC)
multiBamSummary bins -b sample1.bam sample2.bam sample3.bam \
  -o multibam.npz -p 16
plotCorrelation -in multibam.npz --corMethod pearson \
  --whatToPlot heatmap -o correlation.png

# Fingerprint plot (QC: enrichment check)
plotFingerprint -b treatment.bam input.bam \
  -o fingerprint.png -p 16
Download:

9 Peak Annotation

R
library(ChIPseeker)
library(TxDb.Hsapiens.UCSC.hg38.knownGene)
library(org.Hs.eg.db)

txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene

# Read peaks
peaks <- readPeakFile("peaks.narrowPeak")

# Annotate peaks to genomic features
peakAnno <- annotatePeak(peaks, TxDb=txdb, annoDb="org.Hs.eg.db",
                          level="gene")

# Visualize feature distribution
plotAnnoPie(peakAnno)
plotAnnoBar(peakAnno)
plotDistToTSS(peakAnno)

# Gene Ontology enrichment on peak-associated genes
genes <- as.data.frame(peakAnno)$geneId
library(clusterProfiler)
ego <- enrichGO(gene=genes, OrgDb=org.Hs.eg.db, ont="BP",
                readable=TRUE, pAdjustMethod="BH", qvalueCutoff=0.05)
dotplot(ego, showCategory=15)
Download:

10 AI Prompt Guide

Epigenomics Analysis for Your Data
I have [ChIP-Seq / ATAC-Seq / CUT&RUN / CUT&Tag] data. Organism: [ORGANISM], genome build: [BUILD, e.g., hg38, mm10]. Number of samples: [N] treatment, [N] control/input, [N] replicates each. Target: [TF name / histone mark, e.g., H3K27ac]. Sequencing: [single-end / paired-end], [READ_LENGTH] bp. Please write a complete pipeline that: 1. Trims and QCs reads with fastp 2. Aligns with Bowtie2 3. Filters BAM (duplicates, MAPQ, MT, blacklist) 4. Calls peaks with MACS3 (narrow/broad as appropriate) 5. Generates bigWig tracks and TSS heatmaps with deepTools 6. Runs differential binding with DiffBind 7. Finds enriched motifs with HOMER 8. Annotates peaks with ChIPseeker Wrap in SLURM scripts for [CORES] CPUs, [RAM] GB.

11 Common Errors & Troubleshooting

MACS: "0 peaks called" or very few peaks

Check: (1) enough reads after filtering? (2) correct -g genome size? (3) correct -f format (BAMPE for PE)? (4) is the ChIP actually enriched? Check fingerprint plot.

DiffBind: "No consensus peaks"

Set minOverlap=1 to include peaks found in at least one sample. Default requires overlap across majority of replicates. Also check that peak files are non-empty and in the right format (narrowPeak or bed).

ATAC-Seq: Most reads map to mitochondria

Common problem — MT DNA is open chromatin. A good experiment should have <30% MT reads. If >50%, the library prep may need optimization. Filter MT reads post-alignment: samtools view -b input.bam | grep -v chrM.

HOMER: "Genome not found"

Install the genome: perl /path/to/homer/configureHomer.pl -install hg38. For non-model organisms, provide your own FASTA with -preparsedDir.

12 Organism-Specific Notes

Plants
  • No standard blacklist regions — create your own from input controls
  • Larger genomes need more sequencing depth (>30M reads for ATAC)
  • Use effective genome size for your species (calculate from assembly)
  • HOMER: install custom genome with -preparsedDir
Animals
  • ENCODE blacklists available for human (hg38) and mouse (mm10)
  • Well-established protocols and peak count expectations
  • CUT&RUN/CUT&Tag: use SEACR instead of MACS for peak calling (designed for low-background)
Microbes
  • Small genomes — fewer peaks but highly concentrated signal
  • No nucleosomes in prokaryotes (no ATAC-Seq) — use ChIP for TF binding
  • For yeast: MACS with -g 12e6

13 All Epigenomics Tools

MACS2/3

Gold standard peak caller

Peaks
SEACR

CUT&RUN/Tag peak caller (low bg)

Peaks
HOMER

Motif discovery + peak annotation

Motifs
MEME-ChIP

Comprehensive motif analysis suite

Motifs
DiffBind

Differential binding analysis (R)

Diff
deepTools

Signal viz: heatmaps, profiles, QC

Viz
ChIPseeker

Peak annotation + enrichment (R)

Annotation
GREAT

Regulatory region enrichment (web)

Enrichment
Genrich

ATAC-Seq peak caller (handles replicates)

ATAC
ChromHMM

Chromatin state segmentation

States
nf-core/chipseq

Full Nextflow pipeline

Pipeline
nf-core/atacseq

Full Nextflow ATAC pipeline

Pipeline

Mnemonics & Thumb Rules

Mnemonic

"Align → Filter → Peaks → Motifs → Diff" — AFPMD

The ChIP/ATAC pipeline in order: Align reads, Filter BAM, call Peaks, find Motifs, test Differential binding.

Thumb Rule

Narrow vs Broad Peaks

TF binding + H3K4me3 + H3K27ac = narrow. H3K36me3 + H3K27me3 + H3K9me3 = broad. ATAC-Seq = mostly narrow. Using the wrong mode = missed peaks.

Mnemonic

"FRiP = Fraction of Reads in Peaks"

FRiP >1% for ChIP, >20% for ATAC. Low FRiP = poor enrichment or too few peaks. High FRiP = great data quality.

Thumb Rule — Sequencing Depth: TF ChIP: 20M reads. Histone ChIP: 40M reads. ATAC-Seq: 50M reads (25M usable after filtering). Input control: match treatment depth.

Summary

Byte
You can now:
  1. Align and filter ChIP-Seq and ATAC-Seq reads
  2. Call narrow and broad peaks with MACS2/3
  3. Handle ATAC-Seq specifics (Tn5 shift, nucleosome-free fragments)
  4. Find differential binding with DiffBind
  5. Discover enriched motifs with HOMER and MEME
  6. Visualize signal with deepTools heatmaps and profiles
  7. Annotate peaks to genes with ChIPseeker