ChIP-Seq & ATAC-Seq Epigenomics

1 Overview

ChIP-Seq identifies where proteins (transcription factors, histones) bind DNA. ATAC-Seq maps open chromatin regions. Both produce genome-wide signal tracks analyzed similarly.

Assay	What It Measures	Peak Type	Typical Peaks
ChIP-Seq (TF)	Transcription factor binding sites	Narrow (sharp)	5,000–50,000
ChIP-Seq (Histone)	Histone modifications (H3K4me3, H3K27ac)	Broad (wide)	10,000–100,000
ATAC-Seq	Open chromatin / accessibility	Narrow + some broad	30,000–150,000
CUT&RUN	Protein-DNA binding (low background)	Narrow or broad	Similar to ChIP
CUT&Tag	Histone marks (low cell input)	Narrow or broad	Similar to ChIP

Standard Epigenomics Pipeline

QC & Trim — FastQC, fastp
Align — Bowtie2 (ChIP) or BWA-MEM2 (ATAC)
Filter — remove duplicates, MT reads, low MAPQ
Peak Call — MACS2/3
Visualize — deepTools bigWig, heatmaps
Differential — DiffBind or DESeq2 on peak counts
Motifs — HOMER, MEME-ChIP
Annotate — ChIPseeker, GREAT

2 QC & Alignment

Bash

# QC and trim
fastp -i sample_R1.fq.gz -I sample_R2.fq.gz \
  -o trimmed_R1.fq.gz -O trimmed_R2.fq.gz \
  --html qc_report.html -w 8

# Build Bowtie2 index (if not done)
bowtie2-build reference.fasta bt2_index

# Align with Bowtie2 (ChIP-Seq standard)
bowtie2 -x bt2_index \
  -1 trimmed_R1.fq.gz -2 trimmed_R2.fq.gz \
  --very-sensitive -p 16 --no-mixed --no-discordant \
  --maxins 1000 \
  | samtools sort -@ 8 -o aligned.bam
samtools index aligned.bam

# Check alignment stats
samtools flagstat aligned.bam
# Expect: >80% mapping rate for ChIP, >90% for ATAC

Download:

3 BAM Filtering

Remove duplicates, mitochondrial reads, multi-mappers, and blacklist regions.

Bash

# Remove duplicates with Picard
picard MarkDuplicates \
  I=aligned.bam O=dedup.bam M=dup_metrics.txt \
  REMOVE_DUPLICATES=true

# Alternative: samtools markdup
samtools markdup -r aligned.bam dedup.bam

# Filter: MAPQ>=30, proper pairs, no MT reads
samtools view -b -q 30 -f 2 -F 1804 dedup.bam \
  | grep -v "chrM" > filtered.bam
samtools index filtered.bam

# Remove ENCODE blacklist regions (human/mouse)
bedtools intersect -v -abam filtered.bam \
  -b blacklist.bed > clean.bam
samtools index clean.bam

# Check: how many usable reads?
samtools view -c clean.bam
# ChIP-Seq: aim for >10M reads. ATAC-Seq: >25M reads.

Download:

Mnemonic

"Dup → MAPQ → Mito → Blacklist" — DMMB

Filter in this order: Duplicates → MAPQ threshold → Mitochondrial → Blacklist regions. Skipping any step = noisy peaks.

4 Peak Calling with MACS2/3

Bash

# Narrow peaks (TF ChIP-Seq or ATAC-Seq)
macs3 callpeak \
  -t treatment.bam -c input.bam \
  -f BAMPE -g hs \
  -n my_experiment \
  --outdir peaks/ \
  -q 0.05 \
  --keep-dup all

# Broad peaks (histone marks: H3K36me3, H3K27me3)
macs3 callpeak \
  -t treatment.bam -c input.bam \
  -f BAMPE -g hs \
  -n histone_broad \
  --outdir peaks/ \
  --broad --broad-cutoff 0.1

# Common genome sizes: hs=human, mm=mouse, ce=C.elegans, dm=Drosophila
# For other organisms, calculate: effective genome size
# python -c "print(sum(l for l in genome_lengths if l > 0) * 0.8)"

# Key output files:
# peaks/my_experiment_peaks.narrowPeak  — peak coordinates + scores
# peaks/my_experiment_summits.bed       — peak summit positions
# peaks/my_experiment_treat_pileup.bdg  — signal track (bedGraph)

Download:

Parameter	Default	When to Change
`-g`	hs	Set to your organism's effective genome size
`-f BAMPE`	AUTO	Use `BAMPE` for paired-end to use fragment size
`-q`	0.05	More stringent: 0.01. More permissive for histone: 0.1
`--broad`	off	Enable for histone marks: H3K36me3, H3K27me3, H3K9me3
`--nomodel`	model on	Use for ATAC-Seq with `--shift -75 --extsize 150`

MACS3 Summary Output # Command: macs3 callpeak -t treatment.bam -c input.bam -f BAMPE -g hs # Fragment size = 180 # Total tags in treatment: 25,456,789 # Total tags in control: 22,345,678 # Paired peaks: 23,456 # Peak with FDR < 0.05: 18,234

5 ATAC-Seq Specific Steps

ATAC-Seq has unique preprocessing requirements due to the Tn5 transposase insertion offset.

Bash

# ATAC-Seq: Tn5 offset correction (+4bp/−5bp)
# alignmentSieve from deepTools handles this:
alignmentSieve -b filtered.bam \
  --ATACshift -o shifted.bam -p 16
samtools sort shifted.bam -o shifted_sorted.bam
samtools index shifted_sorted.bam

# Peak calling for ATAC-Seq (NO input control)
macs3 callpeak -t shifted_sorted.bam \
  -f BAMPE -g hs \
  --nomodel --shift -75 --extsize 150 \
  -n atac_peaks --outdir atac_peaks/ \
  -q 0.01 --keep-dup all --call-summits

# Fragment size distribution (QC check)
# Expect: nucleosome-free (<150bp), mono-nucleosome (150-300bp), etc.
samtools view -f 2 filtered.bam | awk '{print abs($9)}' | \
  sort -n | uniq -c | awk '{print $2"\t"$1}' > fragment_sizes.txt

Download:

ATAC-Seq QC Checks:

Fragment size distribution — should show clear nucleosome ladder (peaks at ~200, ~400, ~600 bp)
TSS enrichment — signal should pile up at transcription start sites (enrichment >5 = good)
FRiP (Fraction of Reads in Peaks) — should be >20% for good ATAC-Seq
Mitochondrial reads — should be <20% after filtering (high MT = failed experiment)

6 Differential Binding with DiffBind

R

library(DiffBind)

# Create sample sheet (CSV)
# SampleID, Condition, Replicate, bamReads, Peaks, PeakCaller
samples <- read.csv("samplesheet.csv")

# Load data
dba_obj <- dba(sampleSheet=samples)

# Count reads in consensus peak set
dba_obj <- dba.count(dba_obj, minOverlap=2)

# Normalize
dba_obj <- dba.normalize(dba_obj)

# Set up contrast
dba_obj <- dba.contrast(dba_obj, categories=DBA_CONDITION)

# Differential analysis (uses DESeq2 under the hood)
dba_obj <- dba.analyze(dba_obj, method=DBA_DESEQ2)

# Get results
results <- dba.report(dba_obj, th=0.05, fold=1)
write.csv(as.data.frame(results), "differential_peaks.csv")

# Visualization
dba.plotMA(dba_obj)
dba.plotVolcano(dba_obj)
dba.plotHeatmap(dba_obj, contrast=1, correlations=FALSE)
dba.plotPCA(dba_obj, DBA_CONDITION)

Download:

Thumb Rule — Replicates: Minimum 2 replicates per condition, ideally 3+. DiffBind needs replicates to estimate variance. Without replicates, you can only do overlap analysis (not statistical differential binding).

7 Motif Discovery

Bash

# HOMER — most popular for ChIP/ATAC motif analysis
# Find motifs enriched in peaks vs random background
findMotifsGenome.pl peaks.bed hg38 motif_output/ \
  -size 200 -mask -p 16

# Known motif enrichment only (faster)
findMotifsGenome.pl peaks.bed hg38 motif_output/ \
  -size 200 -mask -p 16 -nomotif

# MEME-ChIP — comprehensive motif analysis suite
meme-chip -meme-maxw 20 -meme-nmotifs 10 \
  -oc meme_output peak_sequences.fasta

# Annotate peaks with closest motif occurrence
annotatePeaks.pl peaks.bed hg38 -m motif_output/homerResults/motif1.motif \
  > peaks_with_motif.txt

Download:

HOMER Known Motif Results (example TF ChIP) Rank Motif Name p-value % Target % Background 1 CTCF(Zf)/... 1e-2345 45.2% 3.1% 2 BORIS(Zf)/... 1e-1234 38.7% 5.2% 3 AP-1(bZIP)/... 1e-456 22.1% 8.9% 4 GATA(Zf)/... 1e-234 15.3% 4.5%

8 deepTools Visualization

Bash

# Generate normalized bigWig signal tracks
bamCoverage -b clean.bam -o signal.bw \
  --normalizeUsing RPKM --binSize 10 \
  --extendReads 200 -p 16

# Compute signal matrix around genes (TSS ± 3kb)
computeMatrix reference-point \
  -S treatment.bw control.bw \
  -R genes.bed \
  --referencePoint TSS \
  -a 3000 -b 3000 \
  -o matrix.gz -p 16

# Profile plot (line plot showing signal around TSS)
plotProfile -m matrix.gz -o tss_profile.png \
  --perGroup --colors blue grey \
  --plotTitle "ChIP signal at TSS"

# Heatmap
plotHeatmap -m matrix.gz -o tss_heatmap.png \
  --colorMap RdYlBu_r --whatToShow "heatmap and colorbar" \
  --sortUsing mean

# Correlation between samples (QC)
multiBamSummary bins -b sample1.bam sample2.bam sample3.bam \
  -o multibam.npz -p 16
plotCorrelation -in multibam.npz --corMethod pearson \
  --whatToPlot heatmap -o correlation.png

# Fingerprint plot (QC: enrichment check)
plotFingerprint -b treatment.bam input.bam \
  -o fingerprint.png -p 16

Download:

9 Peak Annotation

R

library(ChIPseeker)
library(TxDb.Hsapiens.UCSC.hg38.knownGene)
library(org.Hs.eg.db)

txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene

# Read peaks
peaks <- readPeakFile("peaks.narrowPeak")

# Annotate peaks to genomic features
peakAnno <- annotatePeak(peaks, TxDb=txdb, annoDb="org.Hs.eg.db",
                          level="gene")

# Visualize feature distribution
plotAnnoPie(peakAnno)
plotAnnoBar(peakAnno)
plotDistToTSS(peakAnno)

# Gene Ontology enrichment on peak-associated genes
genes <- as.data.frame(peakAnno)$geneId
library(clusterProfiler)
ego <- enrichGO(gene=genes, OrgDb=org.Hs.eg.db, ont="BP",
                readable=TRUE, pAdjustMethod="BH", qvalueCutoff=0.05)
dotplot(ego, showCategory=15)

Download:

10 AI Prompt Guide

Epigenomics Analysis for Your Data

I have [ChIP-Seq / ATAC-Seq / CUT&RUN / CUT&Tag] data. Organism: [ORGANISM], genome build: [BUILD, e.g., hg38, mm10]. Number of samples: [N] treatment, [N] control/input, [N] replicates each. Target: [TF name / histone mark, e.g., H3K27ac]. Sequencing: [single-end / paired-end], [READ_LENGTH] bp. Please write a complete pipeline that: 1. Trims and QCs reads with fastp 2. Aligns with Bowtie2 3. Filters BAM (duplicates, MAPQ, MT, blacklist) 4. Calls peaks with MACS3 (narrow/broad as appropriate) 5. Generates bigWig tracks and TSS heatmaps with deepTools 6. Runs differential binding with DiffBind 7. Finds enriched motifs with HOMER 8. Annotates peaks with ChIPseeker Wrap in SLURM scripts for [CORES] CPUs, [RAM] GB.

11 Common Errors & Troubleshooting

MACS: "0 peaks called" or very few peaks

Check: (1) enough reads after filtering? (2) correct -g genome size? (3) correct -f format (BAMPE for PE)? (4) is the ChIP actually enriched? Check fingerprint plot.

DiffBind: "No consensus peaks"

Set minOverlap=1 to include peaks found in at least one sample. Default requires overlap across majority of replicates. Also check that peak files are non-empty and in the right format (narrowPeak or bed).

ATAC-Seq: Most reads map to mitochondria

Common problem — MT DNA is open chromatin. A good experiment should have <30% MT reads. If >50%, the library prep may need optimization. Filter MT reads post-alignment: samtools view -b input.bam | grep -v chrM.

HOMER: "Genome not found"

Install the genome: perl /path/to/homer/configureHomer.pl -install hg38. For non-model organisms, provide your own FASTA with -preparsedDir.

12 Organism-Specific Notes

Plants

No standard blacklist regions — create your own from input controls
Larger genomes need more sequencing depth (>30M reads for ATAC)
Use effective genome size for your species (calculate from assembly)
HOMER: install custom genome with -preparsedDir

Animals

ENCODE blacklists available for human (hg38) and mouse (mm10)
Well-established protocols and peak count expectations
CUT&RUN/CUT&Tag: use SEACR instead of MACS for peak calling (designed for low-background)

Microbes

Small genomes — fewer peaks but highly concentrated signal
No nucleosomes in prokaryotes (no ATAC-Seq) — use ChIP for TF binding
For yeast: MACS with -g 12e6

13 All Epigenomics Tools

MACS2/3

Gold standard peak caller

Peaks

SEACR

CUT&RUN/Tag peak caller (low bg)

Peaks

HOMER

Motif discovery + peak annotation

Motifs

MEME-ChIP

Comprehensive motif analysis suite

Motifs

DiffBind

Differential binding analysis (R)

Diff

deepTools

Signal viz: heatmaps, profiles, QC

Viz

ChIPseeker

Peak annotation + enrichment (R)

Annotation

GREAT

Regulatory region enrichment (web)

Enrichment

Genrich

ATAC-Seq peak caller (handles replicates)

ATAC

ChromHMM

Chromatin state segmentation

States

nf-core/chipseq

Full Nextflow pipeline

Pipeline

nf-core/atacseq

Full Nextflow ATAC pipeline

Pipeline

Mnemonics & Thumb Rules

Mnemonic

"Align → Filter → Peaks → Motifs → Diff" — AFPMD

The ChIP/ATAC pipeline in order: Align reads, Filter BAM, call Peaks, find Motifs, test Differential binding.

Thumb Rule

Narrow vs Broad Peaks

TF binding + H3K4me3 + H3K27ac = narrow. H3K36me3 + H3K27me3 + H3K9me3 = broad. ATAC-Seq = mostly narrow. Using the wrong mode = missed peaks.

Mnemonic

"FRiP = Fraction of Reads in Peaks"

FRiP >1% for ChIP, >20% for ATAC. Low FRiP = poor enrichment or too few peaks. High FRiP = great data quality.

Thumb Rule — Sequencing Depth: TF ChIP: 20M reads. Histone ChIP: 40M reads. ATAC-Seq: 50M reads (25M usable after filtering). Input control: match treatment depth.

Summary

You can now:

Align and filter ChIP-Seq and ATAC-Seq reads
Call narrow and broad peaks with MACS2/3
Handle ATAC-Seq specifics (Tn5 shift, nucleosome-free fragments)
Find differential binding with DiffBind
Discover enriched motifs with HOMER and MEME
Visualize signal with deepTools heatmaps and profiles
Annotate peaks to genes with ChIPseeker

Phylogenetics All Tutorials

On this page

1 Overview

Standard Epigenomics Pipeline

2 QC & Alignment

3 BAM Filtering

4 Peak Calling with MACS2/3

5 ATAC-Seq Specific Steps

6 Differential Binding with DiffBind

7 Motif Discovery

8 deepTools Visualization

9 Peak Annotation

10 AI Prompt Guide

Epigenomics Analysis for Your Data

11 Common Errors & Troubleshooting

MACS: "0 peaks called" or very few peaks

DiffBind: "No consensus peaks"

ATAC-Seq: Most reads map to mitochondria

HOMER: "Genome not found"

12 Organism-Specific Notes

Plants

Animals

Microbes

13 All Epigenomics Tools

MACS2/3

SEACR

HOMER

MEME-ChIP

DiffBind

deepTools

ChIPseeker

GREAT

Genrich

ChromHMM

nf-core/chipseq

nf-core/atacseq

Mnemonics & Thumb Rules

Summary