From raw reads to biological insight — alignment, peak calling with MACS2/3, differential binding with DiffBind, motif discovery with HOMER, and signal visualization with deepTools.
ChIP-Seq identifies where proteins (transcription factors, histones) bind DNA. ATAC-Seq maps open chromatin regions. Both produce genome-wide signal tracks analyzed similarly.
| Assay | What It Measures | Peak Type | Typical Peaks |
|---|---|---|---|
| ChIP-Seq (TF) | Transcription factor binding sites | Narrow (sharp) | 5,000–50,000 |
| ChIP-Seq (Histone) | Histone modifications (H3K4me3, H3K27ac) | Broad (wide) | 10,000–100,000 |
| ATAC-Seq | Open chromatin / accessibility | Narrow + some broad | 30,000–150,000 |
| CUT&RUN | Protein-DNA binding (low background) | Narrow or broad | Similar to ChIP |
| CUT&Tag | Histone marks (low cell input) | Narrow or broad | Similar to ChIP |
# QC and trim fastp -i sample_R1.fq.gz -I sample_R2.fq.gz \ -o trimmed_R1.fq.gz -O trimmed_R2.fq.gz \ --html qc_report.html -w 8 # Build Bowtie2 index (if not done) bowtie2-build reference.fasta bt2_index # Align with Bowtie2 (ChIP-Seq standard) bowtie2 -x bt2_index \ -1 trimmed_R1.fq.gz -2 trimmed_R2.fq.gz \ --very-sensitive -p 16 --no-mixed --no-discordant \ --maxins 1000 \ | samtools sort -@ 8 -o aligned.bam samtools index aligned.bam # Check alignment stats samtools flagstat aligned.bam # Expect: >80% mapping rate for ChIP, >90% for ATAC
Remove duplicates, mitochondrial reads, multi-mappers, and blacklist regions.
# Remove duplicates with Picard picard MarkDuplicates \ I=aligned.bam O=dedup.bam M=dup_metrics.txt \ REMOVE_DUPLICATES=true # Alternative: samtools markdup samtools markdup -r aligned.bam dedup.bam # Filter: MAPQ>=30, proper pairs, no MT reads samtools view -b -q 30 -f 2 -F 1804 dedup.bam \ | grep -v "chrM" > filtered.bam samtools index filtered.bam # Remove ENCODE blacklist regions (human/mouse) bedtools intersect -v -abam filtered.bam \ -b blacklist.bed > clean.bam samtools index clean.bam # Check: how many usable reads? samtools view -c clean.bam # ChIP-Seq: aim for >10M reads. ATAC-Seq: >25M reads.
"Dup → MAPQ → Mito → Blacklist" — DMMB
Filter in this order: Duplicates → MAPQ threshold → Mitochondrial → Blacklist regions. Skipping any step = noisy peaks.
# Narrow peaks (TF ChIP-Seq or ATAC-Seq) macs3 callpeak \ -t treatment.bam -c input.bam \ -f BAMPE -g hs \ -n my_experiment \ --outdir peaks/ \ -q 0.05 \ --keep-dup all # Broad peaks (histone marks: H3K36me3, H3K27me3) macs3 callpeak \ -t treatment.bam -c input.bam \ -f BAMPE -g hs \ -n histone_broad \ --outdir peaks/ \ --broad --broad-cutoff 0.1 # Common genome sizes: hs=human, mm=mouse, ce=C.elegans, dm=Drosophila # For other organisms, calculate: effective genome size # python -c "print(sum(l for l in genome_lengths if l > 0) * 0.8)" # Key output files: # peaks/my_experiment_peaks.narrowPeak — peak coordinates + scores # peaks/my_experiment_summits.bed — peak summit positions # peaks/my_experiment_treat_pileup.bdg — signal track (bedGraph)
| Parameter | Default | When to Change |
|---|---|---|
-g | hs | Set to your organism's effective genome size |
-f BAMPE | AUTO | Use BAMPE for paired-end to use fragment size |
-q | 0.05 | More stringent: 0.01. More permissive for histone: 0.1 |
--broad | off | Enable for histone marks: H3K36me3, H3K27me3, H3K9me3 |
--nomodel | model on | Use for ATAC-Seq with --shift -75 --extsize 150 |
ATAC-Seq has unique preprocessing requirements due to the Tn5 transposase insertion offset.
# ATAC-Seq: Tn5 offset correction (+4bp/−5bp)
# alignmentSieve from deepTools handles this:
alignmentSieve -b filtered.bam \
--ATACshift -o shifted.bam -p 16
samtools sort shifted.bam -o shifted_sorted.bam
samtools index shifted_sorted.bam
# Peak calling for ATAC-Seq (NO input control)
macs3 callpeak -t shifted_sorted.bam \
-f BAMPE -g hs \
--nomodel --shift -75 --extsize 150 \
-n atac_peaks --outdir atac_peaks/ \
-q 0.01 --keep-dup all --call-summits
# Fragment size distribution (QC check)
# Expect: nucleosome-free (<150bp), mono-nucleosome (150-300bp), etc.
samtools view -f 2 filtered.bam | awk '{print abs($9)}' | \
sort -n | uniq -c | awk '{print $2"\t"$1}' > fragment_sizes.txt
library(DiffBind)
# Create sample sheet (CSV)
# SampleID, Condition, Replicate, bamReads, Peaks, PeakCaller
samples <- read.csv("samplesheet.csv")
# Load data
dba_obj <- dba(sampleSheet=samples)
# Count reads in consensus peak set
dba_obj <- dba.count(dba_obj, minOverlap=2)
# Normalize
dba_obj <- dba.normalize(dba_obj)
# Set up contrast
dba_obj <- dba.contrast(dba_obj, categories=DBA_CONDITION)
# Differential analysis (uses DESeq2 under the hood)
dba_obj <- dba.analyze(dba_obj, method=DBA_DESEQ2)
# Get results
results <- dba.report(dba_obj, th=0.05, fold=1)
write.csv(as.data.frame(results), "differential_peaks.csv")
# Visualization
dba.plotMA(dba_obj)
dba.plotVolcano(dba_obj)
dba.plotHeatmap(dba_obj, contrast=1, correlations=FALSE)
dba.plotPCA(dba_obj, DBA_CONDITION)
# HOMER — most popular for ChIP/ATAC motif analysis # Find motifs enriched in peaks vs random background findMotifsGenome.pl peaks.bed hg38 motif_output/ \ -size 200 -mask -p 16 # Known motif enrichment only (faster) findMotifsGenome.pl peaks.bed hg38 motif_output/ \ -size 200 -mask -p 16 -nomotif # MEME-ChIP — comprehensive motif analysis suite meme-chip -meme-maxw 20 -meme-nmotifs 10 \ -oc meme_output peak_sequences.fasta # Annotate peaks with closest motif occurrence annotatePeaks.pl peaks.bed hg38 -m motif_output/homerResults/motif1.motif \ > peaks_with_motif.txt
# Generate normalized bigWig signal tracks bamCoverage -b clean.bam -o signal.bw \ --normalizeUsing RPKM --binSize 10 \ --extendReads 200 -p 16 # Compute signal matrix around genes (TSS ± 3kb) computeMatrix reference-point \ -S treatment.bw control.bw \ -R genes.bed \ --referencePoint TSS \ -a 3000 -b 3000 \ -o matrix.gz -p 16 # Profile plot (line plot showing signal around TSS) plotProfile -m matrix.gz -o tss_profile.png \ --perGroup --colors blue grey \ --plotTitle "ChIP signal at TSS" # Heatmap plotHeatmap -m matrix.gz -o tss_heatmap.png \ --colorMap RdYlBu_r --whatToShow "heatmap and colorbar" \ --sortUsing mean # Correlation between samples (QC) multiBamSummary bins -b sample1.bam sample2.bam sample3.bam \ -o multibam.npz -p 16 plotCorrelation -in multibam.npz --corMethod pearson \ --whatToPlot heatmap -o correlation.png # Fingerprint plot (QC: enrichment check) plotFingerprint -b treatment.bam input.bam \ -o fingerprint.png -p 16
library(ChIPseeker)
library(TxDb.Hsapiens.UCSC.hg38.knownGene)
library(org.Hs.eg.db)
txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene
# Read peaks
peaks <- readPeakFile("peaks.narrowPeak")
# Annotate peaks to genomic features
peakAnno <- annotatePeak(peaks, TxDb=txdb, annoDb="org.Hs.eg.db",
level="gene")
# Visualize feature distribution
plotAnnoPie(peakAnno)
plotAnnoBar(peakAnno)
plotDistToTSS(peakAnno)
# Gene Ontology enrichment on peak-associated genes
genes <- as.data.frame(peakAnno)$geneId
library(clusterProfiler)
ego <- enrichGO(gene=genes, OrgDb=org.Hs.eg.db, ont="BP",
readable=TRUE, pAdjustMethod="BH", qvalueCutoff=0.05)
dotplot(ego, showCategory=15)
Check: (1) enough reads after filtering? (2) correct -g genome size? (3) correct -f format (BAMPE for PE)? (4) is the ChIP actually enriched? Check fingerprint plot.
Set minOverlap=1 to include peaks found in at least one sample. Default requires overlap across majority of replicates. Also check that peak files are non-empty and in the right format (narrowPeak or bed).
Common problem — MT DNA is open chromatin. A good experiment should have <30% MT reads. If >50%, the library prep may need optimization. Filter MT reads post-alignment: samtools view -b input.bam | grep -v chrM.
Install the genome: perl /path/to/homer/configureHomer.pl -install hg38. For non-model organisms, provide your own FASTA with -preparsedDir.
-preparsedDir-g 12e6Gold standard peak caller
PeaksCUT&RUN/Tag peak caller (low bg)
PeaksMotif discovery + peak annotation
MotifsComprehensive motif analysis suite
MotifsDifferential binding analysis (R)
DiffSignal viz: heatmaps, profiles, QC
VizPeak annotation + enrichment (R)
AnnotationRegulatory region enrichment (web)
EnrichmentATAC-Seq peak caller (handles replicates)
ATACChromatin state segmentation
StatesFull Nextflow pipeline
PipelineFull Nextflow ATAC pipeline
Pipeline"Align → Filter → Peaks → Motifs → Diff" — AFPMD
The ChIP/ATAC pipeline in order: Align reads, Filter BAM, call Peaks, find Motifs, test Differential binding.
Narrow vs Broad Peaks
TF binding + H3K4me3 + H3K27ac = narrow. H3K36me3 + H3K27me3 + H3K9me3 = broad. ATAC-Seq = mostly narrow. Using the wrong mode = missed peaks.
"FRiP = Fraction of Reads in Peaks"
FRiP >1% for ChIP, >20% for ATAC. Low FRiP = poor enrichment or too few peaks. High FRiP = great data quality.