Recombination frequency → LOD scores → Kosambi/Haldane distances → marker ordering → final map. Every formula shown live — edit values in the interactive spreadsheets below and watch the calculations update in real time.
A genetic linkage map is a diagram of markers arranged in order along each chromosome, with distances measured in centiMorgans (cM) — a unit proportional to the probability of recombination between two points. Unlike a physical map (which measures base pairs), a genetic map measures biological crossing-over frequency.
| Term | Definition | Units |
|---|---|---|
| Recombination frequency (r) | Proportion of offspring with a recombinant genotype between two markers | Fraction 0–0.5 |
| LOD score | Logarithm of odds that two markers are linked vs. unlinked (r=0.5) | Dimensionless; threshold ≥ 3.0 |
| centiMorgan (cM) | Genetic distance unit: 1 cM = 1% chance of recombination between two markers per meiosis. Calculated from r via a mapping function. | cM (or Morgans, M = cM/100) |
| Mapping function | Formula that converts r to cM, correcting for double crossovers (Haldane, Kosambi) | — |
| Linkage group | Set of markers that all show pairwise LOD ≥ 3 with at least one other marker in the group = one chromosome | — |
The starting point for any linkage analysis is estimating the recombination frequency (r) between every pair of markers. For a RIL or F2 population, this is done by counting genotype combinations in the offspring.
The interactive spreadsheet below shows the calculation for one pair of markers in a RIL population. Edit the four genotype-count cells to see the recombination frequency update live:
The LOD score tests whether two markers are significantly linked (r < 0.5) versus independent (r = 0.5). It is the log10 ratio of two likelihoods: the probability of observing the data if the markers are linked at frequency r, versus the probability if they are unlinked.
Recombination frequency r is not linearly proportional to physical distance because double crossovers cancel each other out. Two mapping functions correct for this:
Before ordering markers, you must assign them to linkage groups (= chromosomes). Two markers belong to the same linkage group if their LOD score exceeds the threshold (typically LOD ≥ 3) and their recombination frequency is below a maximum (typically r ≤ 0.35–0.40). The threshold combination LOD=3 / r=0.35 is the JoinMap default.
Once markers are assigned to linkage groups, they must be ordered along the chromosome to minimise the total map length. The simplest criterion is to find the order of markers that minimises the sum of adjacent recombination frequencies (SARF) or equivalently the total map length in cM.
######################################################################
## STEP 6: R pipeline for linkage map construction
## Tools: R/qtl (read cross, check data) + ASMap (fast MSTmap algorithm)
## ASMap (https://cran.r-project.org/package=ASMap) wraps the MSTmap
## algorithm — the fastest and most accurate marker ordering method
## for large SNP datasets (handles thousands of markers).
##
## Install: install.packages(c("qtl", "ASMap"))
######################################################################
library(qtl)
library(ASMap)
library(ggplot2)
## ================================================================
## FORMAT: R/qtl CSV format
## ================================================================
## Row 1: marker names (columns) and phenotype column names
## Row 2: chromosome assignment (use NA until map is built)
## Row 3: position in cM (use NA until map is built)
## Rows 4+: individual ID + genotype codes
##
## Genotype codes for RIL:
## A = homozygous parent 1 (AA)
## B = homozygous parent 2 (BB)
## - = missing data
##
## Genotype codes for F2:
## A = AA, H = AB (heterozygous), B = BB, - = missing
## Example: read a RIL cross from CSV
## cross <- read.cross("csv",
## file = "myRIL_genotypes.csv",
## genfile = NULL,
## na.strings = c("-", "NA"),
## genotypes = c("A","H","B"))
## Using the built-in soybean RIL dataset for illustration:
data(soybean) ## From R/qtl: 105 markers, 95 RIL individuals, 3 traits
print(soybean)
## "f2": has H class — this is actually an F2 in the soybean dataset.
## Convert to RIL-style analysis by treating H as missing:
## soybean <- convert2bcsft(soybean, BC.gen=0, F.gen=6)
## ================================================================
## STEP 6a: Data quality checks before mapping
## ================================================================
## 1. Check for genotyping error: markers with > 20% missing data
## are unreliable and should be removed.
## 2. Check for duplicate individuals (identical genotype vectors)
## 3. Check for segregation distortion (marker deviated from expected
## 50:50 or 25:50:25 ratio — may indicate selection)
## Per-marker missing data
geno_freq <- summary(soybean)$genotyped.pct
low_geno <- names(geno_freq[geno_freq < 80]) ## < 80% genotyped = problematic
cat("Markers with >20% missing data:", length(low_geno), "\n")
if (length(low_geno) > 0) print(low_geno)
## Remove problematic markers
if (length(low_geno) > 0) {
soybean <- drop.markers(soybean, low_geno)
cat("Cleaned cross:", totmar(soybean), "markers remaining\n")
}
## Check for duplicate markers (same genotype vector = remove one)
## dup_marks <- findDupMarkers(soybean, exact.only=TRUE)
## soybean <- drop.markers(soybean, unlist(dup_marks))
## Segregation distortion chi-sq test
gt <- geno.table(soybean)
seg_dist <- gt[gt$P.value < 0.001,] ## markers distorted at p < 0.001
cat("Markers with segregation distortion (p<0.001):", nrow(seg_dist), "\n")
## ================================================================
## STEP 6b: Linkage group formation using ASMap
## ================================================================
## mstmap: fast minimum spanning tree marker ordering
## p.value = 1e-6 : LOD ~ 6.0 threshold for grouping (strict)
## p.value = 1e-3 : LOD ~ 3.0 (permissive, good for smaller populations)
## dist.fun = "kosambi" : Kosambi mapping function (default; recommended)
## trace = TRUE : print progress
soybean_map <- mstmap.cross(
soybean,
bychr = FALSE, ## FALSE: let ASMap find linkage groups automatically
trace = TRUE,
p.value = 1e-4, ## LOD ≈ 4 threshold for linkage grouping
noMap.dist = 15, ## maximum gap (cM) before splitting a linkage group
noMap.size = 0, ## minimum markers per group (0 = no minimum)
anchor = FALSE,
detectBadData = TRUE, ## flag likely genotyping errors automatically
dist.fun = "kosambi" ## mapping function
)
## ================================================================
## STEP 6c: Examine the genetic map
## ================================================================
## summary: number of chromosomes, markers per LG, total length
summary(soybean_map)
## Total map length
total_len <- sum(chrlen(soybean_map))
cat("Total map length:", round(total_len,1), "cM\n")
cat("Number of linkage groups:", nchr(soybean_map), "\n")
cat("Markers per linkage group:\n")
print(nmar(soybean_map))
## ================================================================
## STEP 6d: Visualise the linkage map
## ================================================================
## plotMap: standard R/qtl map plot
png("linkage_map.png", width=1200, height=600, res=120)
plotMap(soybean_map,
main = "Genetic Linkage Map",
show.marker.names = FALSE) ## set TRUE if few markers
dev.off()
## ================================================================
## STEP 6e: Export map positions for downstream QTL analysis
## ================================================================
## Save the genetic map as a CSV
map_positions <- do.call(rbind, lapply(names(soybean_map$gmap), function(chr) {
positions <- soybean_map$gmap[[chr]]
data.frame(marker=names(positions), chr=chr,
pos_cM=round(positions,3), stringsAsFactors=FALSE)
}))
write.csv(map_positions, "genetic_map_positions.csv", row.names=FALSE)
cat("Map saved: genetic_map_positions.csv\n")
head(map_positions, 10)
######################################################################
## STEP 7: Publication-quality linkage map visualisation
## Creates a chromosome-stick plot with marker tick marks,
## chromosome lengths, and grouped by linkage group
######################################################################
library(ggplot2)
library(dplyr)
## Load the exported map positions (or use from soybean_map above)
map_df <- read.csv("genetic_map_positions.csv")
## Ensure chromosomes are ordered numerically
map_df$chr <- factor(map_df$chr, levels=gtools::mixedsort(unique(map_df$chr)))
## Per-chromosome summary for drawing chromosome sticks
chr_summary <- map_df %>%
group_by(chr) %>%
summarise(
total_cM = max(pos_cM),
n_markers = n(),
.groups = "drop"
)
cat("Chromosome summary:\n")
print(chr_summary)
## ----------------------------------------------------------------
## Plot: chromosome sticks with marker ticks
## ----------------------------------------------------------------
ggplot() +
## Chromosome backbone (thick vertical line)
geom_segment(data=chr_summary,
aes(x=chr, xend=chr, y=0, yend=total_cM),
linewidth=3, color="#2c3e50", lineend="round") +
## Marker tick marks
geom_segment(data=map_df,
aes(x=as.numeric(chr)-0.18,
xend=as.numeric(chr)+0.18,
y=pos_cM, yend=pos_cM),
linewidth=0.35, color="#e74c3c", alpha=0.8) +
## Chromosome labels at top
geom_text(data=chr_summary,
aes(x=chr, y=-3, label=paste0("LG", chr)),
size=3, fontface="bold", vjust=1) +
## Chromosome length label at bottom
geom_text(data=chr_summary,
aes(x=chr, y=total_cM+3,
label=paste0(round(total_cM,0), " cM")),
size=2.5, color="#7f8c8d", vjust=0) +
scale_y_reverse(name="Map position (cM)") +
scale_x_discrete(name=NULL) +
labs(
title = "Genetic Linkage Map",
subtitle = paste0(nrow(map_df), " markers — ",
nlevels(map_df$chr), " linkage groups — ",
round(sum(chr_summary$total_cM),0), " cM total")
) +
theme_minimal(base_size=12) +
theme(
axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
panel.grid = element_blank(),
plot.title = element_text(face="bold")
)
ggsave("linkage_map_publication.png", width=10, height=7, dpi=200)
cat("Saved: linkage_map_publication.png\n")
| Problem | Symptom | Diagnosis & Fix |
|---|---|---|
| Too many linkage groups | More groups than expected chromosomes | LOD threshold is too high. Lower from LOD=6 to LOD=3 in mstmap(p.value=1e-3). Also check for missing data: markers with >30% missing inflate estimated r and break groups. |
| Too few linkage groups | Markers from different chromosomes grouped together | LOD threshold is too low OR maximum r threshold too high. Try LOD=4 / r≤0.35. Also check for contamination — a mixed-up sample with wrong parental alleles forces distant markers to look linked. |
| Inverted marker order | LOD profile or QTL peaks on wrong side of chromosome | The orientation of a linkage group (left-right) is arbitrary. Flip the chromosome: map$chr1 <- max(map$chr1) - rev(map$chr1). Compare to physical map or known marker anchors to choose the correct orientation. |
| Very long gaps in the map | One interval >30–40 cM while others are <10 cM | Missing markers in that region. Check if any markers failed QC that span the gap. Add more markers (GBS, KASP) targeting that chromosomal region. Gaps >50 cM are unreliable for QTL interval mapping — QTL confidence intervals will be inflated. |
| Segregation distortion hotspot | Region of chromosome with consistently skewed allele frequencies | May be a real biological signal (locus affecting viability/fertility). Do NOT remove distorted markers — they provide important positional information. JoinMap handles distorted markers in its maximum likelihood framework. Report the distortion in your manuscript. |
| Double crossover artefacts | Single individual shows recombination in two very close intervals | Likely a genotyping error, not a true double crossover. R/qtl calc.errorlod() flags such individuals. Set error rate to 0.01–0.05 in calc.genoprob(error.prob=0.01) to downweight suspect genotypes in QTL analysis. |