The definitive guide to AI-assisted bioinformatics. Four tools, one workflow, zero excuses for writing boilerplate at 2am. We added LabClaw — autonomous AI agent skills from Stanford & Princeton that run your entire lab without sleeping.
You know how "vibe coding" means describing what you want in plain English and letting an AI write the code? Vibe Science is the same thing but for bioinformatics — and it is, frankly, ridiculous how well it works.
Large language model CLIs and autonomous agent frameworks have become genuinely transformative tools for computational biologists in 2024–2026. This tutorial covers four of the best tools across the spectrum — from simple chat CLIs to fully autonomous lab agents:
The reasoning champion. Best for code review, long-doc QA, methods sections. Think calm postdoc who's read every paper.
Code generation powerhouse. Will write your GATK pipeline and ask if you want it in Nextflow too. Types 200wpm.
The dark horse. 1 million token context means it can read your whole GTF file. Free tier exists. Google handed science a superpower.
206 agentic skills. Runs your full biomedical stack autonomously. Not just a chat CLI — a lab operating system.
| Tool | Developer | Context | Best Bioinformatics Use | Free? | Autonomy Level |
|---|---|---|---|---|---|
| Claude CLI | Anthropic | 200K tokens | Code review, methods QA, long-doc analysis | API key | Interactive |
| Codex / GPT-4o | OpenAI | 128K tokens | Pipeline generation, boilerplate, scripting | API key | Interactive |
| Gemini CLI | 1M tokens | Whole-genome queries, multi-file analysis | Free tier ✓ | Interactive | |
| 🦞 LabClaw | Stanford + Princeton | Agent-based | Full autonomous biomedical workflows, 206 skills | MIT License ✓ | Autonomous ⚡ |
A typical Vibe Science workflow — from raw data to published result — and where each AI tool slots in:
VIBE SCIENCE WORKFLOW — END-TO-END
Each step is AI-assisted. You remain the scientist. The AI is your tireless co-pilot.
🦞 LABCLAW — AUTONOMOUS AGENT PIPELINE
# Install via npm (Node.js required) npm install -g @anthropic-ai/claude-code # Or via pip wrapper pip install anthropic # Set your API key (get one at console.anthropic.com) export ANTHROPIC_API_KEY="sk-ant-..." # Verify claude --version
# Install Codex CLI npm install -g @openai/codex # Or use the OpenAI Python SDK directly pip install openai # Set API key (platform.openai.com) export OPENAI_API_KEY="sk-..." # Verify codex --version
# Install Gemini CLI (Google AI SDK) npm install -g @google/gemini-cli # Or Python pip install google-generativeai # Set API key (aistudio.google.com — has a FREE tier!) export GEMINI_API_KEY="AIza..." # Verify gemini --version # Gemini 1.5 Pro has a free tier with 1M context window # You can literally paste a GTF file and ask questions about it
# Step 1: Install OpenClaw agent runtime (Python 3.9+) pip install openclaw # Or clone directly git clone https://github.com/openclaw/openclaw cd openclaw pip install -e . # Step 2: Clone LabClaw skill library git clone https://github.com/wu-yc/LabClaw cd LabClaw # Step 3: Configure your LLM backend (OpenAI, Anthropic, or local) export OPENAI_API_KEY="sk-..." # or export ANTHROPIC_API_KEY="sk-ant-..." # Step 4: One-message install — send this to your OpenClaw agent: # "Fetch and install the LabClaw skill library into this workspace" # The agent auto-downloads all 206 skills. # Step 5: Verify skills loaded python -c "import labclaw; print(labclaw.list_skills())"
export in your ~/.bashrc or a .env file (add to .gitignore). Leaked keys get abused within minutes. Your PI will not be amused when they get a $4,000 OpenAI bill from a cryptominer in Crimea.
Claude is particularly good at reasoning about complex biology, reading long methods sections, reviewing code logic, and generating well-commented scripts. Think of it as a very calm postdoc who has read every paper ever published.
# Ask Claude a direct question claude "What does a TSS enrichment score of 1.2 mean for my ATAC-seq QC?" # Pipe a file to Claude for analysis cat alignment_stats.txt | claude "Summarise these alignment stats and flag anything concerning" # Ask Claude to write a script claude "Write a Python script that reads a VCF file with cyvcf2, filters to PASS variants with MAF > 0.01, and outputs a tidy TSV with columns: chrom, pos, ref, alt, gene, consequence, AF" # Have Claude review your existing script claude "Review this DESeq2 script for statistical mistakes" < deseq2_analysis.R
# It's 2am. STAR alignment crashed. You have no idea why. # Old you: Stack Overflow for 90 minutes # New you: cat star_log.final.out | claude "Why did my STAR alignment fail and how do I fix it? I am running human GRCh38 with splice junctions. Here is the log:" # Claude will calmly identify that your --genomeSAindexNbases was wrong for your genome size # and give you the exact corrected command. # You will feel both grateful and slightly embarrassed.
# Start an interactive session claude # Then inside the session: # > I have 48 RNA-seq samples, 3 conditions (control, drug_A, drug_B), # > 16 replicates each, human samples. I want to do differential expression # > analysis. Walk me through the best approach step by step. # Claude will ask clarifying questions, recommend DESeq2 vs edgeR based on # your design, warn you about batch effects, and generate the full R script. # This used to take a week to figure out from papers. Now: 8 minutes.
Codex is a code-generation powerhouse. If Claude is the thoughtful postdoc, Codex is the caffeine-fuelled grad student who types 200wpm and somehow makes it work. Ask it to generate boilerplate — it thrives.
# Generate a complete variant calling pipeline codex "Write a complete Snakemake pipeline for variant calling: - Input: paired-end WGS FASTQ files in a samples/ directory - Steps: FastQC -> Trimmomatic -> BWA-MEM2 -> samtools sort/markdup -> GATK HaplotypeCaller -> VQSR -> bcftools filter - Output: annotated VCF per sample + multiqc report - Use config.yaml for sample sheet and reference genome paths - Include proper wildcard handling and temp() for intermediate files" # What you get: a complete, runnable Snakemake workflow # What this used to take: 2-3 days of debugging wildcards
from openai import OpenAI
client = OpenAI()
# Use the API programmatically in your analysis notebooks
def ask_codex(prompt, system="You are an expert bioinformatics engineer."):
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": system},
{"role": "user", "content": prompt}
]
)
return response.choices[0].message.content
# Example: generate a parsing function on the fly
code = ask_codex("""
Write a Python function parse_blast_xml(filepath) that:
1. Parses BLAST XML output using BioPython
2. Returns a pandas DataFrame with columns:
query_id, subject_id, pident, length, evalue, bitscore, query_start,
query_end, subject_start, subject_end
3. Filters to evalue < 1e-5 and pident > 70
""")
print(code)
# Paste or exec() the output — instant working parser
# All those tasks you've been putting off for months: codex "Write an argparse CLI wrapper for my RNA-seq pipeline with flags: --input-dir, --output-dir, --genome, --threads, --dry-run, --config, --resume. Include --help text for each flag." codex "Convert my hardcoded paths in analysis.py to a config.yaml system using pydantic for validation" codex "Write unit tests for my VCF parsing functions using pytest with 3 synthetic VCF fixtures" # Every bioinformatician has a list of 20 scripts they need to write. # Codex writes all 20. You drink your coffee.
chr1\t12345\tSNV\tA\tG\t.\tPASS\tDP=40;AF=0.35. Write a parser for this format." Works every time.
Google's Gemini CLI is the dark horse. The 1-million token context window in Gemini 1.5 Pro changes the game entirely — you can feed it entire genomes' worth of annotation and have a conversation about them. Oh, and there's a free tier. Google just casually handed science a superpower.
gemini --version # One-shot question gemini "Explain UMAP dimensionality reduction to a biologist who understands PCA but has never heard of topology" # File analysis — this is where Gemini DESTROYS the competition gemini -f my_deseq2_results.csv "Which genes are most consistently upregulated across all comparisons? Are there any obvious pathway themes? Flag anything that looks like a batch effect artifact." # Multi-file analysis gemini -f sample_metadata.csv -f counts_matrix.tsv -f qc_report.html \ "Review my experimental design for confounds and suggest the optimal DESeq2 design formula"
# Gemini can literally read your entire annotation file # A human GTF is ~1.5 GB but the text of relevant fields is much smaller # Extract what you need and feed it in: # Get all gene annotations for a chromosome grep "^chr17" gencode.v44.annotation.gtf | grep -v "^#" | head -50000 > chr17_genes.gtf # Ask Gemini to analyse it cat chr17_genes.gtf | gemini "How many protein-coding genes are on chr17? What fraction have at least 5 transcripts? List the top 10 by transcript count." # Compare two annotation files gemini -f gencode_v44.gene_summary.txt -f ensembl_v110.gene_summary.txt \ "What are the key differences between these two genome annotations? Which has more non-coding RNAs? Are there genes in one missing from the other?"
import google.generativeai as genai
import pandas as pd
genai.configure(api_key="YOUR_GEMINI_API_KEY")
model = genai.GenerativeModel("gemini-1.5-pro")
# Load your DESeq2 results
df = pd.read_csv("deseq2_results.csv")
# Automated biological interpretation
prompt = f"""
I have DESeq2 differential expression results comparing cancer vs normal tissue.
Here are the top 50 significant genes (padj < 0.05, |log2FC| > 1):
{df[df.padj < 0.05].nlargest(50, 'log2FoldChange')[['gene_name','log2FoldChange','padj']].to_string()}
Please:
1. Identify major biological pathways involved
2. Highlight any known oncogenes or tumour suppressors
3. Suggest the most relevant Gene Ontology terms to test for enrichment
4. Flag any surprising findings
"""
response = model.generate_content(prompt)
print(response.text)
# This replaces about 45 minutes of manual literature lookup
# Your PI asks "so what do these genes do?"
# You already know because Gemini told you 3 minutes ago
🦞 Stanford & Princeton · MIT License · 2025
LabClaw (from Le Cong Lab at Stanford and Mengdi Wang Lab at Princeton) is a skill library for autonomous biomedical AI agents. It provides 206 composable skills across biology, drug discovery, clinical research, and literature — all wired into the OpenClaw agent runtime so an agent can chain them autonomously without human prompting for each step.
Here’s what a LabClaw-powered autonomous scRNA-seq analysis looks like. You send one message. The agent handles the rest:
# You send ONE message to your OpenClaw agent:
→ USER: "Run a complete scRNA-seq analysis on the 10x data in /data/pbmc_10x/.
Cluster the cells, annotate them, and find differentially expressed genes
between CD4 T cells and CD8 T cells. Save all plots to /results/."
# The agent:
# 1. Loads the LabClaw tooluniverse-single-cell skill
# 2. Runs Cell Ranger count if raw FASTQs are present
# 3. Imports into AnnData / Seurat automatically
# 4. Applies QC filters (adaptive thresholds)
# 5. Normalises, scales, PCA → UMAP
# 6. Clusters with Leiden (tests multiple resolutions)
# 7. Annotates cell types using SingleR / CellTypist
# 8. Runs FindMarkers for CD4 vs CD8
# 9. Generates volcano plots, UMAP PDFs, marker heatmaps
# 10. Writes a structured summary report
# All without another message from you.
# You get back:
← AGENT: "Analysis complete. Found 14 cell types across 8,423 cells.
Top DE genes CD4 vs CD8: CD4, IL7R, CCR7 (up in CD4);
CD8A, GZMB, PRF1 (up in CD8). Results saved to /results/.
Report: /results/analysis_summary.html"
# Each LabClaw skill is a structured YAML/Python definition: # Format: Overview → When to Use → Capabilities → Examples skill_name: tooluniverse-single-cell-rna domain: biology when_to_use: - "User has scRNA-seq data (10x, Smart-seq2, Parse, etc.)" - "User wants QC, clustering, annotation, or DE analysis" - "Any mention of Seurat, Scanpy, Cell Ranger, or AnnData" capabilities: - Load CellRanger output or h5ad/RDS files - QC filtering with adaptive thresholds - Normalisation (SCTransform, scran, log-normalise) - Dimensionality reduction: PCA, UMAP, tSNE - Clustering: Leiden, Louvain - Cell type annotation: SingleR, CellTypist, scGPT - Differential expression: FindMarkers, DESeq2 pseudobulk - Visualisation: UMAP, violin, dot, heatmap, volcano examples: - "Analyse PBMC 10x data and find all immune cell types" - "Compare CD4 vs CD8 T cells in tumour microenvironment"
# Step 1: Let LabClaw run the whole pipeline autonomously # (sends one message to OpenClaw agent as above) # Agent produces: results/de_genes_all_celltypes.csv (50,000 rows) # Step 2: Use Gemini's 1M context to interpret the ENTIRE output cat results/de_genes_all_celltypes.csv | gemini \ "This is the complete differential expression result across all cell types in a tumour microenvironment scRNA-seq experiment. 1. What are the top biological themes per cell type? 2. Are there any consistently dysregulated pathways across multiple cell types? 3. What transcription factors likely drive these signatures? 4. Suggest the 3 most publishable follow-up experiments." # LabClaw did the work. Gemini interpreted 50,000 rows in one shot. # Your job: critically evaluate the output and design experiments. # Welcome to 2026 bioinformatics.
Good prompts are the new bioinformatics skill. Here are patterns that work reliably across all three CLIs:
You are a senior computational biologist with 15 years of experience in RNA-seq analysis, familiar with Bioconductor, Nextflow, and GATK best practices. You prioritise reproducibility, statistical rigour, and clear documentation. [Your actual question here]
Write a Python function that [does X]. Requirements: - Use only standard library + pandas + numpy + [specific lib] - Include type hints - Include a docstring with Args and Returns sections - Include 2 example calls in a if __name__ == '__main__': block - Handle FileNotFoundError gracefully Output only code, no explanation.
I am running [tool/script] and getting this error: [PASTE FULL ERROR TRACEBACK] Here is the relevant code/command: [PASTE CODE] My environment: [OS, Python/R version, key package versions] What is the root cause and what is the exact fix?
Write a script to [do X] and explain each major step with inline comments. I understand [concept A] but not [concept B], so please explain [concept B] in the comment where it first appears. Target audience: graduate student with wet-lab background learning bioinformatics.
Let's be concrete. Here are production-quality prompts and the type of output you should expect:
I have 10x Chromium scRNA-seq data (Cell Ranger output) from 6 human PBMC samples (3 healthy donors, 3 COVID-19 patients). Write a complete Seurat v5 analysis script in R that: 1. Loads all 6 samples and creates a merged Seurat object 2. QC filtering: nFeature_RNA 200-5000, percent.mt < 15, nCount_RNA < 30000 3. SCTransform normalisation (not NormalizeData) 4. PCA → Harmony batch correction (batch = "donor") → UMAP 5. Leiden clustering (resolution 0.5 and 0.8) 6. Automated cell type annotation using SingleR with HumanPrimaryCellAtlas 7. DGE between COVID vs healthy within each major cell type using FindMarkers 8. Save: Seurat RDS, UMAP PDF, top 10 markers per cluster as CSV 9. Set seed(42) for reproducibility Include a sessionInfo() at the end.
Write a Nextflow DSL2 pipeline for bulk RNA-seq analysis: - Inputs: FASTQ files from a samplesheet CSV (sample_id, fastq_1, fastq_2, condition) - Tools: FastQC, TrimGalore, STAR (2-pass), featureCounts, DESeq2 - Use containers: specify Docker images for each process - Publish: QC reports, BAM files, count matrix, DESeq2 results - Include a nextflow.config with resource profiles for 'standard' and 'slurm' Follow nf-core coding style conventions.
Honestly? Debugging is where AI CLIs shine brightest. Bioinformatics errors are notoriously cryptic. Your AI co-pilot has seen every error message ever posted on Biostars and GitHub Issues.
# Classic scenario: Snakemake wildcard error you can't parse
# Error: "WildcardError: Wildcards in input files cannot be determined from output files"
# Old approach: Read Snakemake docs for 1 hour, still confused
# New approach:
cat Snakefile | claude "I'm getting a WildcardError in Snakemake.
Here is my Snakefile. Identify the rule causing the issue and explain
why the wildcard constraint is failing. Show me the corrected rule."
# Claude spots that you used {sample} in input but {samples} in output.
# One character. You lost 2 hours to one character.
# Claude found it in 4 seconds.
# You are both humbled and liberated.
# Paste your error into the terminal
# Using the R httr2 package to call the API:
library(httr2)
debug_with_claude <- function(error_msg, code_context) {
req <- request("https://api.anthropic.com/v1/messages") |>
req_headers(
"x-api-key" = Sys.getenv("ANTHROPIC_API_KEY"),
"anthropic-version" = "2023-06-01",
"content-type" = "application/json"
) |>
req_body_json(list(
model = "claude-opus-4-5",
max_tokens = 1024,
messages = list(list(
role = "user",
content = paste0(
"I have this R error in my bioinformatics analysis:\n\n",
error_msg,
"\n\nContext code:\n\n", code_context,
"\n\nWhat is the root cause and exact fix?"
)
))
))
resp <- req_perform(req)
cat(resp_body_json(resp)$content[[1]]$text)
}
# Usage: just call it when something breaks
tryCatch(
expr = { result <- DESeq(dds) },
error = function(e) debug_with_claude(conditionMessage(e), deparse(dds))
)