SingleM Taxonomic Profiling Skill
Overview
SingleM is a tool for profiling shotgun metagenomes (short- and long-read) by targeting 20 amino acid "window" sequences within single-copy marker genes. It generates GTDB-based taxonomic profiles and is particularly strong at handling novel lineages.
The primary subcommand for taxonomic profiling is singlem pipe.
Skill corresponds to SingleM v[RELEASE_TAG].
Installation
Conda (recommended)
conda create -c conda-forge -c bioconda --override-channels \
--name singlem 'singlem>=[RELEASE_TAG]'
conda activate singlem
# Download reference data (metapackage) — required after conda install
singlem data --output-directory /path/to/metapackage
Docker (includes reference data — no separate data download needed)
docker pull wwood/singlem:[RELEASE_TAG]
# Run pipe directly:
docker run -v `pwd`:`pwd` wwood/singlem:[RELEASE_TAG] pipe \
--sequences `pwd`/my.fastq.gz -p `pwd`/my.profile.csv --threads 4
Singularity/Apptainer
singularity pull docker://wwood/singlem:[RELEASE_TAG]
singularity run -B `pwd`:`pwd` singlem_[RELEASE_TAG].sif pipe \
--sequences `pwd`/my.fastq.gz -p `pwd`/my.profile.csv --threads 4
Core Concepts
- OTU table: The intermediate output of
singlem pipe. Contains per-marker-gene OTU sequences with their coverage/abundance across samples. - Taxonomic profile (condensed profile): The final output summarising community composition. Generated from the OTU table via the
condensealgorithm, which uses trimmed means and expectation maximisation across 59 marker genes. - Coverage: The expected per-base coverage of a genome with that OTU sequence. Derived from
num_hits. The default minimum coverage to report in a taxonomic profile is 0.35× for reads, 0.1× for genomes. - GTDB taxonomy: SingleM uses GTDB taxonomy strings (e.g.
Root; d__Bacteria; p__Proteobacteria; ...).
Generating a Taxonomic Profile
Basic usage — paired-end short reads
singlem pipe \
--forward sample_R1.fastq.gz \
--reverse sample_R2.fastq.gz \
--taxonomic-profile sample.profile.tsv \
--threads 8
Single-end or unpaired reads
singlem pipe \
--sequences sample.fastq.gz \
-p sample.profile.tsv \
--threads 8
(-p is the short form of --taxonomic-profile)
Long reads (Nanopore ≥R10.4.1 or PacBio HiFi)
singlem pipe \
--sequences sample_nanopore.fastq.gz \
-p sample.profile.tsv \
--threads 8
Long reads use the same interface; SingleM auto-detects read length.
Multiple samples — combined in one run
singlem pipe \
--forward S1_R1.fq.gz S2_R1.fq.gz \
--reverse S1_R2.fq.gz S2_R2.fq.gz \
--otu-table all_samples.otu_table.csv \
--taxonomic-profile all_samples.profile.tsv \
--threads 16
For >100 samples, run each individually and combine OTU tables with
singlem summarise.
Genome / assembly input
# Single genome
singlem pipe \
--genome-fasta-files genome.fna \
-p genome.profile.tsv
# Many genomes from a directory
singlem pipe \
--genome-fasta-directory /path/to/genomes/ \
--genome-fasta-extension fna \
-p genomes.profile.tsv \
--threads 16
# From a file listing genome paths
singlem pipe \
--genome-fasta-list genomes.txt \
-p genomes.profile.tsv \
--threads 16
Genome mode uses different defaults: higher
--min-taxon-coverage(0.1) and--min-orf-length(300 bp).
Output Options
Also save an OTU table (--otu-table)
Useful for alpha/beta diversity metrics, ordination, and inspecting raw data (e.g. which marker genes fired, which OTU sequences were found). Compatible with singlem summarise and singlem appraise.
singlem pipe \
--forward sample_R1.fastq.gz \
--reverse sample_R2.fastq.gz \
--otu-table sample.otu_table.csv \
--taxonomic-profile sample.profile.tsv \
--threads 8
Save an archive OTU table (--archive-otu-table) — recommended for long-term archiving
The archive OTU table stores additional information (full sequence context, alignment data) needed to regenerate results without re-running the pipeline. It is the right format for two important downstream modes:
singlem condense— re-derive the taxonomic profile from the archive OTU table (e.g. with different--min-taxon-coveragesettings) without re-runningpipesinglem renew— re-assign taxonomy against an updated metapackage without re-runningpipe
singlem pipe \
--forward sample_R1.fastq.gz \
--reverse sample_R2.fastq.gz \
--archive-otu-table sample.archive.otu_table.json.gz \
--taxonomic-profile sample.profile.tsv \
--threads 8
# Later: re-derive profile with different coverage threshold
singlem condense \
--input-archive-otu-tables sample.archive.otu_table.json.gz \
--taxonomic-profile sample_recondensed.profile.tsv \
--min-taxon-coverage 0.1
# Later: re-assign taxonomy with a newer metapackage
singlem renew \
--archive-otu-table sample.archive.otu_table.json.gz \
--taxonomic-profile sample_updated.profile.tsv \
--metapackage /path/to/new_metapackage
Key Options
| Option | Description | Default |
|---|---|---|
--forward / -1 / --reads / --sequences | Forward or unpaired reads (FASTA/FASTQ, gzipped ok) | required |
--reverse / -2 | Reverse reads for paired-end | — |
--taxonomic-profile / -p | Output taxonomic profile (TSV) | not set |
--otu-table | Output OTU table (CSV) | not set |
--threads | Number of CPU threads | 1 |
--metapackage | Path to reference metapackage | default system metapackage |
--min-taxon-coverage | Min coverage to report in profile | 0.35 (reads), 0.1 (genomes) |
--assignment-method | Taxonomy assignment algorithm for OTUs | smafa_naive_then_diamond |
--genome-fasta-files | Input genome FASTA(s) | — |
--genome-fasta-directory / -d | Directory of genome FASTAs | — |
--genome-fasta-extension | Extension for genome FASTAs | fna |
--genome-fasta-list | File listing genome paths | — |
Output Format
Taxonomic profile (-p / --taxonomic-profile)
Tab-separated file with columns:
sample coverage taxonomy
sample1 5.23 Root; d__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Pseudomonadales; f__Pseudomonadaceae; g__Pseudomonas; s__Pseudomonas aeruginosa
sample1 3.10 Root; d__Bacteria; p__Firmicutes_A; ...
OTU table (--otu-table)
CSV with columns: gene, sample, sequence, num_hits, coverage, taxonomy
Important Caveats
- Use raw reads, not quality-trimmed reads. Quality trimming (e.g. Trimmomatic) can shorten reads below 100 bp, making them unusable. Adapter trimming is fine but unnecessary.
- Do not use assembled contigs as read input. Use
--genome-fasta-filesfor assemblies/MAGs;--sequences/--forwardis for raw reads only. - Reference data required. After conda install, run
singlem databefore usingpipe. Docker images include reference data.
Downstream Analysis
Convert profile to other formats (e.g. BIOM, Kraken-style)
singlem summarise \
--input-taxonomic-profiles sample.profile.tsv \
--output-taxonomic-profile-krona sample.krona.html
Estimate fraction of reads that are bacterial/archaeal (prokaryotic) rather than eukaryotic/phage/etc
singlem pipe \
--forward sample_R1.fq.gz --reverse sample_R2.fq.gz \
-p sample.profile.tsv --threads 8
singlem prokaryotic_fraction \
--forward sample_R1.fq.gz --reverse sample_R2.fq.gz \
-p sample.profile.tsv \
> sample.prokaryotic_fraction.tsv
Re-profile with updated reference database (no re-running pipe)
Requires that the original run saved an --archive-otu-table.
singlem renew \
--archive-otu-table sample.archive.otu_table.json.gz \
--taxonomic-profile sample_updated.profile.tsv \
--metapackage /path/to/new_metapackage
Combine OTU tables from multiple separate runs
singlem summarise \
--input-otu-tables s1.otu_table.csv s2.otu_table.csv s3.otu_table.csv \
--output-otu-table combined.otu_table.csv
Assess how much of a metagenomes's prokaryotes have an associated genome/MAG
singlem pipe --sequences raw.fq.gz --otu-table metagenome.otu_table.csv
singlem pipe --genome-fasta-files my-genomes/*.fasta --otu-table genomes.otu_table.csv
singlem appraise \
--metagenome-otu-tables metagenome.otu_table.csv \
--genome-otu-tables genomes.otu_table.csv
Phage Profiling (Lyrebird)
For dsDNA phage profiling, use the lyrebird command with the same interface:
# Download lyrebird reference data
lyrebird data --output-directory /path/to/lyrebird_metapackage
lyrebird pipe \
--forward sample_R1.fq.gz \
--reverse sample_R2.fq.gz \
-p sample.phage_profile.tsv \
--threads 8
Lyrebird uses >500 phage marker genes and vConTACT3-based taxonomy (not GTDB).
Quick Reference — Most Common Commands
# 1. Download reference data (once, after conda install)
singlem data --output-directory ~/singlem_metapackage
# 2. Profile paired-end metagenome (save archive OTU table for future re-use)
singlem pipe \
--forward sample_R1.fq.gz \
--reverse sample_R2.fq.gz \
--archive-otu-table sample.archive.otu_table.json.gz \
--taxonomic-profile sample.profile.tsv \
--threads 16
# 3. View profile
cat sample.profile.tsv
# 4. Convert to Krona chart
singlem summarise \
--input-taxonomic-profiles sample.profile.tsv \
--output-taxonomic-profile-krona sample.krona.html
Citation
If you use SingleM, please cite:
Ben J. Woodcroft et al. Comprehensive taxonomic identification of microbial species in metagenomic data using SingleM and Sandpiper. Nat Biotechnol (2025). https://doi.org/10.1038/s41587-025-02738-1