SingleM Taxonomic Profiling Skill

Overview

SingleM is a tool for profiling shotgun metagenomes (short- and long-read) by targeting 20 amino acid "window" sequences within single-copy marker genes. It generates GTDB-based taxonomic profiles and is particularly strong at handling novel lineages.

The primary subcommand for taxonomic profiling is singlem pipe.

Skill corresponds to SingleM v[RELEASE_TAG].

Installation

Conda (recommended)

conda create -c conda-forge -c bioconda --override-channels \
  --name singlem 'singlem>=[RELEASE_TAG]'
conda activate singlem

# Download reference data (metapackage) — required after conda install
singlem data --output-directory /path/to/metapackage

Docker (includes reference data — no separate data download needed)

docker pull wwood/singlem:[RELEASE_TAG]
# Run pipe directly:
docker run -v `pwd`:`pwd` wwood/singlem:[RELEASE_TAG] pipe \
  --sequences `pwd`/my.fastq.gz -p `pwd`/my.profile.csv --threads 4

Singularity/Apptainer

singularity pull docker://wwood/singlem:[RELEASE_TAG]
singularity run -B `pwd`:`pwd` singlem_[RELEASE_TAG].sif pipe \
  --sequences `pwd`/my.fastq.gz -p `pwd`/my.profile.csv --threads 4

Core Concepts

OTU table: The intermediate output of singlem pipe. Contains per-marker-gene OTU sequences with their coverage/abundance across samples.
Taxonomic profile (condensed profile): The final output summarising community composition. Generated from the OTU table via the condense algorithm, which uses trimmed means and expectation maximisation across 59 marker genes.
Coverage: The expected per-base coverage of a genome with that OTU sequence. Derived from num_hits. The default minimum coverage to report in a taxonomic profile is 0.35× for reads, 0.1× for genomes.
GTDB taxonomy: SingleM uses GTDB taxonomy strings (e.g. Root; d__Bacteria; p__Proteobacteria; ...).

Generating a Taxonomic Profile

Basic usage — paired-end short reads

singlem pipe \
  --forward sample_R1.fastq.gz \
  --reverse sample_R2.fastq.gz \
  --taxonomic-profile sample.profile.tsv \
  --threads 8

Single-end or unpaired reads

singlem pipe \
  --sequences sample.fastq.gz \
  -p sample.profile.tsv \
  --threads 8

(-p is the short form of --taxonomic-profile)

Long reads (Nanopore ≥R10.4.1 or PacBio HiFi)

singlem pipe \
  --sequences sample_nanopore.fastq.gz \
  -p sample.profile.tsv \
  --threads 8

Long reads use the same interface; SingleM auto-detects read length.

Multiple samples — combined in one run

singlem pipe \
  --forward S1_R1.fq.gz S2_R1.fq.gz \
  --reverse S1_R2.fq.gz S2_R2.fq.gz \
  --otu-table all_samples.otu_table.csv \
  --taxonomic-profile all_samples.profile.tsv \
  --threads 16

For >100 samples, run each individually and combine OTU tables with singlem summarise.

Genome / assembly input

# Single genome
singlem pipe \
  --genome-fasta-files genome.fna \
  -p genome.profile.tsv

# Many genomes from a directory
singlem pipe \
  --genome-fasta-directory /path/to/genomes/ \
  --genome-fasta-extension fna \
  -p genomes.profile.tsv \
  --threads 16

# From a file listing genome paths
singlem pipe \
  --genome-fasta-list genomes.txt \
  -p genomes.profile.tsv \
  --threads 16

Genome mode uses different defaults: higher --min-taxon-coverage (0.1) and --min-orf-length (300 bp).

Output Options

Also save an OTU table (`--otu-table`)

Useful for alpha/beta diversity metrics, ordination, and inspecting raw data (e.g. which marker genes fired, which OTU sequences were found). Compatible with singlem summarise and singlem appraise.

singlem pipe \
  --forward sample_R1.fastq.gz \
  --reverse sample_R2.fastq.gz \
  --otu-table sample.otu_table.csv \
  --taxonomic-profile sample.profile.tsv \
  --threads 8

Save an archive OTU table (`--archive-otu-table`) — recommended for long-term archiving

The archive OTU table stores additional information (full sequence context, alignment data) needed to regenerate results without re-running the pipeline. It is the right format for two important downstream modes:

singlem condense — re-derive the taxonomic profile from the archive OTU table (e.g. with different --min-taxon-coverage settings) without re-running pipe
singlem renew — re-assign taxonomy against an updated metapackage without re-running pipe

singlem pipe \
  --forward sample_R1.fastq.gz \
  --reverse sample_R2.fastq.gz \
  --archive-otu-table sample.archive.otu_table.json.gz \
  --taxonomic-profile sample.profile.tsv \
  --threads 8

# Later: re-derive profile with different coverage threshold
singlem condense \
  --input-archive-otu-tables sample.archive.otu_table.json.gz \
  --taxonomic-profile sample_recondensed.profile.tsv \
  --min-taxon-coverage 0.1

# Later: re-assign taxonomy with a newer metapackage
singlem renew \
  --archive-otu-table sample.archive.otu_table.json.gz \
  --taxonomic-profile sample_updated.profile.tsv \
  --metapackage /path/to/new_metapackage

Key Options

Option	Description	Default
`--forward` / `-1` / `--reads` / `--sequences`	Forward or unpaired reads (FASTA/FASTQ, gzipped ok)	required
`--reverse` / `-2`	Reverse reads for paired-end	—
`--taxonomic-profile` / `-p`	Output taxonomic profile (TSV)	not set
`--otu-table`	Output OTU table (CSV)	not set
`--threads`	Number of CPU threads	1
`--metapackage`	Path to reference metapackage	default system metapackage
`--min-taxon-coverage`	Min coverage to report in profile	0.35 (reads), 0.1 (genomes)
`--assignment-method`	Taxonomy assignment algorithm for OTUs	`smafa_naive_then_diamond`
`--genome-fasta-files`	Input genome FASTA(s)	—
`--genome-fasta-directory` / `-d`	Directory of genome FASTAs	—
`--genome-fasta-extension`	Extension for genome FASTAs	`fna`
`--genome-fasta-list`	File listing genome paths	—

Output Format

Taxonomic profile (`-p` / `--taxonomic-profile`)

Tab-separated file with columns:

sample    coverage    taxonomy
sample1   5.23        Root; d__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Pseudomonadales; f__Pseudomonadaceae; g__Pseudomonas; s__Pseudomonas aeruginosa
sample1   3.10        Root; d__Bacteria; p__Firmicutes_A; ...

OTU table (`--otu-table`)

CSV with columns: gene, sample, sequence, num_hits, coverage, taxonomy

Important Caveats

Use raw reads, not quality-trimmed reads. Quality trimming (e.g. Trimmomatic) can shorten reads below 100 bp, making them unusable. Adapter trimming is fine but unnecessary.
Do not use assembled contigs as read input. Use --genome-fasta-files for assemblies/MAGs; --sequences/--forward is for raw reads only.
Reference data required. After conda install, run singlem data before using pipe. Docker images include reference data.

Downstream Analysis

Convert profile to other formats (e.g. BIOM, Kraken-style)

singlem summarise \
  --input-taxonomic-profiles sample.profile.tsv \
  --output-taxonomic-profile-krona sample.krona.html

Estimate fraction of reads that are bacterial/archaeal (prokaryotic) rather than eukaryotic/phage/etc

singlem pipe \
  --forward sample_R1.fq.gz --reverse sample_R2.fq.gz \
  -p sample.profile.tsv --threads 8

singlem prokaryotic_fraction \
  --forward sample_R1.fq.gz --reverse sample_R2.fq.gz \
  -p sample.profile.tsv \
  > sample.prokaryotic_fraction.tsv

Re-profile with updated reference database (no re-running pipe)

Requires that the original run saved an --archive-otu-table.

singlem renew \
  --archive-otu-table sample.archive.otu_table.json.gz \
  --taxonomic-profile sample_updated.profile.tsv \
  --metapackage /path/to/new_metapackage

Combine OTU tables from multiple separate runs

singlem summarise \
  --input-otu-tables s1.otu_table.csv s2.otu_table.csv s3.otu_table.csv \
  --output-otu-table combined.otu_table.csv

Assess how much of a metagenomes's prokaryotes have an associated genome/MAG

singlem pipe --sequences raw.fq.gz --otu-table metagenome.otu_table.csv
singlem pipe --genome-fasta-files my-genomes/*.fasta --otu-table genomes.otu_table.csv
singlem appraise \
  --metagenome-otu-tables metagenome.otu_table.csv \
  --genome-otu-tables genomes.otu_table.csv

Phage Profiling (Lyrebird)

For dsDNA phage profiling, use the lyrebird command with the same interface:

# Download lyrebird reference data
lyrebird data --output-directory /path/to/lyrebird_metapackage

lyrebird pipe \
  --forward sample_R1.fq.gz \
  --reverse sample_R2.fq.gz \
  -p sample.phage_profile.tsv \
  --threads 8

Lyrebird uses >500 phage marker genes and vConTACT3-based taxonomy (not GTDB).

Quick Reference — Most Common Commands

# 1. Download reference data (once, after conda install)
singlem data --output-directory ~/singlem_metapackage

# 2. Profile paired-end metagenome (save archive OTU table for future re-use)
singlem pipe \
  --forward sample_R1.fq.gz \
  --reverse sample_R2.fq.gz \
  --archive-otu-table sample.archive.otu_table.json.gz \
  --taxonomic-profile sample.profile.tsv \
  --threads 16

# 3. View profile
cat sample.profile.tsv

# 4. Convert to Krona chart
singlem summarise \
  --input-taxonomic-profiles sample.profile.tsv \
  --output-taxonomic-profile-krona sample.krona.html

Citation

If you use SingleM, please cite:

Ben J. Woodcroft et al. Comprehensive taxonomic identification of microbial species in metagenomic data using SingleM and Sandpiper. Nat Biotechnol (2025). https://doi.org/10.1038/s41587-025-02738-1

ナビゲーション

Skillsとは？

リンク

SingleM Taxonomic Profiling Skill

SingleM Taxonomic Profiling Skill

Overview

Installation

Conda (recommended)

Docker (includes reference data — no separate data download needed)

Singularity/Apptainer

Core Concepts

Generating a Taxonomic Profile

Basic usage — paired-end short reads

Single-end or unpaired reads

Long reads (Nanopore ≥R10.4.1 or PacBio HiFi)

Multiple samples — combined in one run

Genome / assembly input

Output Options

Also save an OTU table (`--otu-table`)

Save an archive OTU table (`--archive-otu-table`) — recommended for long-term archiving

Key Options

Output Format

Taxonomic profile (`-p` / `--taxonomic-profile`)

OTU table (`--otu-table`)

Important Caveats

Downstream Analysis

Convert profile to other formats (e.g. BIOM, Kraken-style)

Estimate fraction of reads that are bacterial/archaeal (prokaryotic) rather than eukaryotic/phage/etc

Re-profile with updated reference database (no re-running pipe)

Combine OTU tables from multiple separate runs

Assess how much of a metagenomes's prokaryotes have an associated genome/MAG

Phage Profiling (Lyrebird)

Quick Reference — Most Common Commands

Citation

関連スキル(🔧 開発ツール)

ナビゲーション

Skillsとは？

リンク

SingleM Taxonomic Profiling Skill

SingleM Taxonomic Profiling Skill

Overview

Installation

Conda (recommended)

Docker (includes reference data — no separate data download needed)

Singularity/Apptainer

Core Concepts

Generating a Taxonomic Profile

Basic usage — paired-end short reads

Single-end or unpaired reads

Long reads (Nanopore ≥R10.4.1 or PacBio HiFi)

Multiple samples — combined in one run

Genome / assembly input

Output Options

Also save an OTU table (--otu-table)

Save an archive OTU table (--archive-otu-table) — recommended for long-term archiving

Key Options

Output Format

Taxonomic profile (-p / --taxonomic-profile)

OTU table (--otu-table)

Important Caveats

Downstream Analysis

Convert profile to other formats (e.g. BIOM, Kraken-style)

Estimate fraction of reads that are bacterial/archaeal (prokaryotic) rather than eukaryotic/phage/etc

Re-profile with updated reference database (no re-running pipe)

Combine OTU tables from multiple separate runs

Assess how much of a metagenomes's prokaryotes have an associated genome/MAG

Phage Profiling (Lyrebird)

Quick Reference — Most Common Commands

Citation

関連スキル(🔧 開発ツール)

Also save an OTU table (`--otu-table`)

Save an archive OTU table (`--archive-otu-table`) — recommended for long-term archiving

Taxonomic profile (`-p` / `--taxonomic-profile`)

OTU table (`--otu-table`)