name: DuckDB Backup description: Comprehensive DuckDB database backup toolkit supporting both file-based (cp) and native (ATTACH+COPY) backup approaches. Use when you need to backup DuckDB databases locally or to remote storage, create daily scheduled backups, verify backup integrity, or manage backup retention policies.
DuckDB Backup Skill
Quick Start
Two backup methods, choose based on your needs:
File-Based Backup (cp method) - Fast for local backups
python3 scripts/backup_duckdb.py \
--db data/awsntp.duckdb \
--backup data/backup-20251223.duckdb \
--method cp
Native DuckDB Backup (attach method) - For remote/cloud backups
python3 scripts/backup_duckdb.py \
--db data/awsntp.duckdb \
--backup data/backup-20251223.duckdb \
--method attach
With Automatic Timestamp Naming
python3 scripts/backup_duckdb.py \
--db data/awsntp.duckdb \
--backup data/backups/ \
--method cp \
--timestamp
# Creates: data/backups/backup-20251223-153045.duckdb
Features
- Two backup methods: cp (fast, local) and attach (flexible, cloud-ready)
- Timestamped backups: Automatic unique naming with YYYYMMDD-HHMMSS format
- Error handling: Comprehensive logging and error reporting
- Scheduled backups: cron/Task Scheduler integration for daily backups
- Verification: Tools to verify backup integrity and table counts
- Retention policies: Scripts to manage backup history and cleanup
When to Use
File-Based (cp method):
- Daily local backups (fastest approach)
- Pre-materialization safety snapshots
- Development/testing environments
- When speed is critical
- Local machine backups
Native DuckDB (attach method):
- Cloud/remote storage backups
- Cross-environment transfers
- Database in active use scenarios
- When you need portable backups
- Complex backup scenarios
Command Reference
python3 scripts/backup_duckdb.py --db <source> --backup <target> [options]
Arguments:
--db(required): Path to source DuckDB database--backup(required): Backup target (file path for cp, directory for attach with --timestamp)--method(optional): Backup method -cp(default) orattach--timestamp(optional): Add YYYYMMDD-HHMMSS timestamp to filename
Exit codes:
0: Backup successful1: Backup failed (check logs)
Examples
Example 1: One-time Local Backup
python3 scripts/backup_duckdb.py \
--db ~/LocalRepos/awsntpdagster/data/awsntp.duckdb \
--backup ~/LocalRepos/awsntpdagster/data/backup-20251223.duckdb \
--method cp
Example 2: Daily Timestamped Backups to Folder
python3 scripts/backup_duckdb.py \
--db ~/LocalRepos/awsntpdagster/data/awsntp.duckdb \
--backup ~/LocalRepos/awsntpdagster/data/backups \
--method cp \
--timestamp
Example 3: Scheduled Daily Backup (cron)
Add to crontab:
0 2 * * * python3 /path/to/backup_duckdb.py --db /path/to/awsntp.duckdb --backup /path/to/backups/ --method cp --timestamp >> /var/log/duckdb_backup.log 2>&1
Example 4: Backup with Retention Cleanup
#!/bin/bash
# Backup and keep only 7 most recent
BACKUP_DIR="/path/to/backups"
python3 scripts/backup_duckdb.py \
--db data/awsntp.duckdb \
--backup "$BACKUP_DIR" \
--method cp \
--timestamp
# Keep only 7 most recent backups
ls -t "$BACKUP_DIR"/backup-*.duckdb | tail -n +8 | xargs rm -f
Backup Strategy Selection
For 975 MB DuckDB database (awsntp.duckdb):
| Scenario | Method | Speed | Best For |
|---|---|---|---|
| Daily backup before asset materialization | cp | ~1-2s | Production safety |
| Weekly archive to external drive | cp | ~1-2s | Local storage |
| Cloud backup to S3 | attach | ~30-60s | Remote storage |
| Backup during active queries | attach | N/A | Concurrent access |
Advanced Usage
Verify Backup Integrity
# Check backup exists and is readable
duckdb data/backup-20251223-153045.duckdb \
"SELECT COUNT(*) as table_count FROM information_schema.tables;"
Compare Original and Backup Sizes
ls -lh data/awsntp.duckdb data/backup-*.duckdb | awk '{print $5, $9}'
List All Backups Chronologically
ls -lt data/backups/backup-*.duckdb | head -10
Troubleshooting
Issue: "duckdb: command not found"
- Install DuckDB CLI or update PATH
- Use full path to duckdb binary if attach method needed
Issue: "database is locked"
- Ensure no active connections to source database
- attach method can work around this (doesn't lock database)
Issue: "out of disk space"
- Check available disk space:
df -h - Use attach method for cloud storage
- Implement retention cleanup scripts
Issue: Slow backup
- For large databases, cp method is fastest
- Schedule backups during off-peak hours
- Consider incremental backup strategies
Integration with Your Pipeline
Pre-materialization Backup
# Backup before running expensive assets
python3 scripts/backup_duckdb.py \
--db data/awsntp.duckdb \
--backup data/pre-materialization-backup.duckdb \
--method cp
# Then run your asset materialization
dagster asset materialize -m awsntpdagster.definitions --select awsntp_features_merged_v6
Scheduled Daily Backup
For your awsntpdagster project:
macOS/Linux crontab:
0 2 * * * cd /Users/zhaoliang/LocalRepos/awsntpdagster && python3 src/awsntpdagster/scripts/backup_duckdb.py --db data/awsntp.duckdb --backup data/backups/ --method cp --timestamp
Reference
For detailed backup strategy guide, scheduling patterns, and retention policies, see:
- backup_strategies.md - When to use each method, cron setup, verification, troubleshooting
Performance Notes
- cp method: ~1-2 seconds for 975 MB database (your awsntp.duckdb)
- attach method: ~30-60 seconds for 975 MB database
- Both methods: Minimal CPU usage
- Disk space required: 1x original database size