name: matlab-use-duckdb description: "Generates MATLAB code for DuckDB database operations using Database Toolbox. Use when connecting to DuckDB (in-memory or file-based), querying CSV/Parquet/JSON files with SQL, creating development databases, preprocessing out-of-memory data, using duckdb(), installing DuckDB extensions, or using DuckDB as an analytical engine in MATLAB." license: MathWorks BSD-3-Clause metadata: author: MathWorks version: "1.0"

MATLAB Database Toolbox Interface to DuckDB

Use when working with DuckDB databases from MATLAB using Database Toolbox. DuckDB is an embedded analytical database engine that ships with Database Toolbox starting in R2026a. It enables SQL-based analytics on files, out-of-memory data preprocessing, and portable development databases — all without external database server configuration.

When to Use This Skill

Connecting to a DuckDB database (in-memory or file-based)
Creating a new DuckDB database file for development workflows
Querying CSV, Parquet, or JSON files directly with SQL
Preprocessing large data that doesn't fit in memory before importing into MATLAB
Using DuckDB as an analytical engine for filtering, aggregation, joins, or sorting
Installing and using DuckDB extensions
User mentions keywords: DuckDB, duckdb, analytical engine, embedded database, parquet, CSV analytics, in-memory database, portable database, development database, out-of-memory preprocessing

When NOT to Use

Connecting to MySQL, PostgreSQL, SQLite, or other external databases — use their native interfaces or JDBC/ODBC
Data fits in memory and only needs standard MATLAB operations — use readtable/readmatrix directly
Object-relational mapping — use ORM (ormread/ormwrite with Mappable classes)
MongoDB, Cassandra, or Neo4j — use their dedicated Database Toolbox interfaces

What Is DuckDB and Why Does Database Toolbox Ship It?

DuckDB is an embedded, serverless analytical database engine. Unlike MySQL or PostgreSQL, it requires no server, no configuration, and runs in-process within MATLAB.

Why it ships with Database Toolbox (R2026a+):

Zero-config database — conn = duckdb() gives you a full SQL engine instantly.
Analytical engine for files — Query CSV, Parquet, and JSON files directly with SQL without loading them into memory.
Out-of-memory preprocessing — Filter, aggregate, join, and sort datasets larger than memory, then bring only results into MATLAB.
Portable development databases — .duckdb or .db files work on any machine with Database Toolbox. No database setup needed.
AI agent advantage — An agent's SQL knowledge directly translates to powerful analytical queries.

DuckDB does NOT replace MATLAB's file I/O (readtable, etc.). It is a performant alternative when data exceeds memory or SQL operations are more natural than MATLAB table operations.

Critical Rules

Connection

ALWAYS use duckdb() to connect — not database(), not JDBC, not ODBC.
ALWAYS verify with isopen(conn) and close with close(conn).

API Surface

All standard functions work: sqlread, fetch, execute, sqlwrite, sqlfind, sqlinnerjoin, sqlouterjoin, commit, rollback.
DuckDB does NOT support databasePreparedStatement. Use execute or sqlwrite instead.
Use ExcludeDuplicates via databaseImportOptions when reading from database tables (with sqlread). For direct file queries (read_csv/read_parquet via fetch), use SELECT DISTINCT in SQL.

File Queries

ALWAYS use fetch (not sqlread) for file queries — they require SQL syntax like SELECT * FROM read_csv('file.csv').
ALWAYS use single quotes for file paths inside SQL: read_csv('data.csv').

Decision Framework

Which connection mode should I use?

Goal	Connection	Why
Analytical queries on files	`duckdb()`	No persistence needed; query files directly
Temporary workspace	`duckdb()`	Fast, discarded on close
Portable development database	`duckdb("mydata.duckdb")`	Creates a `.duckdb` or `.db` file; works on any machine
Open existing database	`duckdb("existing.db")`	Read/write access to pre-existing `.db` or `.duckdb` file
Read-only shared database	`duckdb("shared.duckdb", ReadOnly=true)`	Prevents accidental writes

When should I use DuckDB vs. MATLAB file I/O?

Scenario	Recommendation
Small data, simple operations	`readtable` / `readmatrix`
Data exceeds memory, needs filtering/aggregation	DuckDB (preprocess in SQL, analyze in MATLAB)
Query across multiple CSV/Parquet files	DuckDB with glob patterns
Portable development database	DuckDB file-based connection
MATLAB-specific analysis (signal processing, ML)	Preprocess in DuckDB, analyze in MATLAB

Common Patterns

Pattern 1: Analytical Engine on Files

conn = duckdb();
result = fetch(conn, "SELECT region, SUM(revenue) as total " + ...
    "FROM read_parquet('sales.parquet') " + ...
    "GROUP BY region ORDER BY total DESC");
close(conn);

Pattern 2: Out-of-Memory Preprocessing

conn = duckdb();
summary = fetch(conn, "SELECT date, AVG(value) as avg_val " + ...
    "FROM read_csv('huge_dataset.csv') " + ...
    "WHERE status = 'valid' " + ...
    "GROUP BY date ORDER BY date");
close(conn);
% summary fits in memory — continue with MATLAB analysis

Pattern 3: Development Database

conn = duckdb("dev.duckdb");
sqlwrite(conn, "experiments", experimentData);
rf = rowfilter("score");
results = sqlread(conn, "experiments", RowFilter=rf.score > 0.8);
close(conn);

Pattern 4: Multi-File Query with Glob

conn = duckdb();
data = fetch(conn, "SELECT * FROM read_parquet('data/year=2024/*.parquet') " + ...
    "WHERE category = 'A'");
close(conn);

Pattern 5: Extensions

conn = duckdb();
execute(conn, "INSTALL httpfs");
execute(conn, "LOAD httpfs");
data = fetch(conn, "SELECT * FROM read_parquet('https://example.com/data.parquet') LIMIT 1000");
close(conn);

For detailed examples, see:

File analytics and out-of-memory preprocessing: reference/cards/file-analytics.md
Development database workflows: reference/cards/development-database.md
DuckDB extensions: reference/cards/extensions.md

Common Mistakes

% WRONG — using database() or JDBC to connect to DuckDB
conn = database("", "", "", "org.duckdb.DuckDBDriver", "jdbc:duckdb:");
% CORRECT
conn = duckdb();

% WRONG — using sqlread for file queries (expects a table name)
data = sqlread(conn, "read_csv('data.csv')");
% CORRECT — use fetch with SQL
data = fetch(conn, "SELECT * FROM read_csv('data.csv')");

% WRONG — double quotes for file paths in SQL
data = fetch(conn, "SELECT * FROM read_csv(""data.csv"")");
% CORRECT — single quotes
data = fetch(conn, "SELECT * FROM read_csv('data.csv')");

% WRONG — loading huge file into MATLAB then filtering
data = readtable("huge.parquet"); filtered = data(data.val > 100, :);
% CORRECT — let DuckDB filter on disk
conn = duckdb();
filtered = fetch(conn, "SELECT * FROM read_parquet('huge.parquet') WHERE val > 100");
close(conn);

% WRONG — using databasePreparedStatement (not supported)
pstmt = databasePreparedStatement(conn, "INSERT INTO t VALUES(?, ?)");
% CORRECT — use sqlwrite
sqlwrite(conn, "t", data);

Checklist

Before finalizing DuckDB code, verify:

Connected with duckdb() or duckdb("file.duckdb") / duckdb("file.db") — not database() or JDBC
isopen(conn) checked after connection
File queries use fetch with SQL (not sqlread)
File paths in SQL use single quotes
Out-of-memory data preprocessed in DuckDB before importing to MATLAB
No databasePreparedStatement usage (not supported)
close(conn) called when done

Troubleshooting

Issue: duckdb function not found

Solution: Requires R2026a+ with Database Toolbox. Check with ver('database').

Issue: sqlread errors with file query

Solution: Use fetch(conn, "SELECT * FROM read_csv('file.csv')") — sqlread expects table names only.

Issue: Permission denied on ReadOnly connection

Solution: Reconnect without ReadOnly: conn = duckdb("file.duckdb").

Issue: Out of memory when querying large file

Solution: Add WHERE, GROUP BY, LIMIT, or aggregation in SQL to reduce result size before it enters MATLAB.

Issue: File path not found in read_csv/read_parquet

Solution: Paths are relative to pwd. Use absolute paths or verify with dir('file.csv').

Issue: Extension install fails

Solution: Requires internet for first install (cached afterward). See https://duckdb.org/docs/current/core_extensions/overview.

Issue: Type mismatch on sqlwrite

Solution: DuckDB supports rich types (ARRAY, LIST, STRUCT, MAP). Use sqlfind(conn, "tableName") to check column types.

ナビゲーション

Skillsとは？

リンク

matlab-use-duckdb