name: sql-expert description: Expert system for generating, validating, and optimizing ClickHouse SQL. Use this when the user needs data, queries, or analysis. metadata: author: System disable-slash-command: true
🚨 CRITICAL RULE: MANDATORY VALIDATION
You MUST call
validate_sql(sql)for every new query you generate. Context Note: Historical validation steps are pruned to save tokens, but this does NOT excuse you from validating new queries in the current turn. Always validate before executing.
1. Schema Discovery & Context
- Missing Schema: If you do not have the table schema, you MUST use
get_tablesandexplore_schemafirst.- Optimization: If the user already mentioned exact field names, pass them in the
columnsargument ofexplore_schemainstead of loading the full table schema.
- Optimization: If the user already mentioned exact field names, pass them in the
- Exact Identifier Rule: Treat exact identifier-like tokens from the user question as candidate columns on the first schema lookup. This especially applies to ClickHouse metric names such as
ProfileEvent_*,CurrentMetric_*, and flattened event columns onsystem.*tables. - Missing Columns: If you don't see the expected column, retry
explore_schemawith a narrowercolumnslist based on the user-mentioned identifier or the closest confirmed column names. - Schema Fidelity: Only use columns that are confirmed to exist in the table schema from
explore_schema. Do not assume standard columns exist if they are not in the tool output. - User Context: If the user asks about "my data", use
WHERE user = '<clickHouseUser>'. - System Tables: For queries on
system.*tables (e.g.,system.query_log,system.parts,system.merges), defer to theclickhouse-system-queriesskill - it contains table-specific patterns, predicates, and resource metrics that this skill does not cover. Forsystem.query_log, do not generate SQL untilreferences/system-query-log.mdhas been loaded viaskill_resource, and do not callsearch_query_logfor chart/time-series requests.
2. Syntax Rules (The Grammar)
- Tables: ALWAYS use fully qualified names (e.g.,
database.table). - Semicolons: NEVER include a trailing semicolon (
;). - Enums: Use exact string literals for Enum columns.
- Safety: ALWAYS use
LIMITfor data exploration queries.
3. Optimization Rules (Best Practices)
- Time filters: Always filter by the partition key (usually
event_dateortimestamp) first. Use bounded time windows (e.g., last 24h, 7 days) unless the user asks for all history. - Primary Keys (CRITICAL): ClickHouse indexes are sparse. You MUST filter on the leading column of the Primary Key if you filter on any secondary column.
- Bad:
WHERE event_time > now() - 1h(If PK isevent_date, event_time, this scans everything). - Good:
WHERE event_date >= toDate(now() - 1h) AND event_time > now() - 1h(Uses index, handles midnight crossover).
- Bad:
- Approximation: Use
uniq()instead ofuniqExact()unless precision is explicitly requested. - Joins: Put the smaller table on the RIGHT. Use
GLOBAL INonly for distributed queries.
4. Execution Workflow
- Generate: Create the SQL following the rules above.
- Validate (MANDATORY): Call
validate_sql(sql).- If invalid: Read the error, fix the SQL, and retry (max 3 attempts).
- Decide Action:
- Visualization: IF the user wants a chart, DO NOT execute. Pass the SQL to the visualization skill logic.
- Data: IF the user wants answers (lists, counts), call
execute_sql(sql). - Code Only: IF the user asks to "write SQL", just output the code block.