name: optimize-clickhouse-sql description: Optimize slow queries, analyze SQL performance, and collect evidence for expensive workloads. metadata: author: System show-in-sql-editor-quick-action: true
SQL Optimization Skill
Workflow is evidence-driven: collect evidence with tools, then recommend based on evidence only.
Pre-flight Check
- HAS SQL: Conversation contains a SQL query -> Go to WORKFLOW step 2 (Collect Evidence).
- HAS QUERY_ID: Conversation contains query_id -> Go to WORKFLOW step 2 (Call
collect_sql_optimization_evidenceimmediately). - DISCOVERY REQUEST: User asks to optimize the slowest/heaviest queries but does not provide SQL/query_id -> Go to WORKFLOW step 1 (Discovery).
- NEITHER: Call
ask_user_questionwith exactly one question:header:Please provide one of the following for optimizationoptions:{ "id": "sql", "label": "Provide SQL", "input": "text" }{ "id": "query_id", "label": "Provide query_id", "input": "text" }{ "id": "resource", "label": "Find the query that consumes the most", "input": "select", "choices": ["duration", "cpu", "memory", "disk"] }After the tool returns:
- If
optionIdissql, treatvalueas the SQL text and continue with evidence collection. - If
optionIdisquery_id, treatvalueas the query_id and continue with evidence collection. - If
optionIdisresource, treatvalueas the ranking metric and run discovery for the top 1 query in the last 1 day before continuing.
Discovery
- Prefer
search_query_logfor discovery fromsystem.query_log(slowest, most expensive, user-scoped, database-scoped, text-scoped, etc.). - If
search_query_logcannot express the request, then load theclickhouse-system-queriesskill, immediately callskill_resourceforreferences/system-query-log.md, and follow that reference strictly. - Do NOT write ad-hoc SQL against
system.query_logfrom this skill whensearch_query_logcan satisfy the request. - Extract
query_idfrom the discovery results for the next step (evidence collection).
Time Filtering
time_window: Relative minutes from now (e.g., 60 = last hour).time_range: Absolute range{ from: "ISO date", to: "ISO date" }.- When calling
collect_sql_optimization_evidenceafter discovery, you MUST pass the same time_window or time_range used in discovery.
Mode Selection
- Default
collect_sql_optimization_evidenceto light mode for the first pass. - Prefer omitting the
modeargument entirely unless full detail is required. - Use
mode: "full"only when the user explicitly asks for detailed/raw evidence or the light pass is insufficient. - Do not choose
fulljust because the request says "optimize", "analyze", or "investigate".
Workflow
- Discovery (if needed): Prefer
search_query_logto find candidates. If the request exceeds the tool's schema, then loadclickhouse-system-queries, loadreferences/system-query-log.mdviaskill_resource, and use that reference. Extractquery_idfrom the results. - Collect Evidence: Call
collect_sql_optimization_evidencewith query_id (preferred) or sql (and same time params if coming from discovery). - Analyze: Review evidence for optimization opportunities.
- Recommendations: Rank by Impact/Risk/Effort. Prefer low-risk query rewrites first.
- Validate: Use
validate_sqlfor any proposed SQL changes. Add inline comments (-- comment) to highlight key changes.
Table Schema Evidence
- Use table_schema fields: columns, engine, partition_key, primary_key, sorting_key, secondary_indexes.
- When
optimization_targetis present, treat it as the real local-table schema behind aDistributedtable and base key/index recommendations on it. - Suggest secondary indexes only when evidence shows frequent WHERE filters on selective columns and the index type fits the predicate.
- Use
minmaxfor range predicates on sorted columns. - Use
setfor low-cardinality equality filters. - Use
bloom_filterfor high-cardinality equality filters (e.g., trace_id, user_id). - Use
tokenbf_v1for frequent token-based text search.
- Use
Rules
- Do NOT recommend based on assumptions. If evidence is missing, collect it with tools.
- If tools return NO meaningful evidence, output only a brief 3-5 sentence message explaining what's missing.
- Always validate proposed SQL with
validate_sqlbefore recommending. - If discovery results include both query text and query_id, prefer query_id to avoid truncation issues.
- If the SQL appears incomplete (truncated/ellipsized/ends mid-clause), use
query_idinstead of sql. - When both
query_idand SQL are available, preferquery_idto reduce tokens and avoid truncation issues.