user-invocable: false description: > Analyze exported evaluation results from Copilot Studio's Evaluate tab. The user provides a CSV file exported from the Copilot Studio UI; this skill parses it, identifies failures, and proposes YAML fixes. No API access or published agent required — just the exported CSV. allowed-tools: Read, Glob, Grep, Edit context: fork agent: copilot-studio-test
Analyze Copilot Studio Evaluation Results
Analyze evaluation results exported from the Copilot Studio UI as CSV.
Phase 1: Get Results
-
Ask the user for the CSV file path if not already provided. The file is typically exported from Copilot Studio's Evaluate tab and named
Evaluate <agent name> <date>.csvin their Downloads folder. -
Read the CSV file. The in-product evaluation CSV has these columns:
Column Meaning questionThe test utterance expectedResponseExpected response (may be empty) actualResponseWhat the agent responded testMethodType_1Eval method (e.g., GeneralQuality)result_1PassorFailpassingScore_1Score threshold (may be empty) explanation_1Why it passed/failed (e.g., "Seems relevant; Seems incomplete; Knowledge sources not cited") The
_1suffix indicates the first eval method. There may be additional methods (_2,_3, etc.) with the same column pattern.
Phase 2: Analyze Results
-
Focus on failed evaluations (
result_1=Fail, or anyresult_N=Fail). -
For each failure, use the
explanationcolumn to understand the issue:- "Question not answered" — The agent couldn't handle the question. Check if there's a matching topic or knowledge source.
- "Knowledge sources not cited" — The agent responded but didn't cite sources. Check knowledge source configuration and
SearchAndSummarizeContentnodes. - "Seems incomplete" — The response was partial. Check topic flow for early exits, missing branches, or incomplete
SendActivitymessages. - Error messages in
actualResponse(e.g.,GenAIToolPlannerRateLimitReached) — These are runtime errors, not authoring issues. Flag them to the user as transient failures to retry.
Phase 3: Propose Fixes
-
For each failure, identify the relevant YAML file(s):
- Auto-discover the agent:
Glob: **/agent.mcs.yml - Find the relevant topic by matching the test utterance against trigger phrases and model descriptions
- Read the topic file to understand the current flow
- Auto-discover the agent:
-
Propose specific YAML changes to fix each failure. Present them to the user as a summary:
- Which test(s) failed and why
- Which file(s) need changes
- What the proposed change is (show the diff)
-
Wait for user decision. The user can:
- Accept all — apply all proposed changes
- Accept partially — apply only some changes (ask which ones)
- Reject — discard proposed changes and discuss alternative approaches
-
Apply accepted changes using the Edit tool. After applying, remind the user to push and publish again before re-running evaluations.