user-invocable: false description: > Analyze exported evaluation results from Copilot Studio's Evaluate tab. The user provides a CSV file exported from the Copilot Studio UI; this skill parses it, identifies failures, and proposes YAML fixes. No API access or published agent required — just the exported CSV. allowed-tools: Read, Glob, Grep, Edit context: fork agent: copilot-studio-test

Analyze Copilot Studio Evaluation Results

Analyze evaluation results exported from the Copilot Studio UI as CSV.

Ask the user for the CSV file path if not already provided. The file is typically exported from Copilot Studio's Evaluate tab and named Evaluate <agent name> <date>.csv in their Downloads folder.

Read the CSV file. The in-product evaluation CSV has these columns:

Column	Meaning
`question`	The test utterance
`expectedResponse`	Expected response (may be empty)
`actualResponse`	What the agent responded
`testMethodType_1`	Eval method (e.g., `GeneralQuality`)
`result_1`	`Pass` or `Fail`
`passingScore_1`	Score threshold (may be empty)
`explanation_1`	Why it passed/failed (e.g., "Seems relevant; Seems incomplete; Knowledge sources not cited")

The _1 suffix indicates the first eval method. There may be additional methods (_2, _3, etc.) with the same column pattern.

Focus on failed evaluations (result_1 = Fail, or any result_N = Fail).
For each failure, use the explanation column to understand the issue:
- "Question not answered" — The agent couldn't handle the question. Check if there's a matching topic or knowledge source.
- "Knowledge sources not cited" — The agent responded but didn't cite sources. Check knowledge source configuration and SearchAndSummarizeContent nodes.
- "Seems incomplete" — The response was partial. Check topic flow for early exits, missing branches, or incomplete SendActivity messages.
- Error messages in actualResponse (e.g., GenAIToolPlannerRateLimitReached) — These are runtime errors, not authoring issues. Flag them to the user as transient failures to retry.

For each failure, identify the relevant YAML file(s):
- Auto-discover the agent: Glob: **/agent.mcs.yml
- Find the relevant topic by matching the test utterance against trigger phrases and model descriptions
- Read the topic file to understand the current flow
Propose specific YAML changes to fix each failure. Present them to the user as a summary:
- Which test(s) failed and why
- Which file(s) need changes
- What the proposed change is (show the diff)
Wait for user decision. The user can:
- Accept all — apply all proposed changes
- Accept partially — apply only some changes (ask which ones)
- Reject — discard proposed changes and discuss alternative approaches
Apply accepted changes using the Edit tool. After applying, remind the user to push and publish again before re-running evaluations.