name: "rag-implementation" description: "RAG Implementation Workflow workflow skill. Use this skill when the user needs RAG (Retrieval-Augmented Generation) implementation workflow covering embedding selection, vector database setup, chunking strategies, retrieval optimization, and practical evaluation of retrieval quality before prompt tuning or handoff." version: "0.0.1" category: "ai-agents" tags:

"rag-implementation"
"rag"
"retrieval-augmented"
"generation"
"implementation"
"embedding"
"chunking"
"retrieval"
"omni-enhanced" complexity: "advanced" risk: "safe" tools:
"codex-cli"
"claude-code"
"cursor"
"gemini-cli"
"opencode" source: "omni-team" author: "Omni Skills Team" date_added: "2026-04-15" date_updated: "2026-04-19" source_type: "omni-curated" maintainer: "Omni Skills Team" family_id: "rag-implementation" family_name: "RAG Implementation Workflow" variant_id: "omni" variant_label: "Omni Curated" is_default_variant: true derived_from: "skills/rag-implementation" upstream_skill: "skills/rag-implementation" upstream_author: "sickn33" upstream_source: "community" upstream_pr: "79" upstream_head_repo: "diegosouzapw/awesome-omni-skills" upstream_head_sha: "6bf093920a93e68fa8263cf6ee767d7407989d56" curation_surface: "skills_omni" enhanced_origin: "omni-skills-private" source_repo: "diegosouzapw/awesome-omni-skills" replaces:
"rag-implementation"

RAG Implementation Workflow

Overview

This skill curates the upstream rag-implementation workflow into an execution-focused guide for designing, reviewing, and troubleshooting Retrieval-Augmented Generation systems.

Use it when the task is not just “add a vector database,” but to make concrete decisions about:

whether RAG is the right solution at all
whether managed file search is sufficient or custom indexing is required
how to chunk, enrich, and index documents safely
how to choose dense, lexical, hybrid, and filtered retrieval patterns
how to evaluate retrieval quality separately from generation quality
how to debug failures such as missed passages, stale citations, duplicate chunks, and weak grounding

Preserve the upstream intent: this remains a practical implementation workflow covering embedding selection, vector storage, chunking, and retrieval optimization. The enhancement adds stronger activation boundaries, clearer quality gates, and more operational troubleshooting.

When to Use This Skill

Use this skill when one or more of these are true:

The system must answer from external documents rather than model memory.
The source corpus changes often enough that prompt-only approaches become stale.
The user requires citations, provenance, or document-grounded answers.
The corpus is large enough that manual prompt stuffing is not realistic.
Retrieval quality, filtering, freshness, or tenant isolation are part of the implementation scope.
The team needs to compare managed retrieval with a custom vector pipeline.

Do not default to this skill when:

The knowledge is small, static, and can fit directly in a prompt.
The task is mainly style adaptation or behavior shaping rather than external knowledge access.
There is no searchable corpus yet.
Latency or complexity budgets do not allow retrieval, indexing, and evaluation overhead.
A deterministic search/index system already exists and only answer formatting is needed.

Fast routing

If the situation is...	Prefer...
Small static instructions, no corpus, no citations needed	Prompt-only solution
File-grounded Q&A with limited customization needs	Managed file search
Tenant-aware retrieval, custom ranking, custom ingestion, or compliance constraints	Custom RAG pipeline
Knowledge is stable but behavior needs adaptation	Fine-tuning or task-specific prompting

Operating Table

Decision area	What to inspect	Practical guidance
RAG vs alternatives	Corpus size, freshness, citation need, latency budget	If freshness and provenance matter, RAG is usually justified. If not, first test prompt-only or managed retrieval.
Managed retrieval vs custom pipeline	ACL needs, custom chunking, reranking, observability, compliance	Use managed retrieval for speed. Choose custom indexing when you need tenant isolation, custom metadata filters, custom ranking, or strict ingestion control.
Chunking strategy	Document type, section boundaries, tables, code blocks, FAQs, policy text	Preserve semantic units first. Avoid one fixed chunk size for every corpus. Store structural metadata with every chunk.
Retrieval mode	Query type, corpus language, identifiers, jargon, versioned content	Dense retrieval is not always enough. Prefer hybrid or filtered retrieval for SKU-like terms, versions, legal text, or keyword-heavy corpora.
Embedding/index choice	Corpus scale, latency, operational maturity, filter complexity	Choose based on workload tradeoffs, not fashion. Index choice affects recall, cost, reindex behavior, and debugging.
Evaluation	Gold queries, expected passages, citation correctness, abstention behavior	Evaluate retrieval first, then generation. A fluent answer does not prove the right chunks were retrieved.
Safety	Provenance, ACL metadata, prompt injection exposure, stale content	Treat retrieval as a trust boundary. Restrict scope, keep citations, and abstain when evidence is weak or conflicting.

For compact decision support, use:

references/rag-decision-matrix.md
examples/rag-evaluation-example.md

Workflow

Phase 1: Requirements and activation check

Inputs

user task
target corpus or planned corpus
answer quality expectations
latency, cost, and compliance constraints

Actions

Confirm whether RAG is actually needed.
Identify whether freshness, provenance, or grounded answers are required.
Define what the system must return: answer only, answer plus citations, or structured diagnostics.
Decide whether managed file search can satisfy the use case before proposing a custom stack.

Outputs

a clear activation decision
initial architecture direction: prompt-only, managed retrieval, or custom RAG
explicit success criteria

Exit criteria

The team can state why RAG is required.
The team knows what counts as a correct answer and what evidence must accompany it.

Phase 2: Corpus preparation

Inputs

source documents
update cadence
ownership and access rules

Actions

inventory source types: policies, manuals, tickets, code, FAQs, tables, transcripts, product docs
remove duplicate or superseded content where possible
normalize encoding and extraction quality
assign metadata needed later for filtering and auditability

Recommended metadata per chunk

source URI or document id
title or section label
version or effective date
ingestion timestamp
language
tenant, team, or ACL scope where applicable
document type
parent section lineage

Outputs

clean corpus ready for chunking and indexing
metadata schema for retrieval and security controls

Exit criteria

Operators can trace any chunk back to its source.
Access-control and freshness metadata exist before indexing.

Phase 3: Chunking and enrichment

Inputs

normalized documents
document-type inventory

Actions

chunk by semantic boundaries first: headings, sections, paragraphs, FAQ items, code units, table neighborhoods
use overlap only where it preserves context across boundaries
keep surrounding structural cues that help retrieval and citation
test different chunking strategies on real questions instead of adopting universal token defaults

Document-specific heuristics

Policies / legal / compliance text: preserve clause and section boundaries; attach effective date and policy id.
Technical docs: keep headings, version tags, API names, and nearby examples together.
Code or config docs: chunk by function, class, command, or config block; avoid splitting syntax from explanation.
FAQs / support articles: one question-answer pair per chunk is often clearer than broad paragraph chunks.
Tables: keep the caption, header row meaning, and nearby explanatory text with extracted values where possible.

Outputs

chunked corpus with structural metadata

Exit criteria

A reviewer can inspect a chunk and still understand what document section it came from.
Chunk boundaries do not destroy the meaning needed for retrieval.

Phase 4: Embedding and index design

Inputs

chunked corpus
query patterns
scale and latency targets

Actions

choose embedding approach appropriate to corpus and query language
choose storage/index approach based on workload, not vendor preference
decide whether metadata filtering, hybrid retrieval, reranking, or database-native indexing are required
document re-embedding and reindex triggers before launch

Common decision factors

corpus size and growth rate
latency target
lexical search importance for identifiers and exact terms
need for metadata filters and ACL enforcement
operational tolerance for running a separate search service
observability and debugging needs

Reindex or re-embed when

the chunk schema changes
key metadata fields are added or corrected
a major document refresh lands
the embedding model changes
retrieval quality regresses on a stable test set

Outputs

documented embedding and index plan

Exit criteria

The team can explain why this storage/index path fits the workload.
Reindex triggers are known in advance, not discovered during incidents.

Phase 5: Retrieval design

Inputs

indexed corpus
representative user queries

Actions

start with the simplest retrieval path that matches the corpus
test dense retrieval, lexical retrieval, or hybrid retrieval against real queries
apply metadata filters for tenant, freshness, product/version, language, or document scope
tune top-k only after inspecting what is being returned
consider reranking when initial recall is acceptable but final ordering is weak

Rules of thumb

Use dense retrieval for semantic similarity and paraphrased questions.
Use lexical signals when exact identifiers, product names, versions, or policy labels matter.
Use hybrid retrieval when either semantic-only or keyword-only search misses relevant evidence.
Use metadata filters as both quality controls and security controls.

Outputs

retrieval policy for query handling
logging requirements for top results, scores, filters, and citations

Exit criteria

The team can inspect retrieved chunks and explain why they were selected.
The system can narrow retrieval scope safely using metadata.

Phase 6: Grounded answer generation

Inputs

retrieved chunks
answer policy

Actions

instruct the model to answer from retrieved evidence when grounding is required
require citations or source references when the use case depends on provenance
define abstention behavior when retrieval is weak, missing, or contradictory
prefer structured outputs for diagnostics, eval runs, or review workflows

Minimum answer policy

answer only from retrieved evidence when the task requires grounding
cite the supporting source or section when feasible
state uncertainty or ask a follow-up when the evidence is insufficient
do not silently fill missing facts from general model knowledge in a supposedly grounded workflow

Outputs

grounded answer contract
operator-visible diagnostic format if needed

Exit criteria

The answer behavior makes grounding failures visible rather than hiding them behind fluent prose.

Phase 7: Offline evaluation

Inputs

gold query set
expected documents or passages
generated answers with citations

Actions

evaluate retrieval quality separately from answer quality
record whether the right document or passage appeared in the retrieved set
verify citation correctness and unsupported claims
group failures into buckets before changing prompts or models

Useful evaluation dimensions

retrieval hit rate or recall proxy on expected documents/passages
citation correctness
groundedness or unsupported-claim rate
answer usefulness to the user task
abstention quality when evidence is weak

Outputs

failure buckets tied to retrieval, chunking, metadata, ranking, or generation behavior

Exit criteria

The team knows whether the main problem is ingestion/retrieval or answer generation.
Prompt tuning is not used to hide indexing defects.

See examples/rag-evaluation-example.md for a worked mini-evaluation.

Phase 8: Production monitoring and maintenance

Inputs

live queries
retrieval logs
corpus refresh events

Actions

monitor retrieval misses, stale citations, empty-result rates, and filter behavior
audit tenant or ACL scoping regularly
track corpus drift and reindex triggers
review examples where users report “hallucination” to confirm whether the real issue was retrieval failure

Outputs

maintenance plan for refresh, reindex, and incident review

Exit criteria

The team can detect degradation caused by corpus changes, not just model changes.

Troubleshooting

1. Relevant document exists but is not retrieved

Likely causes

chunk boundaries split the answer from its heading or context
semantic-only retrieval misses exact identifiers
metadata filters are too narrow
stale index or incomplete ingestion

Checks

inspect top-k retrieved chunks and their metadata
test the same query with and without filters
test dense-only versus hybrid retrieval
verify the document was actually indexed in the expected version

Fixes

rework chunking to preserve semantic units
add lexical or hybrid retrieval
correct filters or metadata
reindex the missing or updated content

2. Answer cites the wrong section or wrong version

Likely causes

overlapping chunks produce near-duplicate candidates
superseded content remains searchable
ranking favors semantically similar but outdated text

Checks

compare cited chunk metadata with effective date and version
inspect for duplicate or superseded documents in the index
review whether freshness metadata exists and is used

Fixes

deduplicate or retire old content
filter by version/effective date where appropriate
store stronger provenance metadata and expose it in the answer

3. Answers are repetitive or contain duplicated evidence

Likely causes

overlap is too high
many near-identical chunks from the same source dominate retrieval
top-k is too large for the query

Checks

inspect neighboring retrieved chunks for near-duplicates
compare answer quality at lower top-k values

Fixes

reduce unnecessary overlap
deduplicate chunk candidates before generation
tune top-k based on query class, not a universal default

4. Dense retrieval fails on SKUs, codes, policy IDs, or version numbers

Likely causes

lexical specificity matters more than semantic similarity
identifiers were normalized or lost during ingestion

Checks

run exact-term tests against representative identifier queries
confirm identifiers remain present in chunk text and metadata

Fixes

add lexical or hybrid retrieval
preserve identifiers in chunk text and metadata
add filters for product, version, or document type

5. Users report hallucinations, but the real issue is bad retrieval

Likely causes

answer generation is blamed before retrieval logs are inspected
weak evidence is still passed to the model as if it were sufficient

Checks

review retrieved chunks before reviewing prompt wording
verify whether cited evidence truly supports the answer
test abstention behavior on weak-retrieval cases

Fixes

enforce grounded-answer policy
require citations in evaluation runs
adjust retrieval, chunking, or ranking before changing prompts

6. Cross-tenant or unauthorized content appears in results

Likely causes

missing ACL metadata
retrieval performed without required filters
shared index without proper scope enforcement

Checks

inspect metadata fields on returned chunks
confirm filters are applied server-side where required
review ingestion pipeline for missing tenant or ACL attributes

Fixes

add mandatory scope metadata to every chunk
require retrieval filters for tenant/access boundaries
reindex content after metadata correction

7. Retrieved content is stale or superseded

Likely causes

corpus refresh does not trigger reindexing
old and new versions coexist without ranking or filtering rules

Checks

compare source freshness with index freshness
inspect effective dates and ingestion timestamps

Fixes

define explicit reindex triggers
filter or rank by freshness where the use case requires it
retire superseded content from searchable scope

8. Retrieved passages contain prompt injection or untrusted instructions

Likely causes

the system treats retrieved text as trusted instructions instead of untrusted evidence
unreviewed external content is indexed without policy boundaries

Checks

inspect whether the prompt clearly separates system policy from retrieved content
review the provenance and trust class of indexed sources

Fixes

treat retrieved text as data, not instructions
limit indexing of untrusted sources or isolate them by policy
require the model to ground answers in evidence rather than obey document-embedded commands

Additional Resources

references/rag-decision-matrix.md — compact architecture and retrieval decision matrix for implementation planning
examples/rag-evaluation-example.md — worked example that separates retrieval defects from generation defects

Primary guidance to verify before final implementation decisions:

OpenAI Embeddings guide: https://platform.openai.com/docs/guides/embeddings
OpenAI Retrieval guide: https://platform.openai.com/docs/guides/retrieval
OpenAI File Search guide: https://platform.openai.com/docs/guides/tools-file-search
OpenAI structured output guidance: https://platform.openai.com/docs/guides/text?api-mode=responses

If exact documentation URLs move, re-check current official docs before merge or handoff.

Related Skills

Use a different or adjacent skill when the task shifts to:

prompt engineering without external retrieval
search relevance tuning for a non-LLM application
evaluation framework design beyond a lightweight gold-set review
agent orchestration where retrieval is one tool among many
data governance or redaction workflows before indexing sensitive corpora

ナビゲーション

Skillsとは？

リンク

rag-implementation

RAG Implementation Workflow

Overview

When to Use This Skill

Fast routing

Operating Table

Workflow

Phase 1: Requirements and activation check

Phase 2: Corpus preparation

Phase 3: Chunking and enrichment

Phase 4: Embedding and index design

Phase 5: Retrieval design

Phase 6: Grounded answer generation

Phase 7: Offline evaluation

Phase 8: Production monitoring and maintenance

Troubleshooting

1. Relevant document exists but is not retrieved

2. Answer cites the wrong section or wrong version

3. Answers are repetitive or contain duplicated evidence

4. Dense retrieval fails on SKUs, codes, policy IDs, or version numbers

5. Users report hallucinations, but the real issue is bad retrieval

6. Cross-tenant or unauthorized content appears in results

7. Retrieved content is stale or superseded

8. Retrieved passages contain prompt injection or untrusted instructions

Additional Resources

Related Skills

関連スキル(🔧 開発ツール)