Two-Sided Search and Recsys Planning
Version 0.1.0
Marketplace Engineering
April 2026
Note:
This document is mainly for agents and LLMs to follow when maintaining,
generating, or refactoring codebases. Humans may also find it useful,
but guidance here is optimized for automation and consistency by AI-assisted workflows.
Abstract
Planning, design and diagnostic guide for search and recommendation systems in two-sided trust marketplaces built on OpenSearch. Contains 57 rules across 10 categories ordered by cascade impact on the retrieval lifecycle — from user-intent framing and product-surface architecture through OpenSearch index and query design, CDC ingestion, embedding-model selection, retrieval strategy, ranking, search-plus-recs blending, measurement, PII scrubbing and the instrumentation-and-dashboard layer that turns measurement into ongoing decision making. Includes two playbooks for planning a new retrieval system from scratch and diagnosing an existing one, plus explicit living-artefact conventions (decisions log, golden set, gotchas) so context accumulates across sessions, releases, and team changes. Functions as the precursor to the companion marketplace-personalisation skill with an explicit hand-off rule.
Table of Contents
- Problem Framing and User Intent — CRITICAL
- 1.1 Audit Live Query Logs Before Designing — CRITICAL (prevents designing for imagined users)
- 1.2 Distinguish Transactional from Exploratory Intent — CRITICAL (prevents conversion loss on transactional sessions)
- 1.3 Map Queries to Intent Classes Before Touching Retrieval — CRITICAL (prevents retrieval-strategy mismatch with user goal)
- 1.4 Reject the One-Search-For-Everything Temptation — CRITICAL (prevents system-wide compromise)
- 1.5 Separate Known-Item Search from Discovery — CRITICAL (prevents recall loss on known-item queries)
- 1.6 Treat No-Search as a First-Class Choice — CRITICAL (prevents forcing retrieval where browse is correct)
- Surface Taxonomy and Architecture — CRITICAL
- 2.1 Avoid Mono-Stack Retrieval — CRITICAL (prevents single-point-of-failure in retrieval)
- 2.2 Declare a Fallback Owner per Surface at Architecture Time — CRITICAL (prevents fallback gaps on new surfaces)
- 2.3 Design for Cold Start from Day One — CRITICAL (prevents new-listing discovery failure)
- 2.4 Map Each Surface to a Retrieval Primitive Deliberately — CRITICAL (prevents architectural drift across surfaces)
- 2.5 Route Surfaces to Search, Recs, or Hybrid Deliberately — CRITICAL (prevents ad-hoc routing drift)
- 2.6 Split Candidate Generation from Ranking — CRITICAL (enables independent tuning of retrieval and ranking)
- Index Design and Mapping — HIGH
- 3.1 Design Mappings Conservatively Because Reindex Is Expensive — HIGH (avoids full reindex downtime)
- 3.2 Match Index-Time and Query-Time Analyzers — HIGH (prevents tokenisation mismatch at query time)
- 3.3 Separate Searchable Fields from Display Fields — HIGH (reduces index storage and query cost)
- 3.4 Stream Listing Updates via CDC, Not Periodic Full Re-Import — HIGH (reduces index staleness from hours to seconds)
- 3.5 Use Index Templates to Enforce Consistency — HIGH (prevents mapping drift across indices)
- 3.6 Use keyword and text as Multi-Fields — HIGH (enables exact match and full-text on one field)
- 3.7 Use Language Analyzers for Language-Sensitive Fields — HIGH (enables language-aware stemming and stopwords)
- Planning and Improvement Methodology — HIGH
- 4.1 Audit Before You Build — Gate Work on Instrumentation Readiness — HIGH (prevents building on broken telemetry)
- 4.2 Build a Golden Query Set as the First Artefact — HIGH (enables offline regression detection)
- 4.3 Find the Bottleneck Before Optimising — HIGH (prevents work on non-bottleneck layers)
- 4.4 Freeze and Version the Golden Set per Evaluation Cycle — HIGH (enables comparable evaluations across releases)
- 4.5 Hand Off to the Personalisation Skill When the Bottleneck Is Personalisation — HIGH (prevents duplicated planning effort)
- 4.6 Maintain a Decisions Log as Living Context — HIGH (prevents lost context across team changes)
- Query Understanding — MEDIUM-HIGH
- 5.1 Build Autocomplete on a Separate Index — MEDIUM-HIGH (prevents autocomplete latency from blocking main search)
- 5.2 Classify Queries Before Routing — MEDIUM-HIGH (enables intent-aware routing)
- 5.3 Curate Synonyms by Domain Intent — MEDIUM-HIGH (enables domain-specific recall)
- 5.4 Normalise Queries Before Anything Else — MEDIUM-HIGH (prevents unicode and whitespace misses)
- 5.5 Use Fuzzy Matching for Typo Tolerance — MEDIUM-HIGH (prevents recall loss on typos)
- 5.6 Use Language Analyzers for Stemming and Stopwords — MEDIUM-HIGH (enables stemming and stopword removal)
- Retrieval Strategy — MEDIUM-HIGH
- 6.1 Choose the Embedding Model Deliberately Before Hybrid Search — MEDIUM-HIGH (avoids full re-embedding on model change)
- 6.2 Combine BM25 and KNN via Hybrid Search — MEDIUM-HIGH (enables semantic plus lexical recall)
- 6.3 Paginate with search_after for Deep Result Sets — MEDIUM-HIGH (prevents deep-pagination memory cost)
- 6.4 Run Expensive Signals in rescore — MEDIUM-HIGH (reduces scoring cost on full candidate set)
- 6.5 Use bool Structure Deliberately — MEDIUM-HIGH (prevents ambiguous clause semantics)
- 6.6 Use filter Clauses for Exact Matches — MEDIUM-HIGH (enables query result caching)
- Relevance and Ranking — MEDIUM-HIGH
- 7.1 Apply Diversity at Rank Time, Not Retrieval — MEDIUM-HIGH (preserves retrieval recall for diversity)
- 7.2 Deploy Learning to Rank Only After Golden Set and Judgments Exist — MEDIUM-HIGH (prevents premature LTR complexity)
- 7.3 Normalise Scores Across Retrieval Primitives — MEDIUM-HIGH (enables comparable hybrid ranking)
- 7.4 Tune BM25 Parameters Last, Not First — MEDIUM-HIGH (prevents premature micro-optimisation)
- 7.5 Use function_score for Business Signals — MEDIUM-HIGH (enables explainable business ranking)
- Search and Recommender Blending — MEDIUM
- 8.1 Combine Search and Personalisation Scores with Normalised Weights — MEDIUM (enables comparable hybrid ranking)
- 8.2 Keep Hybrid Blending Explainable — MEDIUM (enables blending debugging and tuning)
- 8.3 Never Return Zero Results — MEDIUM (prevents dead-end sessions)
- 8.4 Use Search Alone When Intent Is Specific — MEDIUM (prevents noise on precision-oriented queries)
- Measurement and Experimentation — MEDIUM
- 9.1 Define Session Success per Surface — MEDIUM (enables surface-specific measurement)
- 9.2 Run Interleaving as a Cheap A/B Proxy — MEDIUM (reduces experiment sample-size cost)
- 9.3 Track NDCG, MRR and Zero-Result Rate — MEDIUM (enables ranking-quality measurement)
- 9.4 Track Reformulation Rate as a Failure Signal — MEDIUM (enables implicit query-failure detection)
- 9.5 Use Click Models for Implicit Relevance Judgments — MEDIUM (enables scalable judgment collection)
- Instrumentation, Dashboards and Decision Triggers — MEDIUM
- 10.1 Alert on Decision-Triggering Metrics, Not Just Error Rates — MEDIUM (enables early quality regression detection)
- 10.2 Build a Search Health Dashboard with Threshold Lines — MEDIUM (enables at-a-glance quality monitoring)
- 10.3 Log Every Query with Full Context for Counterfactual Replay — MEDIUM (enables post-hoc query debugging)
- 10.4 Run a Weekly Search-Quality Review Ritual — MEDIUM (enables calendar-driven decision making)
- 10.5 Scrub PII from Query Logs Before Warehouse Ingestion — MEDIUM (prevents GDPR exposure in analytics)
- 10.6 Track Ranking Stability as a Churn Metric — MEDIUM (enables leading-indicator detection)
References
- https://docs.opensearch.org/latest/query-dsl/compound/bool/
- https://docs.opensearch.org/latest/query-dsl/query-filter-context/
- https://docs.opensearch.org/latest/query-dsl/rescore/
- https://docs.opensearch.org/latest/analyzers/
- https://docs.opensearch.org/latest/analyzers/custom-analyzer/
- https://docs.opensearch.org/latest/analyzers/language-analyzers/index/
- https://docs.opensearch.org/latest/analyzers/language-analyzers/english/
- https://docs.opensearch.org/latest/vector-search/ai-search/hybrid-search/index/
- https://opensearch.org/blog/building-effective-hybrid-search-in-opensearch-techniques-and-best-practices/
- https://opensearch.org/blog/multilingual-search/
- https://docs.aws.amazon.com/opensearch-service/latest/developerguide/learning-to-rank.html
- https://aws.amazon.com/blogs/big-data/hybrid-search-with-amazon-opensearch-service/
- https://www.manning.com/books/relevant-search
- https://opensourceconnections.com/blog/2019/12/11/what-is-a-relevant-search-result/
- https://eugeneyan.com/writing/recsys-llm/
- https://www.kdd.org/kdd2018/accepted-papers/view/real-time-personalization-using-embeddings-for-search-ranking-at-airbnb
- https://pubsonline.informs.org/doi/10.1287/mksc.2022.0238
- https://www.pinecone.io/learn/offline-evaluation/
- https://developers.google.com/machine-learning/guides/rules-of-ml
- https://careersatdoordash.com/blog/homepage-recommendation-with-exploitation-and-exploration/
- https://docs.opensearch.org/latest/field-types/
- https://docs.opensearch.org/latest/search-plugins/searching-data/paginate/
- https://sre.google/sre-book/embracing-risk/
- https://sbert.net/examples/sentence_transformer/domain_adaptation/README.html
- https://eugeneyan.com/writing/system-design-for-discovery/
- https://lantern.splunk.com/Security/UCE/Foundational_Visibility/Compliance/Detecting_Personally_Identifiable_Information_(PII)_in_log_data_for_GDPR_compliance
Source Files
This document was compiled from individual reference files. For detailed editing or extension:
| File | Description |
|---|---|
| references/_sections.md | Category definitions and impact ordering |
| assets/templates/_template.md | Template for creating new rules |
| SKILL.md | Quick reference entry point |
| metadata.json | Version and reference URLs |