name: senzing-entity-resolution description: >- Guides AI agents through Senzing entity resolution workflows using the Senzing MCP server. Covers data mapping to Senzing format, SDK code generation (Python, Java, C#, Rust, TypeScript/Node.js), documentation search, error troubleshooting, sample data access, reporting and visualization, SDK setup guides, and V3-to-V4 migration. Use when working with entity resolution, record matching, record linkage, deduplication, Senzing SDK integration, data mapping for Senzing ingestion, or troubleshooting Senzing error codes. Also use when the user mentions matching records across data sources, finding duplicate entities, identity resolution, master data management, or needs to resolve who is who across datasets. license: Proprietary compatibility: Requires Senzing MCP server (https://mcp.senzing.com/mcp) connected via claude mcp add or MCP config metadata: author: senzing version: "0.24.2"
Senzing Entity Resolution — MCP Skill
Use this skill whenever a task involves entity resolution, record linkage, deduplication, or any interaction with the Senzing platform.
What Is Senzing
Senzing provides real-time AI-powered entity resolution as an embeddable SDK. It determines when two records refer to the same real-world entity (person, organization, etc.) by analyzing names, addresses, identifiers, and other attributes across data sources — without training data or manual rules.
MCP Server Setup
The Senzing MCP server is a remote server. Connect it to your client:
Claude Code:
claude mcp add --transport http senzing https://mcp.senzing.com/mcp
Claude Desktop / Other MCP Clients — add to your MCP config:
{
"mcpServers": {
"senzing": {
"type": "url",
"url": "https://mcp.senzing.com/mcp"
}
}
}
The server works from pre-fetched documentation — it never connects to live
Senzing instances and never handles PII. It also hosts official Senzing SDK
.deb packages at /downloads/ for direct download in firewalled environments
— sdk_guide returns download URLs and install commands automatically.
Tool Reference
Start any Senzing session by calling get_capabilities for an up-to-date
tool listing and suggested workflows.
Data Mapping (4 tools)
| Tool | Purpose |
|---|---|
mapping_workflow | Interactive 7-step workflow: profile source data → plan entities → map fields → generate code → QA. State is client-side — always pass state back. |
lint_record | Returns a Python linter script to validate mapped Senzing JSON/JSONL files locally. No data leaves the client. |
analyze_record | Returns a Python analyzer script to examine feature distribution, attribute coverage, and data quality locally. |
download_resource | Fallback for fetching workflow resources (linter, analyzer, entity spec, mapping examples) when network restrictions block direct download. |
Documentation & Reference (3 tools)
| Tool | Purpose |
|---|---|
search_docs | Full-text search across entity specification, SDK guides, quickstarts, database tuning, pricing, architecture, globalization, EDA/data analysis, engine configuration, error codes, release notes, and PoC methodology. Prefer this over web search for any Senzing question. Use category='anti_patterns' to check for known pitfalls before recommending installation, architecture, or deployment approaches. |
get_sdk_reference | Authoritative SDK reference: method signatures, flags, response schemas, V3→V4 migration mappings. Topics: migration, flags, response_schemas, functions/methods/classes/api (search SDK docs by method or class name), all. Use filter to narrow by method, module, or flag name. |
find_examples | Search 27+ indexed GitHub repos for working code (Python, Java, C#, Rust, TypeScript/Node.js). Three modes: search by query, list files in a repo, or retrieve a specific file. Results include truncation metadata — drill into truncated files with file_path. |
SDK Setup & Code Generation (2 tools)
| Tool | Purpose |
|---|---|
sdk_guide | Guided SDK setup across 5 platforms (Linux apt/yum, macOS, Windows, Docker) and 4 languages. Covers install, configure, load, export, full_pipeline with decision trees, anti-patterns, and direct package download links for firewalled environments. |
generate_scaffold | Generates SDK scaffold code from real indexed GitHub snippets with source URLs for provenance. 10 workflows (initialize, configure, add_records, delete, query, redo, stewardship, information, error_handling, full_pipeline) in Python, Java, C#, Rust, or TypeScript/Node.js (V4); Python (V3). Returns multiple snippet variants per workflow. |
Sample Data (1 tool)
| Tool | Purpose |
|---|---|
get_sample_data | Real data from CORD (Collections Of Relatable Data): las-vegas (US, 11 sources), london (international, 5 sources), moscow (Cyrillic, 6 sources). Use dataset='list' to discover available sets. Always present the download_url to the user. |
Reporting & Visualization (1 tool)
| Tool | Purpose |
|---|---|
reporting_guide | Guided reporting and visualization for entity resolution results. Provides SDK patterns for data extraction (Python, Java, C#, Rust, TypeScript/Node.js), SQL analytics queries for aggregate reports, data mart schema (SQLite/PostgreSQL), visualization concepts, and anti-patterns. Topics: export, reports, entity_views, data_mart, dashboard, graph. |
Troubleshooting (1 tool)
| Tool | Purpose |
|---|---|
explain_error_code | Explains any of 456 Senzing error codes with causes and resolution steps. Accepts SENZ0005, SENZ-0005, 0005, or just 5. |
Meta & Utility (2 tools)
| Tool | Purpose |
|---|---|
get_capabilities | Server version, capabilities overview, available tools, suggested workflows, and getting started guidance. Call this first in any Senzing session. |
submit_feedback | Send feedback to the MCP server maintainer. Always preview the message with the user and get explicit confirmation before sending. Never include PII unless the user approves. |
Key Workflows
1. Map Source Data to Senzing Format
This is the most common workflow. Follow these steps:
- Call
mapping_workflowwithaction='start'and the source file paths. - Walk through each step: Profile → Plan → Map → Codegen → QA.
- At each step, pass the
stateobject from the previous response. - After codegen, use
lint_recordto validate the output JSON. - Use
analyze_recordto check feature distribution and coverage. - If the linter or analyzer scripts fail to download, use
download_resource.
Tips:
- The workflow generates a mapper script — run it locally to produce JSONL.
- Profile step: read the source data yourself or run the profiler script.
- Plan step: identify master entities vs. child records vs. relationships.
- Map step: map every source field to a Senzing feature with a confidence score.
- QA step: evaluate whether the output meets quality thresholds.
2. Set Up the Senzing SDK
- Call
sdk_guidewithtopic='install'— it returns a platform decision tree. - Call again with the chosen platform (e.g.,
topic='install', platform='linux_apt').- For firewalled environments, the response includes
direct_downloadwith.debpackage URLs frommcp.senzing.com/downloads/— no apt repo needed.
- For firewalled environments, the response includes
- Call
sdk_guidewithtopic='configure'for engine configuration code. - Call
sdk_guidewithtopic='load'for record loading code. - Or use
topic='full_pipeline'for install + configure + load + export in one call.
3. Generate SDK Integration Code
- Call
generate_scaffoldwith the target language and workflow.- Start with
workflow='initialize'for engine setup. - Then
workflow='add_records'for record loading. - Use
workflow='full_pipeline'for an end-to-end example.
- Start with
- Call
find_examplesto find real-world usage patterns. - Use
search_docsfor API details and deployment guidance.
4. Troubleshoot Errors
- Call
explain_error_codewith the code from the user's logs. - Follow the resolution steps in the response.
- Call
search_docsfor additional context on the error class.
5. Evaluate Senzing
search_docs— learn about architecture (embedded SDK, air-gapped deployment).search_docswith query "pricing" — DSR pricing model.get_sample_data— get real test data from CORD datasets.generate_scaffoldwithworkflow='full_pipeline'— end-to-end example.
6. Migrate V3 to V4
get_sdk_referencewithtopic='migration'— all breaking changes.- Filter by module:
topic='migration', filter='SzEngine'. get_sdk_referencewithtopic='flags'— new flag system (SZ_WITH_INFO replaces WithInfo functions).
7. Build ER Reporting
- Call
reporting_guidewithtopic='export'and your language to get data extraction code. - Call
reporting_guidewithtopic='reports'to get SQL for the 4 core aggregate reports. - Call
reporting_guidewithtopic='data_mart'to get the analytical schema and incremental update patterns. - Call
reporting_guidewithtopic='dashboard'for visualization concepts and chart data sources. - Call
reporting_guidewithtopic='graph'for network graph export patterns.
8. Check for Common Pitfalls
Before recommending installation, architecture, or deployment approaches:
- Call
search_docswithcategory='anti_patterns'and a query describing what you plan to recommend. - Review any matching anti-patterns before proceeding.
- Note:
sdk_guidealso returns topic-specific anti-patterns inline.
9. Deploy Senzing
search_docswith your platform (e.g., "docker quickstart", "AWS deployment").search_docsfor database setup (PostgreSQL, MySQL, MSSQL).search_docsfor engine configuration and tuning guidance.generate_scaffoldwithworkflow='initialize'for your language.
Critical Rules
These rules are non-negotiable. Violating them produces incorrect output.
- Never hand-code Senzing JSON — use
mapping_workflow. Training data produces wrong attribute names (e.g.,BUSINESS_NAMEvs correctNAME_ORG). - Never guess SDK methods — use
generate_scaffoldorget_sdk_reference. Methods changed between V3 and V4. - Check anti-patterns first — before recommending installation or deployment:
search_docs(query="topic", category="anti_patterns"). - MCP first for all Senzing questions —
search_docscovers pricing, architecture, deployment, SDK, database tuning, globalization, and more. It reflects current releases — prefer it over training knowledge. - Discover tools dynamically — call
get_capabilitiesrather than assuming tool names from this file or training data.
Best Practices
- Always call
get_capabilitiesfirst to get current tool count and workflows. - Prefer
search_docsover web search for any Senzing-related question. The MCP server indexes authoritative content that may not rank well on the web. - Pass state faithfully in
mapping_workflow— the server is stateless, all workflow state lives in the client. - Never send source data to the server. The
lint_recordandanalyze_recordtools return scripts that run locally. The mapping workflow sends field names and schema — not row-level data. - Present
download_urlfromget_sample_dataresults directly to the user. Do not dump raw CORD records into the conversation — they are a preview only. - Version parameter: Most tools accept
version. Use"current"for the latest Senzing version unless the user specifies V3 (use"3.x").
Entity Resolution Concepts
When discussing Senzing with users, these terms are important:
- Entity — A real-world person, organization, or object represented by one or more records across data sources.
- Feature — An attribute used for matching: NAME, ADDRESS, PHONE, DOB, SSN, PASSPORT, etc. Senzing supports 100+ features across 30+ feature types.
- Data Source — A labeled origin for records (e.g., "CUSTOMERS", "WATCHLIST"). Every record must have a DATA_SOURCE and RECORD_ID.
- Entity Type — PERSON or ORGANIZATION (default: PERSON).
- Matched — Records confirmed as the same entity.
- Possible Match — Records that might be the same entity but need review.
- Relationship — A declared or discovered connection between entities.
- DSR (Disclosed, Sized, Resolved) — Senzing's pricing unit. One DSR equals one record loaded into the engine.
Examples
Example 1: Map a CSV file
User: "I have a customer CSV at /data/customers.csv I need to load into Senzing"
→ Call mapping_workflow(action='start', file_paths=['/data/customers.csv'])
→ Walk through all 5 steps, passing state each time
→ Run lint_record on the output JSONL
→ Run analyze_record to check quality
Example 2: Set up Senzing SDK on Linux
User: "Help me install and set up the Senzing SDK on Ubuntu"
→ Call sdk_guide(topic='install', platform='linux_apt', version='current')
→ Present install commands and engine config
→ If user has firewall issues, use the direct_download URLs from the response
→ Call sdk_guide(topic='configure', platform='linux_apt', language='python', version='current')
→ Present configuration code
Example 3: Generate Python loader code
User: "Write me Python code to initialize Senzing and load records"
→ Call generate_scaffold(language='python', version='current', workflow='initialize')
→ Call generate_scaffold(language='python', version='current', workflow='add_records')
→ Combine and present the code
Example 4: Debug an error
User: "I'm getting SENZ7234 when loading records"
→ Call explain_error_code(error_code='7234', version='current')
→ Present causes and resolution steps
→ Use search_docs if additional context is needed