dsync-migration -- Compiled Rules

Auto-generated from 10 rule files. Do not edit directly.

Phase: prereq

Verify Migration Input Documents

Step 1 | Impact: CRITICAL

Verify Migration Input Documents

All four migration planning documents must exist and contain the expected sections before proceeding. This prevents running dsync against an incomplete or incorrect migration plan.

Inputs

Project directory containing docs/ folder

Procedure

Check that all four files exist:
- docs/cosmos-db-migration-assessment.md
- docs/access-patterns.md
- docs/volumetrics.md
- docs/schema_and_access_patterns_conversion_plan.md
Validate each document contains minimum required content:

cosmos-db-migration-assessment.md must contain:
- Source database name and server
- List of tables with row counts
- Data type inventory
- Identified constraints and relationships (foreign keys)
access-patterns.md must contain:
- At least one documented query pattern
- Read/write ratio estimates
- Identified hot paths or frequently accessed entities
volumetrics.md must contain:
- Row counts per table
- Estimated data size per table
- Growth rate projections (if available)
schema_and_access_patterns_conversion_plan.md must contain:
- SQL table to Cosmos DB container mapping
- Partition key selection for each container
- Embedding vs referencing decisions
- Any denormalization or schema transformation rules
If any document is missing or incomplete, report exactly which documents and sections are missing.

Success Criteria

All four files exist
Each contains the minimum required sections listed above
Agent has parsed and can reference the table-to-container mapping and partition key selections

Failure Modes

Missing file: Tell the user which file is missing and that Steps 1-4 must be completed first.
Empty or stub file: Flag it as incomplete and list the missing sections.
Conflicting information: If volumetrics row counts don't match assessment row counts, flag the discrepancy.

Next Step

Proceed to 02-prereq-sql-connection.

Verify SQL Server Connectivity

Step 2 | Impact: CRITICAL

Verify SQL Server Connectivity

Confirm that the agent can connect to the source SQL Server instance and enumerate the tables that will be migrated.

Inputs

SQL Server connection details: host, port, username, password, database name
Table list from docs/cosmos-db-migration-assessment.md

Procedure

Obtain the SQL Server connection string from the user. Accept any of these formats:
- Explicit parameters: host, port, user, password, database
- Connection string: Server=<host>,<port>;Database=<db>;User Id=<user>;Password=<pass>;
- Docker container name + credentials (if running locally)
Test connectivity by executing a simple query:
```
SELECT @@VERSION;
```

Enumerate all user tables in the database:

SELECT TABLE_SCHEMA, TABLE_NAME
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_TYPE = 'BASE TABLE'
ORDER BY TABLE_SCHEMA, TABLE_NAME;

Cross-reference the enumerated tables against the table list in the migration assessment document. Flag any discrepancies:
- Tables in the assessment but not in the database
- Tables in the database but not in the assessment (these may be intentionally excluded)

For each table in scope, capture the row count:

SELECT '[schema].[table]' AS TableName, COUNT(*) AS RowCount
FROM [schema].[table];

Store these row counts -- they will be used in Step 08 for validation.

Success Criteria

SQL Server connection succeeds
All tables listed in the migration assessment exist in the database
Row counts are captured for every in-scope table

Failure Modes

Connection refused: Check that SQL Server is running, port is correct, and firewall allows the connection.
Authentication failure: Verify username/password. For Docker containers, check the SA_PASSWORD environment variable.
Database not found: Confirm the database name matches exactly (case-sensitive on some configurations).
Table missing: A table listed in the assessment doesn't exist -- the assessment may be out of date.

Next Step

Proceed to 03-config-generate-connector.

Phase: config

Generate dsync Connector Configuration

Step 3 | Impact: CRITICAL

Generate dsync Connector Configuration

dsync uses two separate components for SQL Server -> Cosmos DB NoSQL migrations:

SQL Batch connector (source): configured via a YAML config file
Cosmos DB NoSQL connector (sink): a separate Java/gRPC process

Inputs

SQL Server connection details (from Step 02)
Cosmos DB NoSQL account URL and read-write key
Schema conversion plan (docs/schema_and_access_patterns_conversion_plan.md)

Procedure

Obtain the Cosmos DB target connection details from the user:
- Cosmos DB NoSQL account URL (e.g., https://<account>.documents.azure.com:443/)
- Cosmos DB primary read-write key
- Target database and container names (must be pre-created)

Generate the SQL Batch connector YAML config file (dsync-sqlbatch-config.yml):

# dsync SQL Batch Connector Configuration
# Docs: https://docs.adiom.io/reference/connectors/sql-batch

id: sqlserver-source
driver: sqlserver
connectionstring: "sqlserver://${SQL_USER}:${SQL_PASSWORD}@${SQL_HOST}:${SQL_PORT}?database=${SQL_DATABASE}"
mappings:
  # One mapping entry per SQL table to migrate.
  # Repeat this block for each table in scope.
  - namespace: "dbo.Products"
    query: "SELECT ProductID, Name, ProductNumber, Color, ListPrice FROM SalesLT.Product"
    partitionquery: "SELECT ProductID FROM SalesLT.Product WHERE ProductID % 4 = 0 ORDER BY ProductID"
    cols: [ProductID]
    limit: 5000

  - namespace: "dbo.SalesOrderHeader"
    query: "SELECT SalesOrderID, OrderDate, CustomerID, TotalDue FROM SalesLT.SalesOrderHeader"
    partitionquery: "SELECT SalesOrderID FROM SalesLT.SalesOrderHeader WHERE SalesOrderID % 4 = 0 ORDER BY SalesOrderID"
    cols: [SalesOrderID]
    limit: 5000

  # Add additional mappings for each table from the schema conversion plan.
  # See Step 04 for the complete table-to-container mapping procedure.

Key fields explained:

driver: Use sqlserver for SQL Server sources
connectionstring: ADO.NET-style connection string; use env vars for credentials
namespace: Logical name for this dataset (used in --namespace remapping)
query: SQL SELECT that returns all columns to migrate
partitionquery: Returns partition boundary values for parallel reads (must match cols, sorted ascending, no duplicates)
cols: Primary key column(s) used for partitioning and row identification
limit: Batch size per read operation

Write the config file to the project directory as dsync-sqlbatch-config.yml.

Do NOT hardcode credentials in the config file. Use environment variables:

export SQL_USER="sa"
export SQL_PASSWORD="<your-password>"
export SQL_HOST="localhost"
export SQL_PORT="1433"
export SQL_DATABASE="AdventureWorksLT"

Note: The Cosmos DB NoSQL destination is NOT configured in this file. It runs as a separate gRPC connector process (configured in Step 06).

Success Criteria

dsync-sqlbatch-config.yml is written to disk
Connection string uses environment variable references (not hardcoded credentials)
Every in-scope table has a mapping entry with query, partitionquery, cols, and limit
The driver field is set to sqlserver

Failure Modes

Missing Cosmos DB credentials: Prompt the user to provide the account URL and key (needed in Step 06).
Invalid YAML: Ensure proper indentation and quoting of connection strings.
Missing partition query: Every mapping MUST include a partitionquery for parallel reads.
Credentials in plaintext: Warn the user if SQL credentials are written directly into the YAML file.

Reference

Next Step

Proceed to 04-config-table-mapping.

Configure Table-to-Container Mapping

Step 4 | Impact: HIGH

Configure Table-to-Container Mapping

Define how each SQL Server table maps to a Cosmos DB container, including column-to-property mappings and any denormalization or embedding rules.

Inputs

Schema conversion plan (docs/schema_and_access_patterns_conversion_plan.md)
SQL Server table schema (from Step 02)
dsync connector config (from Step 03)

Procedure

Read the table-to-container mapping from the schema conversion plan. For each SQL table, determine:
- Target Cosmos DB container name
- Whether the table is embedded into another container or stands alone
- Which columns map to which Cosmos DB properties
- Any column renames or type transformations
For tables that are embedded (denormalized into a parent document):
- Identify the parent container
- Define the embedding path (e.g., order.lineItems[])
- Specify the join key between parent and child tables
For tables that are referenced (separate containers with ID references):
- Map the foreign key column to the reference property
- Note that referential integrity is the application's responsibility in Cosmos DB
Handle data type mappings:
- SQL DATETIME -> ISO 8601 string or Unix timestamp (per conversion plan)
- SQL MONEY/DECIMAL -> number (note precision considerations)
- SQL UNIQUEIDENTIFIER -> string
- SQL VARBINARY -> base64 string or omit (per conversion plan)
- SQL BIT -> boolean
- SQL NVARCHAR/VARCHAR -> string
Update the dsync config file with these mappings.

Success Criteria

Every in-scope SQL table has a defined target container or embedding path
Column-to-property mappings are complete
Data type transformations are specified where needed

Failure Modes

Unmapped table: A table in scope has no mapping in the conversion plan. Ask the user for guidance.
Circular embedding: Table A embeds B which embeds A. Flag this as an error.
Data type incompatibility: Flag any SQL types that have no clean Cosmos DB equivalent.

Next Step

Proceed to 05-config-partition-key.

Configure Partition Keys

Step 5 | Impact: CRITICAL

Configure Partition Keys

Set the partition key for each Cosmos DB container based on the schema conversion plan. Partition key selection directly determines scalability, query performance, and cost.

Inputs

Schema conversion plan (docs/schema_and_access_patterns_conversion_plan.md)
Access patterns (docs/access-patterns.md)
dsync connector config (from Steps 03-04)

Procedure

For each target Cosmos DB container, read the partition key from the schema conversion plan.
Validate each partition key choice against best practices:
- High cardinality: The key should have many unique values (thousands+). Flag if the chosen key has fewer than 100 distinct values in the source data.
- Even distribution: No single partition key value should hold a disproportionate amount of data. Query the source SQL table to check distribution:
```
SELECT [partition_key_column], COUNT(*) AS cnt
FROM [schema].[table]
GROUP BY [partition_key_column]
ORDER BY cnt DESC;
```
- Query alignment: The most frequent queries from access-patterns.md should filter on the partition key to avoid cross-partition queries.
If the conversion plan specifies a hierarchical partition key, configure it with levels ordered broad-to-narrow (e.g., /tenantId, /category, /id).
If the conversion plan specifies a synthetic partition key (e.g., combining two columns), add the transformation rule to the dsync config.
Update the dsync config file with partition key settings for each container.

Success Criteria

Every target container has a partition key configured
Partition key choices align with the conversion plan
No obvious low-cardinality or hot-partition risks

Failure Modes

Low cardinality key: The chosen key has very few distinct values. Warn the user and suggest alternatives.
Hot partition detected: One key value dominates (>20% of rows). Flag this with the actual distribution data.
Missing partition key in plan: The conversion plan doesn't specify a key for a container. Ask the user.

Next Step

Proceed to 06-run-execute-migration.

Phase: run

Execute the dsync Migration

Step 6 | Impact: CRITICAL

Execute the dsync Migration

Run dsync to migrate data from SQL Server to Cosmos DB NoSQL. This requires two processes:

The Cosmos DB NoSQL sink connector (Java/gRPC process)
The dsync engine with the SQL Batch source config

Inputs

Completed SQL Batch config file dsync-sqlbatch-config.yml (from Steps 03-05)
dsync binary available on the system
Cosmos DB NoSQL account URL and read-write key

Procedure

Step A: Verify dsync is installed

dsync --help

If dsync is not installed, install it using one of these methods:

# Option 1: Homebrew (macOS)
brew install adiom-data/homebrew-tap/dsync

# Option 2: Docker
docker pull markadiom/dsync

# Option 3: Build from source (requires Go)
git clone https://github.com/adiom-data/dsync.git
cd dsync
go build

Step B: Build and start the Cosmos DB NoSQL sink connector

The Cosmos DB NoSQL connector is a separate Java process that receives data via gRPC and writes to Cosmos DB using the optimized Java SDK for bulk operations.

# Set Cosmos DB credentials as environment variables
export COSMOS_URL="https://<your-account>.documents.azure.com:443/"
export COSMOS_KEY="<your-read-write-key>"

# Option 1: Using Docker (recommended)
docker network create dsync-net 2>/dev/null || true

docker run -d \
  --network dsync-net \
  --name cosmosnosqlconnector \
  -e OTEL_SDK_DISABLED=true \
  markadiom/cosmosnosqlconnector 8089 "$COSMOS_URL" "$COSMOS_KEY"

# Option 2: Using locally built JAR (requires Java 21+)
# cd dsync/java && mvn clean install
# java -jar target/cosmos-connector-1-jar-with-dependencies.jar 8089 "$COSMOS_URL" "$COSMOS_KEY" &

Wait a few seconds for the connector to start, then verify it is running:

docker logs cosmosnosqlconnector 2>&1 | tail -5

Step C: Execute dsync with the SQL Batch source

# Using local dsync binary:
dsync \
  --progress \
  --logfile dsync.log \
  --namespace "dbo.Products:TargetDB.ProductsContainer,dbo.SalesOrderHeader:TargetDB.OrdersContainer" \
  "sqlbatch --config=dsync-sqlbatch-config.yml" \
  grpc://localhost:8089 --insecure

# Using Docker (if source is also in Docker):
docker run \
  --network dsync-net \
  --name dsync \
  -v "$(pwd)/dsync-sqlbatch-config.yml:/config.yml:ro" \
  -p 8080:8080 \
  markadiom/dsync \
  --progress \
  --web-host 0.0.0.0 \
  --logfile /tmp/dsync.log \
  --namespace "dbo.Products:TargetDB.ProductsContainer" \
  "sqlbatch --config=/config.yml" \
  grpc://cosmosnosqlconnector:8089 --insecure

Key flags explained:

--progress: Show real-time progress in terminal
--logfile dsync.log: Write detailed logs to file
--namespace "SRC:DST": Remap source namespace to target Database.Container (comma-separated for multiple tables)
--insecure: Required because the gRPC connection to the Cosmos NoSQL connector is not TLS-encrypted
--web-host 0.0.0.0: (Optional) Enable web UI for progress monitoring on port 8080

Step D: Monitor output

The agent should run this command and capture both stdout and stderr. Watch for:

Migration started confirmation
Per-table progress indicators (rows migrated, percentage)
Completion message with summary statistics
Any error messages or warnings

Success Criteria

Cosmos DB NoSQL connector process is running and accepting connections on port 8089
dsync process exits with status code 0
Output contains a completion summary with row counts
No fatal errors in stderr

Failure Modes

dsync not found: Provide installation instructions specific to the user's platform (see Step A).
Cosmos NoSQL connector fails to start: Check that COSMOS_URL and COSMOS_KEY are correct. Verify the Cosmos DB account exists and is accessible.
Config file invalid: dsync will report config validation errors. Verify YAML syntax and all required fields in dsync-sqlbatch-config.yml.
Connection error to source: SQL Server may have dropped the connection. Verify connectivity (Step 02). Check that the SQL Server host is reachable from where dsync is running.
Connection error to gRPC sink: Ensure the Cosmos NoSQL connector is running on port 8089. Check docker logs cosmosnosqlconnector.
Throughput exceeded (429): Cosmos DB is throttling writes. Increase provisioned RUs or reduce dsync parallelism.
Timeout: Large tables may take a long time. Ensure the command is not killed by shell timeout. Use nohup or screen for long-running migrations.

Reference

dsync GitHub
SQL Batch Connector Docs
DynamoDB to Cosmos DB NoSQL Quickstart (architecture reference for Cosmos NoSQL sink)
Cosmos NoSQL Connector Build Instructions

Next Step

Proceed to 07-run-monitor-progress.

Monitor Migration Progress

Step 7 | Impact: HIGH

Monitor Migration Progress

Track the dsync migration while it runs, handle transient errors, and decide whether to retry or escalate.

Inputs

Running dsync process (from Step 06)
Expected row counts per table (from Step 02)

Procedure

While dsync is running, monitor its output for:
- Progress indicators: Tables being processed, rows migrated, percentage complete
- Warnings: Non-fatal issues like data type coercions or skipped rows
- Errors: Connection drops, throttling, or data validation failures
For transient errors (connection drops, 429 throttling):
- dsync may auto-retry depending on configuration
- If dsync exits with a retryable error, wait 30 seconds and re-run the same command
- Maximum 3 retries before escalating to the user
For data errors (e.g., a row violates Cosmos DB constraints):
- Log the specific row/table that failed
- Continue with remaining tables if possible
- Report all data errors to the user after migration completes
Capture the final migration summary, which should include:
- Total rows migrated per table/container
- Total time elapsed
- Any errors or skipped rows

Success Criteria

dsync migration completes (exit code 0)
Migration summary is captured and displayed to the user
Any warnings or non-fatal errors are reported

Failure Modes

Repeated 429 errors: Cosmos DB throughput is insufficient. Advise the user to scale up RUs temporarily.
Consistent timeout on large table: Suggest breaking the migration into smaller batches.
dsync hangs: Check system resources (memory, disk). Kill and retry with reduced parallelism.

Next Step

Proceed to 08-validate-row-counts.

Phase: validate

Validate Row Counts

Step 8 | Impact: CRITICAL

Validate Row Counts

Compare row counts between the source SQL Server tables and the target Cosmos DB containers to confirm all data was migrated.

Inputs

Source row counts (captured in Step 02)
Cosmos DB connection details
Table-to-container mapping (from Step 04)

Procedure

For each source SQL table, query the current row count:
```
SELECT COUNT(*) FROM [schema].[table];
```
For the corresponding Cosmos DB container, query the document count:
```
SELECT VALUE COUNT(1) FROM c
```
Or use the container metrics if available via SDK/REST API.
Compare source and target counts. Build a summary table:

Source Table Source Rows Target Container Target Docs Match
SalesLT.Customer 847 Customers 847 YES
SalesLT.Product 295 Products 295 YES
For tables that were embedded into parent documents, the container document count will NOT match the source row count. Instead, verify:
- The parent container doc count matches the parent table row count
- A sample parent document contains the expected number of embedded child items
Flag any mismatches with the exact delta.

Source Table	Source Rows	Target Container	Target Docs	Match
SalesLT.Customer	847	Customers	847	YES
SalesLT.Product	295	Products	295	YES

Success Criteria

All standalone table row counts match exactly
Embedded table counts are verified via parent document inspection
Zero unexplained missing rows

Failure Modes

Count mismatch: Re-run the dsync migration for the specific table/container that has a delta.
Cosmos DB count query slow: For large containers, COUNT(1) can be expensive. Use container metrics instead.
Source data changed during migration: If the source is live, rows may have been added/deleted during migration. Note the timestamp and accept small deltas.

Next Step

Proceed to 09-validate-data-integrity.

Validate Data Integrity

Step 9 | Impact: HIGH

Validate Data Integrity

Spot-check individual records to confirm data was migrated correctly, including data types, values, and relationships.

Inputs

SQL Server connection (from Step 02)
Cosmos DB connection
Table-to-container mapping (from Step 04)

Procedure

For each migrated table/container pair, select 3-5 sample records from the SQL source:
```
SELECT TOP 5 * FROM [schema].[table] ORDER BY NEWID();
```
For each sample record, query the corresponding document in Cosmos DB using the primary key / partition key:
```
SELECT * FROM c WHERE c.id = '<id-value>'
```
Compare field-by-field:
- String values match exactly
- Numeric values match (accounting for precision differences in MONEY/DECIMAL types)
- DateTime values are correctly formatted
- NULL values in SQL are either absent or null in Cosmos DB
- Boolean values (BIT -> bool) are correctly mapped
For embedded documents, verify:
- Child records appear as nested arrays in the parent document
- The count of embedded items matches the source child row count for that parent
- Child field values match the source
For reference-based relationships, verify:
- The reference ID in the child document matches a valid parent document ID
- Cross-referencing works in both directions

Success Criteria

All sampled records match between source and target
Data types are correctly transformed
Embedded and referenced relationships are intact

Failure Modes

Value mismatch: Log the exact field, source value, and target value. May indicate a data type transformation error in the dsync config.
Missing fields: A column exists in SQL but not in the Cosmos DB document. Check the column mapping in Step 04.
Encoding issues: Special characters (unicode, accents) not preserved. Check encoding settings in the dsync config.

Next Step

Proceed to 10-validate-cosmos-queries.

Validate Cosmos DB Queries

Step 10 | Impact: HIGH

Validate Cosmos DB Queries

Run the access pattern queries from the migration planning documents against Cosmos DB to confirm the migrated data supports the application's query needs.

Inputs

Access patterns (docs/access-patterns.md)
Cosmos DB connection
Schema conversion plan (docs/schema_and_access_patterns_conversion_plan.md)

Procedure

Extract each documented access pattern from docs/access-patterns.md. For each pattern:
- Translate the original SQL query into a Cosmos DB SQL API query
- Ensure the query filters on the partition key where possible (to avoid cross-partition scans)
Execute each query against Cosmos DB and verify:
- The query returns results (not empty)
- The result count is plausible given the source data
- The query completes within a reasonable time
For each query, capture the RU charge from the response headers:
```
x-ms-request-charge: <RU value>
```
Log these for the user to review. High RU queries may indicate:
- Missing or incorrect partition key filter
- Need for a composite index
- Cross-partition fan-out
Build a summary table:

Access Pattern Cosmos DB Query Results RU Cost Status
Get customer by ID SELECT * FROM c WHERE c.id = '123' 1 doc 3.5 RU OK
List orders by customer SELECT * FROM c WHERE c.customerId = '123' 5 docs 12.1 RU OK
Flag any access pattern that:
- Returns zero results when data is expected
- Has an RU cost above 100 (likely a cross-partition scan)
- Takes more than 5 seconds to execute

Access Pattern	Cosmos DB Query	Results	RU Cost	Status
Get customer by ID	`SELECT * FROM c WHERE c.id = '123'`	1 doc	3.5 RU	OK
List orders by customer	`SELECT * FROM c WHERE c.customerId = '123'`	5 docs	12.1 RU	OK

Success Criteria

All documented access patterns return expected results
No access pattern has an unexpectedly high RU cost
The migrated data fully supports the application's query needs

Failure Modes

Query returns no results: Check the partition key filter and document structure. The schema transformation may have renamed or restructured the queried fields.
High RU cost: Add a composite index or adjust the query to include the partition key.
Query syntax error: Cosmos DB SQL is a subset of ANSI SQL. Features like JOINs, GROUP BY, and subqueries work differently. Translate accordingly.

Migration Complete

If all validations pass, the migration is complete. Report the final summary to the user:

Total tables/containers migrated
Total rows/documents migrated
Any warnings or issues encountered
RU cost summary for access pattern queries
Connection details for the Cosmos DB target

ナビゲーション

Skillsとは？

リンク

dsync-migration -- Compiled Rules

dsync-migration -- Compiled Rules

Phase: prereq

Verify Migration Input Documents

Verify Migration Input Documents

Inputs

Procedure

Success Criteria

Failure Modes

Next Step

Verify SQL Server Connectivity

Verify SQL Server Connectivity

Inputs

Procedure

Success Criteria

Failure Modes

Next Step

Phase: config

Generate dsync Connector Configuration

Generate dsync Connector Configuration

Inputs

Procedure

Success Criteria

Failure Modes

Reference

Next Step

Configure Table-to-Container Mapping

Configure Table-to-Container Mapping

Inputs

Procedure

Success Criteria

Failure Modes

Next Step

Configure Partition Keys

Configure Partition Keys

Inputs

Procedure

Success Criteria

Failure Modes

Next Step

Phase: run

Execute the dsync Migration

Execute the dsync Migration

Inputs

Procedure

Step A: Verify dsync is installed

Step B: Build and start the Cosmos DB NoSQL sink connector

Step C: Execute dsync with the SQL Batch source

Step D: Monitor output

Success Criteria

Failure Modes

Reference

Next Step

Monitor Migration Progress

Monitor Migration Progress

Inputs

Procedure

Success Criteria

Failure Modes

Next Step

Phase: validate

Validate Row Counts

Validate Row Counts

Inputs

Procedure

Success Criteria

Failure Modes

Next Step

Validate Data Integrity

Validate Data Integrity

Inputs

Procedure

Success Criteria

Failure Modes

Next Step

Validate Cosmos DB Queries

Validate Cosmos DB Queries