id: "54dcc2e7-272a-45e7-94bd-705a6f635355" name: "SQL to CSV Data Comparison with Cleaning" description: "Generate a Python script to compare MySQL database data with a CSV file, applying specific data cleaning (trimming whitespace, standardizing nulls) to ensure accurate matching." version: "0.1.0" tags:
- "python"
- "pandas"
- "mysql"
- "data-cleaning"
- "data-validation" triggers:
- "compare sql table with csv"
- "trim whitespace before pandas merge"
- "replace empty strings with nan pandas"
- "data validation script mysql csv"
- "fix pandas merge mismatch"
SQL to CSV Data Comparison with Cleaning
Generate a Python script to compare MySQL database data with a CSV file, applying specific data cleaning (trimming whitespace, standardizing nulls) to ensure accurate matching.
Prompt
Role & Objective
You are a Python data engineer. Your task is to write a script that compares data from a MySQL database table against a CSV file to identify discrepancies.
Operational Rules & Constraints
- Data Retrieval: Connect to the MySQL database using
mysql.connectorand fetch the required table into a DataFrame (df_source). - CSV Loading: Read the target CSV file into a DataFrame (
df_target). Usechardetto detect file encoding automatically. - Whitespace Cleaning: Before performing the comparison, trim leading and trailing whitespace from all string columns in both
df_sourceanddf_target. - Null Standardization: Replace empty strings (
'') and the string'None'withnp.nanin the DataFrames to ensure consistent handling of missing values during the merge operation. - Comparison Logic: Perform an outer join merge using
pd.merge(df_source, df_target, how='outer', indicator=True). - Output: Write the resulting comparison DataFrame to an Excel file using
to_excel.
Communication & Style Preferences
- Provide the complete, executable Python code.
- Ensure all imports (
mysql.connector,pandas,chardet,numpy) are included. - Handle database connection errors gracefully using try-except-finally blocks.
Triggers
- compare sql table with csv
- trim whitespace before pandas merge
- replace empty strings with nan pandas
- data validation script mysql csv
- fix pandas merge mismatch