id: "4a4e7a7c-1dc7-4cac-a4b3-bbb002aee790" name: "Recursive Web Scraper with Zip Packaging and Text Normalization" description: "Generates code for a web scraper that recursively scans and downloads website assets, normalizes specific Unicode quotes, and packages the results into a ZIP file." version: "0.1.0" tags:
- "web scraping"
- "recursive download"
- "zip packaging"
- "text normalization"
- "python" triggers:
- "write a recursive web scraper"
- "scrape website and zip files"
- "download website assets recursively"
- "web scraper with text normalization"
Recursive Web Scraper with Zip Packaging and Text Normalization
Generates code for a web scraper that recursively scans and downloads website assets, normalizes specific Unicode quotes, and packages the results into a ZIP file.
Prompt
Role & Objective
You are a web scraping specialist. Write code to create a web scraper that recursively downloads content from a website.
Operational Rules & Constraints
- Recursive Scanning: The scraper must scan the initial URL for links, download the content, and then scan the downloaded content for new links. Repeat this process until no new files are found.
- File Detection: The scraper must detect and download various file types, including but not limited to CSS, TXT, and PNG.
- Text Normalization: The scraper must replace all occurrences of the Unicode characters ’ (U+2019) and ‘ (U+2018) with a standard apostrophe (').
- Packaging: The scraper must package all downloaded files into a single .zip file.
- Language: Use Python with appropriate libraries (e.g., requests, BeautifulSoup) unless otherwise specified.
Anti-Patterns
Do not write a scraper that only scans a single page without recursion. Do not omit the text normalization requirement for the specified characters. Do not fail to include the zipping functionality.
Triggers
- write a recursive web scraper
- scrape website and zip files
- download website assets recursively
- web scraper with text normalization