name: betterclaw description: Use this skill when working with the betterclaw library. Triggers when user mentions betterclaw or imports from it.
BetterClaw
What this is
BetterClaw is a Python library used for web scraping and data extraction. It provides an efficient and easy-to-use way to extract data from websites. BetterClaw is designed to handle common web scraping tasks, such as handling different types of content, rotating user agents, and avoiding anti-scraping measures.
Installation
pip install betterclaw
Key concepts
The most important APIs and patterns in BetterClaw include:
betterclaw.Client: The main entry point for making HTTP requests and extracting data.betterclaw.Parser: Used to parse HTML and XML content.betterclaw.RotationPolicy: Defines how user agents are rotated to avoid anti-scraping measures.
Example:
from betterclaw import Client
client = Client()
response = client.get("https://www.example.com")
print(response.text)
Correct usage patterns
When using BetterClaw, make sure to handle exceptions and errors properly:
from betterclaw import Client
from betterclaw.exceptions import RequestException
client = Client()
try:
response = client.get("https://www.example.com")
print(response.text)
except RequestException as e:
print(f"An error occurred: {e}")
Common mistakes to avoid
- Not handling exceptions and errors properly
- Not rotating user agents, leading to IP blocks
- Not checking the library's documentation for updates and changes
File and folder conventions
- Configuration files should be named
betterclaw.cfgand placed in the root directory of the project. - Log files should be named
betterclaw.logand placed in thelogsdirectory. - User-defined parsers and rotation policies should be placed in separate modules and imported as needed.