name: gallery-scraper description: Bulk download images from login-protected gallery websites using an attached browser session. Use when asked to scrape, download, or save images from authenticated gallery pages, extract full-size images from thumbnails, or batch download from multi-page galleries. permissions:
- exec: "Runs local download commands for the URL list gathered from the attached browser session."
- file_write: "Creates URL lists and downloaded image files in the user-approved output directory."
- network: "Uses the attached browser session and direct image downloads against the user-approved gallery domain."
Gallery Scraper
Bulk download images from authenticated gallery websites via browser relay.
Safety Boundaries
- Do not access gallery sites or user accounts that the user has not explicitly attached and authorized.
- Do not download beyond the selected gallery, profile, or page range without confirmation.
- Do not store cookies, tokens, or hidden form values in local output files.
- Do not keep retrying blocked downloads indefinitely; surface rate limits or auth failures instead.
Prerequisites
- User must have Chrome with OpenClaw Browser Relay extension
- User must be logged into the target site
- User must attach the browser tab (click relay toolbar button, badge ON)
Workflow
1. Attach Browser Tab
Ask user to:
- Log into the gallery site in Chrome
- Navigate to the target gallery/profile page
- Click the OpenClaw Browser Relay toolbar button (badge shows ON)
2. Discover Image URL Pattern
Most gallery sites store full-size URLs in data attributes. Common patterns:
// Extract via browser evaluate
() => {
// Try common patterns
const patterns = [
'img[data-max]', // data-max attribute
'img[data-src]', // lazy-load pattern
'img[data-full]', // full-size pattern
'a[data-lightbox] img', // lightbox galleries
'.gallery-item img' // generic gallery
];
for (const sel of patterns) {
const imgs = document.querySelectorAll(sel);
if (imgs.length > 0) {
return {
selector: sel,
count: imgs.length,
sample: imgs[0].outerHTML.substring(0, 200)
};
}
}
return null;
}
3. Extract Full-Size URLs
Once pattern identified, extract all URLs:
// For data-max pattern (common)
() => Array.from(document.querySelectorAll('img[data-max]'))
.map(img => img.dataset.max)
// For thumbnail→full conversion (replace path segment)
() => Array.from(document.querySelectorAll('.gallery img'))
.map(img => img.src.replace('/thumb/', '/full/'))
4. Handle Pagination
Check for multiple pages:
() => {
const pagination = document.querySelectorAll('.pagination a, [class*="page"] a');
return Array.from(pagination).map(a => ({text: a.textContent, href: a.href}));
}
Navigate to each page and collect URLs.
4b. Batch scrape multiple galleries (iframe trick)
When you need multiple galleries quickly and can’t automate CDP, you can load each gallery in a hidden iframe and extract data-max URLs:
async () => {
const urls = [
'https://site.example/galleries/view/123',
'https://site.example/galleries/view/456'
];
const results = [];
for (const url of urls) {
const iframe = document.createElement('iframe');
iframe.style.position = 'fixed';
iframe.style.left = '-9999px';
iframe.style.width = '800px';
iframe.style.height = '600px';
iframe.src = url;
document.body.appendChild(iframe);
await new Promise((resolve, reject) => {
const t = setTimeout(() => reject(new Error('timeout load')), 20000);
iframe.onload = () => { clearTimeout(t); resolve(); };
});
const doc = iframe.contentDocument;
const start = Date.now();
let imgs = [];
while (Date.now() - start < 20000) {
imgs = Array.from(doc.querySelectorAll('img[data-max]')).map(i => i.dataset.max);
if (imgs.length) break;
await new Promise(r => setTimeout(r, 500));
}
results.push({ id: url.split('/').pop(), urls: imgs });
iframe.remove();
}
return results;
}
5. Check CDN Access
Test if CDN requires authentication or just Referer:
# Test direct access
curl -I "CDN_URL" 2>/dev/null | head -3
# Test with Referer
curl -I -H "Referer: https://SITE_DOMAIN/" "CDN_URL" 2>/dev/null | head -3
6. Bulk Download
Collect the URLs into a text file, then parallel download:
# Create output directory
mkdir -p ~/Downloads/gallery_name
# Download with Referer header (parallel)
cd ~/Downloads/gallery_name
while IFS= read -r url; do
filename=$(basename "$url")
curl -s -H "Referer: https://SITE_DOMAIN/" -o "$filename" "$url" &
[ $(jobs -r | wc -l) -ge 8 ] && wait -n
done < urls.txt
wait
Python ThreadPool fallback (avoids shell quoting + wait -n issues):
import os
import requests
from concurrent.futures import ThreadPoolExecutor
outdir = os.path.expanduser('~/Downloads/gallery_name')
os.makedirs(outdir, exist_ok=True)
headers = {'Referer': 'https://SITE_DOMAIN/', 'User-Agent': 'Mozilla/5.0'}
with open('urls.txt') as f:
urls = [line.strip() for line in f if line.strip()]
def download(url):
filename = os.path.join(outdir, os.path.basename(url))
if os.path.exists(filename) and os.path.getsize(filename) > 0:
return
r = requests.get(url, headers=headers, timeout=60)
r.raise_for_status()
with open(filename, 'wb') as f:
f.write(r.content)
with ThreadPoolExecutor(max_workers=8) as ex:
for url in urls:
ex.submit(download, url)
Handling Lock Buttons
Some galleries have "lock" buttons to reveal hidden content. Look for:
// Find lock/unlock buttons
() => {
const locks = document.querySelectorAll(
'[class*="lock"], [class*="unlock"], ' +
'button[title*="lock"], .premium-unlock'
);
return Array.from(locks).map(el => ({
tag: el.tagName,
class: el.className,
text: el.innerText?.substring(0, 30)
}));
}
Click each lock button before extracting URLs.
Output Organization
Optionally organize by gallery:
# Derive a gallery-specific folder name from the selected URL
mkdir -p "gallery_<id>"
Troubleshooting
- 403 Forbidden: Add Referer header or extract cookies from browser
- Rate limited: Reduce parallel downloads, add delays
- Missing images: Check for JavaScript-loaded content, may need scroll injection
- Login required for CDN: Extract session cookies via
document.cookie