name: xbs-booksource-workflow description: Build, debug, and maintain 香色闺阁/香色书源 JSON and XBS files. Use when users ask to create or fix book sources, convert between xbs and json, debug empty parsed fields, validate XPath and JS parser rules, or produce import-ready XBS outputs for the app.
XBS Booksource Workflow
Overview
Implement or fix text book sources for 香色闺阁(StandarReader 2.56.1 only)with a deterministic workflow: analyze target pages, generate rule JSON, convert to XBS, and validate with real HTML samples.
Workflow
- Inspect source format and goal.
- Build or repair JSON rules.
- Validate selectors with real HTML.
- Convert JSON to XBS.
- Verify roundtrip and provide checksum.
Step 1: Inspect Input and Site
- Confirm whether input is
.jsonor.xbs. - Legacy source import-first rule:
- if input is
.xbs, runpython tools/scripts/xbs_tool.py import-fix -i <input.xbs> -o <fixed.json> --to-xbs <fixed.xbs> --report <fix_report.json> - continue all checks/conversion with
<fixed.json>/<fixed.xbs>
- if input is
- For new source rules, fetch and inspect:
- search page
- detail page
- chapter list page
- chapter content page
- Hard gate before conversion (must pass):
python tools/scripts/check_xiangse_schema.py <source.json>- If this fails, do NOT convert to XBS first; fix schema first.
- StandarReader 2.56.1 编辑兼容门槛(新增):
python tools/scripts/xbs_tool.py check-editor -i <source.json>- 若触发高风险项(如新 schema wrapper、
requestFilters非字符串),先产出editor_safe再测:python tools/scripts/xbs_tool.py profile -i <input.json> -o <editor_safe.json> --profile editor_safe
- 需要批量修复历史书源时:
python tools/scripts/xbs_tool.py normalize-2561 -i <json_or_dir> --rebuild-xbs --report <report.json>
Step 2: Build or Repair JSON Rules
Minimum required actions:
searchBookbookDetailchapterListchapterContent
Required action fields:
actionIDparserIDresponseFormatTyperequestInfo
香色 schema 强约束(防跑偏,必须满足):
- Top-level must be:
{ \"<sourceAlias>\": { ... } }(source config nested under alias key). - Source config must use:
sourceName/sourceUrl/sourceType/enable/weight
sourceTypemust be exactly"text"(Xiangse 2.56.1 hard constraint).- Forbidden legacy top-level keys:
bookSourceName/bookSourceUrl/bookSourceGroup/httpUserAgent
requestInfo @jsruntime must useconfig/params/result.- Forbidden runtime/transport patterns in
requestInfo:java.getParams()method:(usePOST)data:(usehttpParams)headers:(usehttpHeaders)
Prefer:
parserID: DOMrequestInfo: @js:(avoid legacy JS callback fields unless required)weightmust be integer-like string in\"1\"..\"9999\"(default\"9999\")- Priority semantics: larger
weightmeans higher source priority - If using template
requestInfo, support placeholders:%@result,%@keyWord,%@pageIndex,%@offset,%@filter - For chapter pagination requests, prioritize:
params.lastResponse.nextPageUrl- then
result - then
params.queryInfo.url
- When next-page URLs are relative, resolve with
params.responseUrlfirst, thenconfig.host - Use
moreKeyswhen needed:requestFiltersfor category/sort filtersremoveHtmlKeysfor html/script cleanupskipCountfor list head trimmingpageSizeormaxPagefor pagination control
- For paged
chapterContent, setmaxPagein both:chapterContent.validConfigchapterContent.moreKeys
实战补充(2026-03,deqixs):
- Search 中文关键词优先
GET + encodeURIComponent(params.keyWord);部分客户端POST中文参数会有编码不一致问题。 - Search 需要兼容“双形态响应”:
- 模糊词返回列表页;
- 精确词可能直接返回详情页(无 list)。
- 当
searchBook.list已命中单条书籍节点时,detailUrl优先使用 list 上下文相对字段,不要先用页面级 canonical/meta 覆盖。 - 为兼容“直达详情页”场景,可额外增加
url兜底字段(canonical/meta),供后续bookDetail/chapterList/chapterContent.requestInfo回退取值。 bookWorld分类页默认按pageIndex单页请求;不要默认启用nextPageUrl + 超大 maxPage自动连翻,否则容易超时或卡住。requestInfo里 URL 清洗避免易错正则写法replace(/\\//g,'/');推荐String(u).split('\\\\/').join('/')。- 正文链路先判定是否“接口二跳”:
- 若站点是
chapter.js.php -> ajax2.php这类 token 换正文接口,优先走接口,不默认上webView。 - 仅在接口链路不可行时,才使用
webView + webViewJs回填。
- 若站点是
- deqixs 类接口务必在第二跳请求头里显式注入:
X-Requested-With: XMLHttpRequestReferer: 当前章节 URL否则常见返回:仅支持网页端访问/不支持该客户端访问。
chapterContent.content禁止“检测到 chapterToken 就直接 return ''”:- 混合响应(token脚本 + JSON)会被误杀,导致
status=1但正文空。 - 必须先尝试 JSON 解析,再做正则兜底提取
content。
- 混合响应(token脚本 + JSON)会被误杀,导致
实战补充(2026-03,66shuba):
- API-first 站点优先全链路走 JSON 接口,不要优先依赖详情/阅读页 DOM。
- 常用链路:
- search:
/api/novel/search - detail:
/api/novel/detail/{bookId} - catalog:
/api/novel/catalog/{bookId} - chapter:
/api/novel/chapter/{bookId}/{chapterId}(VIP 用vip-chapter)
- search:
searchBook.list若是卡片嵌套结构(CardList -> Body -> ItemData),在list || @js中扁平化并过滤ItemType===0。chapterList.list必须过滤哨兵章节(如C=-10000),通用规则:仅保留正整型章节 id。chapterContent请求建议从 read URL 反解bookId/chapterId再组 API URL,减少上下游 URL 形态差异影响。- 验证正文时增加“占位正文识别”:
- 若返回
code=0但content为“网络开小差了,请稍后再试”等提示语,判定为上游异常,不是规则成功。
- 若返回
实战补充(2026-03,sudugu):
- 章节页常见“上一章/目录/下一页(或下一章)”共用同一
prenext结构:- 不要仅靠文字匹配
contains(text(),'下一页'); - 优先取“右侧固定位”链接(如
span[2]/a/@href),再做 JS 守卫判定。
- 不要仅靠文字匹配
- 正文分页守卫推荐通用规则:
- 解析当前 URL 和候选 URL 为
/{bookId}/{chapterId}(-{page})?.html - 仅当
bookId相同、chapterId相同、nextPage > currentPage时继续翻页 - 其它情况(下一章、目录、回详情)一律返回空,禁止误翻。
- 解析当前 URL 和候选 URL 为
- 目录分页与分类分页不要猜 URL 形态:
- 目录页可能是
p-2.html#dir - 分类页可能是
/xuanhuan/2.html(而不是/xuanhuan/p-2.html) - 必须基于真实下一页链接或已验证模板构造。
- 目录页可能是
实战补充(2026-03,shuhaoxs):
- 站内搜索入口可能被外部站接管(如首页直接跳
rrssk):- 先验证本域
index.php?action=*是否存在可用搜索接口; - 若大多为
404,判定“无稳定站内搜索接口”。
- 先验证本域
- 外部搜索若为“关键词加密 + 结果链接加密 + 前端解密跳转”链路(
toUrl/openUrl),默认不要直接作为主搜索方案:- 尤其是解密依赖
CryptoJS + UA + 动态页面变量时,客户端运行时不一定可复现。
- 尤其是解密依赖
- 外部链路不稳定时,优先提供可用降级:
searchBook改为“分类页遍历 + 关键词字段过滤”;- 明确这是 fallback 搜索,并在交付说明中标注精度限制。
bookDetail优先使用页面og:*元数据(book_name/author/category/status/update_time/latest_chapter_name/url/image),再回退 DOM 字段。chapterList若走index.php?action=loadChapterPage&id={aid}&page={n}:- 需处理“越界页重复最后一页”与“短书重复第 1 页”现象;
- 不要仅凭
data.length > 0继续翻页; - 建议按
chapterorder校验当前页范围(每页 100:1-100/101-200/...)并据此决定list与nextPageUrl。
chapterContent.nextPageUrl继续使用同章分页守卫:- 仅允许
/book/{aid}-{cid}-{p}.html且aid/cid相同、p递增; - 命中“下一章/目录/回详情”一律返回空。
- 仅允许
实战补充(2026-03,bxwx):
- 搜索接口存在站点级限流:
POST /search.html连续请求会返回“搜索间隔为30秒,请稍后在试!”页面。- 这属于上游限流,不是解析失败;
- 调试与验证需加节流/重试(至少间隔 30 秒)或使用本地 fixture。
- 搜索表单字段需精确匹配:
searchtype=all369koolearn=<keyword>
bookDetail常见og:novel:read_url直接给目录页(/dir/{aid}/{bid}.htm),不是阅读入口:chapterList.requestInfo需同时兼容/b/{aid}/{bid}/与/dir/{aid}/{bid}.htm两种输入。
- 分类页真实路径为
/bsort{n}/(如/bsort1/),不要套用其他站点的/xuanhuan/2.html模板。bookWorld默认按单页处理(maxPage=1),避免盲翻到空页/错误页。
- 分类列表节点可直接使用
#newscontent .l ul li:- 书名:
span.s2 a - 作者:
span.s4 - 分类:
span.s1(需去[]) - 更新时间:
span.s5
- 书名:
- 正文分页继续使用“同章守卫”:
#pager_next在末页常直接跳下一章(非下一页);- 仅允许
/b/{bookId}/{chapterId}(_{page})?.html且页码递增。
- 正文需清洗分页提示与推广尾巴(常见文案):
- “这章没有结束…下一页继续阅读…”
- “小主子…下一页继续阅读…”
- “喜欢…请大家收藏…更新速度全网最快”
实战补充(2026-03,libahao):
- 香色闺阁里
list子字段 XPath 统一用//开头,禁用./...与.//...。- 错误高发写法:
./td[2]/a/text()、./td[3]/a/text()、./td[2]/a/@href - 推荐:
//td[2]/a/text()、//td[3]/a/text()、//td[2]/a/@href
- 错误高发写法:
- 能直接取目标字段时,不写“复杂拼接式 JS”:
- 例如
detailUrl/url直接用//td[2]/a/@href,避免多余绝对化逻辑引入噪声。
- 例如
- 列表字段禁止抓整行文本再拆分:
- 不要把
status/desc/wordCount绑定到整段tr文本; - 采用“列到字段”的窄 XPath,一列一个字段。
- 不要把
- 所有文本字段默认做空白归一:
String(result || '').replace(/\\s+/g, ' ').trim()- 分类名额外去方括号:
.replace(/[\\[\\]]/g, '')
- 分类页无
<img>时,允许由detailUrl反推封面:/book/{aid}_{bid}/->/data/image/{bid}.jpg- 该策略用于
bookWorld.cover,可显著减少“分类封面为空”。
实战补充(2026-03,17k):
- 导入稳定性(防闪退)优先于“强行接入复杂链路”:
- 禁止把超长混淆 WAF JS 直接塞进
requestInfo/content作为主链路。 - 遇到 WAF/风控页面,优先 API 或可稳定复现的请求链路;不稳定链路降级为可用方案。
- 禁止把超长混淆 WAF JS 直接塞进
- 加密正文必须做解密成功校验:
- 若章节响应含
content[].encrypt=1,仅拿到title不算成功。 - 交付前必须确认
chapterContent.content为非空明文(不是密文串、不是空串)。
- 若章节响应含
- 分类功能是交付必检项:
- 不允许遗漏
bookWorld与分类筛选(requestFilters)设计; - 若站点不支持完整分类,需在交付备注明确“缺失原因 + 降级策略”。
- 不允许遗漏
- 公众号信息写在交付备注,不写在
sourceName:sourceName仅保留“站点名 + 版本”语义。delivery_notes必须包含:公众号:好用的软件站。
实战补充(2026-03,StandarReader 2.56.1 编辑保存闪退):
- 若崩溃日志出现
-[__NSCFNumber length],先检查weight类型:- 必须是整数字符串(如
"9999"),不能是数字类型。
- 必须是整数字符串(如
- 导入可用不等于可编辑可保存,必须单独做“保存回归”:
- 进入编辑页后“不改直接保存”
- 修改 1 个字符后保存
- 修改 1 个规则字段后保存
- 若保存闪退,按 A/B 变体定位字段簇(A0/A1/A2/A3):
python tools/scripts/xbs_tool.py build-ab -i <input.json> -d <out_dir> --prefix <name> --to-xbs
editor_safeprofile 目标:- 保留
bookWorld分类能力 - 将
requestFilters统一为字符串 - 降级高风险结构(如
validConfigJSON 字符串、顶层复杂对象字段)
- 保留
实战补充(2026-03,StandarReader 2.56.1 导入即闪退):
- 若崩溃日志出现:
NSInvalidArgumentException-[__NSArrayI allKeys]说明客户端把数组按字典读取。
- 已确认高风险触发点:
bookWorld.categories数组形态。 - 导入产物硬约束:
bookWorld使用分类 map(bookWorld.{分类名}),禁用bookWorld.categories数组。enable统一输出1/0。responseFormatType统一小写(html/json/xml/text)。requestInfo对象转为字符串模板或@js:return {...};。
Step 3: Selector Validation (Critical)
Validate selectors against saved HTML (e.g., xmllint --html --xpath ...).
香色闺阁兼容重点:
- If
listLengthOnlyDebug > 0but fields are empty, change child selectors from.//...to//.... - This parser may not reliably honor relative XPath context under
list. - In 香色 runtime, avoid
./...in list child fields as well; prefer//...for stability. - If runtime context is unclear, use JS debug return shape:
return {"config": config, "params": params, "result": result};
- Remember
resultchanges by stage:- in
requestInfo: upstream URL ornextPageUrl - in parse stage: previous-layer parsed value
- in
- If chapter body is rendered by JS (
document.writeln(base64...)), decode incontent || @js:and add guardednextPageUrlfor same-chapter pagination. - Guarded next-page rule is mandatory:
- parse current URL + candidate next URL as
/baidu/{aid}/{cid}(_{page})?.html - continue only if
aidsame,cidsame, andnextPage > currentPage - never guess
_1/_2blindly when no hard evidence exists
- parse current URL + candidate next URL as
- If old client fails with
content || @js:(script stripped in DOM), switch to legacy-compatible parsing:chapterContent.parserID = JSchapterContent.responseFormatType = ''(plain string)- decode body in
responseJavascript(config, params, resStr)from raw response text.
- Use old-engine-safe JS for compatibility:
- prefer
var+function - avoid
new URL(), optional chaining, nullish coalescing
- prefer
See detailed pitfalls: references/xiangse-parser-pitfalls.md.
Step 4: Convert Between JSON and XBS
Preferred (cross-platform, including Windows/Termux):
python tools/scripts/xbs_tool.py json2xbs -i <input.json> -o <output.xbs>python tools/scripts/xbs_tool.py xbs2json -i <input.xbs> -o <output.json>python tools/scripts/xbs_tool.py roundtrip -i <input.json> -p <output_prefix>python tools/scripts/xbs_tool.py import-fix -i <input.xbs|input.json> -o <fixed.json> [--to-xbs <fixed.xbs>] [--report <fix_report.json>]python tools/scripts/xbs_tool.py check-editor -i <input.json>python tools/scripts/xbs_tool.py simulate-live -i <input.xbs|input.json> --engine auto --webview-timeout 25 --keyword 都市 --book-index 0 --chapter-index 0 --report <simulate_report.json>python tools/scripts/xbs_tool.py simulate-fixture -i <input.xbs|input.json> --engine auto --webview-timeout 25 --fixtures <fixtures_dir_or_map> --report <simulate_fixture_report.json>python tools/scripts/xbs_tool.py profile -i <input.json> -o <editor_safe.json> --profile editor_safepython tools/scripts/xbs_tool.py build-ab -i <input.json> -d <out_dir> --prefix <name> --to-xbspython tools/scripts/xbs_tool.py normalize-2561 -i <json_or_dir> --rebuild-xbs --report <report.json>- Note:
json2xbs/roundtripauto-run schema guard; conversion aborts on schema mismatch. simulate-live/simulate-fixtureauto-runimport-fix -> schema_check -> editor_check -> 4-step simulation:- steps:
searchBook/bookDetail/chapterList/chapterContent - engine:
auto|http|webview(默认auto) - pass gate: four steps all pass
- anti-bot (
403/429/challenge) returnsblocked(not parser fail)
- steps:
- If absolutely needed, bypass with
--skip-schema-check(not recommended for delivery artifacts). - Windows 开箱即用约束(新增):
- 默认 runner 优先级:
XBSREBUILD_BIN- 内置
tools/bin/windows/xbsrebuild.exe - PATH
XBSREBUILD_ROOT+go run- 同级
../xbsrebuild+go run - 内置
tools/vendor/xbsrebuild+go run
- Windows 默认无需 Go(内置 EXE 可直接转换)
- 默认不依赖外部同级仓;即使没有
../xbsrebuild,也可回退到仓内 vendored 源码。 - 可选入口:
- CMD:
json2xbs.cmd / xbs2json.cmd / roundtrip_check.cmd - PowerShell:
json2xbs.ps1 / xbs2json.ps1 / roundtrip_check.ps1
- CMD:
- 首次排障先跑:
python tools/scripts/xbs_tool.py doctor
- 默认 runner 优先级:
Fallback:
xbsrebuild xbs2json/json2xbs- Python fallback implementing XXTEA + appended plain-length tail
XXTEA details are documented in references/xbs-xxtea-format.md.
Step 5: Output Contract
When delivering results, always provide:
- absolute path to JSON
- absolute path to XBS
- SHA256 of XBS
- simulate report path and verdict (
pass/fail/blocked) - runtime engine evidence (
steps.*.runtime_engine) - for WebView sources: include
steps.*.webview_tracesummary (navigation/injection/filter/failure) - 保存回归结论(不改保存 / 改名保存 / 改字段保存)
- brief debug note if any compatibility workaround was applied
- schema check result (
PASS/FAIL) and command used.
Do/Don't
Do:
- Keep
enableas numeric1/0. - Keep
weightas string integer (recommend\"9999\"for highest priority). - Set
lastModifyTimeas Unix seconds string (not date text), e.g."1772463417". - Keep rules minimal and testable.
- Verify with at least one real query and one real chapter.
- For chapter pagination, verify at least two chapter samples:
- one truly paged chapter (should continue to
_1, etc.) - one non-paged chapter (must stop, no fake
_1)
- one truly paged chapter (should continue to
- For media links in chapter content, return object with dynamic headers when required:
{"url": result, "httpHeaders": {...}}
Don't:
- Keep legacy callback fields by default (
requestJavascript,responseJavascript,requestFunction,responseFunction), unless required for old-client compatibility. - Assume
.//behaves correctly in app runtime.