Install
openclaw skills install rss-sitemapDiscover website URLs, feed entries, and latest publications by checking sitemap.xml, sitemaps.xml, atom.xml, and rss.xml before crawling a specific site. Use when Codex needs to find the most recent posts/articles/publications from a named website or domain, search, crawl, scrape, monitor, or enumerate site content, and should prefer the site's own sitemap, Atom feed, or RSS feed over blind link crawling.
openclaw skills install rss-sitemapUse this skill to bootstrap site discovery from the site's own machine-readable indexes before doing general crawling. For any task that targets a specific website, first look for sitemap, Atom, and RSS resources and use them to find the latest publications or guide the crawl.
https://example.com.exec tool when Node.js 18+ is available. exec is the shell tool name; do not require a separate bash tool:
node skills/rss-sitemap/scripts/preprocess-rss-sitemap.js --site https://example.com --output /tmp/rss-sitemap.json
/sitemap.xml/sitemaps.xml/atom.xml/rss.xml/robots.txt for Sitemap: directives and include those sitemap URLs.Use scripts/preprocess-rss-sitemap.js for deterministic pre-crawl discovery. It has no npm dependencies and uses Node's built-in fetch, so it requires Node.js 18 or newer for URL fetching.
Common commands:
node skills/rss-sitemap/scripts/preprocess-rss-sitemap.js --site https://example.com
node skills/rss-sitemap/scripts/preprocess-rss-sitemap.js --url https://example.com/sitemap.xml --url https://example.com/feed.xml
node skills/rss-sitemap/scripts/preprocess-rss-sitemap.js --file ./sitemap.xml --file ./feed.xml
node skills/rss-sitemap/scripts/preprocess-rss-sitemap.js --site https://example.com --max-depth 2 --output /tmp/rss-sitemap.json
The script outputs JSON with:
resources: probed XML or robots resources, HTTP status, content type, detected kind, and entry count.entries: normalized sitemap URLs, RSS items, or Atom entries with source provenance.For latest-publication requests, sort entries by the best available date:
pubDateupdatedpublishedlastmodIf entries do not include dates, prefer RSS or Atom feed order before sitemap order because feeds usually list newest content first.
If the script fails because the site blocks requests, needs JavaScript, or requires authentication, use the available web scraping/search/browser tools for fetching, then apply the same parsing and crawl strategy.
Required tools:
exec enabled for host script execution.For sitemaps:
<sitemapindex> as a list of nested sitemaps; recursively fetch each <loc>.<urlset> as crawlable page URLs; extract <loc> and keep useful metadata such as <lastmod>, <changefreq>, and <priority> when present.For RSS feeds:
<item> with title, link, guid, pubDate, and description when present.link as the crawl URL; fall back to guid only if it is URL-like.For Atom feeds:
<entry> with title, id, updated, published, summary, and link.<link rel="alternate" href="...">; otherwise use the first URL-like href.