Install
openclaw skills install nutrition-provider-r2crawl the Vietnam nutrition provider page-by-page with scrapling-official and upload each raw provider record to Cloudflare R2 after fetch. use for provider-specific ingestion where the canonical data comes from the provider's foods or prepared-dish JSON APIs, records from one page may upload in parallel with stable object keys, and each page must still respect a minimum 60 second crawl-plus-upload window before the next page begins.
openclaw skills install nutrition-provider-r2This skill is a provider-specific orchestration wrapper around scrapling-official.
Its job is to take the payload that scrapling-official fetched, split each canonical page into individual provider records, and upload those records to Cloudflare R2.
It does not replace scrapling-official as the crawler.
Use it when the target is one of the Vietnam nutritional portal lookup listings and the job is:
Do not normalize provider records into a custom nutrition schema. Preserve provider fields, response bodies, pagination clues, and raw linked payloads exactly as obtained whenever possible.
This skill depends on scrapling-official for crawling.
scrapling-official is not installed or not set up yet, stop and tell the user to install and configure that skill first.scrapling-official own crawl execution, endpoint discovery, rendering mode, and fetch escalation.scrapling-official's fetch escalation strategy exactly: start with get, then move to fetch if needed, then stealthy-fetch only when the earlier modes fail or protection requires it.scrapling-official is missing.{baseDir}/references/source-notes.md for the default source URL, pagination clues, and stop conditions.R2_ACCOUNT_IDR2_ACCESS_KEY_IDR2_SECRET_ACCESS_KEYR2_BUCKETscrapling-official to inspect the provider page and determine which payload actually contains the canonical records for the current request.scrapling-official discovers it, instead of the outer HTML shell.gia-tri-dinh-duong-thuc-pham exposes records from GET /api/fe/foodNatunal/getPageFoodDatagia-tri-dinh-duong-mon-an exposes records from GET /api/fe/tool/getPageFoodDatapage=1&pageSize=15&energy=0page=1&pageSize=15name, category, energyname, energy, with additional filters visible in the UI such as group and region; let scrapling-official discover the exact live request paramsscrapling-official fetched for that page without normalizing item fields.scrapling-official can fetch the canonical JSON payload, treat raw.data as the list of provider records for that page.data._id firstcodeuv run {baseDir}/scripts/upload_page_to_r2.py --extract-foods --page-index <n> --skip-existing--extract-foods is retained for compatibility, but it may also be used for prepared-dish page payloads because both current source types return data arrays.--food-id.scrapling-official cannot reach the canonical payload directly. Do not upload the HTML shell as the primary dataset.scrapling-official handle the actual pagination requests.source_url, fetched_at, page_index, content_type, and storage_key.scrapling-official is responsible for extracting or fetching the correct provider payload.scrapling-official can access both.N+1 begins.scrapling-official fetches JSON from an XHR endpoint, store that JSON body unchanged.Use page-sequential crawling with record-level upload concurrency.
N+1 until page N has finished all uploads.Required environment variables:
R2_ACCOUNT_IDR2_ACCESS_KEY_IDR2_SECRET_ACCESS_KEYR2_BUCKETOptional environment variables used by the helper when --key is not passed:
R2_PREFIX default rawSOURCE_NAME default nutrition-providerRUN_ID default current UTC timestamp in YYYY-MM-DDTHH-MM-SSZWhen supporting both provider sources, do not reuse the same storage namespace for both in the same crawl run.
SOURCE_NAME=viendinhduong-foods for gia-tri-dinh-duong-thuc-phamSOURCE_NAME=viendinhduong-dishes for gia-tri-dinh-duong-mon-an--source-name explicitly per crawl jobWrap the provider payload with minimal crawl metadata only when needed for storage traceability:
{
"source_url": "https://viendinhduong.vn/api/fe/foodNatunal/getPageFoodData?page=1&pageSize=15&energy=0",
"page_index": 1,
"fetched_at": "2026-03-15T10:00:00Z",
"content_type": "application/json",
"raw": {
"data": [],
"current_page": 1,
"per_page": 15,
"total": 853
}
}
The foods endpoint currently returns page-level JSON with top-level keys data, current_page, per_page, and total. Each food item currently includes _id, code, name_vi, name_en, category, categoryEn, nutrition, and energy.
The prepared-dish endpoint currently returns page-level JSON with top-level keys current_page, data, first_page_url, from, last_page, last_page_url, links, next_page_url, path, per_page, prev_page_url, to, and total. Each dish item currently includes _id, category_id, code, description, dish_components, food_area_id, image, name_vi, name_en, nutritional_components, total_energy, category_name, category_name_en, and category_description.
Use those richer raw objects only as the source page payloads to split into per-record uploads.
Recommended per-record upload shape:
{
"source_url": "https://viendinhduong.vn/api/fe/foodNatunal/getPageFoodData?page=1&pageSize=15&energy=0",
"page_index": 1,
"fetched_at": "2026-03-15T10:00:00Z",
"content_type": "application/json",
"raw": {
"_id": "6877a6b660d6c84e9bd5cca4",
"code": "10001",
"name_vi": "Sữa bò tươi",
"name_en": "Milk cow, fresh (Fluid)",
"category": "Sữa và sản phẩm chế biến",
"categoryEn": "Milk and processed products",
"nutrition": [],
"energy": 74
}
}
Use uv run {baseDir}/scripts/upload_page_to_r2.py.
The helper supports two modes:
--keyR2_PREFIX, SOURCE_NAME, RUN_ID, --page-index, and optional --food-idGenerated keys follow this layout:
raw/<source>/<run_id>/page-0001/food-6877a6b660d6c84e9bd5cca4.jsonraw/<source>/<run_id>/failures/page-0001.jsonFor this skill, per-record upload is the default and expected mode.
--extract-foods when the input is a full canonical page JSON payload from either supported source--food-id only when uploading a single already-split record object--skip-existing when reruns are possibleExamples:
uv run {baseDir}/scripts/upload_page_to_r2.py \
--input tmp/page-0001.json \
--page-index 1 \
--extract-foods \
--skip-existing
uv run {baseDir}/scripts/upload_page_to_r2.py \
--input tmp/food-10001.json \
--page-index 1 \
--food-id 6877a6b660d6c84e9bd5cca4 \
--skip-existing
cat tmp/food-10001.json | uv run {baseDir}/scripts/upload_page_to_r2.py \
--page-index 1 \
--food-id 10001 \
--skip-existing \
--content-type application/json
uv run {baseDir}/scripts/upload_page_to_r2.py \
--input tmp/food-10001.json \
--key raw/viendinhduong/2026-03-15T10-00-00Z/page-0001/food-10001.json
Only for debug or failure capture:
uv run {baseDir}/scripts/upload_page_to_r2.py \
--input tmp/page-0001.json \
--page-index 1 \
--failed
For this provider target, use {baseDir}/references/source-notes.md.