Yandex Archive Scraper

v1.0.0

Search and extract data from Yandex.Archive (Яндекс.Архив) — metric books, newspapers, directories. Bypasses bot protection via Scrapling.

0· 112·0 current·0 all-time
byFlo@flobo3

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for flobo3/yandex-archive-scraper.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "Yandex Archive Scraper" (flobo3/yandex-archive-scraper) from ClawHub.
Skill page: https://clawhub.ai/flobo3/yandex-archive-scraper
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install yandex-archive-scraper

ClawHub CLI

Package manager switcher

npx clawhub@latest install yandex-archive-scraper
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description (Yandex.Archive scraping, bypassing bot protection) align with the included Python scripts and declared dependencies (scrapling, playwright, etc.). The code only targets Yandex.Archive URLs and extracts site-specific JSON/HTML.
Instruction Scope
SKILL.md and README instruct installing the listed Python packages and using StealthyFetcher to fetch archive pages. The runtime instructions and scripts stay focused on constructing search URLs, fetching pages, and parsing results; they do not read unrelated files or environment variables.
Install Mechanism
The package is instruction-first and contains code files but no formal install spec. README suggests pip installing several packages and running 'playwright install chromium' — this will download browser binaries and execute third-party code (scrapling, browserforge). That is expected for a scraper but increases runtime footprint and risk from third-party packages.
Credentials
The skill requests no environment variables, no credentials, and accesses no system config paths. The lack of secret access is proportionate to a public-web scraping task.
Persistence & Privilege
Skill does not request always:true, does not attempt to modify other skills or agent-wide settings, and requires no persistent credentials. Autonomous invocation is allowed (platform default) but not combined with additional privileges.
Assessment
This skill appears internally consistent with its stated purpose: it fetches and parses Yandex.Archive pages and uses Scrapling/Playwright to avoid bot protections. Before installing, consider that: (1) bypassing bot protection may violate Yandex's terms of service or local law — ensure you have the right to scrape the target; (2) installing Playwright will download browser binaries and the listed Python packages (third-party code) which will run on your system — audit or sandbox the environment and verify package sources (PyPI project pages, authors); (3) run this skill in an isolated environment (container/VM) if you are concerned about third-party dependencies; and (4) no secrets are required by the skill itself, but if you modify it to integrate other services, re-evaluate requested credentials. If you want, I can list the exact package pages to review or suggest safer alternatives (site APIs, manual downloads, or permissioned data access).

Like a lobster shell, security has layers — review code before you run it.

latestvk972tx8bx13vn889fk42dtpeq18434rb
112downloads
0stars
1versions
Updated 3w ago
v1.0.0
MIT-0

yandex-archive-scraper

A powerful skill for searching and extracting data from Yandex.Archive (Яндекс.Архив) using Scrapling to bypass bot protection and Cloudflare Turnstile.

Features

  • Converts natural language queries into optimized Yandex.Archive search URLs.
  • Uses Scrapling (StealthyFetcher) to bypass Yandex bot protection.
  • Extracts search results (document titles, text snippets, and direct links).
  • Supports pagination to collect multiple pages of results.
  • Can search across all three Yandex.Archive indexes:
    • archive (Архивы) — Metric books, revision tales, confessional statements.
    • mass_media (Периодика) — Old newspapers (e.g., "Senate Gazette", "Provincial Gazette").
    • directories (Справочники) — Address calendars, lists of residents, memorable books.

Tools

yandex_archive_search

Search Yandex.Archive based on a natural language query. Parameters:

  • query (string): The search query (e.g., "Александр Пушкин Москва").
  • index (string, optional): The index to search in. Options: archive (default), mass_media, directories.
  • max_pages (integer, optional): Maximum number of pages to scrape (default 1).

Requirements

  • scrapling
  • playwright
  • curl_cffi
  • patchright
  • msgspec
  • browserforge

yandex-archive-scraper (Русский)

Мощный скилл для поиска и извлечения данных из Яндекс.Архива с использованием фреймворка Scrapling для обхода защиты от ботов и Cloudflare Turnstile.

Возможности

  • Преобразует запросы на естественном языке в оптимизированные URL для поиска по Яндекс.Архиву.
  • Использует Scrapling (StealthyFetcher) для обхода защиты Яндекса.
  • Извлекает результаты поиска (названия документов, текстовые фрагменты/сниппеты и прямые ссылки).
  • Поддерживает пагинацию для сбора нескольких страниц результатов.
  • Умеет искать по всем трем базам Яндекс.Архива:
    • archive (Архивы) — Метрические книги, ревизские сказки, исповедные ведомости.
    • mass_media (Периодика) — Старые газеты (например, "Сенатские ведомости", "Губернские ведомости").
    • directories (Справочники) — Адрес-календари, списки жителей, памятные книжки.

Инструменты (Tools)

yandex_archive_search

Поиск по Яндекс.Архиву на основе текстового запроса. Параметры:

  • query (string): Поисковый запрос (например, "Александр Пушкин Москва").
  • index (string, optional): Раздел для поиска. Варианты: archive (по умолчанию), mass_media, directories.
  • max_pages (integer, optional): Максимальное количество страниц для парсинга (по умолчанию 1).

Зависимости

  • scrapling
  • playwright
  • curl_cffi
  • patchright
  • msgspec
  • browserforge

Comments

Loading comments...