Website Content Scraped into Obsidian

v0.1.2

Fetch social media content and save to Obsidian. Supports Twitter/X, Reddit, GitHub, HackerNews, Bilibili, Weibo, Xiaohongshu and 30+ platforms via bb-browse...

0· 142·0 current·0 all-time

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for glassmarbles/claw-social-feed.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "Website Content Scraped into Obsidian" (glassmarbles/claw-social-feed) from ClawHub.
Skill page: https://clawhub.ai/glassmarbles/claw-social-feed
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Canonical install target

openclaw skills install glassmarbles/claw-social-feed

ClawHub CLI

Package manager switcher

npx clawhub@latest install claw-social-feed
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
medium confidence
Purpose & Capability
Name/description, config.yaml, platforms.md and scripts/fetch_save.py all align: the skill calls bb-browser to fetch posts, filters/tags them, and writes markdown files to an Obsidian vault. There are no unrelated credentials or unexplained binaries requested.
Instruction Scope
SKILL.md directs installing and using bb-browser and instructs the agent to create a scheduled sync (cron job). The code itself reads/writes config.yaml, a local state file, and writes files into a user-specified vault path — all expected for this purpose. Two things to note: (1) SKILL.md promises the agent will create cron jobs (system scheduling changes) — that is beyond mere file I/O and should be consented to, and (2) the workflow relies on bb-browser --openclaw to reuse the OpenClaw browser session, which means any logged-in session/data in that browser could be used by bb-browser adapters.
Install Mechanism
The repository contains no automated install spec; the instructions ask the user to install bb-browser via npm (global). This is a typical approach but requires Node/npm and a global install; there is no opaque download or embedded binary. No files in the skill perform arbitrary remote downloads during install.
Credentials
The skill declares no required environment variables or credentials and the Python script does not read secrets from env vars. It does probe common paths (home/.nvm, /usr/local/bin) to locate bb-browser and will read/write files under the user's home directory (config.yaml, state file, and the specified Obsidian vault). These accesses are proportionate to the stated purpose, but because bb-browser uses the OpenClaw browser session, it may access any web sessions/cookies present in that browser — that is a functional requirement but a privacy consideration.
Persistence & Privilege
always is false (normal). However SKILL.md states the agent will create cron jobs to enable scheduled syncs. Creating/modifying crontab entries is a system-level action outside the script itself; users should explicitly approve such changes. The skill does not claim or request permanent privileged presence beyond that.
Assessment
This skill appears to do what it says: it runs bb-browser to fetch posts, filters/tags them, and writes .md files into an Obsidian vault. Before installing or running it: (1) be prepared to install bb-browser (npm global) and ensure Node.js is acceptable on your machine; (2) review and set vault_base in config.yaml so files go where you expect; (3) run with --dry-run/--verbose first to observe behavior; (4) the skill’s scheduled sync flow will create cron jobs if the agent follows SKILL.md — only allow that if you want automatic system-level cron modifications; (5) be aware bb-browser --openclaw reuses the OpenClaw browser session (cookies/logins). If you do not want adapters to access logged-in sessions (bookmarks/notifications/private feeds), avoid reusing the browser session or log out of those sites first. Finally, inspect the config and the .claw-social-feed-state.json after a run so you understand what was fetched and when.

Like a lobster shell, security has layers — review code before you run it.

latestvk97fh2edrt55zs7xnp0rsbekd183qwry
142downloads
0stars
3versions
Updated 1mo ago
v0.1.2
MIT-0

claw-social-feed

Fetch social media timelines into Obsidian vaults. Multi-platform, incremental sync, smart filtering, auto-tagging.

Core dependency: bb-browser (via --openclaw flag to reuse the OpenClaw browser session). Supports 36 platforms via bb-browser adapters — see references/platforms.md.

Workflow

User config (config.yaml)
      │
      ▼
fetch_save.py
      │
      ├── Dedup accounts
      ├── Read state.json (last fetch cursor)
      │
      ▼
bb-browser site <platform>/<cmd> --openclaw --json
      │
      ▼
Filter → Tag → Write to Obsidian
      │
      ▼
Update state.json

Quick Start

1. Install bb-browser

# Requires Node.js 18+
npm install -g bb-browser

# Verify
bb-browser --version

2. Configure accounts

Edit config.yaml:

accounts:
  - platform: twitter
    username: your_target_handle
  - platform: hackernews
    username: your_username

vault_base: ~/Documents/Obsidian Vault/SocialFeed

fetch:
  count: 20

filters:
  min_text_length: 30
  skip_retweet_no_comment: true
  skip_link_only: true
  blocked_keywords: []

tagging:
  enabled: true
  keywords:
    AI / LLM / GPT / Claude: AI
    Python / JavaScript / Rust: coding

3. Run

python3 scripts/fetch_save.py --verbose

4. Check output

Content lands in vault_base/@username/ — one .md file per post, with Obsidian YAML frontmatter (platform, author, date, URL, likes, tags).


Config Reference

accounts

accounts:
  - platform: twitter
    username: dotey
  • platform: must match a bb-browser supported platform (see references/platforms.md)
  • username: the platform-native user identifier
  • Deduplication: platform + username must be unique within the list

filters

FieldTypeDefaultDescription
min_text_lengthint30Skip posts below this character count
skip_retweet_no_commentbooltrueSkip retweets with no original comment
skip_link_onlybooltrueSkip posts that are links/images with little text
blocked_keywordslist[]Skip posts containing any of these keywords

tagging

Auto-tag based on keyword matching (case-insensitive, / separated synonyms = OR):

tagging:
  enabled: true
  keywords:
    AI / LLM / 大模型: AI
    skill / Skills: skill
    Python / JavaScript: coding

fetch.count

fetch:
  count: 20  # default 20, max 100

twitter/tweets returns ~20 tweets newest-first by default. For scheduled syncs, set to 50–100 to avoid missing posts from high-frequency accounts between sync intervals.


Incremental Sync

state.json tracks the last-fetched timestamp per account. On re-run:

  1. Skips posts with created_at ≤ last_fetch
  2. Saves only new content
  3. Updates last_fetch timestamp

Missed-run compensation: if a cron job missed a run (e.g., machine was off), the next run will backfill content within catchup_window_days (default 3 days).

To force re-fetch an account: delete its entry in state.json or delete the corresponding .md files.


Scheduled Sync

To enable automatic sync, ask the agent:

"Sync every morning at 9am" or "Sync every Monday at 8am"

The agent will create a cron job that runs in isolated mode with incremental sync — no duplicates.


Troubleshooting

bb-browser: command not found The script auto-detects bb-browser PATH. If it still fails, confirm npm global bin is in your PATH, or install via npm install -g bb-browser.

twitter/search returns webpack module error Use twitter/tweets instead of twitter/search. This is a known bb-browser adapter compatibility issue.

Platform returns 401 Unauthorized The OpenClaw browser needs to be logged into that platform. Open the site manually in the browser, log in once, then retry.

File already exists but want to re-fetch Delete the corresponding entry in state.json or delete the .md files for that account.

Comments

Loading comments...