Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

餐厅推荐交叉验证

Cross-reference restaurant recommendations from Xiaohongshu (小红书) and Dianping (大众点评) to validate restaurant quality and consistency. Use when querying restaurant recommendations by geographic location (city/district) to get validated insights from both platforms. Automatically fetches ratings, review counts, and analyzes consistency across platforms to provide trustworthy recommendations with confidence scores.

MIT-0 · Free to use, modify, and redistribute. No attribution required.
2 · 890 · 1 current installs · 1 all-time installs
byleon@liyang2016
MIT-0
Security Scan
VirusTotalVirusTotal
Suspicious
View report →
OpenClawOpenClaw
Suspicious
medium confidence
Purpose & Capability
The code and documentation match the stated purpose: fetching data from Dianping and Xiaohongshu, fuzzy-matching restaurants, and computing consistency/recommendation scores. Libraries (requests, bs4, thefuzz, Playwright) and matching/sentiment logic are appropriate to that goal.
!
Instruction Scope
SKILL.md and IMPLEMENTATION.md explicitly instruct the agent to perform web scraping, use persistent authenticated browser sessions, rotate proxies, and store cookies/sessions locally. This goes beyond a simple read-only lookup: it instructs actions that can log in as a user (cookies), maintain persistent authenticated sessions, and mimic human browsing (Playwright). Those instructions also push the operator toward anti-scraping workarounds (residential proxies, user-agent rotation), which raises legal/compliance and operational risk.
Install Mechanism
No formal registry install spec in the skill meta, but repository includes setup.sh that installs Python deps and downloads Playwright browsers. Installing Playwright and pip packages is standard for such tooling, but setup.sh should be reviewed before running. There are no obscure external download URLs in the provided files, but Playwright will download browser binaries from upstream.
!
Credentials
The skill declares no required env vars, but it persistently stores authenticated sessions (cookies/localStorage) under a sessions/ directory and expects proxies (proxy_list) to be configured. That is proportional to scraping functionality, but it creates a risk: sensitive session cookies or proxy credentials may be stored in plain files (scripts/config.py or sessions/) and could be accidentally committed/published. The skill's docs even guide publishing; there are no explicit safeguards (e.g., .gitignore) shown to prevent leaking session data or credentials.
!
Persistence & Privilege
The skill includes a session manager that persists login state and claims to auto-login and maintain sessions for 1–2 weeks. While always:false, the agent can be invoked autonomously; combined with persistent authenticated sessions, this means the skill can make authenticated requests on behalf of the user without re-authentication. This increases the blast radius if credentials or sessions are leaked or if the skill is misused.
What to consider before installing
What to consider before installing or running this skill: - Legal/ToS: Both Dianping and Xiaohongshu explicitly prohibit scraping in their docs; using the 'real' scraping mode may violate platform terms and local law — only use for personal research and accept the legal risk. - Session cookies: The skill saves browser sessions/cookies locally (sessions/ and session_state.json). Do NOT run setup.sh or login on shared/cloud machines. Before publishing or sharing the repo, ensure sessions/ and any files containing credentials are removed and added to .gitignore. - Do not store secrets in scripts/config.py: If you must use proxies with authentication, avoid writing credentials into repository files; prefer environment variables or a secure secret store and do not publish them. - Review setup/install scripts: Inspect setup.sh and any install steps before running. They install Python packages and download Playwright browsers (normal) but running them grants the code filesystem and network access on your machine. - Use mock/server-only mode on servers: The repo includes a simulated/mock-data (server) version — use that on headless or shared servers to avoid login/cookie persistence. - Audit network destinations: The docs recommend residential proxy providers; review any third-party service terms and avoid sending credentials or session files to unfamiliar hosts. - Reduce blast radius: Run the skill in an isolated VM or local machine, not on production or shared servers. If you plan to publish, remove any sessions/ and credentials first. If you want, I can: (1) point out exact filenames that store sessions and should be excluded, (2) scan setup.sh for any unsafe commands, or (3) suggest minimal code changes (e.g., .gitignore entry and switching config to read proxy credentials from env vars) to reduce risk.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.0
Download zip
chinavk97brpb688pqa2c7z2wy6jzhf980tzmtchinesevk97brpb688pqa2c7z2wy6jzhf980tzmtdianpingvk97brpb688pqa2c7z2wy6jzhf980tzmtfoodvk97brpb688pqa2c7z2wy6jzhf980tzmtlatestvk97brpb688pqa2c7z2wy6jzhf980tzmtrecommendationvk97brpb688pqa2c7z2wy6jzhf980tzmtrestaurantvk97brpb688pqa2c7z2wy6jzhf980tzmtxiaohongshuvk97brpb688pqa2c7z2wy6jzhf980tzmt

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

SKILL.md

Restaurant Review Cross-Check

Cross-reference restaurant data from Xiaohongshu and Dianping to provide validated recommendations.

Quick Start

Query restaurants by location and cuisine type:

# Basic query
crosscheck-restaurants "上海静安区" "日式料理"

# With filters
crosscheck-restaurants "北京朝阳区" "火锅" --min-rating 4.5 --min-reviews 100

Workflow

1. Data Collection

Query both platforms simultaneously:

Dianping:

  • Fetch restaurants matching location + cuisine
  • Extract: name, rating, review_count, price_range, address, tags

Xiaohongshu:

  • Search notes/posts matching location + cuisine
  • Extract: restaurant_name, engagement_metrics (likes/saves), sentiment_score
  • Note: Xiaohongshu data requires scraping as no public API

2. Data Matching

Match restaurants across platforms using fuzzy matching:

  • Restaurant name similarity (Levenshtein distance)
  • Location proximity (address matching)
  • Handle name variations (e.g., "银座寿司" vs "银座寿司静安店")

See scripts/match_restaurants.py for matching logic.

3. Consistency Analysis

Calculate consistency score based on:

  • Rating correlation (0-1): Correlation between platform ratings
  • Engagement validation (0-1): Do high ratings correlate with high engagement?
  • Sentiment alignment (0-1): Do user sentiments align across platforms?

Formula: consistency_score = (rating_corr * 0.5) + (engagement_val * 0.3) + (sentiment_align * 0.2)

4. Recommendation Score

Calculate final recommendation score:

recommendation_score = (
    (dianping_rating * 0.4) +
    (xhs_engagement_normalized * 0.3) +
    (consistency_score * 0.3)
) * 10

Output: 0-10 scale, where >8.0 = high confidence recommendation

Output Format

📍 [Location] [Cuisine Type] 餐厅推荐

1. [Restaurant Name]
   🏆 推荐指数: X.X/10
   ⭐ 大众点评: X.X (Xk评价)
   💬 小红书: X.X⭐ (X笔记)
   📍 地址: [Address]
   💰 人均: ¥[Price]
   ✅ 一致性: [高/中/低] - [Brief explanation]
   
   📊 平台对比:
   - 大众点评标签: [Tags]
   - 小红书热词: [Keywords]
   
   ⚠️ 注意: [Any discrepancies or warnings]

[Continue for top 5-10 restaurants...]

Thresholds

  • Min rating: 4.0/5.0 (configurable)
  • Min reviews: 50 on Dianping, 20 notes on Xiaohongshu (configurable)
  • Max results: Top 10 restaurants by recommendation score
  • High consistency: Score > 0.7
  • Medium consistency: Score 0.5-0.7
  • Low consistency: Score < 0.5 (flag for manual review)

API & Data Sources

Dianping

  • Method: Web scraping (Dianping API requires business partnership)
  • Base URL: https://www.dianping.com
  • Rate limiting: 1 request/2 seconds minimum
  • Anti-scraping: Use residential proxies, rotate user agents

See scripts/fetch_dianping.py for implementation.

Xiaohongshu

  • Method: Web scraping (no public API)
  • Base URL: https://www.xiaohongshu.com
  • Rate limiting: 1 request/3 seconds minimum
  • Authentication: Cookies required for full access

See scripts/fetch_xiaohongshu.py for implementation.

Configuration

Edit scripts/config.py to set:

DEFAULT_THRESHOLDS = {
    "min_rating": 4.0,
    "min_dianping_reviews": 50,
    "min_xhs_notes": 20,
    "max_results": 10
}

PROXY_CONFIG = {
    "use_proxy": True,
    "proxy_list": ["http://proxy1:port", "http://proxy2:port"]
}

Error Handling

  • No matches found: Suggest broader search terms or nearby areas
  • Platform timeout: Retry with exponential backoff, max 3 attempts
  • Rate limiting detected: Pause for 60 seconds, rotate proxy
  • Low confidence results: Flag results with consistency < 0.5 for manual review

Advanced Features

Sentiment Analysis

Xiaohongshu posts use NLP to extract:

  • Food quality mentions
  • Service quality mentions
  • Atmosphere mentions
  • Price/value mentions

See references/sentiment_analysis.md for methodology.

Fuzzy Matching

Handle restaurant name variations:

  • Chain stores (e.g., "海底捞火锅" vs "海底捞静安店")
  • Abbreviations (e.g., "鼎泰丰" vs "鼎泰丰上海店")
  • Translation differences

Uses thefuzz library for similarity scoring.

Dependencies

pip install requests beautifulsoup4 pandas numpy thefuzz selenium lxml

See scripts/requirements.txt for complete list.

Troubleshooting

Issue: Xiaohongshu returns empty results

  • Solution: Check if cookies expired, re-authenticate

Issue: Dianping blocks requests

  • Solution: Reduce request rate, rotate proxies

Issue: Poor matching between platforms

  • Solution: Adjust similarity threshold in match_restaurants.py

References

Files

27 total
Select a file
Select a file to preview.

Comments

Loading comments…