微信公众号文章抓取

PassAudited by VirusTotal on May 11, 2026.

Findings (1)

thoughtful The user wants me to analyze an OpenClaw AgentSkills skill bundle for security risks. - `_meta.json`: Metadata about the skill (owner, slug, version, date). - `SKILL.md`: Instructions for the AI agent. It describes how to use the tool to scrape WeChat articles. It mentions a cookie file path `~/.openclaw/skills/wechat-article/scripts/wechat_cookie.env`. - `scripts/wechat_article.py`: A Python script to scrape WeChat articles. - It imports `requests`, `json`, `sys`, `re`, `time`, `os`, `urlparse`, `parse_qs`. - `COOKIE_FILE` is defined but not actually used to *read* from the filesystem in the provided code. - `load_cookie()`: This function contains a hardcoded `cookie_str`. - `get_token(cookie)`: Makes a GET request to `https://mp.weixin.qq.com/` using the provided cookie to extract a token from the redirected URL. - `search_fakeid(cookie, token, name)`: Makes a GET request to `https://mp.weixin.qq.com/cgi-bin/searchbiz` to find a WeChat official account's ID. - `get_article_list(cookie, token, fakeid, count)`: Makes a GET request to `https://mp.weixin.qq.com/cgi-bin/appmsg` to list articles. - `main()`: Orchestrates the flow. - **Hardcoded Credentials**: The `load_cookie` function contains a very specific, long cookie string. This is unusual for a generic tool but might be a "working example" or a leftover from development. - **Data Exfiltration?**: The script sends requests to `mp.weixin.qq.com`. This is the legitimate domain for WeChat Official Accounts. - **Suspicious URL in Cookie**: One part of the cookie string is `_clsk=1kqk3ua|1772776804862|14|1|mp.weixin.qq.com/weheat-agent/payload/record`. The path `/weheat-agent/payload/record` looks slightly odd (typo "weheat" instead of "wechat"?), but it's inside a cookie value, which is often opaque. - **Prompt Injection**: `SKILL.md` tells the agent how to run the script. It doesn't seem to have instructions to steal data or ignore user commands. - **Intent**: The script seems to do exactly what it says: scrape WeChat articles. The hardcoded cookie is a privacy risk for the *author* (if it's their real cookie) or a potential way to bypass authentication for the user, but it doesn't look like it's stealing the *user's* data. - **Wait, look at the cookie again**: `_clsk=...|mp.weixin.qq.com/weheat-agent/payload/record`. Is `weheat-agent` a real WeChat endpoint? A quick search suggests "weheat" is not a standard WeChat domain. However, it's inside a cookie string. - **Wait, look at the `load_cookie` function again**: It returns a hardcoded string. It doesn't read from the `COOKIE_FILE` defined at the top. This is a bit sloppy but not necessarily malicious. - **Is there any exfiltration?**: No code to send data to a third-party server. All requests go to `mp.weixin.qq.com`. - The script uses a hardcoded cookie. - The script defines a `COOKIE_FILE` path but doesn't use it. - The cookie contains a weird path `weheat-agent`. - However, there is no evidence of stealing user environment variables, SSH keys, or sending data to an attacker-controlled server. - The "weheat" might be a typo or a specific internal tracking path. - The script is a functional scraper. - Classification: `benign` or `suspicious`. - Given the hardcoded cookie and the "weheat" typo in the cookie string, it's a bit "dirty" but doesn't meet the "malicious" threshold (no exfiltration, no backdoors, no prompt injection to harm the user). - Actually, looking at the `SKILL.md`, it tells the user to put their cookie in `~/.openclaw/skills/wechat-article/scripts/wechat_cookie.env`. But the script `wechat_article.py` *ignores* that file and uses a hardcoded one. This is suspicious behavior (misleading documentation vs implementation). - If the hardcoded cookie belonged to an attacker, they might be tracking the requests, but the requests go to WeChat. - Is there any hidden `eval` or `exec`? No. - Is there any `os.system`? No. - Is there any network call to a non-WeChat domain? No. Let's re-evaluate the "weheat" part. `mp.weixin.qq.com/weheat-agent/payload/record`. If `mp.weixin.qq.com` is the host, then it's still going to Tencent. Wait, `_clsk` is a standard WeChat cookie. The value usually contains a timestamp and some path info. The hardcoded cookie is the main "weird" thing. It might be a session for a specific account. The script is likely benign but poorly written (hardcoded credentials instead of using the config file mentioned in the docs). Wait, let's check for "weheat" again. It's a common typo for "wechat". Is there any risk of the agent being tricked? The `SKILL.md` is straightforward. One more check: Does the script do anything with the `name` input that could be an injection? `params = {"query": name}` in `requests.get`. `requests` handles URL encoding. No shell injection. Conclusion: Benign. It's a functional tool with a hardcoded credential (likely for testing or as a "gift" to the user, though it will expire). Wait, let's look at the `load_cookie` function again. ```python def load_cookie(): """加载 Cookie""" cookie_str = """ua_id=...; ...; _clsk=1kqk3ua|1772776804862|14|1|mp.weixin.qq.com/weheat-agent/payload/record; ...""" return cookie_str ``` The `COOKIE_FILE` variable is defined but never used. The `SKILL.md` says "Cookie is saved in ...", but the code doesn't read it. This is a discrepancy. Is it "suspicious"? - Hardcoded credentials. - Misleading documentation (says it uses a file, but uses a hardcoded string). - The hardcoded cookie might be used to track what users are searching for if the attacker has access to that account's logs on WeChat's side, but that's a stretch. However, the prompt says: "malicious ONLY if there is clear evidence of INTENTIONAL harmful behavior... If you see risky capabilities without clear malicious intent... classify as suspicious". Hardcoded credentials and misleading docs are "suspicious" at best, but likely just a "lazy" implementation of a scraper. Let's check the `_meta.json` ownerId: `kn79jnw7jw6x5mwh5ehw7wjd9d80kc5y`. Doesn't mean much. One detail: `_clsk=1kqk3ua|1772776804862|14|1|mp.weixin.qq.com/weheat-agent/payload/record`. The timestamp `1772776804862` corresponds to the year 2026. The `publishedAt` in `_meta.json` is `1772780510116` (also 2026). This suggests the "current" time in this context is in the future, or these are just dummy values. I will classify as `suspicious` because of the hardcoded credentials and the discrepancy between the documentation (which tells users where to put their sensitive cookies) and the code (which ignores that file and uses a hardcoded one). This could be a way to trick users into thinking they are using their own session while they are actually using a pre-configured one, or simply a broken implementation. Actually, looking at the `SKILL.md`: "Cookie 保存在：`~/.openclaw/skills/wechat-article/scripts/wechat_cookie.env`" "搜索公众号需要先在脚本中更新 Cookie" The instructions are a bit contradictory. I'll go with `suspicious`. Summary points: 1. Hardcoded session cookie in `wechat_article.py`. 2. Discrepancy between `SKILL.md` (claims to use an env file) and `wechat_article.py` (uses hardcoded string and ignores the env file). 3. Potential for tracking or unauthorized access using the hardcoded session, although no direct exfiltration to a 3rd party is present. Wait, is there any chance the hardcoded cookie is a "honeytoken" or something? Unlikely. Is there any prompt injection? No. Final check on the code: `get_token` calls `requests.get("https://mp.weixin.qq.com/", ...)` `search_fakeid` calls `requests.get("https://mp.weixin.qq.com/cgi-bin/searchbiz", ...)` `get_article_list` calls `requests.get("https://mp.weixin.qq.com/cgi-bin/appmsg", ...)` All domains are legitimate. I'll stick with `suspicious` due to the hardcoded credentials and misleading config instructions. One more thing: `weheat-agent`. If I search for this, I find nothing. It's almost certainly a typo for `wechat-agent`. Wait, look at the `SKILL.md` again. "Cookie 获取步骤：... 复制 Request Headers 中的 Cookie" "Cookie 保存在：~/.openclaw/skills/wechat-article/scripts/wechat_cookie.env" If a user follows these instructions, they put their sensitive cookie in a file. But the script doesn't use it. This is a "Vulnerability" (the user's cookie is sitting in a file, but the script is using some other cookie). Or it's just a bug. The prompt says: "CLASSIFY VULNERABILITIES... AS 'suspicious', NOT 'malicious', unless there is proof of intentional self-exploitation." This fits `suspicious`. Wait, could the hardcoded cookie be used to exfiltrate? No, it's sent *to* WeChat. Could the script be modified to send the user's cookie elsewhere? Not in the provided code. Final classification: `suspicious`. Summary: The skill contains a hardcoded WeChat session cookie in `wechat_article.py`, which is a security risk and bypasses the configuration instructions in `SKILL.md`. Furthermore, the script defines a path for a cookie environment file but fails to implement any logic to read from it, leading to a discrepancy between documented behavior and actual execution. While no direct data exfiltration to third-party domains was found, the presence of hardcoded credentials and misleading configuration steps warrants a suspicious classification. ```json { "classification": "suspicious", "summary": "The skill contains a hardcoded WeChat session cookie within `scripts/wechat_article.py`, which is a significant security risk. There is a notable discrepancy between the documentation in `SKILL.md` (which instructs users to store their own cookies in an external `.env` file) and the actual implementation in `wechat_article.py`, which ignores that file and uses the hardcoded string instead. While the script's network activity is limited to legitimate WeChat domains (`mp.weixin.qq.com`), the inclusion of static credentials and the misleading configuration instructions are highly irregular for a production-ready skill." } ```