Install
openclaw skills install blog-topic-researchResearch and propose long-tail blog topics backed by real, verifiable user demand. Mines candidates from Google Suggest / autocomplete, People Also Ask, Reddit, Stack Overflow, GitHub issues, vendor community forums, and vendor changelogs. Captures every signal as a citable URL with verbatim evidence text, classifies each topic by post format, optionally runs a cannibalization check against an existing backlog, and outputs a structured list with proof URLs, problem summaries, confirmed fixes, version context, and question variants the writer can use as the article's FAQ block and LSI spread. Trigger when the user says: 'research blog topics', 'find topics with real demand', 'expand the editorial backlog', 'research N long-tail topics', or any variant of growing a content pipeline with verified candidates.
openclaw skills install blog-topic-researchGenerates topic candidates for a blog with documented user demand. The skill exists to fight hallucinated SEO ideas: every topic it proposes must point to a URL that proves someone is asking about it.
research <N> topics [for cluster <C>] [--append-to <path>]
N - number of topics to return (default 50; cap 100)cluster - if the blog has cluster taxonomy, restrict to one cluster the user names--append-to <path> - after presenting results, ask the user before appending accepted topics as JSON to the given path (a backlog file, a CSV, whatever the blog uses)The skill is content-only: it does no scraping of its own. It drives the agent's WebFetch and WebSearch tools to fetch sources, and (optionally) shells out to a Python similarity script for the cannibalization step.
For every topic the skill emits, it captures:
| Field | What it is |
|---|---|
topic | Full title shaped like a long-tail query |
cluster | A bucket the user defines for their blog (e.g. n8n, databases, react-hooks) |
format | One of how-to-fix, how-to-connect, how-to-automate, x-vs-y, what-is, use-case, listicle, migration, release-recap |
demand_signals[] | One or more, each with type, url, evidence (verbatim text), strength (1-3) |
signal_score | Sum of strength across all signals; topic accepted only if >=3 |
primary_sources[] | At least 1 vendor doc / GitHub issue / official changelog URL |
keywords[] | Primary keyword + 3-5 LSI variants extracted from source text |
commentary | 1-2 sentences on what makes this topic specific (no fluff) |
problem_summary | 1-2 sentences distilling the symptom + trigger from the highest-engagement signal's body, in factual writer-voice (no marketing). Lets the writer skip re-fetching to figure out what the problem actually is. |
confirmed_fixes[] | Each {kernel, source}: a short fix kernel (one phrase, e.g. "set N8N_PAYLOAD_SIZE_MAX=16000000", "downgrade crewai to 0.113") plus the source URL where that fix is reported. Empty list if no fix is documented yet (still-open issues count). The writer expands kernels into prose and re-verifies. |
version_context | String like "n8n 1.65+", "Cursor 0.42 only", "introduced in CrewAI 0.114", or null if no version qualifier applies. |
question_variants[] | 2-4 paraphrases of the topic that real users actually post (lifted from PAA, autocomplete depth-2, sibling forum titles). Feed the writer's FAQ block + LSI keyword spread directly. Not invented - every variant must trace to a captured signal or autocomplete completion. |
Hard rules:
evidence field is copied byte-for-byte from the source (PAA question text, GitHub issue title, Reddit post title, forum thread title).X to Y), or a named edge case. Reject vague titles like "n8n tutorial" or "what is automation".problem_summary, confirmed_fixes[], version_context, and question_variants[] are derived from fetched bodies - never hallucinated. If a body doesn't mention a fix, confirmed_fixes[] stays empty. If no version is named, version_context is null. The writer treats these as a verified scaffold and still re-fetches at least one primary source for currency.N validated topics, it returns what it has and reports the shortfall.A signal is valid only if its type is one of these, and the linked URL contains the verbatim evidence text:
| Type | What counts |
|---|---|
paa | "People Also Ask" question on a Google SERP. URL = the parent query SERP. Evidence = the PAA question text. |
autocomplete | Google Suggest entry triggered by typing a partial query. Evidence = the suggested completion. |
reddit | A Reddit post asking the question or a close variant on a relevant subreddit. Evidence = the post title. |
stackoverflow | A Stack Overflow question with the same intent. Evidence = the question title. |
github_issue | A GitHub issue on the tool's repo describing the problem (open or closed). Evidence = the issue title. |
forum | A community-forum thread (vendor or third-party). Evidence = the thread title. |
vendor_doc | A vendor docs page that exists because users ask the question. Evidence = the page heading. |
trends | A Google Trends rising query. Evidence = the query string + the rising-percent label from Trends. |
Strength tiers (used to compute signal_score):
| Type | 1 (exists) | 2 (engaged) | 3 (heavy) |
|---|---|---|---|
paa | appears once | appears across >=2 parent SERPs | appears + "More questions" expansion shows >=4 follow-ups on same intent |
autocomplete | direct suggestion | suggestion at >=4-word depth | suggestion at >=6-word depth |
reddit | post exists | >=10 comments OR >=20 upvotes | >=50 comments OR >=100 upvotes |
stackoverflow | question exists | score >=3 OR views >=500 | score >=10 OR views >=2000 |
github_issue | issue exists | >=3 reactions OR >=5 comments | >=10 reactions OR >=20 comments OR linked from changelog |
forum | thread exists | >=5 replies | >=20 replies OR pinned / marked solution |
vendor_doc | page exists | page in main nav | dedicated FAQ entry or "Common errors" section |
trends | rising query | rising >=100% | rising >=500% or labelled "Breakout" |
Engagement counts are read from the source page at fetch time. Record the count in the signal entry so the user can audit (e.g. [strength=3, 14 reactions]).
Does NOT count:
The user supplies the source list for their blog (or asks the skill to suggest one). Generic source templates per cluster type:
For each tool the blog covers, mine:
r/<language> and r/<framework>.ecommerce, customer-support, marketing-ops)r/ecommerce, r/customerservice, etc.).If the user has a specific cluster list, ask them for the source URLs before mining; if they don't, suggest a list and let them edit it.
If the user has an existing backlog file (backlog.json, posts.csv, an RSS feed of published posts, whatever), ask for the path and a way to enumerate titles. The cannibalization step needs an embedding cache built from these existing titles.
If a Python embedding script is available, the user can build a cache (a JSON file mapping each existing title to its OpenAI text-embedding-3-small vector) and point the cannibalization step at it. Typical cost is ~$0.02 per 1M tokens.
If no Python / no cache, fall back to Jaccard-only cannibalization (cheap, less accurate; surfaces near-identical titles but misses synonym dupes).
Also build a token-set per existing title (lowercase, alphanumeric, stopwords stripped) for the Jaccard prefilter used in Step 4.
If the user has cluster weights (e.g. "30% n8n, 20% AI coding, ..."), apply them: subtract current backlog counts per cluster, then distribute N proportionally to the largest deficits.
If cluster <C> was specified, allocate all N to that cluster.
If no cluster taxonomy at all, treat the whole blog as one cluster and aim for format diversity instead (see Step 3 format matrix).
For each cluster, walk its source list. For each source URL:
WebFetch (or WebSearch for SERP-derived signals).Error:, TypeError:, Traceback, exception class names, stack-frame headers).Google-side seeds - use WebSearch with these patterns and parse for autocomplete + PAA. Run the full set per cluster, not just the troubleshooting one - long-tail diversity is what stops the corpus drifting into all-how-to-fix:
| Pattern | Format target |
|---|---|
<tool> <error string>, <tool> not working, <tool> stuck, <tool> fails | how-to-fix |
<tool> how to <verb>, <tool> connect to <other tool> | how-to-connect / how-to-automate |
<tool> vs <competitor>, <tool> or <competitor> | x-vs-y |
what is <feature>, <feature> explained, how does <feature> work | what-is |
build <outcome> with <tool>, <tool> for <vertical> (for ecommerce, for SaaS, for marketing, for customer support), <tool> agent for <task> | use-case |
best <tool category>, top <N> <tool category>, free <tool>, <tool> alternatives, <tool> templates for <vertical> | listicle |
migrate from <tool A> to <tool B>, switch from <tool A> to <tool B>, <tool A> to <tool B> migration | migration |
what's new in <tool>, <tool> changelog, <tool> <recent-version> features, <tool> release notes | release-recap |
Repeat the matrix per tool / framework / vertical in the cluster.
Per-format source pointers (in addition to the per-cluster source list above):
use-case - vendor template galleries are the highest-signal source. Each gallery entry is a verified user-demand signal (people search the named outcome). Reddit threads like "how do I [outcome] with [tool]" with many comments count too. Drop any seed whose only signal is an existing SEO blog's listicle - that's a competitor signal, not a user-demand signal.listicle - Google autocomplete depth-2 on best <category> and top <N> <category>; competitor roundup SERPs (look at what the top 3 results list). Reddit posts asking "what's the best X for Y" with many comments score high.migration - Reddit threads with titles starting moving from / switching from; Stack Overflow questions about exporting + re-importing between two tools; vendor migration docs (the doc exists because users ask).release-recap - the vendor changelogs already in the source list, but mine for specific versions shipped in the last 90 days. A release-recap post for <tool> <version> is only worth writing if (a) that version is current or one minor behind, and (b) the changelog has at least one user-facing entry, not just internal refactors. Older recaps go stale fast.Each extracted query becomes a candidate. The source is its first demand signal.
For each candidate that survived Step 3, append one more qualifier word and re-query Google Suggest. Keep any deeper completion that returns a new variant. This is what gets you actual long-tail vs mid-tail - one autocomplete pass surfaces "how to fix n8n webhook error", a second surfaces "how to fix n8n webhook error 404 after restart".
Common qualifiers to try (try several, keep what returns a real completion):
error, not working, after update, stuck, slow, timeout, free, limit, vs <known competitor>, <current year>, self-hosted, cloud, docker
Cap at 2 recursive passes to bound runtime. Each new completion inherits the parent's first signal and must still pass Step 4 on its own.
For each candidate that survived Step 3b, build the writer-facing scaffold by re-reading the body of the highest-engagement demand signal (the one that scored 2 or 3 in the strength table - usually a high-reaction GitHub issue, pinned forum thread, or popular SO question). If you already fetched the body in Step 3.4, reuse it; otherwise fetch now.
Extract four fields:
problem_summary - 1-2 sentences in writer-voice describing the symptom + trigger. Factual, no marketing copy. Example: "n8n's HTTP Request node returns 401 when an OAuth2 credential's access token has expired and the refresh-token grant is missing the offline_access scope." Pull verbs and nouns from the body; don't paraphrase the title.
confirmed_fixes[] - list of {kernel, source} entries. Each kernel is one short phrase capturing the action (env var to set, version to downgrade to, setting toggle, code edit). The source URL is where the fix is reported (forum reply, vendor doc, GitHub commit, changelog entry). Walk the thread replies, accepted-answer block, vendor "common errors" page. Skip noise: ignore "have you tried restarting" or fixes contradicted by later replies. If the issue is genuinely still open with no working fix, leave confirmed_fixes[] empty - the writer will frame the post around "what's known so far" rather than fabricating a fix.
version_context - extract any version-qualified phrase the body or thread uses to scope the problem ("after upgrade to n8n 1.65", "Cursor 0.42 only", "introduced in CrewAI 0.114", "Power Automate desktop V2"). If multiple versions are named, pick the one most closely tied to the failure. If the body is version-agnostic, set null.
question_variants[] - 2-4 near-paraphrases of the topic that surface in PAA boxes, autocomplete depth-2 completions, sibling forum thread titles, or SO related-questions. Each variant must be a real captured string from a source - do not invent paraphrases. These feed the writer's FAQ block and LSI keyword spread.
If any of these can't be filled honestly from the body, leave the field at its empty default ("", [], null) rather than fabricating. A sparsely-filled scaffold is more useful than a hallucinated one - the writer can re-research, but cannot un-trust a polluted blob.
For each candidate:
Cluster - derived from the tool / framework / vertical named in the topic. If the topic spans two clusters, pick the one with the higher-strength demand signal.
Format - match phrasing (evaluate in order; first match wins):
migrate from X to Y / switch from X to Y / move from X to Y -> migrationwhat's new in <tool> <version> / <tool> changelog / <tool> release notes / <tool> v<N> features -> release-recapbest X for Y / top N X / free X / X alternatives / most popular X -> listicleX vs Y / X or Y -> x-vs-y<error string> / not working / fix / fails / broken -> how-to-fixconnect X to Y / integration / integrate X with Y -> how-to-connectbuild <outcome> with <tool> / <outcome> using <tool> / <tool> for <vertical/use-case> -> use-caseautomate X / <tool> workflow for Y -> how-to-automatewhat is X / X explained / how does X work -> what-isAmbiguity rules: prefer use-case over how-to-automate when the title names a concrete artefact ("daily Slack digest", "invoice extraction agent"); prefer how-to-automate for generic processes ("automate email triage"). Prefer listicle for >=3 options and x-vs-y for exactly 2.
Cannibalization - three-layer check against the existing-titles cache built in Step 1:
cannibalization-semantic.[40-second, make] flags it.)cannibalization: jaccard-only note to the summary footer so the user knows the check was reduced.Score signals - assign each signal 1-3 per the strength tier table. If signal_score < 3, fetch one more SERP and look for additional signals (PAA, second forum thread, SO question, vendor doc). If still < 3, drop under low-signal-score.
Specificity floor - title must be >=7 words OR contain a concrete qualifier (version number, error code/string, named integration pair, named edge case). If neither holds, try to rewrite the title using a qualifier from the source text; if that's not possible, drop under low-specificity.
Primary source - find at least one vendor doc / GitHub issue / official changelog URL the writer can cite. If none, drop.
Keywords - extract the primary keyword (the topic's core noun phrase) plus 3-5 LSI variants from the source text. Do not invent variants.
Print one block per accepted topic in this exact shape so the user can scan or pipe it:
[01/50] cluster=<cluster> format=how-to-fix priority=2
TOPIC: How to fix the n8n "Cannot read properties of undefined" error in Code node
slug: n8n-cannot-read-properties-undefined-code-node
keywords: n8n Code node error, undefined property, JavaScript error, item access, $json
commentary: Specific to mistakes when accessing $json on the wrong item; vendor docs do not show the failure mode.
problem: The n8n Code node throws "Cannot read properties of undefined" when the script accesses $json on an item index that doesn't exist (typically the second iteration of a for-loop reading $items[1].json with only one input item).
fixes:
- guard with $items[i]?.json before accessing (https://github.com/n8n-io/n8n/issues/<id>#issuecomment-<n>)
- use $input.item.json instead of $items[i] when iterating (https://docs.n8n.io/code/builtin/data/)
version_context: n/a
question_variants:
- "n8n Code node Cannot read properties of undefined reading 'json'"
- "n8n JavaScript error TypeError undefined json"
- "Code node loop fails when item missing"
demand: (signal_score=4)
- github_issue: https://github.com/n8n-io/n8n/issues/<id> ("Code node throws Cannot read properties of undefined") [strength=3, 14 reactions]
- reddit: https://www.reddit.com/r/n8n/comments/<id> ("Help: Code node error when looping over items") [strength=1]
sources:
- https://docs.n8n.io/code/builtin/data/ (n8n docs - Built-in data variables)
If confirmed_fixes is empty (open issue with no working fix), print fixes: (none documented yet - frame as 'what's known so far') instead of an empty list. If version_context is null, print version_context: n/a. If question_variants is empty, omit the line entirely (rare but valid).
If the topic landed in the REVIEW band of the cannibalization check, append a dedup_review line so the user can decide knowingly:
dedup_review: cosine=0.81 vs "n8n Code node returns undefined when reading $json" (backlog-queued)
Then a summary footer:
Requested: 50
Validated: <X>
Dropped: <Y> (cannibalization-jaccard: <a1>, cannibalization-semantic: <a2>, cannibalization-token-dupe: <a3>, low-signal-score: <b>, no primary source: <c>, off-cluster: <d>, low-specificity: <e>)
Cluster mix: <cluster1>=__ <cluster2>=__ ...
Format mix: how-to-fix=__ how-to-connect=__ how-to-automate=__ x-vs-y=__ what-is=__ use-case=__ listicle=__ migration=__ release-recap=__
If --append-to <path> was passed: print the diff (count + cluster mix changes), ask Append <X> topics to <path>? [y/N]. On y, write each topic as JSON to the target path with:
{
"id": "<slug>",
"topic": "<title>",
"cluster": "<cluster>",
"format": "<format>",
"priority": 100,
"status": "queued",
"tags": ["<format-tag>", "<tool-tag>"],
"notes": "<commentary>",
"added_at": "<YYYY-MM-DD>",
"research_proof": {
"demand_signals": [...],
"primary_sources": [...],
"keywords": [...],
"problem_summary": "<1-2 factual sentences from the highest-engagement signal's body>",
"confirmed_fixes": [
{"kernel": "<short fix phrase>", "source": "<url>"},
{"kernel": "<short fix phrase>", "source": "<url>"}
],
"version_context": "<version qualifier or null>",
"question_variants": ["<variant 1>", "<variant 2>", "<variant 3>"]
},
"published_slug": null,
"published_at": null
}
The research_proof blob is preserved on the backlog entry while status = "queued" so a downstream writer skill can read it later (saves re-research). When the topic ships, strip the proof blob to keep the backlog file from ballooning at scale; the published post carries the citations.
evidence string is copied byte-for-byte. No paraphrase. Length cap 120 chars; truncate with ... if longer.low-signal-score. The strength tier table is the only authority - do not invent your own scoring.low-specificity. Long-tail is proven by what's in the title, not by hand-waving "this is specific".5400 / mo figure.--append-to <path> it adds to a backlog as queued; the user picks what gets written next.research <N> topics [for cluster <C>] [--append-to <path>]
problem_summary, confirmed_fixes[], version_context, question_variants[]. Empty defaults are valid; never fabricate.signal_score >= 3, specificity floor, >=1 primary source, real keywords.--append-to <path>, confirm with user, then write to the backlog file with the proof blob preserved.