Zeelin Patent Retriever

v0.1.2

Team ZeeLin’s production-grade patent evidence retrieval skill for Google Patents BigQuery. Converts natural-language research intent into auditable multi-ro...

0· 347·1 current·1 all-time
byYuwen Yang@yangyuwen-bri
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description claim BigQuery patent retrieval and the bundle requires python3 and Google Cloud credentials (GOOGLE_APPLICATION_CREDENTIALS, GOOGLE_CLOUD_PROJECT) and includes BigQuery client code—these are expected and proportionate.
Instruction Scope
SKILL.md instructs installing listed Python deps and running the included scripts (seed -> build_plan -> execute_plan). The scripts only read/write local files, build SQL queries for BigQuery, and echo planned/effective filters. They do not reference unrelated system paths or remote endpoints beyond Google Cloud.
Install Mechanism
There is no remote download/install hook in the skill bundle. Dependencies are declared in requirements.txt (python-dotenv, google-cloud-bigquery, jsonschema) which is appropriate for a Python BigQuery client tool.
Credentials
Only GOOGLE_APPLICATION_CREDENTIALS and GOOGLE_CLOUD_PROJECT are required, which are the normal credentials for BigQuery access. The only additional behavior is optional loading of a local .env (scripts/config.py), which is a convenience and not a hidden exfiltration channel.
Persistence & Privilege
The skill is not always-enabled, does not modify other skills, and does not request persistent elevated platform privileges. It writes outputs to local result files as expected for a retrieval tool.
Assessment
This bundle appears coherent for querying Google Patents via BigQuery. Before running: (1) ensure the GOOGLE_APPLICATION_CREDENTIALS you provide follow least-privilege (only BigQuery query + storage if needed) to limit blast radius; (2) be aware BigQuery queries can incur cost—inspect and test queries with low limits first; (3) avoid committing credentials into the repository (.env) — scripts will load a local .env if present; (4) review the generated SQL in a safe environment if you have sensitive billing or data constraints; (5) validate outputs and JSON schemas before using results for downstream decisions.

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

Binspython3
EnvGOOGLE_APPLICATION_CREDENTIALS, GOOGLE_CLOUD_PROJECT
latestvk9709hms76essb4qxfhxsvbda582djzs
347downloads
0stars
3versions
Updated 1mo ago
v0.1.2
MIT-0

ZeeLin Patent Retriever

Team ZeeLin skill for Google Patents retrieval via BigQuery. This skill performs patent retrieval and structured output generation only. It does not provide legal conclusions.

30-Second Quickstart Card

Purpose:

  • Fetch, deduplicate, and structure patent evidence from Google Patents BigQuery for downstream analysis.

Required env:

  • GOOGLE_APPLICATION_CREDENTIALS
  • GOOGLE_CLOUD_PROJECT

Run this:

python3 -m pip install -r requirements.txt
RUN_ID="quick_$(date +%Y%m%d_%H%M%S)"; RUN_DIR="results/${RUN_ID}"; mkdir -p "$RUN_DIR"
python3 scripts/patent_search.py --keywords "ai sentiment analysis" --limit 80 --output "$RUN_DIR/seed_raw.json"
python3 scripts/build_query_plan.py --topic "Public Opinion + AI" --keywords "public opinion ai sentiment" --task-id "$RUN_ID" --seed-raw "$RUN_DIR/seed_raw.json" --concept-output "$RUN_DIR/concept_scan.json" --plan-output "$RUN_DIR/query_plan.json"
python3 scripts/patent_search_plan.py --plan "$RUN_DIR/query_plan.json" --output-raw "$RUN_DIR/retriever_raw.json" --output-retriever "$RUN_DIR/retriever_result.json" --min-results 20

Expected outputs:

  • $RUN_DIR/concept_scan.json
  • $RUN_DIR/query_plan.json
  • $RUN_DIR/retriever_raw.json
  • $RUN_DIR/retriever_result.json

If it fails:

  • Missing env vars: configure Google credentials first.
  • Too few results: keep filters and increase limits/expansion rounds before relaxing constraints.

1. Execution Rules

  1. Use the three-stage flow by default: seed -> build_plan -> execute_plan.
  2. Default minimum result count is 20 unless the user explicitly requests another value.
  3. If the user specifies hard constraints (year, country, assignee, inventor, IPC/CPC), they must be applied in query_plan.json (filters) before execution.
  4. Before execution, echo planned filters. After execution, echo effective filters, result size, and output file paths.

2. Pre-Run Checks

Required environment variables:

  • GOOGLE_APPLICATION_CREDENTIALS
  • GOOGLE_CLOUD_PROJECT

Install dependencies:

python3 -m pip install -r requirements.txt

Optional environment check:

python3 - <<'PY'
import os
required = ["GOOGLE_APPLICATION_CREDENTIALS", "GOOGLE_CLOUD_PROJECT"]
missing = [k for k in required if not os.getenv(k)]
print({"ok": not missing, "missing": missing})
PY

3. Capability Boundary and Parameter Sources

3.1 Supported filter dimensions

  • Text: keywords_all / keywords_any / keywords_anchor_any / keywords_not
  • Taxonomy: ipc_prefix_any / cpc_prefix_any
  • Entities: assignee_any / inventor_any
  • Geography: country_in
  • Date ranges: pub_date_from / pub_date_to / filing_date_from / filing_date_to

Field source: query_plan.json (schema: schemas/query_plan.schema.json).

3.2 Default behavior for missing inputs

  • min_results: default 20
  • Country unspecified: default US,CN,WO,EP,JP,KR
  • Date range unspecified: default years_back=8
  • Keywords missing: ask for clarification and do not run

3.3 Year-to-date mapping rules

  • Single year (e.g. 2021) => from=20210101, to=20211231
  • Year range (e.g. 2021-2023) => from=20210101, to=20231231
  • Relative window (e.g. “last N years”) => use --years-back N

4. Standard Flow (Command Templates)

Create a run directory first:

RUN_ID="run_$(date +%Y%m%d_%H%M%S)"
RUN_DIR="results/${RUN_ID}"
mkdir -p "$RUN_DIR"

Step 1: Seed retrieval

python3 scripts/patent_search.py \
  --keywords "<keywords>" \
  --limit 80 \
  --output "$RUN_DIR/seed_raw.json"

Step 2: Build query plan

python3 scripts/build_query_plan.py \
  --topic "<topic>" \
  --keywords "<keywords>" \
  --task-id "$RUN_ID" \
  --years-back 8 \
  --country-in "US,CN,WO,EP,JP,KR" \
  --seed-raw "$RUN_DIR/seed_raw.json" \
  --concept-output "$RUN_DIR/concept_scan.json" \
  --plan-output "$RUN_DIR/query_plan.json"

Step 3: Apply explicit user constraints (critical)

When the user explicitly requests country/year/assignee filters, patch query_plan.json before execution.

python3 - <<'PY'
import json
import os
from pathlib import Path

plan_path = Path(os.environ["RUN_DIR"]) / "query_plan.json"
plan = json.loads(plan_path.read_text(encoding="utf-8"))

# Example override: 2021-2023 + US + keyword constraints
for r in plan.get("query_rounds", []):
    f = r.setdefault("filters", {})
    f["country_in"] = ["US"]
    f["pub_date_from"] = 20210101
    f["pub_date_to"] = 20231231
    f.setdefault("keywords_any", [])
    f["keywords_any"] = list(dict.fromkeys(f["keywords_any"] + ["sentiment", "public opinion", "risk"]))

plan_path.write_text(json.dumps(plan, ensure_ascii=False, indent=2), encoding="utf-8")
print({"updated": str(plan_path)})
PY

Step 4: Execute planned retrieval

python3 scripts/patent_search_plan.py \
  --plan "$RUN_DIR/query_plan.json" \
  --output-raw "$RUN_DIR/retriever_raw.json" \
  --output-retriever "$RUN_DIR/retriever_result.json" \
  --min-results 20

Step 5: Validate outputs

python3 scripts/schema_check.py --input "$RUN_DIR/concept_scan.json" --schema schemas/concept_scan.schema.json
python3 scripts/schema_check.py --input "$RUN_DIR/query_plan.json" --schema schemas/query_plan.schema.json
python3 scripts/schema_check.py --input "$RUN_DIR/retriever_result.json" --schema schemas/retriever_result.schema.json

5. Natural Language to Parameter Mapping Examples

Example A:

  • User input: Find US patents on AI public-opinion early warning from 2021 to 2023, at least 30 results
  • Mapping:
    • topic="AI public opinion early warning"
    • keywords="ai public opinion early warning sentiment"
    • Plan override: country_in=["US"], pub_date_from=20210101, pub_date_to=20231231
    • Execution arg: --min-results 30

Example B:

  • User input: Search multimodal emotion recognition patents in CN/JP/KR over the last 5 years, focus on Tencent and ByteDance
  • Mapping:
    • --years-back 5
    • country_in=["CN","JP","KR"]
    • assignee_any=["Tencent","ByteDance"]

6. Post-Execution Response Template (required)

Retrieval completed.
Effective filters:
- Countries: ...
- Publication date range: ...
- Filing date range: ...
- Keywords (any/all/not): ...
- Assignee/Inventor filters: ...

Results:
- Patent count: ...
- Country distribution: ...
- Latest publication date: ...

Files:
- concept_scan: ...
- query_plan: ...
- retriever_raw: ...
- retriever_result: ...

7. Common Failures and Recovery

  • Missing environment variables: instruct user to configure Google credentials first.
  • Insufficient retrieval volume:
    1. Keep constraints, increase per-round limits.
    2. Increase expansion rounds.
    3. If still insufficient, ask whether to relax country/date constraints.
  • Cost risk: prioritize narrower date windows and country scopes before broad scans.

8. Output Contract

Required output files:

  • concept_scan.json
  • query_plan.json
  • retriever_raw.json
  • retriever_result.json

retriever_result.json minimum requirements:

  • patents count >= min_results (default 20)
  • each item includes publication_number and title

9. References

  • Methodology: references/methodology.md
  • Quick examples: examples/quickstart.md

Comments

Loading comments...