Ielts Extractor

v1.1.0

自动从剑桥雅思PDF中提取多页连续阅读文章和题目内容,支持双栏排版并保存为结构化JSON格式。

0· 289·1 current·1 all-time
byXuanyu Chen@lava-chen

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for lava-chen/ielts-extractor.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "Ielts Extractor" (lava-chen/ielts-extractor) from ClawHub.
Skill page: https://clawhub.ai/lava-chen/ielts-extractor
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Canonical install target

openclaw skills install lava-chen/ielts-extractor

ClawHub CLI

Package manager switcher

npx clawhub@latest install ielts-extractor
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Suspicious
medium confidence
!
Purpose & Capability
The name/description promise: extract multi-page Cambridge IELTS passages and questions, handle two-column layout, and save structured JSON. The included Python (extract_article.py) only locates a passage and concatenates page text (no question parsing, no JSON output, and the two-column helper is defined but not used). References describe image extraction with PyMuPDF and JSON schema, but those behaviors are not implemented in the code.
Instruction Scope
SKILL.md describes locating passages, extracting multi-page text, handling two-column layout, extracting question groups and options, and saving JSON to specific ielts-tracker paths. The runtime instructions reference writing files into project directories (ielts-tracker/.../public/images/) and expect question extraction, but the actual code does not perform those writes or parse questions. Instructions don't request unrelated credentials or environment variables.
Install Mechanism
No install spec (instruction-only) — lower risk — but the code imports pdfplumber and references fitz/PyMuPDF in docs. Dependencies are not declared; runtime may fail or require installing third-party packages. No external downloads or suspicious URLs are present.
Credentials
The skill requests no environment variables, no credentials, and no config paths. However, SKILL.md and references expect writing output files into project paths which could overwrite local files if run with file-system access; this is expected for a data-extraction tool but worth noting.
Persistence & Privilege
always is false and the skill is user-invocable. It does not request permanent agent inclusion or modify other skills. There is no autonomously elevated privilege requested.
What to consider before installing
This skill is internally inconsistent rather than obviously malicious: its description and docs promise full passage+question extraction and JSON output, but the shipped code only extracts concatenated article text and prints it. Before using, ask the author to (1) implement or remove question-parsing and JSON write logic, (2) declare/install dependencies (pdfplumber, and optionally PyMuPDF/fitz), and (3) confirm where files will be written to avoid overwriting local project files. Test the script on non-sensitive sample PDFs in a sandbox or disposable directory to verify behavior. If you don't trust the author, do not run the code on sensitive systems or give it write access to important directories.

Like a lobster shell, security has layers — review code before you run it.

latestvk975qyb5gf47bb334easv11f1182fxrr
289downloads
0stars
7versions
Updated 1mo ago
v1.1.0
MIT-0

IELTS 试题数据提取 Skill

概述

从剑桥雅思 PDF 提取阅读文章和题目。

触发条件

用户要求"提取雅思试题"时使用。

流程(4步)

1. 定位 PDF 和页码

  • 查找 "Test X" + "READING PASSAGE Y"
  • 记录起始页码

2. 提取文章

  • 必须连续提取多页
  • 处理两栏布局
  • 检查字数: 1500-2500词/篇

详见: references/pdf-extraction.md

3. 提取题目

  • 按大题分组
  • 选择正确题型
  • 完整选项(单选每题A-D)

详见: references/question-types.md

4. 保存 JSON

  • 使用 content 字段
  • 选项格式正确

详见: references/json-format.md

数据文件

ielts-tracker/data/tests/cambridge-{4,5,6}/test-{1-4}/test.json
ielts-tracker/ielts-app/public/images/

题型速查

题型type
标题配对matching-headings
判断题yes-no-not-given
单选multiple-choice-single
多选multiple-answer
表格table-completion
填空fill-blank-summary

Comments

Loading comments...