Install
openclaw skills install tra-extract-textExtract readable text, markdown, HTML, JSON, or XML content from web pages using the trafilatura CLI tool with optional metadata and output formatting.
openclaw skills install tra-extract-textExtract text from web pages using the trafilatura command-line tool.
pip install trafilatura
trafilatura -u URL --markdown
trafilatura -u URL --text
trafilatura -u URL --markdown > output.md
trafilatura -u URL --text > output.txt
| Option | Description |
|---|---|
-u, --url | Target URL (required) |
--markdown | Output as Markdown (default) |
--text | Output as plain text |
--html | Output as HTML |
--json | Output as JSON |
--xml | Output as XML |
-o, --output | Write to file instead of stdout |
--with-metadata | Include metadata (title, author, date) |
--license | Show license info |
Extract a Medium article to markdown:
trafilatura -u "https://medium.com/example/article" --markdown
Extract and save:
trafilatura -u "https://news.example.com/article" --markdown -o article.md
Extract with metadata:
trafilatura -u "https://example.com/post" --markdown --with-metadata