tra-extract-text

v1.0.0

Extract readable text, markdown, HTML, JSON, or XML content from web pages using the trafilatura CLI tool with optional metadata and output formatting.

0· 217· 1 versions· 0 current· 0 all-time· Updated 18h ago· MIT-0
byJay@goog

Install

openclaw skills install tra-extract-text

tra-extract-text

Extract text from web pages using the trafilatura command-line tool.

Installation

pip install trafilatura

Usage

Basic text extraction (Markdown)

trafilatura -u URL --markdown

Extract raw text (no formatting)

trafilatura -u URL --text

Output to file

trafilatura -u URL --markdown > output.md
trafilatura -u URL --text > output.txt

CLI Options

OptionDescription
-u, --urlTarget URL (required)
--markdownOutput as Markdown (default)
--textOutput as plain text
--htmlOutput as HTML
--jsonOutput as JSON
--xmlOutput as XML
-o, --outputWrite to file instead of stdout
--with-metadataInclude metadata (title, author, date)
--licenseShow license info

Examples

Extract a Medium article to markdown:

trafilatura -u "https://medium.com/example/article" --markdown

Extract and save:

trafilatura -u "https://news.example.com/article" --markdown -o article.md

Extract with metadata:

trafilatura -u "https://example.com/post" --markdown --with-metadata

Version tags

latestvk97dnffk9f099gq8b27bgn6jq582zdxy