Extracts and returns plain text from PDF, Word (.docx), and Excel (.xlsx/.xls) bid documents for analysis, search, or summarisation.

Install

openclaw skills install @ezhencacao-dotcom/bid-reader

bid-reader Skill

Overview

A lightweight skill to extract readable text from bid and tender documents in PDF, Word (.docx), and Excel (.xlsx/.xls) formats. It can be invoked from the OpenClaw UI or other agents to quickly pull the full textual content of a file for analysis, search, or summarisation.

Usage

text
bid-read <file-path>
  • <file-path> should be an absolute or workspace‑relative path to a document.
  • The skill prints the extracted plain‑text to stdout, which OpenClaw captures and returns to the caller.

Example

bash
bid-read /home/zhenxing/投标文件/招投标项目1/13.上海联通/投标文件.pdf

The command returns the full text of the PDF, ready for further processing (e.g., keyword search, summarisation).

Installation

Copy the skill folder into your workspace under skills/bid-reader. Install required Python packages:

bash
pip install -r $(pwd)/skills/bid-reader/requirements.txt

The skill is then available as an agent command.

Implementation Details

  • PDF: Uses pdfplumber to extract text page‑by‑page.
  • Word: Uses python-docx to read paragraphs.
  • Excel: Uses pandas (with openpyxl/xlrd) to read all sheets and concatenate cell values.

Limitations

  • Only .pdf, .docx, .xlsx, and .xls are supported. Other formats will be ignored.
  • Large files may take a few seconds to process.
  • Tables are flattened into whitespace‑separated rows; complex formatting is not preserved.

Future Enhancements

  • Add OCR fallback for scanned PDFs (e.g., via pytesseract).
  • Support selective page or sheet extraction.
  • Provide a JSON output mode with structural metadata.