Byted Tos Doc Process

v1.0.0

Generates pre-signed URLs for Bytedance TOS `doc-preview` processing to preview and convert documents to PDF, images (PNG/JPG), or HTML, and to export page r...

⭐ 0· 83·0 current·0 all-time

by@volcengine-skills

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for volcengine-skills/byted-tos-doc-process.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "Byted Tos Doc Process" (volcengine-skills/byted-tos-doc-process) from ClawHub.
Skill page: https://clawhub.ai/volcengine-skills/byted-tos-doc-process
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install byted-tos-doc-process

ClawHub CLI

Package manager switcher

npx clawhub@latest install byted-tos-doc-process

Security Scan

VirusTotal

Pending

View report →

OpenClaw

Suspicious

high confidence

ℹ

Purpose & Capability

The skill's name/description match the provided scripts and SKILL.md: all files implement Bytedance/Volcengine TOS 'doc-preview' workflows (PDF/image/HTML/batch). However the registry metadata lists no required environment variables/primary credential while the SKILL.md and all scripts clearly require TOS credentials (TOS_ACCESS_KEY, TOS_SECRET_KEY, TOS_ENDPOINT, TOS_REGION) plus typical config (TOS_BUCKET, TOS_OBJECT_KEY). This metadata omission is inconsistent and could mislead users about required secrets.

✓

Instruction Scope

The runtime instructions and scripts consistently describe generating pre-signed URLs via the tos SDK and performing HTTP requests to fetch previews or trigger server-side saves. They operate on the stated service (TOS) and only reference expected files/headers (e.g., x-tos-total-page) and decode an HTML token via URL-safe base64 for HTML previews. There are no instructions to read unrelated system files or exfiltrate data outside TOS endpoints.

✓

Install Mechanism

There is no install spec — risk is low. The repository includes a minimal requirements.txt (only 'tos') and example scripts. Nothing in the manifest attempts to download or execute arbitrary remote archives. The only external dependency is the public 'tos' Python package.

Credentials

The skill needs sensitive credentials and config (TOS_ACCESS_KEY, TOS_SECRET_KEY, TOS_ENDPOINT, TOS_REGION, and commonly TOS_BUCKET/TOS_OBJECT_KEY). That is proportionate to the task. The concern is that the registry metadata and declared 'primary credential' fields do not reflect these requirements (they're listed as none), which is a mismatch that could hide the need to supply secrets. Also the skill will accept full AK/SK credentials — users should prefer short-lived STS tokens and least-privilege keys.

✓

Persistence & Privilege

The skill does not request 'always: true' and does not attempt to modify other skills or system-wide agent settings. It performs normal local file writes for downloaded previews and may ask TOS to save processed results back to a bucket (via x-tos-save-bucket/object) — both are expected for the stated functionality.

Scan Findings in Context

[base64-block] expected: The repository and SKILL.md/README include a long URL-safe base64 token example used to illustrate HTML-preview parsing and decoding. This is expected for the feature (scripts parse and urlsafe-base64-decode tokens) and is not, by itself, an injection attempt — but long embedded tokens can trigger heuristics.

What to consider before installing

What to consider before installing: - The scripts require TOS credentials and configuration (TOS_ACCESS_KEY, TOS_SECRET_KEY, TOS_ENDPOINT, TOS_REGION, and typically TOS_BUCKET and TOS_OBJECT_KEY). The registry metadata incorrectly omitted these requirements — assume you must supply them. - These are sensitive credentials. Prefer using short-lived STS credentials (TOS_SECURITY_TOKEN) and least-privilege keys (read-only for preview, or a narrowly-scoped write permission if using save-to-bucket). Do not provide full account keys unless necessary. - Review the included scripts yourself (they are shipped with the skill). They generate pre-signed URLs and make HTTP requests to TOS, save files locally, and optionally request TOS to write converted outputs back to a bucket. Ensure output paths and save buckets are what you expect. - The README/SKILL.md include large base64 tokens as examples for HTML-preview decoding; these are sample data used by the parsing logic and not necessarily malicious, but confirm any real tokens/URLs you use are legitimate. - Verify the 'tos' Python package is the official SDK from a trusted source (PyPI) before pip-installing it in your environment. - Run the skill in a controlled environment the first time (isolated VM/container), and avoid exposing high-privilege keys. If you test with production credentials, consider rotating them afterwards. If you want to proceed: provide minimally-privileged credentials, or use an STS token, and double-check TOS_BUCKET/TOS_OBJECT_KEY values. If you do not control the skill's origin/trust, treat it as untrusted code and review thoroughly before supplying secrets.

Like a lobster shell, security has layers — review code before you run it.

latestvk978pr4bzhtrnatr5ancbkadtd83ww08

83downloads

0stars

1versions

Updated 4w ago

v1.0.0

MIT-0

Bytedance TOS Document Process Skill

This skill provides document processing functions for files in Bytedance's TOS via the doc-preview feature, implemented by generating pre-signed URLs with the Volcengine TOS SDK.

Note: This approach is necessary because the SDK's get_object method does not directly support doc_* keyword arguments. All document processing parameters must be passed as query parameters in a pre-signed URL.

Quick Start

1. Client Initialization

import os
import tos
from tos.enum import HttpMethodType
from urllib.request import urlopen

def create_client() -> tos.TosClientV2:
    """Initializes a TosClientV2 from environment variables."""
    try:
        # ... (full implementation in scripts)
        return tos.TosClientV2(
            ak=os.getenv('TOS_ACCESS_KEY'),
            sk=os.getenv('TOS_SECRET_KEY'),
            endpoint=os.getenv('TOS_ENDPOINT'),
            region=os.getenv('TOS_REGION'),
            security_token=os.getenv('TOS_SECURITY_TOKEN'),
        )
    except Exception as e:
        print(f"Error initializing client: {e}")
        return None

client = create_client()

2. Basic Workflow (Pre-signed URL)

# (Assumes 'client' is initialized and 'bucket_name', 'object_key' are set)

# 1. Preview document as a PDF and save locally
try:
    # Build query params for doc-preview
    pdf_params = {
        "x-tos-process": "doc-preview",
        "x-tos-doc-dst-type": "pdf"
    }
    presigned_pdf = client.pre_signed_url(
        HttpMethodType.Http_Method_Get,
        bucket_name,
        object_key,
        query=pdf_params
    )
    
    # Download the content from the pre-signed URL
    with urlopen(presigned_pdf.signed_url) as response, open("local_preview.pdf", "wb") as f_out:
        f_out.write(response.read())
    print("PDF preview saved to local_preview.pdf")

except Exception as e:
    print(f"Error converting to PDF: {e}")

# 2. Preview page 3 as a PNG image
try:
    png_params = {
        "x-tos-process": "doc-preview",
        "x-tos-doc-dst-type": "png",
        "x-tos-doc-page": "3",
        "x-tos-doc-image-dpi": "150"
    }
    presigned_png = client.pre_signed_url(
        HttpMethodType.Http_Method_Get,
        bucket_name,
        object_key,
        query=png_params
    )
    with urlopen(presigned_png.signed_url) as response, open("page_3.png", "wb") as f_out:
        f_out.write(response.read())
    print("Page 3 saved as page_3.png")

except Exception as e:
    print(f"Error converting to PNG: {e}")

# 3. Get total page count from response headers
try:
    presigned_head = client.pre_signed_url(
        HttpMethodType.Http_Method_Get,
        bucket_name,
        object_key,
        query={"x-tos-process": "doc-preview", "x-tos-doc-dst-type": "pdf"}
    )
    with urlopen(presigned_head.signed_url) as response:
        total_pages = response.headers.get("x-tos-total-page")
        print(f"Document has {total_pages} pages.")
except Exception as e:
    print(f"Error getting page count: {e}")

Core Operations

All document processing is achieved by generating a pre-signed URL with process=\"doc-preview\" and other x-tos-doc-* parameters in the query string.

1. Convert to PDF (`x-tos-doc-dst-type='pdf'`)

Converts an entire document into a single PDF file.

# See Quick Start example

2. Convert to Image (`x-tos-doc-dst-type='png' or 'jpg'`)

Converts a specific page of a document into an image.

# See Quick Start example
# Use query params like "x-tos-doc-page", "x-tos-doc-image-dpi", etc.

3. Convert to HTML (`x-tos-doc-dst-type='html'`)

Fetches a temporary HTML page containing a token for the final preview URL. This requires a second step to parse the HTML and decode the token.

# Step 1: Get the HTML content via a pre-signed URL
html_params = {"x-tos-process": "doc-preview", "x-tos-doc-dst-type": "html"}
presigned_html = client.pre_signed_url(HttpMethodType.Http_Method_Get, bucket_name, object_key, query=html_params)

with urlopen(presigned_html.signed_url) as response:
    html_content = response.read().decode('utf-8')

# Step 2: Parse and decode (see scripts/doc_preview_html_url.py for full logic)
# ... logic to extract and base64-decode the token ...
# final_url = decode_preview_url(token)

4. Batch Export Pages (`image-mode=1`)

Exports a range of pages as images directly to a TOS bucket.

# Use query params: "image-mode", "start-page", "end-page", "x-tos-save-bucket", "x-tos-save-object"
batch_params = {
    "x-tos-process": "doc-preview",
    "x-tos-doc-dst-type": "jpg",
    "image-mode": "1",
    "start-page": "2",
    "end-page": "5",
    "x-tos-save-bucket": "output-bucket",
    "x-tos-save-object": "exported/page_{Page}.jpg" # {Page} is a placeholder
}
presigned_batch = client.pre_signed_url(HttpMethodType.Http_Method_Get, bucket_name, object_key, query=batch_params)
# The response body (from urlopen) contains JSON metadata about the batch job

Authorization

Authentication is handled by tos.TosClientV2. Provide credentials via environment variables.

Required Environment Variables

TOS_ACCESS_KEY
TOS_SECRET_KEY
TOS_ENDPOINT
TOS_REGION

Optional for STS

TOS_SECURITY_TOKEN

Best Practices

Error Handling: Always wrap HTTP requests in try...except blocks for HTTPError and URLError.
Parameter Reference: Refer to REFERENCE.md for a mapping of doc_preview_params.py arguments to x-tos-* query keys and to the official TOS documentation for authoritative details.
HTML Preview: Be aware of the two-step process and the custom domain requirement for recent buckets.
Total Pages Header: The x-tos-total-page header is a convenient way to get the page count.

Additional Resources

For detailed parameters, see REFERENCE.md.
For end-to-end examples, see WORKFLOWS.md.
For executable Python examples, see the scripts/ directory.
For the definitive list of all processing parameters, always consult the official Volcengine TOS Document Preview documentation.

Comments

Loading comments...