Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

Office To Md V2

v0.1.0

Convert PDF, DOC, DOCX, and PPTX office documents to Markdown, supporting legacy .doc files with text extraction and basic formatting preservation.

2· 1.2k·4 current·4 all-time
Security Scan
VirusTotalVirusTotal
Suspicious
View report →
OpenClawOpenClaw
Benign
medium confidence
Purpose & Capability
The name/description (convert office docs to Markdown) matches the included code and docs. Converters for PDF, DOCX, legacy DOC, and PPTX are present and the declared npm dependencies (mammoth, pdf-parse, turndown, word-extractor) align with the functionality.
Instruction Scope
SKILL.md and the wrapper instruct the agent to run the included Node scripts and read/write files (expected). However the PPTX converter and examples call out shell operations: creating a Python script in /tmp, executing python3, trying to pip3 install python-pptx at runtime, running unzip and rm -rf on temporary directories, and using child_process.execSync. Those steps expand the runtime scope beyond pure Node module behavior and should be reviewed before running on sensitive hosts.
Install Mechanism
There is no formal install spec in the registry (instruction-only), but package.json and office-to-md/package.json rely on standard npm packages from npmjs.org (expected). The PPTX path includes a runtime pip3 install call (pip3 install python-pptx) invoked via execSync; that performs an external download from PyPI at runtime, which is functional but increases risk compared to declarative dependency installation.
Credentials
The skill does not request environment variables, credentials, or config paths. Example/test code references absolute paths under /root/.openclaw/workspace and a specific local .doc path (likely sample/test artifacts) but no secrets are required.
Persistence & Privilege
The skill does not request always:true, does not modify other skills, and exports a callable function. It writes temporary files (/tmp/extract_pptx.py) and output .md files next to inputs (expected behavior for a converter), but it does not assert permanent elevated privileges or persistent system-wide changes.
Assessment
This skill appears to do what it says (converting office files to Markdown). Before installing or running it: 1) Review and be comfortable with the PPTX converter's behavior — it writes a Python script to /tmp and may attempt to run `pip3 install python-pptx` automatically. If you prefer control, install python-pptx yourself in a controlled environment instead of letting the skill run pip. 2) Run the skill in an isolated environment (sandbox / container) until you trust it, because it executes shell commands (python3, unzip, pip3, rm -rf) and spawns child processes. 3) Update any hard-coded absolute paths in the examples/tests to avoid accidental access to host-specific paths. 4) Inspect or run `npm install` in a safe workspace to fetch Node dependencies from npmjs.org. If you need higher assurance, request the author/source or run the code review and dependency scans before using it with sensitive files.

Like a lobster shell, security has layers — review code before you run it.

latestvk9796hbz142wfextqs0j9evs6d816xnb
1.2kdownloads
2stars
1versions
Updated 7h ago
v0.1.0
MIT-0

Office to Markdown Converter Skill (v2)

Description

Convert office documents (PDF, DOC, DOCX, PPTX) to Markdown format. This skill uses the word-extractor library for .doc support and provides full OpenClaw integration.

When to Use

  • When you need to extract text from office documents
  • When you want to convert documents to readable Markdown format
  • When analyzing document content in OpenClaw
  • Specifically when dealing with legacy .doc format files

Supported Formats

  • PDF (.pdf): Text extraction using pdf-parse
  • Word (.docx): Formatting preservation using mammoth + turndown
  • Legacy Word (.doc): Text extraction using word-extractor (supports Chinese encoding)
  • PowerPoint (.pptx): Basic text extraction using python-pptx

Dependencies

  • Node.js with npm packages: pdf-parse, mammoth, turndown, word-extractor
  • Python3 with python-pptx (for PPTX conversion, optional)
  • OpenClaw exec tool permission

Installation

1. Copy the skill to your workspace:

cp -r /root/.openclaw/workspace/office-to-md-v2/office-to-md /path/to/your/workspace/

2. Install dependencies:

cd /path/to/your/workspace/office-to-md
npm install

3. For PPTX support (optional):

pip3 install python-pptx

Usage in OpenClaw

Method 1: Direct exec call

// Convert any supported document
const result = await exec(
  'node /path/to/office-to-md/openclaw-skill.js /path/to/document.doc',
  { workdir: '/path/to/workspace', timeout: 60000 }
);

if (result.exitCode === 0) {
  console.log('✅ Document converted successfully');
  // Output file: /path/to/document.md
} else {
  console.error('❌ Conversion failed:', result.stderr);
}

Method 2: Using the wrapper function

// Import the converter
const { convertOfficeToMarkdown } = require('/path/to/office-to-md/openclaw-skill.js');

// Convert document
const conversionResult = await convertOfficeToMarkdown('/path/to/document.pdf');
if (conversionResult.success) {
  console.log(`Output: ${conversionResult.outputPath}`);
  console.log(`Preview: ${conversionResult.preview}`);
} else {
  console.error(`Error: ${conversionResult.error}`);
}

Method 3: Complete OpenClaw integration function

async function convertDocumentToMarkdown(filePath) {
  // Validate file exists
  try {
    await read(filePath);
  } catch (error) {
    return { success: false, error: `File not found: ${filePath}` };
  }
  
  // Check file extension
  const ext = filePath.toLowerCase().slice(-5);
  const supported = ['.pdf', '.doc', '.docx', '.pptx'];
  if (!supported.some(s => ext.endsWith(s))) {
    return { 
      success: false, 
      error: `Unsupported file type. Supported: ${supported.join(', ')}` 
    };
  }
  
  // Convert using the skill
  const cmd = `node /path/to/office-to-md/openclaw-skill.js "${filePath}"`;
  const result = await exec(cmd, { 
    workdir: '/path/to/workspace',
    timeout: 120000 // 2 minutes for large files
  });
  
  if (result.exitCode === 0) {
    const outputPath = filePath.replace(/\.[^/.]+$/, '.md');
    return {
      success: true,
      outputPath: outputPath,
      message: `Converted to: ${outputPath}`
    };
  } else {
    return {
      success: false,
      error: result.stderr || 'Conversion failed'
    };
  }
}

// Usage example
const result = await convertDocumentToMarkdown('/path/to/document.doc');
if (result.success) {
  const markdown = await read(result.outputPath);
  console.log(markdown.substring(0, 1000));
}

Examples

Example 1: Convert and analyze a document

// Convert a .doc file and analyze its content
const docPath = '/path/to/document.doc';
const convertResult = await exec(
  `node /path/to/office-to-md/openclaw-skill.js "${docPath}"`,
  { workdir: '/path/to/workspace' }
);

if (convertResult.exitCode === 0) {
  const mdPath = docPath.replace('.doc', '.md');
  const content = await read(mdPath);
  
  // Analyze the content
  const wordCount = content.split(/\s+/).length;
  const lines = content.split('\n').length;
  const hasChinese = /[\u4e00-\u9fff]/.test(content);
  
  console.log(`Document analysis:`);
  console.log(`- Word count: ${wordCount}`);
  console.log(`- Lines: ${lines}`);
  console.log(`- Contains Chinese: ${hasChinese}`);
  console.log(`- Preview: ${content.substring(0, 200)}...`);
}

Example 2: Batch conversion

// Convert multiple documents of different formats
const documents = [
  '/path/to/report.pdf',
  '/path/to/legacy.doc',
  '/path/to/modern.docx',
  '/path/to/presentation.pptx'
];

const results = [];
for (const doc of documents) {
  console.log(`Converting ${doc}...`);
  const result = await exec(
    `node /path/to/office-to-md/openclaw-skill.js "${doc}"`,
    { workdir: '/path/to/workspace', timeout: 90000 }
  );
  
  const success = result.exitCode === 0;
  results.push({
    file: doc,
    success: success,
    error: success ? null : result.stderr
  });
  
  console.log(success ? '✅ Success' : '❌ Failed');
}

// Summary
const successful = results.filter(r => r.success).length;
console.log(`\nConversion summary: ${successful}/${results.length} successful`);

API Reference

convertOfficeToMarkdown(filePath)

Returns a Promise that resolves to:

{
  success: boolean,
  outputPath?: string,
  markdown?: string,
  preview?: string,
  fileType?: string,
  message?: string,
  stats?: {
    lines: number,
    characters: number,
    words: number
  },
  error?: string,
  stack?: string
}

Configuration

Timeout Settings

  • Small files (<1MB): 30 seconds
  • Medium files (1-10MB): 60 seconds
  • Large files (>10MB): 120 seconds

Memory Limits

  • Default Node.js memory limit is sufficient for most documents
  • For very large files, you may need to increase memory:
    node --max-old-space-size=4096 openclaw-skill.js large-file.doc
    

Troubleshooting

Common Issues

  1. "File not found"

    • Check file path and permissions
    • Use absolute paths for reliability
  2. "Unsupported file type"

    • Ensure file has correct extension
    • Check if file is actually the claimed format
  3. Conversion errors with .doc files

    • The file may be corrupted or in an unusual format
    • Try opening in Word and saving as .docx first
  4. Chinese text appears as gibberish

    • word-extractor should handle Chinese encoding automatically
    • If issues persist, the file may use unusual encoding
  5. Timeout errors

    • Increase timeout for large files
    • Check system resources

Debug Mode

Enable debug logging by setting environment variable:

DEBUG=office-to-md node openclaw-skill.js document.doc

Performance

  • PDF: Fast, depends on file size
  • DOCX: Fast to medium, good formatting preservation
  • DOC: Medium, requires binary parsing
  • PPTX: Slow, requires Python and external library

Limitations

  • Images in documents are not extracted
  • Complex formatting may not be fully preserved
  • Tables may convert imperfectly to Markdown
  • Very old or corrupted .doc files may fail
  • Password-protected files are not supported

Changelog

v2.0.0 (2026-02-15)

  • Added full .doc support using word-extractor
  • Fixed ESM compatibility issues with pptConverter
  • Added comprehensive OpenClaw integration
  • Improved Chinese text extraction
  • Added structured output with statistics

v1.0.0 (Initial)

  • Basic PDF, DOCX, PPTX support
  • Simple conversion without .doc support

License

This skill is provided as-is. The underlying libraries have their own licenses:

  • pdf-parse: MIT
  • mammoth: BSD-2-Clause
  • turndown: MIT
  • word-extractor: MIT
  • python-pptx: MIT

Comments

Loading comments...