Skill PDF Orgnizer

Data & APIs

Organizes PDFs by extracting metadata and classifying into topics, renaming files, and sorting into topic-based folders using AI analysis.

Install

openclaw skills install skill-pdf-orgnizer

PDF Organizer Skill

Description

AI-powered PDF organization tool that automatically categorizes and organizes PDF files by topic using GPT analysis.

Features

AI-Powered Content Analysis: Uses OpenAI/Kimi API to extract titles, authors, and journal names
Automatic Topic Classification: Classifies PDFs into 10 standard categories (Technology, Finance, Health, Science, Education, Business, Entertainment, Politics, Sports, Other)
Smart File Naming: Renames files in Title_Author_Journal.pdf format using only underscores
Hierarchical Organization: Creates topic-based folders with subtopic subfolders
Batch Processing: Efficiently processes multiple PDFs at once
Incremental Mode: Only processes new/modified files
Error Handling: Gracefully handles corrupted PDFs and API failures
Detailed Logging: Tracks all operations with comprehensive statistics

Requirements

Python 3.8+
OpenAI API key or Kimi API key
PDF files to organize

Usage

Configure API key in config.json
Place PDF files in input_pdfs/ folder
Run: python pdf_organizer.py
Organized files appear in organized_pdfs/ folder

Project Structure

pdf_organizer.py          # Main entry point
modules/
  ├── pdf_reader.py       # PDF text extraction
  ├── content_analyzer.py  # OpenAI/Kimi API integration
  ├── folder_manager.py    # Folder creation and management
  └── file_mover.py        # File operations
config.json              # Configuration file
requirements.txt          # Python dependencies
README.md                # Documentation
setup.py                # Setup/initialization script

Configuration Options

openai_api_key: Your API key (required)
input_folder: Folder containing PDFs to organize
output_folder: Destination folder for organized PDFs
model: GPT model to use (gpt-3.5-turbo or moonshot-v1-8k)
api_provider: API provider ("openai" or "kimi")
dry_run: Preview mode without moving files (true/false)
incremental: Only process new/modified files (true/false)
max_chars: Maximum characters to analyze from each PDF
custom_topics: Custom topic mappings

Example Output

Organized files are named: Title_Author_Journal.pdf Example: Visual_SLAM_What_Are_the_Current_Trends_and_What_to_Expect_Ali_Tourani,_Hriday_Bavle,_Jose_Luis_Sanchez-Lopez,_and_Holger_Voos_Sensors.pdf