Image To PPT Pro

将任意图片或幻灯片截图复刻成完全可编辑的PPTX文件，像素级还原布局、颜色、文字与图形元素。

Audits

Pass

ClawScanReview

Agentic behavior and permission review.

Static analysisPass

Pattern checks against bundled files.

VirusTotalPass

Multi-engine malware detections and file reputation.

Install

openclaw skills install image-to-ppt-pro

Workflow Overview

Step -1  Image Type Classification  ← Required! Determines Strategy A or B
    ↓
    ├─── Strategy A: Pure Code Replication      (Images with flat geometric shapes)
    │     Step 0    Perspective Correction      ← Required for photos; skip for screenshots
    │     Step 1    Data Extraction             ← Colors + OCR text + shape recognition
    │     Step 2    Visual Planning             ← Define regions, record coordinates
    │     Step 3    Code Implementation         ← Code each block
    │     Step 3.5  Pre-flight Check            ← ★ Paper verification before execution
    │     Step 4    Generate pptx
    │     Step 5    Visual QA + Correction Loop (max 3 iterations)
    │     Step 6    Delivery
    │
    └─── Strategy B: Mathematical Approximation (Images with 3D/lighting/curves that can be geometrically decomposed)
          Step 0    Perspective Correction      ← Required for photos; skip for screenshots
          Step 1    Data Extraction             ← Colors + OCR text + layer decomposition
          Step B2   Layer Planning              ← Decompose complex graphics into overlay layers
          Step B3   Code Implementation         ← Transparency + math coordinates + multi-shape overlay
          Step 3.5  Pre-flight Check            ← ★ Paper verification before execution
          Step 4    Generate pptx
          Step 5    Visual QA + Correction Loop (max 3 iterations)
          Step 6    Delivery

Core difference between paths: Strategy A directly replicates flat graphics; Strategy B doesn't pursue pixel-perfect replication but uses geometric overlays + transparency + math coordinates to approximate complex visuals, trading visual fidelity for 100% editability.

Value of Step 3.5: After writing the script, perform paper verification before execution to intercept out-of-bounds, overlaps, and text overflow issues, significantly reducing Step 5 correction iterations.

⚠️ Global Golden Rules (Every line of code must follow)

Rule 1: Text margin must be 0

All addText calls, whether for titles, body text, or node text, must include margin: 0.

// ✅ Correct
slide.addText("text", { x:1, y:1, w:3, h:0.5, margin: 0, ... });

// ❌ Wrong: Missing margin: 0, pptxgenjs default padding causes text position offset
slide.addText("text", { x:1, y:1, w:3, h:0.5, ... });

Rule 2: Text box coordinates must exactly match shape coordinates

For text inside shapes, the text box's x/y/w/h must be exactly identical to the underlying shape, with no offset.

// ✅ Correct: Shape and text coordinates are identical
slide.addShape(pres.shapes.ROUNDED_RECTANGLE, { x:1.4, y:0.95, w:2.4, h:0.42, ... });
slide.addText("text", { x:1.4, y:0.95, w:2.4, h:0.42, margin:0, align:"center", valign:"middle" });

// ❌ Wrong: Text box doesn't match shape coordinates, causes text offset or overflow
slide.addShape(pres.shapes.ROUNDED_RECTANGLE, { x:1.4, y:0.95, w:2.4, h:0.42, ... });
slide.addText("text", { x:1.5, y:1.0,  w:2.2, h:0.35, ... });

Rule 3: Text box dimensions locked, no auto-expansion allowed

Text box w and h must be explicitly set to match the corresponding shape dimensions. If text doesn't fit, reduce font size or adjust line breaks, never rely on text box auto-expansion.

Rule 4: Drawing order must be background → foreground

Background color
Large background rectangles (header, footer, content area base)
Content graphics (node shapes, connection lines, decorations)
Text inside graphics (add text immediately after each shape, don't wait for all shapes)

Rule 5: Text direction must match original image

Judge text direction first when viewing the image, then write code:

Condition	Text Direction	Code
Text box width > height, text reads left→right normally	Horizontal (default)	No direction attribute needed
Text box height > width × 3, text top→bottom, each character upright	Vertical (Chinese)	`vert: "eaVert"`
Entire text block rotated 90° or 270°	Rotated horizontal	`rotate: 270` (or 90)

⚠️ Two wrong approaches for vertical text:

// ❌ Wrong: Using narrow text box as vertical, text becomes horizontal stacked
slide.addText("Group Data Platform", { x:0, y:1, w:0.3, h:2.0, ... });
// Result: Text horizontal, insufficient width, characters overlap

// ❌ Wrong: Using rotate for Chinese vertical, characters lie on side
slide.addText("Group Data Platform", { x:0, y:1, w:0.3, h:2.0, rotate: 270, ... });
// Result: Entire text block tilted, characters lying sideways

// ✅ Correct: vert: "eaVert" for upright Chinese vertical
slide.addText("Group Data Platform", {
  x: 0, y: 1, w: 0.4, h: 2.0,   // w sufficient for single character width (~0.3-0.5")
  fontSize: 14, bold: true, color: "FFFFFF",
  fontFace: "Microsoft YaHei",
  align: "center", valign: "middle",
  margin: 0,
  vert: "eaVert",                 // ← Key attribute for Chinese vertical
});

Step -1: Image Type Classification (Required)

After viewing image, answer these two questions to determine the path.

Classification 1: Strategy A?

All conditions met → Use Strategy A:

Condition	Description
All graphics are flat basic shapes	Rectangles, rounded rectangles, diamonds, ellipses, lines, arrows
No lighting gradients, no transparency layers	Each element solid fill, clear color boundaries
No curve-arranged elements	All elements arranged in rows/columns or flowcharts

Typical scenarios: Flowcharts, architecture diagrams, org charts, data tables. Proceed directly to Step 0, no need to inform user.

Classification 2: Strategy B?

Any of these signals present, but can be geometrically decomposed → Use Strategy B:

Complex visual signal (any)	And can be geometrically decomposed (all must satisfy)
3D perspective / isometric graphics	Graphics can be approximated with triangles, ellipses, rectangles
Glow / fan-shaped beams / gradient backgrounds	Layering differences can simulate depth with transparency
Decorative elements arranged along circular arcs	Can calculate coordinates with parametric equations
Multi-layer stacked 3D effects	Can use "large shape + white cover" to achieve cutout rings

Decision mnemonic: "Flatten" the graphic in your mind—if after flattening it can be reconstructed with triangles + ellipses + rectangles + transparency combinations, use Strategy B.

Typical scenarios: Circular supply chain platform diagrams, funnel + glow combinations, dashboard schematics.

Inform user:

"This image contains [specific description], will use 'Mathematical Approximation' strategy: overlay geometric shapes + simulate lighting layers with transparency, all elements fully editable, visual fidelity ~75-85% (abandoning pixel-level lighting details, preserving overall visual structure)."

Strategy B: Mathematical Approximation (Detailed Steps)

Core idea: Don't pursue pixel-perfect replication, decompose complex visual graphics into several "geometric layers", use transparency differences to simulate lighting layers, use parametric equations to calculate coordinates of curve-arranged elements, trading for 100% editability.

Practical verification: For complex circular supply chain platform diagrams, can actually achieve:

Fan-shaped beams → 5 triangles with different transparencies
3D cylinder → 3-4 nested ellipses + white ellipse cutout
Arc decorative ring → 52 small rectangles arranged by parametric equations
3D tower → 9 rectangle columns with different heights + transparencies
All code generated, zero images, 100% editable

Step B2: Layer Planning

When viewing image, decompose complex graphics from bottom to top into geometric layers, each corresponding to an approximation method:

Layer	Original Effect	Approximation Method
L1	Background glow / fan beams	N triangles, same color different transparencies, expanding from center outward
L2	Circular platform base color	Large ellipse (semi-transparent) + small white ellipse (cutout inner ring)
L3	Decorative elements along arc	Small rectangles, coordinates calculated with ellipse parametric equations
L4	Central column / tower structure	Rectangle group, center highest and darkest, gradually shorter and more transparent toward sides
L5	Leader lines pointing to labels	Polyline (vertical line + horizontal line, two LINE segments拼接)
L6	All text labels	addText, coordinates precisely matching original positions

Layer Record Table (fill before writing code):

Layer	Description	Shape Type	Count	Color	Transparency Range	Key Dimensions
L1	Background fan beams	triangle	5	F4CACA	55-78%	Cover corresponding fan sector
L2	Large ellipse platform	ellipse×3	3	F8DADA	20-38%	cx=5.5 rx=3.5 ry=1.0
L3	Arc decorative ring	rectangle	52	D85D5D	0%	0.08×0.15"
L4	Central tower	rectangle	9	B50E17	0-92%	w=0.05"
L5	Leader lines	line×2	N groups	888888	0%	—
L6	Text labels	text	N	A64F4F	0%	Positioned by OCR

Step B3: Code Implementation (Three Core Techniques)

Technique 1: Transparency Control (Simulate Lighting Layers)

transparency is the most core parameter in Strategy B, range 0 (fully opaque) ~ 100 (fully transparent). Larger values are more transparent, opposite of intuition, requires special attention.

// ── Fan-shaped beams: multiple triangles, same color, different transparencies ──────────────────────────
// Closer to center = less transparent (smaller value), farther out = more transparent (larger value)

slide.addShape(pres.shapes.TRIANGLE, {   // Central triangle, 22% opaque
  x: 3.5, y: 1.0, w: 3.0, h: 4.0,
  fill: { color: "F4CACA", transparency: 78 },
  line: { color: "F4CACA", width: 0 },   // ← Must be width:0, otherwise hard edges
});
slide.addShape(pres.shapes.TRIANGLE, {   // Left triangle, 34% opaque
  x: 1.5, y: 1.5, w: 3.0, h: 3.5,
  fill: { color: "F4CACA", transparency: 66 },
  line: { color: "F4CACA", width: 0 },
});
slide.addShape(pres.shapes.TRIANGLE, {   // Outermost triangle, 45% opaque
  x: 0.0, y: 1.5, w: 3.0, h: 3.5,
  fill: { color: "F4CACA", transparency: 55 },
  line: { color: "F4CACA", width: 0 },
});

// ── Multi-layer ellipse overlay + white cutout (circular platform) ─────────────────────────────────────
slide.addShape(pres.shapes.OVAL, {       // Large ellipse, platform base color
  x: 1.5, y: 3.0, w: 7.0, h: 2.0,
  fill: { color: "F8DADA", transparency: 20 },
  line: { color: "F8DADA", width: 0 },
});
slide.addShape(pres.shapes.OVAL, {       // White ellipse, covers inner ring → forms ring shape
  x: 2.5, y: 3.2, w: 5.0, h: 1.4,
  fill: { color: "FFFFFF", transparency: 0 },
  line: { color: "FFFFFF", width: 0 },
});
slide.addShape(pres.shapes.OVAL, {       // Thin ring outline
  x: 2.3, y: 3.15, w: 5.4, h: 1.5,
  fill: { color: "F5CACA", transparency: 38 },
  line: { color: "D85D5D", width: 1.5 },
});

Transparency Quick Reference:

transparency value	Visual Effect	Typical Use
0	Fully opaque (solid)	Main structure, accent colors
20-30	Slightly transparent, color saturated	Platform base color, main areas
40-55	Semi-transparent, layered feel	Mid-layer glow
60-75	Quite transparent, outline still clear	Outer glow
85-92	Very transparent, nearly invisible	Far beams, decorative columns

Technique 2: Math Coordinate Calculation (Elements arranged along ellipse path)

// ── N small rectangles evenly arranged along ellipse path (arc decorative ring) ─────────────────────────────
// cx/cy = ellipse center (inches), rx/ry = horizontal/vertical radius (inches)
// ew/eh = width/height of each small element (inches)

const cx = 5.5,  cy = 4.5;   // Ellipse center
const rx = 3.2,  ry = 0.6;   // Ellipse radius
const N  = 52;                // Total elements
const ew = 0.08, eh = 0.15;  // Small rectangle dimensions

for (let i = 0; i < N; i++) {
  const theta = (2 * Math.PI * i) / N;          // Even angles, 0 → 2π
  const x = cx + rx * Math.cos(theta) - ew / 2; // Element top-left x
  const y = cy + ry * Math.sin(theta) - eh / 2; // Element top-left y
  const color = i % 3 === 0 ? "D85D5D" : "EE8D8D"; // Alternating colors

  slide.addShape(pres.shapes.RECTANGLE, {
    x, y, w: ew, h: eh,
    fill: { color },
    line: { color, width: 0 },
  });
}

Center and radius estimation method:

cx = Ellipse horizontal center (inches) = center pixel x ÷ image width px × 10
cy = Ellipse vertical center (inches) = center pixel y ÷ image height px × 5.625
rx = Ellipse horizontal radius (inches) = radius pixels ÷ image width px × 10
ry = Ellipse vertical radius (inches) = radius pixels ÷ image height px × 5.625

Technique 3: Gradient Column Structure (Center highest and darkest, gradually shorter and more transparent toward sides)

// ── Central tower: 9 columns, center highest and most solid, gradually shorter and more transparent toward sides ─────────────────────
const baseY  = 3.93;  // Column bottom y (inches, shared by all columns)
const colW   = 0.05;  // Column width (inches)

// [x coordinate, height, transparency]  Center column transparency=0 (solid)
const columns = [
  [4.944, 0.82, 92],
  [5.034, 1.00, 92],
  [5.124, 1.18, 92],
  [5.214, 1.55, 92],
  [5.304, 2.15,  0],   // ← Center main column, fully opaque
  [5.349, 2.36,  0],   // ← Second highest column, fully opaque
  [5.484, 1.55, 92],
  [5.574, 1.00, 92],
  [5.664, 0.82, 92],
];

columns.forEach(([x, h, transp]) => {
  const color = transp === 0 ? "B50E17" : "D76666";
  slide.addShape(pres.shapes.RECTANGLE, {
    x: x - colW / 2,
    y: baseY - h,       // Column grows upward from bottom
    w: colW,
    h,
    fill: { color, transparency: transp },
    line: { color, width: 0 },
  });
});

Polyline Leaders (L-shaped, connecting graphics to text labels)

// ── Leader = vertical segment + horizontal segment, two LINE segments拼接 ──────────────────────────────
// anchorX/Y = Leader start point on graphic
// labelX/Y  = Label position

const anchorX = 1.25, anchorY = 4.12;
const labelY  = 2.70, labelX  = 1.00;

// Vertical line: from start point up to label height
slide.addShape(pres.shapes.LINE, {
  x: anchorX, y: labelY, w: 0, h: anchorY - labelY,
  line: { color: "888888", width: 0.75 },
});
// Horizontal line: from vertical top horizontally to label
slide.addShape(pres.shapes.LINE, {
  x: labelX, y: labelY, w: anchorX - labelX, h: 0,
  line: { color: "888888", width: 0.75 },
});

Strategy B QA Focus

After Step 5 generates preview image, in addition to Strategy A general checks, additionally check:

Check Item	Judgment Criteria
Transparency layers natural	Glow naturally attenuates from center outward, no abrupt jumps
Triangles have hard edges	Triangle edges show dark outline (is `line.width` 0)
Ellipse path elements closed and even	Arc decorative ring fully closed, no obvious uneven spacing
Cutout effect clean	White cover ellipse edges aligned, no color gaps showing
Column structure symmetrical	Tower left-right symmetrical about central axis
Leader corner continuous	Vertical line endpoint and horizontal line start coordinates precisely connected

Step 0: Perspective Correction (Required for photos, skip for screenshots)

Judgment: Edges not parallel, shot from side, corners not right angles → Must correct.

# scripts/correct_perspective.py
from PIL import Image
import numpy as np

def order_points(pts):
    pts = np.array(pts, dtype="float32")
    s, diff = pts.sum(axis=1), np.diff(pts, axis=1).flatten()
    return np.array([pts[np.argmin(s)], pts[np.argmin(diff)],
                     pts[np.argmax(s)], pts[np.argmax(diff)]], dtype="float32")

def _find_coeffs(pa, pb):
    matrix = []
    for p1, p2 in zip(pa, pb):
        matrix.append([p1[0], p1[1], 1, 0, 0, 0, -p2[0]*p1[0], -p2[0]*p1[1]])
        matrix.append([0, 0, 0, p1[0], p1[1], 1, -p2[1]*p1[0], -p2[1]*p1[1]])
    A = np.matrix(matrix, dtype=float)
    B = np.array(pb).reshape(8)
    return np.array(np.dot(np.linalg.inv(A.T * A) * A.T, B)).reshape(8)

def perspective_correct(src_path, corners, dst_path, w=1920, h=1080):
    img = Image.open(src_path)
    src = order_points(corners)
    dst = np.array([[0,0],[w-1,0],[w-1,h-1],[0,h-1]], dtype="float32")
    img.transform((w,h), Image.PERSPECTIVE, _find_coeffs(dst,src), Image.BICUBIC).save(dst_path)
    print(f"✓ Correction complete → {dst_path}")

perspective_correct(
    src_path="/mnt/user-data/uploads/your_photo.jpg",
    corners=[[120,85],[1800,60],[1820,980],[100,1000]],  # ← Replace after viewing image with view tool
    dst_path="/home/claude/corrected.jpg"
)

python scripts/correct_perspective.py
# Use view tool to confirm corrected.jpg has straight edges before continuing

All subsequent steps use corrected.jpg after correction.

Step 1: Data Extraction (Three items: colors + text + shapes)

1-A Color Extraction (Program runs)

# scripts/extract_colors.py
import sys
from PIL import Image

def sample(img, x1, y1, x2, y2, n=8):
    x1,y1,x2,y2 = int(x1)+5,int(y1)+5,int(x2)-5,int(y2)-5
    if x2<=x1 or y2<=y1: x2,y2=x1+1,y1+1
    xs=[x1+(x2-x1)*i//(n-1) for i in range(n)]
    ys=[y1+(y2-y1)*i//(n-1) for i in range(n)]
    cols=[img.getpixel((x,y))[:3] for x in xs for y in ys]
    r=sorted(c[0] for c in cols)[len(cols)//2]
    g=sorted(c[1] for c in cols)[len(cols)//2]
    b=sorted(c[2] for c in cols)[len(cols)//2]
    return f"{r:02X}{g:02X}{b:02X}"

img_path = sys.argv[1] if len(sys.argv)>1 else "/mnt/user-data/uploads/your_image.jpg"
img = Image.open(img_path).convert("RGB")
W, H = img.size

print(f"Image size: {W} × {H} px")
print(f"Coordinate conversion: x\" = px_x × {10/W:.5f}   y\" = px_y × {5.625/H:.5f}")
print(f"                      w\" = px_w × {10/W:.5f}   h\" = px_h × {5.625/H:.5f}\n")

regions = {
    "Background":     (W*.4,  H*.4,  W*.6,  H*.6),
    "Header":         (W*.01, H*.01, W*.99, H*.13),
    "Title Text":     (W*.03, H*.02, W*.65, H*.11),
    "Content Area":   (W*.05, H*.18, W*.95, H*.82),
    "Footer":         (W*.01, H*.87, W*.99, H*.99),
}
print(f"{'Region':<12} {'hex':<8}  Sampling range (px)")
print("-" * 52)
for name,(x1,y1,x2,y2) in regions.items():
    print(f"{name:<12} #{sample(img,x1,y1,x2,y2)}    ({int(x1)},{int(y1)})→({int(x2)},{int(y2)})")
print("\n# Single point sampling: r,g,b=img.getpixel((x,y))[:3]; print(f'{r:02X}{g:02X}{b:02X}')")

python scripts/extract_colors.py /mnt/user-data/uploads/your_image.jpg

Record all hex values, copy directly in Step 3, no visual estimation.

1-B Text Extraction (OCR)

# scripts/extract_text.py
import sys, subprocess
from PIL import Image

def ensure_deps():
    try:
        import pytesseract
        pytesseract.get_tesseract_version()
        return pytesseract
    except Exception:
        print("Installing tesseract-ocr...")
        subprocess.run(["apt-get","install","-y","-q",
                        "tesseract-ocr","tesseract-ocr-chi-sim"], check=True)
        subprocess.run([sys.executable,"-m","pip","install","pytesseract",
                        "--break-system-packages","-q"], check=True)
        import pytesseract
        return pytesseract

def ocr(img, x1, y1, x2, y2, scale=2, lang="chi_sim+eng"):
    crop=img.crop((int(x1),int(y1),int(x2),int(y2)))
    crop=crop.resize((crop.width*scale,crop.height*scale),Image.LANCZOS)
    raw=tess.image_to_string(crop,lang=lang).strip()
    return "\n".join(l for l in raw.splitlines() if l.strip())

tess=ensure_deps()
img_path=sys.argv[1] if len(sys.argv)>1 else "/mnt/user-data/uploads/your_image.jpg"
img=Image.open(img_path).convert("RGB")
W,H=img.size

print("=== Full Image Scan ===")
print(tess.image_to_string(img,lang="chi_sim+eng").strip())

regions={"Header":(W*.02,H*.01,W*.88,H*.13),"Footer Left":(W*.02,H*.87,W*.38,H*.99),
         "Footer Center":(W*.38,H*.87,W*.72,H*.99),"Footer Right":(W*.72,H*.87,W*.98,H*.99)}
print("\n=== Region Extraction ===")
for name,(x1,y1,x2,y2) in regions.items():
    print(f"\n[{name}] {repr(ocr(img,x1,y1,x2,y2))}")
print("\n# Custom: ocr(img, x1, y1, x2, y2, scale=2)")

python scripts/extract_text.py /mnt/user-data/uploads/your_image.jpg

OCR uses image as final authority, mainly for quickly obtaining long text and numbers, avoiding manual typing errors.

1-C Shape Recognition + Text Direction Recognition (Visual, compare with references/shapes.md)

Open references/shapes.md, compare and identify each graphic and text label, record to table:

ID	Position Description	Shape Constant	Arrow	Fill Color	Border Color	Inner Text	Text Direction
S1	Header background	`RECTANGLE`	No	8B1A1A	Same as fill	—	—
S2	Left flow node 1	`ROUNDED_RECTANGLE`	No	D8EAF5	A0C4E0	"Start"	Horizontal
S3	Node 1→Node 2 connector	`LINE`	Yes, triangle	888888	—	—	—
T1	Left area label	`RECTANGLE`	No	Dark blue	—	"Group Data Platform"	Vertical
…	…	…	…	…	…	…	…

Text direction judgment (confirm one by one when viewing image):

Observe each text label:
  Text box width > height       → Horizontal (default, no vert needed)
  Text box height > width × 3   → Vertical, characters upright → use vert: "eaVert"
  Entire text block tilted 90°/270°    → Rotated horizontal → use rotate: 270

Special attention to these common vertical labels (narrow strip text on left/right sides of images are almost always vertical):

Hierarchy area labels: "Group Data Platform", "Data Consumption", "Data Lake", "Data Source", etc.
Right side vertical description strips: "Data Services", "Data Governance and Control", etc.

Connector arrow judgment:

Thin line with solid/hollow small triangle at end → LINE + endArrowType: "triangle" or "open"
Thick filled arrow shape → RIGHT_ARROW / DOWN_ARROW etc. shape constants
Pure connector without arrow → LINE, don't set endArrowType

Step 2: Visual Layout Planning

Eyes on image, divide slide into large regions, record inch coordinates for each block using conversion formula.

Coordinate system: Top-left origin, unit inches. Slide = 10" × 5.625".

Conversion formula (coefficients printed in Step 1-A, use directly):

x" = pixel_x  × (10    / image width px)
y" = pixel_y  × (5.625 / image height px)
w" = pixel_w × (10    / image width px)
h" = pixel_h × (5.625 / image height px)

Region Record Table (fill before writing code):

|Region|x"|y"|w"|h"|Notes| |-|-|-|-|-| |Header|0|0|10|0.65|Top edge aligned| |Footer|0|5.0|10|0.625|Bottom edge aligned| |Left content area|0.2|0.75|…|…|Estimate| |…|…|…|…|…|…|

Step 3: Code Implementation

Each element's color from Step 1-A, text from Step 1-B, shape constant from Step 1-C, coordinates from Step 2. No reliance on memory or guesswork, all values have clear sources.

After writing each region, immediately mentally compare with original image before continuing to next block.

Script Template

// create_slide.js
const pptxgen = require("pptxgenjs");
let pres = new pptxgen();
pres.layout = 'LAYOUT_16x9';
let slide = pres.addSlide();

// ── Layer 1: Background ──────────────────────────────────────────────────────────
slide.background = { color: "FFFFFF" };  // ← Background color from Step 1-A

// ── Layer 2: Large background rectangles ───────────────────────────────────────────────────

// Header
slide.addShape(pres.shapes.RECTANGLE, {
  x: 0, y: 0, w: 10, h: 0.65,
  fill: { color: "8B1A1A" }, line: { color: "8B1A1A", width: 0 }
});
// Header text (coordinates exactly match header, margin: 0)
slide.addText("[Step 1-B OCR result]", {
  x: 0, y: 0, w: 10, h: 0.65,
  fontSize: 22, bold: true, color: "FFFFFF",
  fontFace: "Microsoft YaHei",
  align: "left", valign: "middle",
  margin: 0,   // ← Rule 1: must be 0
});

// Footer
slide.addShape(pres.shapes.RECTANGLE, {
  x: 0, y: 5.0, w: 10, h: 0.625,
  fill: { color: "8B1A1A" }, line: { color: "8B1A1A", width: 0 }
});
slide.addText("[Step 1-B OCR result]", {
  x: 0, y: 5.0, w: 10, h: 0.625,
  fontSize: 13, bold: true, color: "FFFFFF",
  fontFace: "Microsoft YaHei",
  align: "left", valign: "middle",
  margin: 0,
});

// ── Layer 3: Content graphics + Layer 4: Graphic text (alternate drawing, add text immediately after each shape) ──

// Example: Flow node
slide.addShape(pres.shapes.ROUNDED_RECTANGLE, {  // ← Shape constant from Step 1-C
  x: 1.4, y: 0.95, w: 2.4, h: 0.42,
  fill: { color: "D8EAF5" },
  line: { color: "A0C4E0", width: 1 },
  rectRadius: 0.05
});
slide.addText("Node text", {
  x: 1.4, y: 0.95, w: 2.4, h: 0.42,  // ← Rule 2: exactly matches shape above
  fontSize: 11, color: "333333",
  fontFace: "Microsoft YaHei",
  align: "center", valign: "middle",
  margin: 0,  // ← Rule 1
  wrap: true,
});

// Example: Connector (with arrow)
slide.addShape(pres.shapes.LINE, {
  x: 2.6, y: 1.37, w: 0, h: 0.25,    // Vertical line: w=0,h>0
  line: {
    color: "888888", width: 1.5,
    endArrowType: "triangle",          // ← Step 1-C: use triangle/open if arrow present, omit if no arrow
  }
});

pres.writeFile({ fileName: "/mnt/user-data/outputs/output.pptx" });
console.log("Done!");

NODE_PATH=/home/claude/.npm-global/lib/node_modules node create_slide.js

Step 3.5: Pre-flight Check (Required before execution, do not skip)

Core principle: Don't execute immediately after writing script. Do paper verification first—"render" script coordinates in mind, compare with original image, fix issues on the spot, then submit for execution. This step intercepts most overlap, out-of-bounds, and text overflow issues before execution.

Check A: Automated Out-of-bounds + Overlap Scan

Run scripts/preflight.py, automatically parse all element coordinates in script and report issues:

python scripts/preflight.py create_slide.js

Output explanation:

❌ Out of bounds: Elements with x+w > 10 or y+h > 5.625 → Must fix before execution
⚠️ Suspected overlap: Non-parent-child rectangle intersections between two elements → Manual confirmation if reasonable
📋 Coordinate summary: All elements sorted by y, convenient for comparing region proportions with original image

Any ❌ out of bounds must be fixed and re-run until no ❌ before entering Step 4.

Check B: Text Capacity Manual Verification

For all addText in script, verify one by one if text fits.

Horizontal Text

Required width = character count × (fontSize / 72 × 1.1) inches
Required height = line count × (fontSize / 72 × 1.5) inches
Requirement: w ≥ required width, h ≥ required height

Quick reference (fontSize = 12pt):

Content	Min w	Min h
4 Chinese characters, single line	0.74"	0.25"
6 Chinese characters, single line	1.10"	0.25"
4 Chinese characters, two lines	0.74"	0.45"
10 English characters	0.80"	0.25"

Vertical Text (`vert: "eaVert"`)

Required width = fontSize / 72 × 1.4 inches (single column character width)
Required height = character count × (fontSize / 72 × 1.3) inches
Requirement: w ≥ required width, h ≥ required height

Quick reference (fontSize = 14pt):

Vertical characters	Min w	Min h
4 chars	0.27"	1.05"
6 chars	0.27"	1.57"
8 chars	0.27"	2.10"

Text doesn't fit → Fix before execution: Reduce fontSize, expand w/h, or use breakLine for manual line breaks.

Check C: Region-by-Region Proportion Verification (Compare with original image)

Use coordinate summary table from preflight.py output, compare with original image for proportion verification:

1. Horizontally slice original image into major horizontal regions (header, content area, footer…)
2. Visually estimate each region's percentage of total slide height
3. Compare with corresponding element's h/5.625 in summary table, see if close to visual percentage
4. Regions with deviation > 15% need re-estimation of y and h

Focus on right-side truncation check:
  → Rightmost element's x+w ≤ 9.8" (leave 0.2" safety margin)
  → When right side has vertical label bar, confirm its x doesn't exceed 10 - label width

Check D: Parent-Child Container Relationship Verification

For nested regions (large background box containing multiple child elements), check all child elements are within parent box range:

Child element requirements:
  child.x        ≥ parent.x
  child.x + child.w ≤ parent.x + parent.w
  child.y        ≥ parent.y
  child.y + child.h ≤ parent.y + parent.h

Most common violation: Last column of nested grid x+w exceeds parent box right boundary. Fix method:

// Recalculate grid column width, ensure last column doesn't exceed bounds
const cellW = (parentW - padLeft - padRight) / cols;
const cellX = (i) => parentX + padLeft + i * cellW;

Pass Criteria

All satisfied before entering Step 4:

Item	Pass Criteria
✅ No out of bounds	preflight.py has no ❌ output
✅ Text fits	All text boxes w/h satisfy capacity calculation
✅ Region proportions reasonable	Each region height deviation from original visual ≤ 15%
✅ Child elements within parent	All nested elements don't exceed parent box boundary
✅ Overlaps confirmed	preflight.py ⚠️ warnings all manually confirmed reasonable

Fix unpassed items directly in script, re-run preflight.py after fix, until all pass.

Step 4: Implementation Methods for Various Element Types

4.1 Various Nodes (Shape + text must have identical coordinates)

// Rectangle node (processing step)
slide.addShape(pres.shapes.RECTANGLE, { x:1.0, y:1.0, w:2.0, h:0.5,
  fill:{color:"F5F5F5"}, line:{color:"AAAAAA",width:1} });
slide.addText("Processing Step", { x:1.0, y:1.0, w:2.0, h:0.5,
  fontSize:11, color:"333333", fontFace:"Microsoft YaHei",
  align:"center", valign:"middle", margin:0, wrap:true });

// Diamond node (decision branch)
slide.addShape(pres.shapes.DIAMOND, { x:1.0, y:2.0, w:2.0, h:0.6,
  fill:{color:"FFF2CC"}, line:{color:"CCAA00",width:1} });
slide.addText("Yes?", { x:1.0, y:2.0, w:2.0, h:0.6,
  fontSize:11, color:"333333", fontFace:"Microsoft YaHei",
  align:"center", valign:"middle", margin:0, wrap:true });

// Ellipse node (start/end)
slide.addShape(pres.shapes.OVAL, { x:1.2, y:0.3, w:1.6, h:0.45,
  fill:{color:"D5E8D4"}, line:{color:"82B366",width:1} });
slide.addText("Start", { x:1.2, y:0.3, w:1.6, h:0.45,
  fontSize:11, color:"333333", fontFace:"Microsoft YaHei",
  align:"center", valign:"middle", margin:0 });

4.2 Connectors (Choose arrow type based on Step 1-C)

// Connector with arrow (solid triangle at end)
slide.addShape(pres.shapes.LINE, { x:2.0, y:1.5, w:0, h:0.3,
  line:{ color:"888888", width:1.5, endArrowType:"triangle" } });

// Connector with arrow (open arrow at end)
slide.addShape(pres.shapes.LINE, { x:2.0, y:1.5, w:0, h:0.3,
  line:{ color:"888888", width:1.5, endArrowType:"open" } });

// Pure connector without arrow
slide.addShape(pres.shapes.LINE, { x:2.0, y:1.5, w:0, h:0.3,
  line:{ color:"CCCCCC", width:1 } });

endArrowType options: "triangle" (solid), "open" (hollow), "stealth" (stealth), "diamond", "oval", "none"

4.3 Dual-Theme Nodes (Main title + subtitle)

slide.addShape(pres.shapes.ROUNDED_RECTANGLE, { x:0.55, y:1.62, w:4.1, h:0.55,
  fill:{color:"E8E0F0"}, line:{color:"C0A8D8",width:1}, rectRadius:0.05 });
slide.addText([
  { text:"Main Title", options:{ breakLine:true } },
  { text:"Subtitle description, can be smaller", options:{ fontSize:7.5, color:"666666" } }
], { x:0.55, y:1.62, w:4.1, h:0.55,
  fontSize:10, color:"444444", fontFace:"Microsoft YaHei",
  align:"center", valign:"middle", margin:0, wrap:true });

4.4 Annotation Bubbles

slide.addShape(pres.shapes.RECTANGULAR_CALLOUT, { x:1.0, y:0.5, w:2.5, h:0.6,
  fill:{color:"FFFDE7"}, line:{color:"CCAA00",width:1} });
slide.addText("Annotation text", { x:1.0, y:0.5, w:2.5, h:0.6,
  fontSize:10, color:"333333", fontFace:"Microsoft YaHei",
  align:"center", valign:"middle", margin:0, wrap:true });

4.5 Mixed Style Text (Links + normal)

slide.addText([
  { text:"Link text", options:{ color:"1155CC", underline:{style:"sng"} } },
  { text:" and ",    options:{ color:"333333" } },
  { text:"Another link", options:{ color:"1155CC", underline:{style:"sng"} } },
  { text:" is description.", options:{ color:"333333" } }
], { x:5.55, y:3.3, w:4.2, h:0.35,
  fontSize:12, fontFace:"Microsoft YaHei", margin:0 });

4.6 Vertical Text Labels (Standard for sidebar area labels)

Applicable scenarios: Narrow strip area labels on left or right side of image, such as "Group Data Platform", "Data Lake", "Data Services", etc.

// ── Vertical text: vert: "eaVert" (each character upright, read top to bottom) ───────────────────

// Vertical label with background (most common form)
slide.addShape(pres.shapes.RECTANGLE, {
  x: 0, y: 0.65, w: 0.4, h: 3.5,        // Width ~0.35-0.5", height fills area
  fill: { color: "1F4E79" }, line: { color: "1F4E79", width: 0 }
});
slide.addText("Group Data Platform", {
  x: 0, y: 0.65, w: 0.4, h: 3.5,        // ← Exactly matches shape
  fontSize: 14, bold: true, color: "FFFFFF",
  fontFace: "Microsoft YaHei",
  align: "center", valign: "middle",
  margin: 0,
  vert: "eaVert",                         // ← Chinese vertical, characters upright
});

// Pure vertical text without background
slide.addText("Data Services", {
  x: 9.2, y: 1.0, w: 0.35, h: 2.0,
  fontSize: 12, color: "2E75B6",
  fontFace: "Microsoft YaHei",
  align: "center", valign: "middle",
  margin: 0,
  vert: "eaVert",
});

Dimension rules:

w (width) = Single character width + small margin, approximately fontSize / 72 * 1.4 (inches)
- fontSize 12pt → w ≈ 0.28", suggest 0.35"
- fontSize 14pt → w ≈ 0.33", suggest 0.4"
- fontSize 16pt → w ≈ 0.38", suggest 0.45"
h (height) = Actual height of corresponding region, converted from image

4.7 Correct Handling of Compressed Horizontal Labels

Applicable scenarios: Horizontal region labels at top or bottom of image (such as "Data Consumption", "Data Source"), text is normal horizontal but text box is narrow.

// ── Correct: Horizontal labels need sufficient width, don't compress to achieve vertical effect ──────────────────────

// ❌ Wrong: Width too narrow, text forced to wrap into "vertical column", actually horizontal compression
slide.addText("Data Consumption", {
  x: 0, y: 0, w: 0.3, h: 0.8,    // w too narrow, each character auto-wraps
  fontSize: 16, ...
});

// ✅ Correct: Give sufficient width to accommodate all text, if space insufficient reduce font size
slide.addText("Data Consumption", {
  x: 0, y: 0, w: 0.8, h: 0.8,    // w sufficient for horizontal text
  fontSize: 14, bold: true, color: "FFFFFF",
  fontFace: "Microsoft YaHei",
  align: "center", valign: "middle",
  margin: 0,
  wrap: false,                     // ← Prohibit wrapping, ensure single line display
});

// ✅ If region really narrow must wrap, use breakLine for manual line break control
slide.addText([
  { text: "Data", options: { breakLine: true } },
  { text: "Consumption" }
], {
  x: 0, y: 0, w: 0.45, h: 0.8,
  fontSize: 14, bold: true, color: "FFFFFF",
  fontFace: "Microsoft YaHei",
  align: "center", valign: "middle",
  margin: 0,
});

Horizontal vs Vertical Quick Reference:

Original Effect	Characters upright?	Correct approach
Text left to right, normal reading	✅ Upright	Horizontal, no `vert` needed
Text top to bottom, each character upright	✅ Upright	Vertical, add `vert: "eaVert"`
Entire text block rotated (horizontal but on side)	❌ On side	Add `rotate: 270`
Text wraps into column but characters horizontal (compressed)	✅ Upright but squeezed	Horizontal + manual `breakLine` + appropriate `w`

const slideW = 10, slideH = 5.625;

// Horizontal center
const x = (slideW - w) / 2;

// Node bottom center (connector start point)
const lineX = nodeX + nodeW / 2;
const lineY = nodeY + nodeH;

// n elements evenly spaced (total range rangeW, start x0, each width itemW)
const gap = (rangeW - n * itemW) / (n + 1);
const itemX = (i) => x0 + gap + i * (itemW + gap);

Step 5: Visual QA + Correction Loop (Required, max 3 iterations)

Generate Preview Image

python /mnt/skills/public/pptx/scripts/office/soffice.py \
  --headless --convert-to pdf /mnt/user-data/outputs/output.pptx

rm -f /home/claude/slide-*.jpg
pdftoppm -jpeg -r 150 /home/claude/output.pdf /home/claude/slide
ls -1 /home/claude/slide*.jpg

Use view tool to view preview image, compare side-by-side with original image, execute following two-phase checks.

Phase 1: Layout Compliance

Check Item	Judgment Criteria
Text out of box	Text exceeds graphic boundary or truncated
Text box exceeds graphic	Text box dimensions larger than underlying shape, text appears outside shape
Vertical text correct	Sidebar label characters upright (`eaVert`); horizontal compressed text not mistakenly made vertical
Horizontal text compressed	Horizontal label `w` sufficient, text not overlapping or misaligned due to insufficient width
Elements unexpectedly overlapping	Text covered by graphics, or graphics block other text
Connector arrow direction	Arrow direction matches original image (start/end reversed?)
Arrangement orderly	Peer elements aligned, evenly spaced, no obvious misalignment
Spacing reasonable	Adjacent element spacing ≥ 0.1", no edge touching
Page margins	Content distance from slide edge ≥ 0.2" (header/footer excepted)

Phase 2: Fidelity to Original Image

Check Item	Judgment Criteria
🎨 Colors	Background, graphic fill, text color match original image
🔷 Shape types	Rectangle/rounded/diamond/ellipse etc. match original image
📏 Proportions	Element width/height ratios, region proportions match original image
↔️ Arrow types	Solid/hollow/thick arrow shapes match original image
🔤 Font size hierarchy	Title/body/annotation font size relationships match original image
↔️ Alignment	Text left/center/right alignment matches original image

Correction Loop Control (Strictly Enforced)

Correction counter = 0

LOOP:
  Generate preview → Phase 1 check → Phase 2 check

  if no issues → Deliver, end

  if correction counter >= 3:
    Deliver directly, note: "Completed 3 correction rounds, following issues unresolved: [list]"
    End

  Modify create_slide.js corresponding lines
  Regenerate pptx and preview
  Correction counter += 1
  → Continue LOOP

Explain each round: ① Layout issues found ② Fidelity issues found ③ Which code modified ④ Round N/3

Step 6: Common Errors and Fixes

Error Phenomenon	Cause	Fix
Vertical label characters on side	Used `rotate: 270` instead of `vert`	Use `vert: "eaVert"` instead
Vertical label characters overlapping	Text box `w` too narrow without `vert`	Add `vert: "eaVert"`, set `w` to 0.35-0.5"
Horizontal label text wraps into column	Text box `w` too narrow, text forced to wrap	Increase `w` for single line, or use `breakLine` for manual line break control
Horizontal label added `vert` causing chaos	Misjudged as vertical	Remove `vert`, ensure `w` wide enough
Text position offset	Forgot `margin: 0`	Add `margin: 0` to all addText
Text exceeds graphic boundary	Text box dimensions don't match shape	Align text box x/y/w/h exactly with shape
Text expanded text box	Didn't set `wrap: true` and fixed dimensions	Add `wrap: true`, appropriately reduce font size
Arrow direction reversed	`endArrowType` on wrong side	Swap `begin/endArrowType`, or adjust x/y/w/h direction
Color display wrong	Added `#` before hex	Remove `#`: `"FF0000"` ✅
Garbled/Chinese characters as blocks	Font not set	Add `fontFace: "Microsoft YaHei"`
ROUNDED_RECTANGLE corner has white edge	Text box covers corner	Text box same coordinates as shape + `margin:0`
Diamond text crooked	valign not set	Add `align:"center", valign:"middle"`
Color sampling deviation	Sampled border/shadow pixels	Move sampling coordinates to region center, 10px from edge

Strategy B Specific

Error Phenomenon	Cause	Fix
Triangle/shape has obvious hard edge outline	`line.width` not set to 0	Add `line: { color: same as fill, width: 0 }`
Glow layers reversed (outer dark inner light)	`transparency` value order wrong	Outer `transparency` smaller (more opaque) → swap values
Arc decorative ring not closed	N value insufficient or angle range wrong	Confirm `theta` from 0 to `2 * Math.PI`, appropriately increase N
Arc element position not on ellipse	cx/cy/rx/ry estimation wrong	Recalculate center and radius from pixel coordinates
Cutout white ellipse has color gap	White ellipse coverage insufficient	Slightly expand white ellipse w/h, ensure complete inner ring coverage
Column structure asymmetrical	x coordinate calculation error	Use center column as benchmark, mirror symmetric offset left-right
Leader corner has gap	Vertical and horizontal line endpoint coordinates not continuous	Vertical line start y = horizontal line y; horizontal line start x = vertical line x

References

pptxgenjs API: references/pptxgenjs-cheatsheet.md
Shape recognition and constants quick reference: references/shapes.md