Image To PPT Pro

将任意图片或幻灯片截图复刻成完全可编辑的PPTX文件,像素级还原布局、颜色、文字与图形元素。

Audits

Pass

Install

openclaw skills install image-to-ppt-pro

Workflow Overview

Step -1  Image Type Classification  ← Required! Determines Strategy A or B
    ↓
    ├─── Strategy A: Pure Code Replication      (Images with flat geometric shapes)
    │     Step 0    Perspective Correction      ← Required for photos; skip for screenshots
    │     Step 1    Data Extraction             ← Colors + OCR text + shape recognition
    │     Step 2    Visual Planning             ← Define regions, record coordinates
    │     Step 3    Code Implementation         ← Code each block
    │     Step 3.5  Pre-flight Check            ← ★ Paper verification before execution
    │     Step 4    Generate pptx
    │     Step 5    Visual QA + Correction Loop (max 3 iterations)
    │     Step 6    Delivery
    │
    └─── Strategy B: Mathematical Approximation (Images with 3D/lighting/curves that can be geometrically decomposed)
          Step 0    Perspective Correction      ← Required for photos; skip for screenshots
          Step 1    Data Extraction             ← Colors + OCR text + layer decomposition
          Step B2   Layer Planning              ← Decompose complex graphics into overlay layers
          Step B3   Code Implementation         ← Transparency + math coordinates + multi-shape overlay
          Step 3.5  Pre-flight Check            ← ★ Paper verification before execution
          Step 4    Generate pptx
          Step 5    Visual QA + Correction Loop (max 3 iterations)
          Step 6    Delivery

Core difference between paths: Strategy A directly replicates flat graphics; Strategy B doesn't pursue pixel-perfect replication but uses geometric overlays + transparency + math coordinates to approximate complex visuals, trading visual fidelity for 100% editability.

Value of Step 3.5: After writing the script, perform paper verification before execution to intercept out-of-bounds, overlaps, and text overflow issues, significantly reducing Step 5 correction iterations.


⚠️ Global Golden Rules (Every line of code must follow)

Rule 1: Text margin must be 0

All addText calls, whether for titles, body text, or node text, must include margin: 0.

// ✅ Correct
slide.addText("text", { x:1, y:1, w:3, h:0.5, margin: 0, ... });

// ❌ Wrong: Missing margin: 0, pptxgenjs default padding causes text position offset
slide.addText("text", { x:1, y:1, w:3, h:0.5, ... });

Rule 2: Text box coordinates must exactly match shape coordinates

For text inside shapes, the text box's x/y/w/h must be exactly identical to the underlying shape, with no offset.

// ✅ Correct: Shape and text coordinates are identical
slide.addShape(pres.shapes.ROUNDED_RECTANGLE, { x:1.4, y:0.95, w:2.4, h:0.42, ... });
slide.addText("text", { x:1.4, y:0.95, w:2.4, h:0.42, margin:0, align:"center", valign:"middle" });

// ❌ Wrong: Text box doesn't match shape coordinates, causes text offset or overflow
slide.addShape(pres.shapes.ROUNDED_RECTANGLE, { x:1.4, y:0.95, w:2.4, h:0.42, ... });
slide.addText("text", { x:1.5, y:1.0,  w:2.2, h:0.35, ... });

Rule 3: Text box dimensions locked, no auto-expansion allowed

Text box w and h must be explicitly set to match the corresponding shape dimensions. If text doesn't fit, reduce font size or adjust line breaks, never rely on text box auto-expansion.

Rule 4: Drawing order must be background → foreground

  1. Background color
  2. Large background rectangles (header, footer, content area base)
  3. Content graphics (node shapes, connection lines, decorations)
  4. Text inside graphics (add text immediately after each shape, don't wait for all shapes)

Rule 5: Text direction must match original image

Judge text direction first when viewing the image, then write code:

ConditionText DirectionCode
Text box width > height, text reads left→right normallyHorizontal (default)No direction attribute needed
Text box height > width × 3, text top→bottom, each character uprightVertical (Chinese)vert: "eaVert"
Entire text block rotated 90° or 270°Rotated horizontalrotate: 270 (or 90)

⚠️ Two wrong approaches for vertical text:

// ❌ Wrong: Using narrow text box as vertical, text becomes horizontal stacked
slide.addText("Group Data Platform", { x:0, y:1, w:0.3, h:2.0, ... });
// Result: Text horizontal, insufficient width, characters overlap

// ❌ Wrong: Using rotate for Chinese vertical, characters lie on side
slide.addText("Group Data Platform", { x:0, y:1, w:0.3, h:2.0, rotate: 270, ... });
// Result: Entire text block tilted, characters lying sideways

// ✅ Correct: vert: "eaVert" for upright Chinese vertical
slide.addText("Group Data Platform", {
  x: 0, y: 1, w: 0.4, h: 2.0,   // w sufficient for single character width (~0.3-0.5")
  fontSize: 14, bold: true, color: "FFFFFF",
  fontFace: "Microsoft YaHei",
  align: "center", valign: "middle",
  margin: 0,
  vert: "eaVert",                 // ← Key attribute for Chinese vertical
});

Step -1: Image Type Classification (Required)

After viewing image, answer these two questions to determine the path.


Classification 1: Strategy A?

All conditions met → Use Strategy A:

ConditionDescription
All graphics are flat basic shapesRectangles, rounded rectangles, diamonds, ellipses, lines, arrows
No lighting gradients, no transparency layersEach element solid fill, clear color boundaries
No curve-arranged elementsAll elements arranged in rows/columns or flowcharts

Typical scenarios: Flowcharts, architecture diagrams, org charts, data tables. Proceed directly to Step 0, no need to inform user.


Classification 2: Strategy B?

Any of these signals present, but can be geometrically decomposed → Use Strategy B:

Complex visual signal (any)And can be geometrically decomposed (all must satisfy)
3D perspective / isometric graphicsGraphics can be approximated with triangles, ellipses, rectangles
Glow / fan-shaped beams / gradient backgroundsLayering differences can simulate depth with transparency
Decorative elements arranged along circular arcsCan calculate coordinates with parametric equations
Multi-layer stacked 3D effectsCan use "large shape + white cover" to achieve cutout rings

Decision mnemonic: "Flatten" the graphic in your mind—if after flattening it can be reconstructed with triangles + ellipses + rectangles + transparency combinations, use Strategy B.

Typical scenarios: Circular supply chain platform diagrams, funnel + glow combinations, dashboard schematics.

Inform user:

"This image contains [specific description], will use 'Mathematical Approximation' strategy: overlay geometric shapes + simulate lighting layers with transparency, all elements fully editable, visual fidelity ~75-85% (abandoning pixel-level lighting details, preserving overall visual structure)."


Strategy B: Mathematical Approximation (Detailed Steps)

Core idea: Don't pursue pixel-perfect replication, decompose complex visual graphics into several "geometric layers", use transparency differences to simulate lighting layers, use parametric equations to calculate coordinates of curve-arranged elements, trading for 100% editability.

Practical verification: For complex circular supply chain platform diagrams, can actually achieve:

  • Fan-shaped beams → 5 triangles with different transparencies
  • 3D cylinder → 3-4 nested ellipses + white ellipse cutout
  • Arc decorative ring → 52 small rectangles arranged by parametric equations
  • 3D tower → 9 rectangle columns with different heights + transparencies
  • All code generated, zero images, 100% editable

Step B2: Layer Planning

When viewing image, decompose complex graphics from bottom to top into geometric layers, each corresponding to an approximation method:

LayerOriginal EffectApproximation Method
L1Background glow / fan beamsN triangles, same color different transparencies, expanding from center outward
L2Circular platform base colorLarge ellipse (semi-transparent) + small white ellipse (cutout inner ring)
L3Decorative elements along arcSmall rectangles, coordinates calculated with ellipse parametric equations
L4Central column / tower structureRectangle group, center highest and darkest, gradually shorter and more transparent toward sides
L5Leader lines pointing to labelsPolyline (vertical line + horizontal line, two LINE segments拼接)
L6All text labelsaddText, coordinates precisely matching original positions

Layer Record Table (fill before writing code):

LayerDescriptionShape TypeCountColorTransparency RangeKey Dimensions
L1Background fan beamstriangle5F4CACA55-78%Cover corresponding fan sector
L2Large ellipse platformellipse×33F8DADA20-38%cx=5.5 rx=3.5 ry=1.0
L3Arc decorative ringrectangle52D85D5D0%0.08×0.15"
L4Central towerrectangle9B50E170-92%w=0.05"
L5Leader linesline×2N groups8888880%
L6Text labelstextNA64F4F0%Positioned by OCR

Step B3: Code Implementation (Three Core Techniques)

Technique 1: Transparency Control (Simulate Lighting Layers)

transparency is the most core parameter in Strategy B, range 0 (fully opaque) ~ 100 (fully transparent). Larger values are more transparent, opposite of intuition, requires special attention.

// ── Fan-shaped beams: multiple triangles, same color, different transparencies ──────────────────────────
// Closer to center = less transparent (smaller value), farther out = more transparent (larger value)

slide.addShape(pres.shapes.TRIANGLE, {   // Central triangle, 22% opaque
  x: 3.5, y: 1.0, w: 3.0, h: 4.0,
  fill: { color: "F4CACA", transparency: 78 },
  line: { color: "F4CACA", width: 0 },   // ← Must be width:0, otherwise hard edges
});
slide.addShape(pres.shapes.TRIANGLE, {   // Left triangle, 34% opaque
  x: 1.5, y: 1.5, w: 3.0, h: 3.5,
  fill: { color: "F4CACA", transparency: 66 },
  line: { color: "F4CACA", width: 0 },
});
slide.addShape(pres.shapes.TRIANGLE, {   // Outermost triangle, 45% opaque
  x: 0.0, y: 1.5, w: 3.0, h: 3.5,
  fill: { color: "F4CACA", transparency: 55 },
  line: { color: "F4CACA", width: 0 },
});

// ── Multi-layer ellipse overlay + white cutout (circular platform) ─────────────────────────────────────
slide.addShape(pres.shapes.OVAL, {       // Large ellipse, platform base color
  x: 1.5, y: 3.0, w: 7.0, h: 2.0,
  fill: { color: "F8DADA", transparency: 20 },
  line: { color: "F8DADA", width: 0 },
});
slide.addShape(pres.shapes.OVAL, {       // White ellipse, covers inner ring → forms ring shape
  x: 2.5, y: 3.2, w: 5.0, h: 1.4,
  fill: { color: "FFFFFF", transparency: 0 },
  line: { color: "FFFFFF", width: 0 },
});
slide.addShape(pres.shapes.OVAL, {       // Thin ring outline
  x: 2.3, y: 3.15, w: 5.4, h: 1.5,
  fill: { color: "F5CACA", transparency: 38 },
  line: { color: "D85D5D", width: 1.5 },
});

Transparency Quick Reference:

transparency valueVisual EffectTypical Use
0Fully opaque (solid)Main structure, accent colors
20-30Slightly transparent, color saturatedPlatform base color, main areas
40-55Semi-transparent, layered feelMid-layer glow
60-75Quite transparent, outline still clearOuter glow
85-92Very transparent, nearly invisibleFar beams, decorative columns

Technique 2: Math Coordinate Calculation (Elements arranged along ellipse path)

// ── N small rectangles evenly arranged along ellipse path (arc decorative ring) ─────────────────────────────
// cx/cy = ellipse center (inches), rx/ry = horizontal/vertical radius (inches)
// ew/eh = width/height of each small element (inches)

const cx = 5.5,  cy = 4.5;   // Ellipse center
const rx = 3.2,  ry = 0.6;   // Ellipse radius
const N  = 52;                // Total elements
const ew = 0.08, eh = 0.15;  // Small rectangle dimensions

for (let i = 0; i < N; i++) {
  const theta = (2 * Math.PI * i) / N;          // Even angles, 0 → 2π
  const x = cx + rx * Math.cos(theta) - ew / 2; // Element top-left x
  const y = cy + ry * Math.sin(theta) - eh / 2; // Element top-left y
  const color = i % 3 === 0 ? "D85D5D" : "EE8D8D"; // Alternating colors

  slide.addShape(pres.shapes.RECTANGLE, {
    x, y, w: ew, h: eh,
    fill: { color },
    line: { color, width: 0 },
  });
}

Center and radius estimation method:

  • cx = Ellipse horizontal center (inches) = center pixel x ÷ image width px × 10
  • cy = Ellipse vertical center (inches) = center pixel y ÷ image height px × 5.625
  • rx = Ellipse horizontal radius (inches) = radius pixels ÷ image width px × 10
  • ry = Ellipse vertical radius (inches) = radius pixels ÷ image height px × 5.625

Technique 3: Gradient Column Structure (Center highest and darkest, gradually shorter and more transparent toward sides)

// ── Central tower: 9 columns, center highest and most solid, gradually shorter and more transparent toward sides ─────────────────────
const baseY  = 3.93;  // Column bottom y (inches, shared by all columns)
const colW   = 0.05;  // Column width (inches)

// [x coordinate, height, transparency]  Center column transparency=0 (solid)
const columns = [
  [4.944, 0.82, 92],
  [5.034, 1.00, 92],
  [5.124, 1.18, 92],
  [5.214, 1.55, 92],
  [5.304, 2.15,  0],   // ← Center main column, fully opaque
  [5.349, 2.36,  0],   // ← Second highest column, fully opaque
  [5.484, 1.55, 92],
  [5.574, 1.00, 92],
  [5.664, 0.82, 92],
];

columns.forEach(([x, h, transp]) => {
  const color = transp === 0 ? "B50E17" : "D76666";
  slide.addShape(pres.shapes.RECTANGLE, {
    x: x - colW / 2,
    y: baseY - h,       // Column grows upward from bottom
    w: colW,
    h,
    fill: { color, transparency: transp },
    line: { color, width: 0 },
  });
});

Polyline Leaders (L-shaped, connecting graphics to text labels)

// ── Leader = vertical segment + horizontal segment, two LINE segments拼接 ──────────────────────────────
// anchorX/Y = Leader start point on graphic
// labelX/Y  = Label position

const anchorX = 1.25, anchorY = 4.12;
const labelY  = 2.70, labelX  = 1.00;

// Vertical line: from start point up to label height
slide.addShape(pres.shapes.LINE, {
  x: anchorX, y: labelY, w: 0, h: anchorY - labelY,
  line: { color: "888888", width: 0.75 },
});
// Horizontal line: from vertical top horizontally to label
slide.addShape(pres.shapes.LINE, {
  x: labelX, y: labelY, w: anchorX - labelX, h: 0,
  line: { color: "888888", width: 0.75 },
});

Strategy B QA Focus

After Step 5 generates preview image, in addition to Strategy A general checks, additionally check:

Check ItemJudgment Criteria
Transparency layers naturalGlow naturally attenuates from center outward, no abrupt jumps
Triangles have hard edgesTriangle edges show dark outline (is line.width 0)
Ellipse path elements closed and evenArc decorative ring fully closed, no obvious uneven spacing
Cutout effect cleanWhite cover ellipse edges aligned, no color gaps showing
Column structure symmetricalTower left-right symmetrical about central axis
Leader corner continuousVertical line endpoint and horizontal line start coordinates precisely connected

Step 0: Perspective Correction (Required for photos, skip for screenshots)

Judgment: Edges not parallel, shot from side, corners not right angles → Must correct.

# scripts/correct_perspective.py
from PIL import Image
import numpy as np

def order_points(pts):
    pts = np.array(pts, dtype="float32")
    s, diff = pts.sum(axis=1), np.diff(pts, axis=1).flatten()
    return np.array([pts[np.argmin(s)], pts[np.argmin(diff)],
                     pts[np.argmax(s)], pts[np.argmax(diff)]], dtype="float32")

def _find_coeffs(pa, pb):
    matrix = []
    for p1, p2 in zip(pa, pb):
        matrix.append([p1[0], p1[1], 1, 0, 0, 0, -p2[0]*p1[0], -p2[0]*p1[1]])
        matrix.append([0, 0, 0, p1[0], p1[1], 1, -p2[1]*p1[0], -p2[1]*p1[1]])
    A = np.matrix(matrix, dtype=float)
    B = np.array(pb).reshape(8)
    return np.array(np.dot(np.linalg.inv(A.T * A) * A.T, B)).reshape(8)

def perspective_correct(src_path, corners, dst_path, w=1920, h=1080):
    img = Image.open(src_path)
    src = order_points(corners)
    dst = np.array([[0,0],[w-1,0],[w-1,h-1],[0,h-1]], dtype="float32")
    img.transform((w,h), Image.PERSPECTIVE, _find_coeffs(dst,src), Image.BICUBIC).save(dst_path)
    print(f"✓ Correction complete → {dst_path}")

perspective_correct(
    src_path="/mnt/user-data/uploads/your_photo.jpg",
    corners=[[120,85],[1800,60],[1820,980],[100,1000]],  # ← Replace after viewing image with view tool
    dst_path="/home/claude/corrected.jpg"
)
python scripts/correct_perspective.py
# Use view tool to confirm corrected.jpg has straight edges before continuing

All subsequent steps use corrected.jpg after correction.


Step 1: Data Extraction (Three items: colors + text + shapes)

1-A Color Extraction (Program runs)

# scripts/extract_colors.py
import sys
from PIL import Image

def sample(img, x1, y1, x2, y2, n=8):
    x1,y1,x2,y2 = int(x1)+5,int(y1)+5,int(x2)-5,int(y2)-5
    if x2<=x1 or y2<=y1: x2,y2=x1+1,y1+1
    xs=[x1+(x2-x1)*i//(n-1) for i in range(n)]
    ys=[y1+(y2-y1)*i//(n-1) for i in range(n)]
    cols=[img.getpixel((x,y))[:3] for x in xs for y in ys]
    r=sorted(c[0] for c in cols)[len(cols)//2]
    g=sorted(c[1] for c in cols)[len(cols)//2]
    b=sorted(c[2] for c in cols)[len(cols)//2]
    return f"{r:02X}{g:02X}{b:02X}"

img_path = sys.argv[1] if len(sys.argv)>1 else "/mnt/user-data/uploads/your_image.jpg"
img = Image.open(img_path).convert("RGB")
W, H = img.size

print(f"Image size: {W} × {H} px")
print(f"Coordinate conversion: x\" = px_x × {10/W:.5f}   y\" = px_y × {5.625/H:.5f}")
print(f"                      w\" = px_w × {10/W:.5f}   h\" = px_h × {5.625/H:.5f}\n")

regions = {
    "Background":     (W*.4,  H*.4,  W*.6,  H*.6),
    "Header":         (W*.01, H*.01, W*.99, H*.13),
    "Title Text":     (W*.03, H*.02, W*.65, H*.11),
    "Content Area":   (W*.05, H*.18, W*.95, H*.82),
    "Footer":         (W*.01, H*.87, W*.99, H*.99),
}
print(f"{'Region':<12} {'hex':<8}  Sampling range (px)")
print("-" * 52)
for name,(x1,y1,x2,y2) in regions.items():
    print(f"{name:<12} #{sample(img,x1,y1,x2,y2)}    ({int(x1)},{int(y1)})→({int(x2)},{int(y2)})")
print("\n# Single point sampling: r,g,b=img.getpixel((x,y))[:3]; print(f'{r:02X}{g:02X}{b:02X}')")
python scripts/extract_colors.py /mnt/user-data/uploads/your_image.jpg

Record all hex values, copy directly in Step 3, no visual estimation.


1-B Text Extraction (OCR)

# scripts/extract_text.py
import sys, subprocess
from PIL import Image

def ensure_deps():
    try:
        import pytesseract
        pytesseract.get_tesseract_version()
        return pytesseract
    except Exception:
        print("Installing tesseract-ocr...")
        subprocess.run(["apt-get","install","-y","-q",
                        "tesseract-ocr","tesseract-ocr-chi-sim"], check=True)
        subprocess.run([sys.executable,"-m","pip","install","pytesseract",
                        "--break-system-packages","-q"], check=True)
        import pytesseract
        return pytesseract

def ocr(img, x1, y1, x2, y2, scale=2, lang="chi_sim+eng"):
    crop=img.crop((int(x1),int(y1),int(x2),int(y2)))
    crop=crop.resize((crop.width*scale,crop.height*scale),Image.LANCZOS)
    raw=tess.image_to_string(crop,lang=lang).strip()
    return "\n".join(l for l in raw.splitlines() if l.strip())

tess=ensure_deps()
img_path=sys.argv[1] if len(sys.argv)>1 else "/mnt/user-data/uploads/your_image.jpg"
img=Image.open(img_path).convert("RGB")
W,H=img.size

print("=== Full Image Scan ===")
print(tess.image_to_string(img,lang="chi_sim+eng").strip())

regions={"Header":(W*.02,H*.01,W*.88,H*.13),"Footer Left":(W*.02,H*.87,W*.38,H*.99),
         "Footer Center":(W*.38,H*.87,W*.72,H*.99),"Footer Right":(W*.72,H*.87,W*.98,H*.99)}
print("\n=== Region Extraction ===")
for name,(x1,y1,x2,y2) in regions.items():
    print(f"\n[{name}] {repr(ocr(img,x1,y1,x2,y2))}")
print("\n# Custom: ocr(img, x1, y1, x2, y2, scale=2)")
python scripts/extract_text.py /mnt/user-data/uploads/your_image.jpg

OCR uses image as final authority, mainly for quickly obtaining long text and numbers, avoiding manual typing errors.


1-C Shape Recognition + Text Direction Recognition (Visual, compare with references/shapes.md)

Open references/shapes.md, compare and identify each graphic and text label, record to table:

IDPosition DescriptionShape ConstantArrowFill ColorBorder ColorInner TextText Direction
S1Header backgroundRECTANGLENo8B1A1ASame as fill
S2Left flow node 1ROUNDED_RECTANGLENoD8EAF5A0C4E0"Start"Horizontal
S3Node 1→Node 2 connectorLINEYes, triangle888888
T1Left area labelRECTANGLENoDark blue"Group Data Platform"Vertical

Text direction judgment (confirm one by one when viewing image):

Observe each text label:
  Text box width > height       → Horizontal (default, no vert needed)
  Text box height > width × 3   → Vertical, characters upright → use vert: "eaVert"
  Entire text block tilted 90°/270°    → Rotated horizontal → use rotate: 270

Special attention to these common vertical labels (narrow strip text on left/right sides of images are almost always vertical):

  • Hierarchy area labels: "Group Data Platform", "Data Consumption", "Data Lake", "Data Source", etc.
  • Right side vertical description strips: "Data Services", "Data Governance and Control", etc.

Connector arrow judgment:

  • Thin line with solid/hollow small triangle at end → LINE + endArrowType: "triangle" or "open"
  • Thick filled arrow shape → RIGHT_ARROW / DOWN_ARROW etc. shape constants
  • Pure connector without arrow → LINE, don't set endArrowType

Step 2: Visual Layout Planning

Eyes on image, divide slide into large regions, record inch coordinates for each block using conversion formula.

Coordinate system: Top-left origin, unit inches. Slide = 10" × 5.625".

Conversion formula (coefficients printed in Step 1-A, use directly):

x" = pixel_x  × (10    / image width px)
y" = pixel_y  × (5.625 / image height px)
w" = pixel_w × (10    / image width px)
h" = pixel_h × (5.625 / image height px)

Region Record Table (fill before writing code):

|Region|x"|y"|w"|h"|Notes| |-|-|-|-|-| |Header|0|0|10|0.65|Top edge aligned| |Footer|0|5.0|10|0.625|Bottom edge aligned| |Left content area|0.2|0.75|…|…|Estimate| |…|…|…|…|…|…|


Step 3: Code Implementation

Each element's color from Step 1-A, text from Step 1-B, shape constant from Step 1-C, coordinates from Step 2. No reliance on memory or guesswork, all values have clear sources.

After writing each region, immediately mentally compare with original image before continuing to next block.

Script Template

// create_slide.js
const pptxgen = require("pptxgenjs");
let pres = new pptxgen();
pres.layout = 'LAYOUT_16x9';
let slide = pres.addSlide();

// ── Layer 1: Background ──────────────────────────────────────────────────────────
slide.background = { color: "FFFFFF" };  // ← Background color from Step 1-A

// ── Layer 2: Large background rectangles ───────────────────────────────────────────────────

// Header
slide.addShape(pres.shapes.RECTANGLE, {
  x: 0, y: 0, w: 10, h: 0.65,
  fill: { color: "8B1A1A" }, line: { color: "8B1A1A", width: 0 }
});
// Header text (coordinates exactly match header, margin: 0)
slide.addText("[Step 1-B OCR result]", {
  x: 0, y: 0, w: 10, h: 0.65,
  fontSize: 22, bold: true, color: "FFFFFF",
  fontFace: "Microsoft YaHei",
  align: "left", valign: "middle",
  margin: 0,   // ← Rule 1: must be 0
});

// Footer
slide.addShape(pres.shapes.RECTANGLE, {
  x: 0, y: 5.0, w: 10, h: 0.625,
  fill: { color: "8B1A1A" }, line: { color: "8B1A1A", width: 0 }
});
slide.addText("[Step 1-B OCR result]", {
  x: 0, y: 5.0, w: 10, h: 0.625,
  fontSize: 13, bold: true, color: "FFFFFF",
  fontFace: "Microsoft YaHei",
  align: "left", valign: "middle",
  margin: 0,
});

// ── Layer 3: Content graphics + Layer 4: Graphic text (alternate drawing, add text immediately after each shape) ──

// Example: Flow node
slide.addShape(pres.shapes.ROUNDED_RECTANGLE, {  // ← Shape constant from Step 1-C
  x: 1.4, y: 0.95, w: 2.4, h: 0.42,
  fill: { color: "D8EAF5" },
  line: { color: "A0C4E0", width: 1 },
  rectRadius: 0.05
});
slide.addText("Node text", {
  x: 1.4, y: 0.95, w: 2.4, h: 0.42,  // ← Rule 2: exactly matches shape above
  fontSize: 11, color: "333333",
  fontFace: "Microsoft YaHei",
  align: "center", valign: "middle",
  margin: 0,  // ← Rule 1
  wrap: true,
});

// Example: Connector (with arrow)
slide.addShape(pres.shapes.LINE, {
  x: 2.6, y: 1.37, w: 0, h: 0.25,    // Vertical line: w=0,h>0
  line: {
    color: "888888", width: 1.5,
    endArrowType: "triangle",          // ← Step 1-C: use triangle/open if arrow present, omit if no arrow
  }
});

pres.writeFile({ fileName: "/mnt/user-data/outputs/output.pptx" });
console.log("Done!");
NODE_PATH=/home/claude/.npm-global/lib/node_modules node create_slide.js

Step 3.5: Pre-flight Check (Required before execution, do not skip)

Core principle: Don't execute immediately after writing script. Do paper verification first—"render" script coordinates in mind, compare with original image, fix issues on the spot, then submit for execution. This step intercepts most overlap, out-of-bounds, and text overflow issues before execution.


Check A: Automated Out-of-bounds + Overlap Scan

Run scripts/preflight.py, automatically parse all element coordinates in script and report issues:

python scripts/preflight.py create_slide.js

Output explanation:

  • ❌ Out of bounds: Elements with x+w > 10 or y+h > 5.625Must fix before execution
  • ⚠️ Suspected overlap: Non-parent-child rectangle intersections between two elements → Manual confirmation if reasonable
  • 📋 Coordinate summary: All elements sorted by y, convenient for comparing region proportions with original image

Any ❌ out of bounds must be fixed and re-run until no ❌ before entering Step 4.


Check B: Text Capacity Manual Verification

For all addText in script, verify one by one if text fits.

Horizontal Text

Required width = character count × (fontSize / 72 × 1.1) inches
Required height = line count × (fontSize / 72 × 1.5) inches
Requirement: w ≥ required width, h ≥ required height

Quick reference (fontSize = 12pt):

ContentMin wMin h
4 Chinese characters, single line0.74"0.25"
6 Chinese characters, single line1.10"0.25"
4 Chinese characters, two lines0.74"0.45"
10 English characters0.80"0.25"

Vertical Text (vert: "eaVert")

Required width = fontSize / 72 × 1.4 inches (single column character width)
Required height = character count × (fontSize / 72 × 1.3) inches
Requirement: w ≥ required width, h ≥ required height

Quick reference (fontSize = 14pt):

Vertical charactersMin wMin h
4 chars0.27"1.05"
6 chars0.27"1.57"
8 chars0.27"2.10"

Text doesn't fit → Fix before execution: Reduce fontSize, expand w/h, or use breakLine for manual line breaks.


Check C: Region-by-Region Proportion Verification (Compare with original image)

Use coordinate summary table from preflight.py output, compare with original image for proportion verification:

1. Horizontally slice original image into major horizontal regions (header, content area, footer…)
2. Visually estimate each region's percentage of total slide height
3. Compare with corresponding element's h/5.625 in summary table, see if close to visual percentage
4. Regions with deviation > 15% need re-estimation of y and h

Focus on right-side truncation check:
  → Rightmost element's x+w ≤ 9.8" (leave 0.2" safety margin)
  → When right side has vertical label bar, confirm its x doesn't exceed 10 - label width

Check D: Parent-Child Container Relationship Verification

For nested regions (large background box containing multiple child elements), check all child elements are within parent box range:

Child element requirements:
  child.x        ≥ parent.x
  child.x + child.w ≤ parent.x + parent.w
  child.y        ≥ parent.y
  child.y + child.h ≤ parent.y + parent.h

Most common violation: Last column of nested grid x+w exceeds parent box right boundary. Fix method:

// Recalculate grid column width, ensure last column doesn't exceed bounds
const cellW = (parentW - padLeft - padRight) / cols;
const cellX = (i) => parentX + padLeft + i * cellW;

Pass Criteria

All satisfied before entering Step 4:

ItemPass Criteria
✅ No out of boundspreflight.py has no ❌ output
✅ Text fitsAll text boxes w/h satisfy capacity calculation
✅ Region proportions reasonableEach region height deviation from original visual ≤ 15%
✅ Child elements within parentAll nested elements don't exceed parent box boundary
✅ Overlaps confirmedpreflight.py ⚠️ warnings all manually confirmed reasonable

Fix unpassed items directly in script, re-run preflight.py after fix, until all pass.


Step 4: Implementation Methods for Various Element Types

4.1 Various Nodes (Shape + text must have identical coordinates)

// Rectangle node (processing step)
slide.addShape(pres.shapes.RECTANGLE, { x:1.0, y:1.0, w:2.0, h:0.5,
  fill:{color:"F5F5F5"}, line:{color:"AAAAAA",width:1} });
slide.addText("Processing Step", { x:1.0, y:1.0, w:2.0, h:0.5,
  fontSize:11, color:"333333", fontFace:"Microsoft YaHei",
  align:"center", valign:"middle", margin:0, wrap:true });

// Diamond node (decision branch)
slide.addShape(pres.shapes.DIAMOND, { x:1.0, y:2.0, w:2.0, h:0.6,
  fill:{color:"FFF2CC"}, line:{color:"CCAA00",width:1} });
slide.addText("Yes?", { x:1.0, y:2.0, w:2.0, h:0.6,
  fontSize:11, color:"333333", fontFace:"Microsoft YaHei",
  align:"center", valign:"middle", margin:0, wrap:true });

// Ellipse node (start/end)
slide.addShape(pres.shapes.OVAL, { x:1.2, y:0.3, w:1.6, h:0.45,
  fill:{color:"D5E8D4"}, line:{color:"82B366",width:1} });
slide.addText("Start", { x:1.2, y:0.3, w:1.6, h:0.45,
  fontSize:11, color:"333333", fontFace:"Microsoft YaHei",
  align:"center", valign:"middle", margin:0 });

4.2 Connectors (Choose arrow type based on Step 1-C)

// Connector with arrow (solid triangle at end)
slide.addShape(pres.shapes.LINE, { x:2.0, y:1.5, w:0, h:0.3,
  line:{ color:"888888", width:1.5, endArrowType:"triangle" } });

// Connector with arrow (open arrow at end)
slide.addShape(pres.shapes.LINE, { x:2.0, y:1.5, w:0, h:0.3,
  line:{ color:"888888", width:1.5, endArrowType:"open" } });

// Pure connector without arrow
slide.addShape(pres.shapes.LINE, { x:2.0, y:1.5, w:0, h:0.3,
  line:{ color:"CCCCCC", width:1 } });

endArrowType options: "triangle" (solid), "open" (hollow), "stealth" (stealth), "diamond", "oval", "none"

4.3 Dual-Theme Nodes (Main title + subtitle)

slide.addShape(pres.shapes.ROUNDED_RECTANGLE, { x:0.55, y:1.62, w:4.1, h:0.55,
  fill:{color:"E8E0F0"}, line:{color:"C0A8D8",width:1}, rectRadius:0.05 });
slide.addText([
  { text:"Main Title", options:{ breakLine:true } },
  { text:"Subtitle description, can be smaller", options:{ fontSize:7.5, color:"666666" } }
], { x:0.55, y:1.62, w:4.1, h:0.55,
  fontSize:10, color:"444444", fontFace:"Microsoft YaHei",
  align:"center", valign:"middle", margin:0, wrap:true });

4.4 Annotation Bubbles

slide.addShape(pres.shapes.RECTANGULAR_CALLOUT, { x:1.0, y:0.5, w:2.5, h:0.6,
  fill:{color:"FFFDE7"}, line:{color:"CCAA00",width:1} });
slide.addText("Annotation text", { x:1.0, y:0.5, w:2.5, h:0.6,
  fontSize:10, color:"333333", fontFace:"Microsoft YaHei",
  align:"center", valign:"middle", margin:0, wrap:true });

4.5 Mixed Style Text (Links + normal)

slide.addText([
  { text:"Link text", options:{ color:"1155CC", underline:{style:"sng"} } },
  { text:" and ",    options:{ color:"333333" } },
  { text:"Another link", options:{ color:"1155CC", underline:{style:"sng"} } },
  { text:" is description.", options:{ color:"333333" } }
], { x:5.55, y:3.3, w:4.2, h:0.35,
  fontSize:12, fontFace:"Microsoft YaHei", margin:0 });

4.6 Vertical Text Labels (Standard for sidebar area labels)

Applicable scenarios: Narrow strip area labels on left or right side of image, such as "Group Data Platform", "Data Lake", "Data Services", etc.

// ── Vertical text: vert: "eaVert" (each character upright, read top to bottom) ───────────────────

// Vertical label with background (most common form)
slide.addShape(pres.shapes.RECTANGLE, {
  x: 0, y: 0.65, w: 0.4, h: 3.5,        // Width ~0.35-0.5", height fills area
  fill: { color: "1F4E79" }, line: { color: "1F4E79", width: 0 }
});
slide.addText("Group Data Platform", {
  x: 0, y: 0.65, w: 0.4, h: 3.5,        // ← Exactly matches shape
  fontSize: 14, bold: true, color: "FFFFFF",
  fontFace: "Microsoft YaHei",
  align: "center", valign: "middle",
  margin: 0,
  vert: "eaVert",                         // ← Chinese vertical, characters upright
});

// Pure vertical text without background
slide.addText("Data Services", {
  x: 9.2, y: 1.0, w: 0.35, h: 2.0,
  fontSize: 12, color: "2E75B6",
  fontFace: "Microsoft YaHei",
  align: "center", valign: "middle",
  margin: 0,
  vert: "eaVert",
});

Dimension rules:

  • w (width) = Single character width + small margin, approximately fontSize / 72 * 1.4 (inches)

    • fontSize 12pt → w ≈ 0.28", suggest 0.35"
    • fontSize 14pt → w ≈ 0.33", suggest 0.4"
    • fontSize 16pt → w ≈ 0.38", suggest 0.45"
  • h (height) = Actual height of corresponding region, converted from image


4.7 Correct Handling of Compressed Horizontal Labels

Applicable scenarios: Horizontal region labels at top or bottom of image (such as "Data Consumption", "Data Source"), text is normal horizontal but text box is narrow.

// ── Correct: Horizontal labels need sufficient width, don't compress to achieve vertical effect ──────────────────────

// ❌ Wrong: Width too narrow, text forced to wrap into "vertical column", actually horizontal compression
slide.addText("Data Consumption", {
  x: 0, y: 0, w: 0.3, h: 0.8,    // w too narrow, each character auto-wraps
  fontSize: 16, ...
});

// ✅ Correct: Give sufficient width to accommodate all text, if space insufficient reduce font size
slide.addText("Data Consumption", {
  x: 0, y: 0, w: 0.8, h: 0.8,    // w sufficient for horizontal text
  fontSize: 14, bold: true, color: "FFFFFF",
  fontFace: "Microsoft YaHei",
  align: "center", valign: "middle",
  margin: 0,
  wrap: false,                     // ← Prohibit wrapping, ensure single line display
});

// ✅ If region really narrow must wrap, use breakLine for manual line break control
slide.addText([
  { text: "Data", options: { breakLine: true } },
  { text: "Consumption" }
], {
  x: 0, y: 0, w: 0.45, h: 0.8,
  fontSize: 14, bold: true, color: "FFFFFF",
  fontFace: "Microsoft YaHei",
  align: "center", valign: "middle",
  margin: 0,
});

Horizontal vs Vertical Quick Reference:

Original EffectCharacters upright?Correct approach
Text left to right, normal reading✅ UprightHorizontal, no vert needed
Text top to bottom, each character upright✅ UprightVertical, add vert: "eaVert"
Entire text block rotated (horizontal but on side)❌ On sideAdd rotate: 270
Text wraps into column but characters horizontal (compressed)✅ Upright but squeezedHorizontal + manual breakLine + appropriate w
const slideW = 10, slideH = 5.625;

// Horizontal center
const x = (slideW - w) / 2;

// Node bottom center (connector start point)
const lineX = nodeX + nodeW / 2;
const lineY = nodeY + nodeH;

// n elements evenly spaced (total range rangeW, start x0, each width itemW)
const gap = (rangeW - n * itemW) / (n + 1);
const itemX = (i) => x0 + gap + i * (itemW + gap);

Step 5: Visual QA + Correction Loop (Required, max 3 iterations)

Generate Preview Image

python /mnt/skills/public/pptx/scripts/office/soffice.py \
  --headless --convert-to pdf /mnt/user-data/outputs/output.pptx

rm -f /home/claude/slide-*.jpg
pdftoppm -jpeg -r 150 /home/claude/output.pdf /home/claude/slide
ls -1 /home/claude/slide*.jpg

Use view tool to view preview image, compare side-by-side with original image, execute following two-phase checks.


Phase 1: Layout Compliance

Check ItemJudgment Criteria
Text out of boxText exceeds graphic boundary or truncated
Text box exceeds graphicText box dimensions larger than underlying shape, text appears outside shape
Vertical text correctSidebar label characters upright (eaVert); horizontal compressed text not mistakenly made vertical
Horizontal text compressedHorizontal label w sufficient, text not overlapping or misaligned due to insufficient width
Elements unexpectedly overlappingText covered by graphics, or graphics block other text
Connector arrow directionArrow direction matches original image (start/end reversed?)
Arrangement orderlyPeer elements aligned, evenly spaced, no obvious misalignment
Spacing reasonableAdjacent element spacing ≥ 0.1", no edge touching
Page marginsContent distance from slide edge ≥ 0.2" (header/footer excepted)

Phase 2: Fidelity to Original Image

Check ItemJudgment Criteria
🎨 ColorsBackground, graphic fill, text color match original image
🔷 Shape typesRectangle/rounded/diamond/ellipse etc. match original image
📏 ProportionsElement width/height ratios, region proportions match original image
↔️ Arrow typesSolid/hollow/thick arrow shapes match original image
🔤 Font size hierarchyTitle/body/annotation font size relationships match original image
↔️ AlignmentText left/center/right alignment matches original image

Correction Loop Control (Strictly Enforced)

Correction counter = 0

LOOP:
  Generate preview → Phase 1 check → Phase 2 check

  if no issues → Deliver, end

  if correction counter >= 3:
    Deliver directly, note: "Completed 3 correction rounds, following issues unresolved: [list]"
    End

  Modify create_slide.js corresponding lines
  Regenerate pptx and preview
  Correction counter += 1
  → Continue LOOP

Explain each round: ① Layout issues found ② Fidelity issues found ③ Which code modified ④ Round N/3


Step 6: Common Errors and Fixes

Error PhenomenonCauseFix
Vertical label characters on sideUsed rotate: 270 instead of vertUse vert: "eaVert" instead
Vertical label characters overlappingText box w too narrow without vertAdd vert: "eaVert", set w to 0.35-0.5"
Horizontal label text wraps into columnText box w too narrow, text forced to wrapIncrease w for single line, or use breakLine for manual line break control
Horizontal label added vert causing chaosMisjudged as verticalRemove vert, ensure w wide enough
Text position offsetForgot margin: 0Add margin: 0 to all addText
Text exceeds graphic boundaryText box dimensions don't match shapeAlign text box x/y/w/h exactly with shape
Text expanded text boxDidn't set wrap: true and fixed dimensionsAdd wrap: true, appropriately reduce font size
Arrow direction reversedendArrowType on wrong sideSwap begin/endArrowType, or adjust x/y/w/h direction
Color display wrongAdded # before hexRemove #: "FF0000"
Garbled/Chinese characters as blocksFont not setAdd fontFace: "Microsoft YaHei"
ROUNDED_RECTANGLE corner has white edgeText box covers cornerText box same coordinates as shape + margin:0
Diamond text crookedvalign not setAdd align:"center", valign:"middle"
Color sampling deviationSampled border/shadow pixelsMove sampling coordinates to region center, 10px from edge

Strategy B Specific

Error PhenomenonCauseFix
Triangle/shape has obvious hard edge outlineline.width not set to 0Add line: { color: same as fill, width: 0 }
Glow layers reversed (outer dark inner light)transparency value order wrongOuter transparency smaller (more opaque) → swap values
Arc decorative ring not closedN value insufficient or angle range wrongConfirm theta from 0 to 2 * Math.PI, appropriately increase N
Arc element position not on ellipsecx/cy/rx/ry estimation wrongRecalculate center and radius from pixel coordinates
Cutout white ellipse has color gapWhite ellipse coverage insufficientSlightly expand white ellipse w/h, ensure complete inner ring coverage
Column structure asymmetricalx coordinate calculation errorUse center column as benchmark, mirror symmetric offset left-right
Leader corner has gapVertical and horizontal line endpoint coordinates not continuousVertical line start y = horizontal line y; horizontal line start x = vertical line x

References

  • pptxgenjs API: references/pptxgenjs-cheatsheet.md
  • Shape recognition and constants quick reference: references/shapes.md