Install
openclaw skills install image-to-ppt-pro将任意图片或幻灯片截图复刻成完全可编辑的PPTX文件,像素级还原布局、颜色、文字与图形元素。
openclaw skills install image-to-ppt-proStep -1 Image Type Classification ← Required! Determines Strategy A or B
↓
├─── Strategy A: Pure Code Replication (Images with flat geometric shapes)
│ Step 0 Perspective Correction ← Required for photos; skip for screenshots
│ Step 1 Data Extraction ← Colors + OCR text + shape recognition
│ Step 2 Visual Planning ← Define regions, record coordinates
│ Step 3 Code Implementation ← Code each block
│ Step 3.5 Pre-flight Check ← ★ Paper verification before execution
│ Step 4 Generate pptx
│ Step 5 Visual QA + Correction Loop (max 3 iterations)
│ Step 6 Delivery
│
└─── Strategy B: Mathematical Approximation (Images with 3D/lighting/curves that can be geometrically decomposed)
Step 0 Perspective Correction ← Required for photos; skip for screenshots
Step 1 Data Extraction ← Colors + OCR text + layer decomposition
Step B2 Layer Planning ← Decompose complex graphics into overlay layers
Step B3 Code Implementation ← Transparency + math coordinates + multi-shape overlay
Step 3.5 Pre-flight Check ← ★ Paper verification before execution
Step 4 Generate pptx
Step 5 Visual QA + Correction Loop (max 3 iterations)
Step 6 Delivery
Core difference between paths: Strategy A directly replicates flat graphics; Strategy B doesn't pursue pixel-perfect replication but uses geometric overlays + transparency + math coordinates to approximate complex visuals, trading visual fidelity for 100% editability.
Value of Step 3.5: After writing the script, perform paper verification before execution to intercept out-of-bounds, overlaps, and text overflow issues, significantly reducing Step 5 correction iterations.
All addText calls, whether for titles, body text, or node text, must include margin: 0.
// ✅ Correct
slide.addText("text", { x:1, y:1, w:3, h:0.5, margin: 0, ... });
// ❌ Wrong: Missing margin: 0, pptxgenjs default padding causes text position offset
slide.addText("text", { x:1, y:1, w:3, h:0.5, ... });
For text inside shapes, the text box's x/y/w/h must be exactly identical to the underlying shape, with no offset.
// ✅ Correct: Shape and text coordinates are identical
slide.addShape(pres.shapes.ROUNDED_RECTANGLE, { x:1.4, y:0.95, w:2.4, h:0.42, ... });
slide.addText("text", { x:1.4, y:0.95, w:2.4, h:0.42, margin:0, align:"center", valign:"middle" });
// ❌ Wrong: Text box doesn't match shape coordinates, causes text offset or overflow
slide.addShape(pres.shapes.ROUNDED_RECTANGLE, { x:1.4, y:0.95, w:2.4, h:0.42, ... });
slide.addText("text", { x:1.5, y:1.0, w:2.2, h:0.35, ... });
Text box w and h must be explicitly set to match the corresponding shape dimensions. If text doesn't fit, reduce font size or adjust line breaks, never rely on text box auto-expansion.
Judge text direction first when viewing the image, then write code:
| Condition | Text Direction | Code |
|---|---|---|
| Text box width > height, text reads left→right normally | Horizontal (default) | No direction attribute needed |
| Text box height > width × 3, text top→bottom, each character upright | Vertical (Chinese) | vert: "eaVert" |
| Entire text block rotated 90° or 270° | Rotated horizontal | rotate: 270 (or 90) |
⚠️ Two wrong approaches for vertical text:
// ❌ Wrong: Using narrow text box as vertical, text becomes horizontal stacked
slide.addText("Group Data Platform", { x:0, y:1, w:0.3, h:2.0, ... });
// Result: Text horizontal, insufficient width, characters overlap
// ❌ Wrong: Using rotate for Chinese vertical, characters lie on side
slide.addText("Group Data Platform", { x:0, y:1, w:0.3, h:2.0, rotate: 270, ... });
// Result: Entire text block tilted, characters lying sideways
// ✅ Correct: vert: "eaVert" for upright Chinese vertical
slide.addText("Group Data Platform", {
x: 0, y: 1, w: 0.4, h: 2.0, // w sufficient for single character width (~0.3-0.5")
fontSize: 14, bold: true, color: "FFFFFF",
fontFace: "Microsoft YaHei",
align: "center", valign: "middle",
margin: 0,
vert: "eaVert", // ← Key attribute for Chinese vertical
});
After viewing image, answer these two questions to determine the path.
All conditions met → Use Strategy A:
| Condition | Description |
|---|---|
| All graphics are flat basic shapes | Rectangles, rounded rectangles, diamonds, ellipses, lines, arrows |
| No lighting gradients, no transparency layers | Each element solid fill, clear color boundaries |
| No curve-arranged elements | All elements arranged in rows/columns or flowcharts |
Typical scenarios: Flowcharts, architecture diagrams, org charts, data tables. Proceed directly to Step 0, no need to inform user.
Any of these signals present, but can be geometrically decomposed → Use Strategy B:
| Complex visual signal (any) | And can be geometrically decomposed (all must satisfy) |
|---|---|
| 3D perspective / isometric graphics | Graphics can be approximated with triangles, ellipses, rectangles |
| Glow / fan-shaped beams / gradient backgrounds | Layering differences can simulate depth with transparency |
| Decorative elements arranged along circular arcs | Can calculate coordinates with parametric equations |
| Multi-layer stacked 3D effects | Can use "large shape + white cover" to achieve cutout rings |
Decision mnemonic: "Flatten" the graphic in your mind—if after flattening it can be reconstructed with triangles + ellipses + rectangles + transparency combinations, use Strategy B.
Typical scenarios: Circular supply chain platform diagrams, funnel + glow combinations, dashboard schematics.
Inform user:
"This image contains [specific description], will use 'Mathematical Approximation' strategy: overlay geometric shapes + simulate lighting layers with transparency, all elements fully editable, visual fidelity ~75-85% (abandoning pixel-level lighting details, preserving overall visual structure)."
Core idea: Don't pursue pixel-perfect replication, decompose complex visual graphics into several "geometric layers", use transparency differences to simulate lighting layers, use parametric equations to calculate coordinates of curve-arranged elements, trading for 100% editability.
Practical verification: For complex circular supply chain platform diagrams, can actually achieve:
When viewing image, decompose complex graphics from bottom to top into geometric layers, each corresponding to an approximation method:
| Layer | Original Effect | Approximation Method |
|---|---|---|
| L1 | Background glow / fan beams | N triangles, same color different transparencies, expanding from center outward |
| L2 | Circular platform base color | Large ellipse (semi-transparent) + small white ellipse (cutout inner ring) |
| L3 | Decorative elements along arc | Small rectangles, coordinates calculated with ellipse parametric equations |
| L4 | Central column / tower structure | Rectangle group, center highest and darkest, gradually shorter and more transparent toward sides |
| L5 | Leader lines pointing to labels | Polyline (vertical line + horizontal line, two LINE segments拼接) |
| L6 | All text labels | addText, coordinates precisely matching original positions |
Layer Record Table (fill before writing code):
| Layer | Description | Shape Type | Count | Color | Transparency Range | Key Dimensions |
|---|---|---|---|---|---|---|
| L1 | Background fan beams | triangle | 5 | F4CACA | 55-78% | Cover corresponding fan sector |
| L2 | Large ellipse platform | ellipse×3 | 3 | F8DADA | 20-38% | cx=5.5 rx=3.5 ry=1.0 |
| L3 | Arc decorative ring | rectangle | 52 | D85D5D | 0% | 0.08×0.15" |
| L4 | Central tower | rectangle | 9 | B50E17 | 0-92% | w=0.05" |
| L5 | Leader lines | line×2 | N groups | 888888 | 0% | — |
| L6 | Text labels | text | N | A64F4F | 0% | Positioned by OCR |
transparency is the most core parameter in Strategy B, range 0 (fully opaque) ~ 100 (fully transparent). Larger values are more transparent, opposite of intuition, requires special attention.
// ── Fan-shaped beams: multiple triangles, same color, different transparencies ──────────────────────────
// Closer to center = less transparent (smaller value), farther out = more transparent (larger value)
slide.addShape(pres.shapes.TRIANGLE, { // Central triangle, 22% opaque
x: 3.5, y: 1.0, w: 3.0, h: 4.0,
fill: { color: "F4CACA", transparency: 78 },
line: { color: "F4CACA", width: 0 }, // ← Must be width:0, otherwise hard edges
});
slide.addShape(pres.shapes.TRIANGLE, { // Left triangle, 34% opaque
x: 1.5, y: 1.5, w: 3.0, h: 3.5,
fill: { color: "F4CACA", transparency: 66 },
line: { color: "F4CACA", width: 0 },
});
slide.addShape(pres.shapes.TRIANGLE, { // Outermost triangle, 45% opaque
x: 0.0, y: 1.5, w: 3.0, h: 3.5,
fill: { color: "F4CACA", transparency: 55 },
line: { color: "F4CACA", width: 0 },
});
// ── Multi-layer ellipse overlay + white cutout (circular platform) ─────────────────────────────────────
slide.addShape(pres.shapes.OVAL, { // Large ellipse, platform base color
x: 1.5, y: 3.0, w: 7.0, h: 2.0,
fill: { color: "F8DADA", transparency: 20 },
line: { color: "F8DADA", width: 0 },
});
slide.addShape(pres.shapes.OVAL, { // White ellipse, covers inner ring → forms ring shape
x: 2.5, y: 3.2, w: 5.0, h: 1.4,
fill: { color: "FFFFFF", transparency: 0 },
line: { color: "FFFFFF", width: 0 },
});
slide.addShape(pres.shapes.OVAL, { // Thin ring outline
x: 2.3, y: 3.15, w: 5.4, h: 1.5,
fill: { color: "F5CACA", transparency: 38 },
line: { color: "D85D5D", width: 1.5 },
});
Transparency Quick Reference:
| transparency value | Visual Effect | Typical Use |
|---|---|---|
| 0 | Fully opaque (solid) | Main structure, accent colors |
| 20-30 | Slightly transparent, color saturated | Platform base color, main areas |
| 40-55 | Semi-transparent, layered feel | Mid-layer glow |
| 60-75 | Quite transparent, outline still clear | Outer glow |
| 85-92 | Very transparent, nearly invisible | Far beams, decorative columns |
// ── N small rectangles evenly arranged along ellipse path (arc decorative ring) ─────────────────────────────
// cx/cy = ellipse center (inches), rx/ry = horizontal/vertical radius (inches)
// ew/eh = width/height of each small element (inches)
const cx = 5.5, cy = 4.5; // Ellipse center
const rx = 3.2, ry = 0.6; // Ellipse radius
const N = 52; // Total elements
const ew = 0.08, eh = 0.15; // Small rectangle dimensions
for (let i = 0; i < N; i++) {
const theta = (2 * Math.PI * i) / N; // Even angles, 0 → 2π
const x = cx + rx * Math.cos(theta) - ew / 2; // Element top-left x
const y = cy + ry * Math.sin(theta) - eh / 2; // Element top-left y
const color = i % 3 === 0 ? "D85D5D" : "EE8D8D"; // Alternating colors
slide.addShape(pres.shapes.RECTANGLE, {
x, y, w: ew, h: eh,
fill: { color },
line: { color, width: 0 },
});
}
Center and radius estimation method:
cx = Ellipse horizontal center (inches) = center pixel x ÷ image width px × 10cy = Ellipse vertical center (inches) = center pixel y ÷ image height px × 5.625rx = Ellipse horizontal radius (inches) = radius pixels ÷ image width px × 10ry = Ellipse vertical radius (inches) = radius pixels ÷ image height px × 5.625// ── Central tower: 9 columns, center highest and most solid, gradually shorter and more transparent toward sides ─────────────────────
const baseY = 3.93; // Column bottom y (inches, shared by all columns)
const colW = 0.05; // Column width (inches)
// [x coordinate, height, transparency] Center column transparency=0 (solid)
const columns = [
[4.944, 0.82, 92],
[5.034, 1.00, 92],
[5.124, 1.18, 92],
[5.214, 1.55, 92],
[5.304, 2.15, 0], // ← Center main column, fully opaque
[5.349, 2.36, 0], // ← Second highest column, fully opaque
[5.484, 1.55, 92],
[5.574, 1.00, 92],
[5.664, 0.82, 92],
];
columns.forEach(([x, h, transp]) => {
const color = transp === 0 ? "B50E17" : "D76666";
slide.addShape(pres.shapes.RECTANGLE, {
x: x - colW / 2,
y: baseY - h, // Column grows upward from bottom
w: colW,
h,
fill: { color, transparency: transp },
line: { color, width: 0 },
});
});
// ── Leader = vertical segment + horizontal segment, two LINE segments拼接 ──────────────────────────────
// anchorX/Y = Leader start point on graphic
// labelX/Y = Label position
const anchorX = 1.25, anchorY = 4.12;
const labelY = 2.70, labelX = 1.00;
// Vertical line: from start point up to label height
slide.addShape(pres.shapes.LINE, {
x: anchorX, y: labelY, w: 0, h: anchorY - labelY,
line: { color: "888888", width: 0.75 },
});
// Horizontal line: from vertical top horizontally to label
slide.addShape(pres.shapes.LINE, {
x: labelX, y: labelY, w: anchorX - labelX, h: 0,
line: { color: "888888", width: 0.75 },
});
After Step 5 generates preview image, in addition to Strategy A general checks, additionally check:
| Check Item | Judgment Criteria |
|---|---|
| Transparency layers natural | Glow naturally attenuates from center outward, no abrupt jumps |
| Triangles have hard edges | Triangle edges show dark outline (is line.width 0) |
| Ellipse path elements closed and even | Arc decorative ring fully closed, no obvious uneven spacing |
| Cutout effect clean | White cover ellipse edges aligned, no color gaps showing |
| Column structure symmetrical | Tower left-right symmetrical about central axis |
| Leader corner continuous | Vertical line endpoint and horizontal line start coordinates precisely connected |
Judgment: Edges not parallel, shot from side, corners not right angles → Must correct.
# scripts/correct_perspective.py
from PIL import Image
import numpy as np
def order_points(pts):
pts = np.array(pts, dtype="float32")
s, diff = pts.sum(axis=1), np.diff(pts, axis=1).flatten()
return np.array([pts[np.argmin(s)], pts[np.argmin(diff)],
pts[np.argmax(s)], pts[np.argmax(diff)]], dtype="float32")
def _find_coeffs(pa, pb):
matrix = []
for p1, p2 in zip(pa, pb):
matrix.append([p1[0], p1[1], 1, 0, 0, 0, -p2[0]*p1[0], -p2[0]*p1[1]])
matrix.append([0, 0, 0, p1[0], p1[1], 1, -p2[1]*p1[0], -p2[1]*p1[1]])
A = np.matrix(matrix, dtype=float)
B = np.array(pb).reshape(8)
return np.array(np.dot(np.linalg.inv(A.T * A) * A.T, B)).reshape(8)
def perspective_correct(src_path, corners, dst_path, w=1920, h=1080):
img = Image.open(src_path)
src = order_points(corners)
dst = np.array([[0,0],[w-1,0],[w-1,h-1],[0,h-1]], dtype="float32")
img.transform((w,h), Image.PERSPECTIVE, _find_coeffs(dst,src), Image.BICUBIC).save(dst_path)
print(f"✓ Correction complete → {dst_path}")
perspective_correct(
src_path="/mnt/user-data/uploads/your_photo.jpg",
corners=[[120,85],[1800,60],[1820,980],[100,1000]], # ← Replace after viewing image with view tool
dst_path="/home/claude/corrected.jpg"
)
python scripts/correct_perspective.py
# Use view tool to confirm corrected.jpg has straight edges before continuing
All subsequent steps use
corrected.jpgafter correction.
# scripts/extract_colors.py
import sys
from PIL import Image
def sample(img, x1, y1, x2, y2, n=8):
x1,y1,x2,y2 = int(x1)+5,int(y1)+5,int(x2)-5,int(y2)-5
if x2<=x1 or y2<=y1: x2,y2=x1+1,y1+1
xs=[x1+(x2-x1)*i//(n-1) for i in range(n)]
ys=[y1+(y2-y1)*i//(n-1) for i in range(n)]
cols=[img.getpixel((x,y))[:3] for x in xs for y in ys]
r=sorted(c[0] for c in cols)[len(cols)//2]
g=sorted(c[1] for c in cols)[len(cols)//2]
b=sorted(c[2] for c in cols)[len(cols)//2]
return f"{r:02X}{g:02X}{b:02X}"
img_path = sys.argv[1] if len(sys.argv)>1 else "/mnt/user-data/uploads/your_image.jpg"
img = Image.open(img_path).convert("RGB")
W, H = img.size
print(f"Image size: {W} × {H} px")
print(f"Coordinate conversion: x\" = px_x × {10/W:.5f} y\" = px_y × {5.625/H:.5f}")
print(f" w\" = px_w × {10/W:.5f} h\" = px_h × {5.625/H:.5f}\n")
regions = {
"Background": (W*.4, H*.4, W*.6, H*.6),
"Header": (W*.01, H*.01, W*.99, H*.13),
"Title Text": (W*.03, H*.02, W*.65, H*.11),
"Content Area": (W*.05, H*.18, W*.95, H*.82),
"Footer": (W*.01, H*.87, W*.99, H*.99),
}
print(f"{'Region':<12} {'hex':<8} Sampling range (px)")
print("-" * 52)
for name,(x1,y1,x2,y2) in regions.items():
print(f"{name:<12} #{sample(img,x1,y1,x2,y2)} ({int(x1)},{int(y1)})→({int(x2)},{int(y2)})")
print("\n# Single point sampling: r,g,b=img.getpixel((x,y))[:3]; print(f'{r:02X}{g:02X}{b:02X}')")
python scripts/extract_colors.py /mnt/user-data/uploads/your_image.jpg
Record all hex values, copy directly in Step 3, no visual estimation.
# scripts/extract_text.py
import sys, subprocess
from PIL import Image
def ensure_deps():
try:
import pytesseract
pytesseract.get_tesseract_version()
return pytesseract
except Exception:
print("Installing tesseract-ocr...")
subprocess.run(["apt-get","install","-y","-q",
"tesseract-ocr","tesseract-ocr-chi-sim"], check=True)
subprocess.run([sys.executable,"-m","pip","install","pytesseract",
"--break-system-packages","-q"], check=True)
import pytesseract
return pytesseract
def ocr(img, x1, y1, x2, y2, scale=2, lang="chi_sim+eng"):
crop=img.crop((int(x1),int(y1),int(x2),int(y2)))
crop=crop.resize((crop.width*scale,crop.height*scale),Image.LANCZOS)
raw=tess.image_to_string(crop,lang=lang).strip()
return "\n".join(l for l in raw.splitlines() if l.strip())
tess=ensure_deps()
img_path=sys.argv[1] if len(sys.argv)>1 else "/mnt/user-data/uploads/your_image.jpg"
img=Image.open(img_path).convert("RGB")
W,H=img.size
print("=== Full Image Scan ===")
print(tess.image_to_string(img,lang="chi_sim+eng").strip())
regions={"Header":(W*.02,H*.01,W*.88,H*.13),"Footer Left":(W*.02,H*.87,W*.38,H*.99),
"Footer Center":(W*.38,H*.87,W*.72,H*.99),"Footer Right":(W*.72,H*.87,W*.98,H*.99)}
print("\n=== Region Extraction ===")
for name,(x1,y1,x2,y2) in regions.items():
print(f"\n[{name}] {repr(ocr(img,x1,y1,x2,y2))}")
print("\n# Custom: ocr(img, x1, y1, x2, y2, scale=2)")
python scripts/extract_text.py /mnt/user-data/uploads/your_image.jpg
OCR uses image as final authority, mainly for quickly obtaining long text and numbers, avoiding manual typing errors.
Open references/shapes.md, compare and identify each graphic and text label, record to table:
| ID | Position Description | Shape Constant | Arrow | Fill Color | Border Color | Inner Text | Text Direction |
|---|---|---|---|---|---|---|---|
| S1 | Header background | RECTANGLE | No | 8B1A1A | Same as fill | — | — |
| S2 | Left flow node 1 | ROUNDED_RECTANGLE | No | D8EAF5 | A0C4E0 | "Start" | Horizontal |
| S3 | Node 1→Node 2 connector | LINE | Yes, triangle | 888888 | — | — | — |
| T1 | Left area label | RECTANGLE | No | Dark blue | — | "Group Data Platform" | Vertical |
| … | … | … | … | … | … | … | … |
Text direction judgment (confirm one by one when viewing image):
Observe each text label:
Text box width > height → Horizontal (default, no vert needed)
Text box height > width × 3 → Vertical, characters upright → use vert: "eaVert"
Entire text block tilted 90°/270° → Rotated horizontal → use rotate: 270
Special attention to these common vertical labels (narrow strip text on left/right sides of images are almost always vertical):
Connector arrow judgment:
LINE + endArrowType: "triangle" or "open"RIGHT_ARROW / DOWN_ARROW etc. shape constantsLINE, don't set endArrowTypeEyes on image, divide slide into large regions, record inch coordinates for each block using conversion formula.
Coordinate system: Top-left origin, unit inches. Slide = 10" × 5.625".
Conversion formula (coefficients printed in Step 1-A, use directly):
x" = pixel_x × (10 / image width px)
y" = pixel_y × (5.625 / image height px)
w" = pixel_w × (10 / image width px)
h" = pixel_h × (5.625 / image height px)
Region Record Table (fill before writing code):
|Region|x"|y"|w"|h"|Notes| |-|-|-|-|-| |Header|0|0|10|0.65|Top edge aligned| |Footer|0|5.0|10|0.625|Bottom edge aligned| |Left content area|0.2|0.75|…|…|Estimate| |…|…|…|…|…|…|
Each element's color from Step 1-A, text from Step 1-B, shape constant from Step 1-C, coordinates from Step 2. No reliance on memory or guesswork, all values have clear sources.
After writing each region, immediately mentally compare with original image before continuing to next block.
// create_slide.js
const pptxgen = require("pptxgenjs");
let pres = new pptxgen();
pres.layout = 'LAYOUT_16x9';
let slide = pres.addSlide();
// ── Layer 1: Background ──────────────────────────────────────────────────────────
slide.background = { color: "FFFFFF" }; // ← Background color from Step 1-A
// ── Layer 2: Large background rectangles ───────────────────────────────────────────────────
// Header
slide.addShape(pres.shapes.RECTANGLE, {
x: 0, y: 0, w: 10, h: 0.65,
fill: { color: "8B1A1A" }, line: { color: "8B1A1A", width: 0 }
});
// Header text (coordinates exactly match header, margin: 0)
slide.addText("[Step 1-B OCR result]", {
x: 0, y: 0, w: 10, h: 0.65,
fontSize: 22, bold: true, color: "FFFFFF",
fontFace: "Microsoft YaHei",
align: "left", valign: "middle",
margin: 0, // ← Rule 1: must be 0
});
// Footer
slide.addShape(pres.shapes.RECTANGLE, {
x: 0, y: 5.0, w: 10, h: 0.625,
fill: { color: "8B1A1A" }, line: { color: "8B1A1A", width: 0 }
});
slide.addText("[Step 1-B OCR result]", {
x: 0, y: 5.0, w: 10, h: 0.625,
fontSize: 13, bold: true, color: "FFFFFF",
fontFace: "Microsoft YaHei",
align: "left", valign: "middle",
margin: 0,
});
// ── Layer 3: Content graphics + Layer 4: Graphic text (alternate drawing, add text immediately after each shape) ──
// Example: Flow node
slide.addShape(pres.shapes.ROUNDED_RECTANGLE, { // ← Shape constant from Step 1-C
x: 1.4, y: 0.95, w: 2.4, h: 0.42,
fill: { color: "D8EAF5" },
line: { color: "A0C4E0", width: 1 },
rectRadius: 0.05
});
slide.addText("Node text", {
x: 1.4, y: 0.95, w: 2.4, h: 0.42, // ← Rule 2: exactly matches shape above
fontSize: 11, color: "333333",
fontFace: "Microsoft YaHei",
align: "center", valign: "middle",
margin: 0, // ← Rule 1
wrap: true,
});
// Example: Connector (with arrow)
slide.addShape(pres.shapes.LINE, {
x: 2.6, y: 1.37, w: 0, h: 0.25, // Vertical line: w=0,h>0
line: {
color: "888888", width: 1.5,
endArrowType: "triangle", // ← Step 1-C: use triangle/open if arrow present, omit if no arrow
}
});
pres.writeFile({ fileName: "/mnt/user-data/outputs/output.pptx" });
console.log("Done!");
NODE_PATH=/home/claude/.npm-global/lib/node_modules node create_slide.js
Core principle: Don't execute immediately after writing script. Do paper verification first—"render" script coordinates in mind, compare with original image, fix issues on the spot, then submit for execution. This step intercepts most overlap, out-of-bounds, and text overflow issues before execution.
Run scripts/preflight.py, automatically parse all element coordinates in script and report issues:
python scripts/preflight.py create_slide.js
Output explanation:
❌ Out of bounds: Elements with x+w > 10 or y+h > 5.625 → Must fix before execution⚠️ Suspected overlap: Non-parent-child rectangle intersections between two elements → Manual confirmation if reasonable📋 Coordinate summary: All elements sorted by y, convenient for comparing region proportions with original imageAny ❌ out of bounds must be fixed and re-run until no ❌ before entering Step 4.
For all addText in script, verify one by one if text fits.
Required width = character count × (fontSize / 72 × 1.1) inches
Required height = line count × (fontSize / 72 × 1.5) inches
Requirement: w ≥ required width, h ≥ required height
Quick reference (fontSize = 12pt):
| Content | Min w | Min h |
|---|---|---|
| 4 Chinese characters, single line | 0.74" | 0.25" |
| 6 Chinese characters, single line | 1.10" | 0.25" |
| 4 Chinese characters, two lines | 0.74" | 0.45" |
| 10 English characters | 0.80" | 0.25" |
vert: "eaVert")Required width = fontSize / 72 × 1.4 inches (single column character width)
Required height = character count × (fontSize / 72 × 1.3) inches
Requirement: w ≥ required width, h ≥ required height
Quick reference (fontSize = 14pt):
| Vertical characters | Min w | Min h |
|---|---|---|
| 4 chars | 0.27" | 1.05" |
| 6 chars | 0.27" | 1.57" |
| 8 chars | 0.27" | 2.10" |
Text doesn't fit → Fix before execution: Reduce fontSize, expand w/h, or use breakLine for manual line breaks.
Use coordinate summary table from preflight.py output, compare with original image for proportion verification:
1. Horizontally slice original image into major horizontal regions (header, content area, footer…)
2. Visually estimate each region's percentage of total slide height
3. Compare with corresponding element's h/5.625 in summary table, see if close to visual percentage
4. Regions with deviation > 15% need re-estimation of y and h
Focus on right-side truncation check:
→ Rightmost element's x+w ≤ 9.8" (leave 0.2" safety margin)
→ When right side has vertical label bar, confirm its x doesn't exceed 10 - label width
For nested regions (large background box containing multiple child elements), check all child elements are within parent box range:
Child element requirements:
child.x ≥ parent.x
child.x + child.w ≤ parent.x + parent.w
child.y ≥ parent.y
child.y + child.h ≤ parent.y + parent.h
Most common violation: Last column of nested grid x+w exceeds parent box right boundary. Fix method:
// Recalculate grid column width, ensure last column doesn't exceed bounds
const cellW = (parentW - padLeft - padRight) / cols;
const cellX = (i) => parentX + padLeft + i * cellW;
All satisfied before entering Step 4:
| Item | Pass Criteria |
|---|---|
| ✅ No out of bounds | preflight.py has no ❌ output |
| ✅ Text fits | All text boxes w/h satisfy capacity calculation |
| ✅ Region proportions reasonable | Each region height deviation from original visual ≤ 15% |
| ✅ Child elements within parent | All nested elements don't exceed parent box boundary |
| ✅ Overlaps confirmed | preflight.py ⚠️ warnings all manually confirmed reasonable |
Fix unpassed items directly in script, re-run preflight.py after fix, until all pass.
// Rectangle node (processing step)
slide.addShape(pres.shapes.RECTANGLE, { x:1.0, y:1.0, w:2.0, h:0.5,
fill:{color:"F5F5F5"}, line:{color:"AAAAAA",width:1} });
slide.addText("Processing Step", { x:1.0, y:1.0, w:2.0, h:0.5,
fontSize:11, color:"333333", fontFace:"Microsoft YaHei",
align:"center", valign:"middle", margin:0, wrap:true });
// Diamond node (decision branch)
slide.addShape(pres.shapes.DIAMOND, { x:1.0, y:2.0, w:2.0, h:0.6,
fill:{color:"FFF2CC"}, line:{color:"CCAA00",width:1} });
slide.addText("Yes?", { x:1.0, y:2.0, w:2.0, h:0.6,
fontSize:11, color:"333333", fontFace:"Microsoft YaHei",
align:"center", valign:"middle", margin:0, wrap:true });
// Ellipse node (start/end)
slide.addShape(pres.shapes.OVAL, { x:1.2, y:0.3, w:1.6, h:0.45,
fill:{color:"D5E8D4"}, line:{color:"82B366",width:1} });
slide.addText("Start", { x:1.2, y:0.3, w:1.6, h:0.45,
fontSize:11, color:"333333", fontFace:"Microsoft YaHei",
align:"center", valign:"middle", margin:0 });
// Connector with arrow (solid triangle at end)
slide.addShape(pres.shapes.LINE, { x:2.0, y:1.5, w:0, h:0.3,
line:{ color:"888888", width:1.5, endArrowType:"triangle" } });
// Connector with arrow (open arrow at end)
slide.addShape(pres.shapes.LINE, { x:2.0, y:1.5, w:0, h:0.3,
line:{ color:"888888", width:1.5, endArrowType:"open" } });
// Pure connector without arrow
slide.addShape(pres.shapes.LINE, { x:2.0, y:1.5, w:0, h:0.3,
line:{ color:"CCCCCC", width:1 } });
endArrowTypeoptions:"triangle"(solid),"open"(hollow),"stealth"(stealth),"diamond","oval","none"
slide.addShape(pres.shapes.ROUNDED_RECTANGLE, { x:0.55, y:1.62, w:4.1, h:0.55,
fill:{color:"E8E0F0"}, line:{color:"C0A8D8",width:1}, rectRadius:0.05 });
slide.addText([
{ text:"Main Title", options:{ breakLine:true } },
{ text:"Subtitle description, can be smaller", options:{ fontSize:7.5, color:"666666" } }
], { x:0.55, y:1.62, w:4.1, h:0.55,
fontSize:10, color:"444444", fontFace:"Microsoft YaHei",
align:"center", valign:"middle", margin:0, wrap:true });
slide.addShape(pres.shapes.RECTANGULAR_CALLOUT, { x:1.0, y:0.5, w:2.5, h:0.6,
fill:{color:"FFFDE7"}, line:{color:"CCAA00",width:1} });
slide.addText("Annotation text", { x:1.0, y:0.5, w:2.5, h:0.6,
fontSize:10, color:"333333", fontFace:"Microsoft YaHei",
align:"center", valign:"middle", margin:0, wrap:true });
slide.addText([
{ text:"Link text", options:{ color:"1155CC", underline:{style:"sng"} } },
{ text:" and ", options:{ color:"333333" } },
{ text:"Another link", options:{ color:"1155CC", underline:{style:"sng"} } },
{ text:" is description.", options:{ color:"333333" } }
], { x:5.55, y:3.3, w:4.2, h:0.35,
fontSize:12, fontFace:"Microsoft YaHei", margin:0 });
Applicable scenarios: Narrow strip area labels on left or right side of image, such as "Group Data Platform", "Data Lake", "Data Services", etc.
// ── Vertical text: vert: "eaVert" (each character upright, read top to bottom) ───────────────────
// Vertical label with background (most common form)
slide.addShape(pres.shapes.RECTANGLE, {
x: 0, y: 0.65, w: 0.4, h: 3.5, // Width ~0.35-0.5", height fills area
fill: { color: "1F4E79" }, line: { color: "1F4E79", width: 0 }
});
slide.addText("Group Data Platform", {
x: 0, y: 0.65, w: 0.4, h: 3.5, // ← Exactly matches shape
fontSize: 14, bold: true, color: "FFFFFF",
fontFace: "Microsoft YaHei",
align: "center", valign: "middle",
margin: 0,
vert: "eaVert", // ← Chinese vertical, characters upright
});
// Pure vertical text without background
slide.addText("Data Services", {
x: 9.2, y: 1.0, w: 0.35, h: 2.0,
fontSize: 12, color: "2E75B6",
fontFace: "Microsoft YaHei",
align: "center", valign: "middle",
margin: 0,
vert: "eaVert",
});
Dimension rules:
w (width) = Single character width + small margin, approximately fontSize / 72 * 1.4 (inches)
h (height) = Actual height of corresponding region, converted from image
Applicable scenarios: Horizontal region labels at top or bottom of image (such as "Data Consumption", "Data Source"), text is normal horizontal but text box is narrow.
// ── Correct: Horizontal labels need sufficient width, don't compress to achieve vertical effect ──────────────────────
// ❌ Wrong: Width too narrow, text forced to wrap into "vertical column", actually horizontal compression
slide.addText("Data Consumption", {
x: 0, y: 0, w: 0.3, h: 0.8, // w too narrow, each character auto-wraps
fontSize: 16, ...
});
// ✅ Correct: Give sufficient width to accommodate all text, if space insufficient reduce font size
slide.addText("Data Consumption", {
x: 0, y: 0, w: 0.8, h: 0.8, // w sufficient for horizontal text
fontSize: 14, bold: true, color: "FFFFFF",
fontFace: "Microsoft YaHei",
align: "center", valign: "middle",
margin: 0,
wrap: false, // ← Prohibit wrapping, ensure single line display
});
// ✅ If region really narrow must wrap, use breakLine for manual line break control
slide.addText([
{ text: "Data", options: { breakLine: true } },
{ text: "Consumption" }
], {
x: 0, y: 0, w: 0.45, h: 0.8,
fontSize: 14, bold: true, color: "FFFFFF",
fontFace: "Microsoft YaHei",
align: "center", valign: "middle",
margin: 0,
});
Horizontal vs Vertical Quick Reference:
| Original Effect | Characters upright? | Correct approach |
|---|---|---|
| Text left to right, normal reading | ✅ Upright | Horizontal, no vert needed |
| Text top to bottom, each character upright | ✅ Upright | Vertical, add vert: "eaVert" |
| Entire text block rotated (horizontal but on side) | ❌ On side | Add rotate: 270 |
| Text wraps into column but characters horizontal (compressed) | ✅ Upright but squeezed | Horizontal + manual breakLine + appropriate w |
const slideW = 10, slideH = 5.625;
// Horizontal center
const x = (slideW - w) / 2;
// Node bottom center (connector start point)
const lineX = nodeX + nodeW / 2;
const lineY = nodeY + nodeH;
// n elements evenly spaced (total range rangeW, start x0, each width itemW)
const gap = (rangeW - n * itemW) / (n + 1);
const itemX = (i) => x0 + gap + i * (itemW + gap);
python /mnt/skills/public/pptx/scripts/office/soffice.py \
--headless --convert-to pdf /mnt/user-data/outputs/output.pptx
rm -f /home/claude/slide-*.jpg
pdftoppm -jpeg -r 150 /home/claude/output.pdf /home/claude/slide
ls -1 /home/claude/slide*.jpg
Use view tool to view preview image, compare side-by-side with original image, execute following two-phase checks.
| Check Item | Judgment Criteria |
|---|---|
| Text out of box | Text exceeds graphic boundary or truncated |
| Text box exceeds graphic | Text box dimensions larger than underlying shape, text appears outside shape |
| Vertical text correct | Sidebar label characters upright (eaVert); horizontal compressed text not mistakenly made vertical |
| Horizontal text compressed | Horizontal label w sufficient, text not overlapping or misaligned due to insufficient width |
| Elements unexpectedly overlapping | Text covered by graphics, or graphics block other text |
| Connector arrow direction | Arrow direction matches original image (start/end reversed?) |
| Arrangement orderly | Peer elements aligned, evenly spaced, no obvious misalignment |
| Spacing reasonable | Adjacent element spacing ≥ 0.1", no edge touching |
| Page margins | Content distance from slide edge ≥ 0.2" (header/footer excepted) |
| Check Item | Judgment Criteria |
|---|---|
| 🎨 Colors | Background, graphic fill, text color match original image |
| 🔷 Shape types | Rectangle/rounded/diamond/ellipse etc. match original image |
| 📏 Proportions | Element width/height ratios, region proportions match original image |
| ↔️ Arrow types | Solid/hollow/thick arrow shapes match original image |
| 🔤 Font size hierarchy | Title/body/annotation font size relationships match original image |
| ↔️ Alignment | Text left/center/right alignment matches original image |
Correction counter = 0
LOOP:
Generate preview → Phase 1 check → Phase 2 check
if no issues → Deliver, end
if correction counter >= 3:
Deliver directly, note: "Completed 3 correction rounds, following issues unresolved: [list]"
End
Modify create_slide.js corresponding lines
Regenerate pptx and preview
Correction counter += 1
→ Continue LOOP
Explain each round: ① Layout issues found ② Fidelity issues found ③ Which code modified ④ Round N/3
| Error Phenomenon | Cause | Fix |
|---|---|---|
| Vertical label characters on side | Used rotate: 270 instead of vert | Use vert: "eaVert" instead |
| Vertical label characters overlapping | Text box w too narrow without vert | Add vert: "eaVert", set w to 0.35-0.5" |
| Horizontal label text wraps into column | Text box w too narrow, text forced to wrap | Increase w for single line, or use breakLine for manual line break control |
Horizontal label added vert causing chaos | Misjudged as vertical | Remove vert, ensure w wide enough |
| Text position offset | Forgot margin: 0 | Add margin: 0 to all addText |
| Text exceeds graphic boundary | Text box dimensions don't match shape | Align text box x/y/w/h exactly with shape |
| Text expanded text box | Didn't set wrap: true and fixed dimensions | Add wrap: true, appropriately reduce font size |
| Arrow direction reversed | endArrowType on wrong side | Swap begin/endArrowType, or adjust x/y/w/h direction |
| Color display wrong | Added # before hex | Remove #: "FF0000" ✅ |
| Garbled/Chinese characters as blocks | Font not set | Add fontFace: "Microsoft YaHei" |
| ROUNDED_RECTANGLE corner has white edge | Text box covers corner | Text box same coordinates as shape + margin:0 |
| Diamond text crooked | valign not set | Add align:"center", valign:"middle" |
| Color sampling deviation | Sampled border/shadow pixels | Move sampling coordinates to region center, 10px from edge |
| Error Phenomenon | Cause | Fix |
|---|---|---|
| Triangle/shape has obvious hard edge outline | line.width not set to 0 | Add line: { color: same as fill, width: 0 } |
| Glow layers reversed (outer dark inner light) | transparency value order wrong | Outer transparency smaller (more opaque) → swap values |
| Arc decorative ring not closed | N value insufficient or angle range wrong | Confirm theta from 0 to 2 * Math.PI, appropriately increase N |
| Arc element position not on ellipse | cx/cy/rx/ry estimation wrong | Recalculate center and radius from pixel coordinates |
| Cutout white ellipse has color gap | White ellipse coverage insufficient | Slightly expand white ellipse w/h, ensure complete inner ring coverage |
| Column structure asymmetrical | x coordinate calculation error | Use center column as benchmark, mirror symmetric offset left-right |
| Leader corner has gap | Vertical and horizontal line endpoint coordinates not continuous | Vertical line start y = horizontal line y; horizontal line start x = vertical line x |
references/pptxgenjs-cheatsheet.mdreferences/shapes.md