english-polish

Other

Advanced English polish skill for non-native writing. Nativization, style consistency, diff output. CLI scripts: detector.py + polish.py

Install

openclaw skills install @muchenhengxin/english-polish

english-polish — Professional English Polish & Nativization Skill

Overview

Polish English text for native-level fluency while preserving the author's voice, argument structure, and analytical depth. Designed for technical/analytical non-fiction writing by non-native speakers.

Core principle: We are not translating Chinese to English. We are nativizing English that was written by a non-native speaker — making it flow naturally while preserving every fact, judgment, and nuance.

When to Use

  1. Non-native English manuscript chapters (like the 算力简史 project)
  2. English-language analysis/reports written with Chinese syntax patterns
  3. Any English output intended for native English-speaking readers

Never Use For

  • Creative writing (poetry, fiction, dialogue-heavy narrative)
  • Academic papers requiring formal citation style preservation
  • Translation from scratch (this is polish, not translate)

CLI Tools

This skill ships with two Python scripts that automate Phase 0 detection and Phase 1-3 polishing:

Installation

# Prerequisites
python3 -c "import re"  # stdlib, no deps needed

Detector: detector.py

# Check a file (default: plain text)
python3 scripts/detector.py check document.md

# HTML report (with score bar + table)
python3 scripts/detector.py check document.md --format html

# ANSI-colored terminal report
python3 scripts/detector.py check document.md --format ansi

# Batch scan a directory + JSON report
python3 scripts/detector.py batch /path/to/chapters/

# HTML batch (one HTML file per chapter)
python3 scripts/detector.py batch /path/to/chapters/ --format html --output /tmp/reports/

# Interactive mode (type or paste text)
python3 scripts/detector.py interactive

# List all 19 pattern categories
python3 scripts/detector.py list-patterns

Output formats:

  • text (default): Plain text report (same format as above)
  • html: Beautiful HTML with score bar, severity colors, pattern table
  • ansi: Terminal-colored report for quick scanning

Output sample:

============================================================
PHASE 0 DETECTOR REPORT — chapter-09.md
============================================================
  Word count:          4820
  Total issues (wtd):  24.4
  Density (/200 words): 1.0
  Score:               4.9/5.0
  Severity:            Native-level

  Detected patterns (5 categories):

  [Noun Stacking (5+ consec noun/mod)] ×8 (weighted: 5.6)
    → "capital expenditure on cloud infrastructure"
    → "chip design competition is fierce"

  ['Not only...but also' overuse] ×2 (weighted: 1.2)
    → "Not only...but also"

  ['Should' false obligation] ×3 (weighted: 2.1)
    → "China should"
    → "Companies should"

  ...

Polisher: polish.py

# Check + report (no file modification)
python3 scripts/polish.py check document.md

# Colorized check output
python3 scripts/polish.py check document.md --format ansi

# Polish file (creates output + diff report)
python3 scripts/polish.py polish document.md --output output.md

# Polish with HTML summary report
python3 scripts/polish.py polish document.md --output output.md --format html

# Style tweaks only (skip Phase 3 structural changes)
python3 scripts/polish.py polish document.md --style-only

# Batch polish a directory
python3 scripts/polish.py batch /path/to/chapters/

Polish output includes:

  • Phase 1: Grammar & mechanics fixes (articles, prepositions, tenses)
  • Phase 2: Fluency/Chinglish pattern replacement (18 categories)
  • Phase 3: Style enhancement (sentence rhythm, word choice)
  • Diff: side-by-side or inline before/after
  • Report: score improvement, changes summary

Makefile (CI Automation)

# Quick test suite only
make check

# Full test + book baseline + regression check
make test

# Run full-book baseline and compare with saved reference
make baseline

# Save current baseline as reference for future comparisons
make save-baseline

# Clean temp files
make clean

make baseline runs the detector on the full book and compares against the saved reference. If the average score changes by more than ±0.1, it flags the change for review.

Custom Configuration (config/)

Extend the skill without modifying code:

# Copy and customize
cp config/default.json config/user.json

user.json supports adding custom detection patterns and polish replacement rules:

{
  "user_custom_patterns": [
    {
      "key": "Z_my_custom_pattern",
      "label": "My Custom Pattern",
      "weight": 1.0,
      "patterns": ["\\bmy regex pattern here\\b"],
      "examples": ["Example text"],
      "notes": "What to look for"
    }
  ],
  "user_custom_replacements": [
    {
      "type": "filler",
      "old": "my custom phrase",
      "new": "better phrase",
      "enabled": true
    },
    {
      "type": "weak_verb",
      "old": "perform an evaluation",
      "new": "evaluate",
      "enabled": true
    }
  ]
}

Architecture

Three-Phase Polish Pipeline

Phase 0: Detection ────→ Score (0-5)
                             │
                     ┌──────┴──────┐
                     │             │
               score < 3      score ≥ 3
                     │             │
              Phase 1:        Phase 2:
           Full Grammar    Light Polish
             + Syntax       (patterns
             Restructure    only)
                     │             │
              Phase 2:        Phase 3:
           Chinglish      Style-only
           Replacement   (word-level)
                     │             │
              Phase 3:           │
           Style Pass     (skip Phase 1)
                     │             │
               Diff + Report

19 Pattern Categories (A-S)

CodePatternWeightExample
ASubject Dangling1.0"For chip design, competition is fierce..."
BRedundant "As For" / "Regarding"0.8"As for the energy cost, it is..."
CPassive Chain Stacking1.0"It can be seen that..."
D"Make + abstract noun"0.8"make a decision" → "decide"
E"Not only...but also" overuse0.6default connector overuse
FNoun Stacking (5+ consec noun/mod)0.7"Data center GPU market growth prediction models"
G"Should" false obligation0.7"China should develop its own chip..."
H"The" overuse for generic0.5"the computing power is..."
IChinese filler expressions0.7"to a certain extent", "in the process of"
J"This is because" structure0.6Chinese-pattern explanatory
KRedundant modifiers0.6"actively strengthen", "continuously improve"
LWeak verb + nominalization0.7"conduct research" → "research"
MEmphasis inversion (Chinese '是...的')0.7"It is X that..." → "X..."
NList cadence enumeration0.5"First...Second...Finally" aggregation
OConcessive cluster (虽然...但是)0.8"Although...but..." double conjunction
PTopic-comment fronting0.7"As for X... / When it comes to..."
QFalse inclusive "We"0.5"We can see/observe/conclude..."
RLogical sawtooth contrast0.5"On one hand...on the other hand..."
SList summarizer0.5"All these factors contribute to..."

Scoring

  • 0-5 scale: normalized from weighted detection density
  • 5.0: Perfect (0 issues) or native-level
  • 4.5+: Native-level (minor style tweaks only)
  • 4.0-4.5: Near-native (phrase-level polish)
  • 3.0-4.0: Light polish needed (pattern-level)
  • 2.0-3.0: Full polish needed (sentence-level)
  • <2.0: Heavy rewrite needed (structural issues)

Scoring formula (per 200 words):

  • 0 weighted issues → score 5.0
  • 1-3 medium → score 4.0-4.5
  • 4-8 heavy → score 3.0-4.0
  • 8 → score drops below 3.0


Workflow

Phase 0: Detection

Run automated detection before deciding to polish:

# Quick check
python3 scripts/detector.py check chapter.md

# Decision matrix:
# Score < 3:   Full polish (Phase 1 → 2 → 3)
# Score 3-4:   Light polish (Phase 2 → 3)
# Score 4+:    Style-only (Phase 3)

Phase 1: Full Chapter Polish (score < 3)

For each chapter:

  1. Read-through: Understand argument structure, key claims, data points
  2. Sentence-by-sentence reform: Fix grammar, syntax, flow WITHOUT changing:
    • Any factual claim
    • Any analytical judgment
    • The chapter's core argument structure
    • Technical terms (DO NOT rename "chiplet" → "modular chip" etc.)
  3. Style pass: Check for
    • Overuse of "However," at sentence start (replace with "But" or restructure)
    • Overuse of "In the previous chapter..." transitions (merge into judgments)
    • Chinese-pattern "Not only...but also..." (use sparingly)
    • "This is because..." → restructure
    • "Therefore, we can see that..." → delete, just state the conclusion
  4. Diff output: Produce markdown diff showing before/after

Phase 2: Light Polish (score 3-4)

Target specific patterns only:

  • Article corrections (a/an/the)
  • Preposition fixes
  • Verb tense adjustments
  • Comma splice repairs
  • Chinglish pattern replacement (see 19-pattern library)

Phase 3: Style-Only (score 4+)

  • Rhythm adjustments (vary sentence length)
  • Passive→active where natural
  • Word choice: elevate from "correct but flat" to "natural and precise"

Quality Gates

Gate 1: Content preservation check

  • Every data point in the original must still appear in the polished version
  • Every core argument must be present
  • Nothing added that changes meaning

Gate 2: Nativization check

  • Read aloud test: does it sound like a native speaker wrote it?
  • If you stumble on reading, mark and fix
  • No Chinese syntax traces: subject-drop, topic-comment structure, redundant "the" usage

Gate 3: Style consistency check

  • Tone matches the original author's voice (not overwritten with a new style)
  • Technical terms preserved
  • The polish feels invisible — reader shouldn't notice "this was translated" or "this was AI-polished"

Output Format

For each polished chapter:

# Chapter X: Title

## Polish Summary
- Original word count: X
- Changed words: X
- Sentences rewritten: X
- Nativization score: before X / after Y

## Changes
### Major Restructures (>50% rewritten)
1. [Original sentence]
   → [Polished sentence]
   Reason: [syntax pattern detected + fix]

### Moderate Changes (sentence-level)
1. [Original fragment]
   → [Polished fragment]

### Word-Level Tweaks
1. [original word] → [new word]: reason

## Critical Notes
- Any factual concerns found during polish
- Any argument structure notes for author review
- Questions about intended meaning (for ambiguous phrases)

Integration with Star1

When used for 算力简史 polishing:

  1. Reference StarKB for technical terms that should NOT be changed
  2. Preserve the Wang Jialin-style analytical voice (judgments before details, facts before emotions)
  3. Do NOT:
    • Add unnecessary hedging
    • Soften strong judgments
    • Remove skepticism/critical analysis
    • Convert "this is bad" to "this may be considered challenging"

Dependencies

  • Python 3 (stdlib only — re, sys, json)
  • Star1 or any advanced LLM (for sentence rewriting)
  • diff command (for before/after comparison)
  • No external API required

Directory Structure

english-polish/
├── SKILL.md                  ← This file
├── Makefile                  ← CI targets (test, baseline, check, clean)
├── config/
│   ├── default.json          ← Built-in configuration
│   ├── user.json             ← User custom patterns & replacements (optional)
│   ├── baseline-reference.json ← Saved book baseline for regression checking
│   └── save-baseline.py      ← Baseline compare/save script
├── scripts/
│   ├── detector.py           ← Phase 0: Chinglish pattern detection (19 classes)
│   ├── detector.py.bak       ← Backup of original 12-class version
│   └── polish.py             ← Phase 1-3: full polish pipeline
├── tests/
│   ├── test_suite.py         ← Automated test suite (29 test cases)
│   └── known_chinglish.md    ← 19-pattern test corpus
└── references/
    └── (optional: comparison reports, benchmark data)