Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

Simple CSC

v1.0.1

Use the simple-csc repository to perform Chinese Spelling Correction (CSC) and Chinese Character Error Correction (C2EC) using large language models in a tra...

1· 116·0 current·0 all-time

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for jacob-zhou/simple-csc.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "Simple CSC" (jacob-zhou/simple-csc) from ClawHub.
Skill page: https://clawhub.ai/jacob-zhou/simple-csc
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install simple-csc

ClawHub CLI

Package manager switcher

npx clawhub@latest install simple-csc
Security Scan
VirusTotalVirusTotal
Suspicious
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description match the instructions. All requested actions (clone repo, install Python deps, download LLMs from HuggingFace, run scripts and an API server) are exactly what a CSC/C2EC toolkit would need.
Instruction Scope
SKILL.md restricts itself to repository-relative paths, model/dataset download and running provided scripts/APIs. It does not instruct reading unrelated system files or harvesting credentials. It does recommend setting MODELSCOPE_CACHE (optional) and shows how to run a local API server bound to 127.0.0.1.
Install Mechanism
This is an instruction-only skill (no install spec). The instructions clone a public GitHub repo and use pip for dependencies (including compiling native packages such as flash-attn). That is standard for ML projects; there are no opaque download URLs or archive extracts in the skill content itself.
Credentials
The skill declares no required environment variables or credentials. It optionally references MODELSCOPE_CACHE to change model mirror location and notes that models auto-download from HuggingFace. Network access to download models/datasets is expected for this purpose.
Persistence & Privilege
always is false and the skill is user-invocable. There is no instruction to modify other skills or system-wide agent configs. No elevated or persistent privileges are requested.
Assessment
This skill is a usage guide that expects you to clone and run the external simple-csc repository. Before installing or running anything: (1) review the GitHub repository (https://github.com/Jacob-Zhou/simple-csc) yourself — this skill only points to that code; (2) expect large model downloads (HuggingFace) and significant GPU/VRAM use; (3) pip-installing native libs (e.g., flash-attn) will compile code on your machine—review and build in a controlled environment if concerned; (4) the API server examples bind to localhost (127.0.0.1) which is safe for local use, but avoid exposing the server to the public internet without proper access controls; and (5) if you must use alternative model mirrors (MODELSCOPE_CACHE), ensure you trust the mirror source. Overall the instructions are coherent with the skill's purpose, but the real code and runtime behavior live in the external repository — inspect that repo before running.

Like a lobster shell, security has layers — review code before you run it.

latestvk97ap577dq3kafgbs703xprecx839pzq
116downloads
1stars
2versions
Updated 1mo ago
v1.0.1
MIT-0

Simple CSC

A training-free approach to Chinese Spelling Correction using LLMs as pure language models with beam search and distortion modeling.

Prerequisites

This skill is a usage guide for the simple-csc repository. Before using any commands or APIs described here, clone the repository and work from its root:

git clone https://github.com/Jacob-Zhou/simple-csc.git
cd simple-csc

All paths referenced below (e.g., configs/, scripts/, data/, eval/, datasets/) are relative to this repository root. The repository contains the actual code, config files, data dictionaries, and scripts — this skill provides the knowledge of how to use them.

Quick Reference

Environment Setup

# Standard setup (creates venv, installs deps)
bash scripts/set_environment.sh

# For Qwen3 models
bash scripts/set_environment_qwen3.sh

# Recommended: install flash-attn for better performance and lower VRAM
pip install flash-attn --no-build-isolation

Qwen2/Qwen2.5 warning: Without flash-attn, set torch_dtype=torch.bfloat16 to avoid unexpected behavior.

Python API

import torch
from lmcsc import LMCorrector

corrector = LMCorrector(
    model="Qwen/Qwen2.5-7B",
    prompted_model="Qwen/Qwen2.5-7B",       # use same model to save VRAM
    config_path="configs/c2ec_config.yaml",   # or "configs/default_config.yaml" for substitution-only
    torch_dtype=torch.bfloat16,               # recommended for Qwen2/2.5 without flash-attn
)

# Single sentence
outputs = corrector("完善农产品上行发展机智。")
# => [('完善农产品上行发展机制。',)]

# Batch
outputs = corrector(["句子一", "句子二"])

# With context (same length lists)
outputs = corrector(["未挨前兆"], contexts=["患者提问:"])

# Streaming (batch_size=1 only)
for output in corrector("完善农产品上行发展机智。", stream=True):
    print(output[0][0], end="\r", flush=True)

Config Selection

ConfigUse Case
configs/default_config.yamlSubstitution-only CSC (v1.0.0 style)
configs/c2ec_config.yamlFull C2EC with insert/delete support (v2.0.0)
configs/demo_config.yamlSame as c2ec_config, used by demo app

Key difference: c2ec_config.yaml includes ROR (reorder), MIS (missing char), RED (redundant char) distortion types and length_immutable_chars data file.

Recommended Models

  • v2.0.0 (C2EC): Qwen/Qwen2.5-7B or Qwen/Qwen2.5-14B — best performance/speed balance
  • v1.0.0 (CSC): baichuan-inc/Baichuan2-13B-Base — best performance
  • Always prefer Base models over Instruct/Chat variants

RESTful API Server

python api_server.py \
    --model "Qwen/Qwen2.5-7B" \
    --prompted_model "Qwen/Qwen2.5-7B" \
    --config_path "configs/c2ec_config.yaml" \
    --host 127.0.0.1 --port 8000 --workers 1 --bf16

Endpoints:

  • GET /health — health check
  • POST /correction{"input": "...", "stream": false, "contexts": null}
# Non-streaming
curl -X POST 'http://127.0.0.1:8000/correction' \
  -H 'Content-Type: application/json' \
  -d '{"input": "完善农产品上行发展机智。"}'

# With context
curl -X POST 'http://127.0.0.1:8000/correction' \
  -H 'Content-Type: application/json' \
  -d '{"input": "未挨前兆", "contexts": "患者提问:"}'

For detailed API parameters, config options, evaluation pipeline, and dataset formats, see references/details.md.

Key Architecture Concepts

The approach works by:

  1. Using an LLM as a pure language model (left-to-right generation)
  2. At each step, computing a distortion probability for each candidate token based on how "similar" it is to the observed (possibly erroneous) character
  3. Combining LM probability with distortion probability via beam search
  4. Distortion types encode the relationship between observed and candidate characters (identical, same pinyin, similar shape, etc.)

The prompted_model parameter adds a second probability source: a prompt-based LLM that scores candidates given the full input sentence as context, improving correction quality.

Comments

Loading comments...