Model Migrate FlagOS

v1.0.0

Migrate a model from the latest vLLM upstream repository into the vllm-plugin-FL project (pinned at vLLM v0.13.0). Use this skill whenever someone wants to a...

⭐ 0· 106·0 current·0 all-time

byFlagos@wbavon

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for wbavon/model-migrate-flagos.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "Model Migrate FlagOS" (wbavon/model-migrate-flagos) from ClawHub.
Skill page: https://clawhub.ai/wbavon/model-migrate-flagos
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Canonical install target

openclaw skills install wbavon/model-migrate-flagos

ClawHub CLI

Package manager switcher

npx clawhub@latest install model-migrate-flagos

Security Scan

VirusTotal

Suspicious

View report →

OpenClaw

Suspicious

medium confidence

✓

Purpose & Capability

Name/description align with the provided artifacts: the SKILL.md plus scripts and reference docs implement a copy-then-patch migration pipeline (clone upstream vLLM, copy model files, apply compatibility patches, register in plugin, run validation/benchmark/serve/E2E). The included scripts and docs are coherent with migrating models into a vllm-plugin pinned to v0.13.0.

Instruction Scope

The SKILL.md instructs the agent to perform many privileged or system-wide actions: clone upstream repos, write/copy/patch plugin source files, modify vllm_plugin entrypoints, run pytest, start/stop servers, run benchmarks, and manage remote GT servers via SSH. Operational rules explicitly tell the agent to 'NEVER ask whether to continue', to 'ALWAYS' create a TaskList and auto-resume, and to 'forcefully release GPUs' by killing processes. These behaviors go beyond a narrowly-scoped helper and grant the agent broad discretion to modify local and remote systems without further user confirmation.

✓

Install Mechanism

There is no install spec (instruction-only), and all code files are included in the skill bundle. No external downloads or arbitrary URL extract/install steps are present in the metadata. Risk from install mechanism is low, though the included scripts when executed will change local files and run commands.

Credentials

The skill declares no required env vars or credentials, but the instructions assume access to SSH keys (e.g. instructions to run ssh-copy-id and to use ~/.ssh/id_ed25519), read/write/execute access to the plugin directory, read access to /usr/local/lib (installed vLLM), and access to /models and GPUs. It also instructs setting env vars when invoking vllm (VLLM_USE_DEEP_GEMM, VLLM_FL_PREFER_ENABLED). Those permission and credential needs are substantial and not explicitly declared in requires.env or config paths, which is disproportionate to a simple skill invocation and should be made explicit.

Persistence & Privilege

always:false (good) but the SKILL.md's operational rules demand creating TaskList entries for all 13 steps, auto-resuming work after interruptions, and 'NEVER ask whether to continue.' Combined with normal autonomous invocation this yields a high risk of the agent continuing to make file and system changes (including killing GPU processes and running remote SSH commands) without re-confirmation. The skill also encourages 'work-until-done' behavior that could cause prolonged or destructive activity.

Scan Findings in Context

[base64-block] expected: The E2E test prompts include small inline base64-encoded images for multimodal tests, which explains the base64-block detection. This appears legitimate for multimodal correctness checks, but embedded base64 content is why the scanner flagged it.

What to consider before installing

What to consider before installing or running this skill: - Functionally coherent: The skill appears to do what it claims (migrating vLLM models and running E2E verification). The included scripts implement the pipeline end-to-end. - High-privilege actions: The instructions expect to read/write plugin source, run tests, start/stop local servers, manage a remote GT server over SSH, and forcibly kill GPU-using processes (nvidia-smi | xargs kill -9). These can affect other users/processes and system state. - Automation without confirmation: The skill's operational rules explicitly tell the agent to auto-resume and to 'NEVER ask whether to continue', and to 'work-until-done'. If you allow the agent to run autonomously with this skill enabled, it may continue making changes without further prompts. - Missing declared credentials: The skill does not declare required env variables or credentials, yet it assumes SSH key access and permission to read /usr/local/lib, /models, and modify the plugin directory. Expect to provide or confirm SSH access and to run in an environment where these assumptions are acceptable. Recommendations: 1. Review the code before running: inspect scripts e2e_remote_serve.sh, validate_migration.py, serve.sh, run-request.sh, and any scripts that execute shell commands (look for any network endpoints or unexpected commands). 2. Run in an isolated environment: execute the migration in a disposable VM or container where you control SSH keys, GPU processes, and filesystem snapshots. Back up your vllm-plugin-FL repo first (git branch or clone). 3. Require manual confirmation: if you let an agent use this skill, configure it NOT to auto-resume unattended or to require user approval before steps that modify files, kill processes, or SSH to remote hosts. 4. Validate SSH usage: do not blindly run ssh-copy-id or any script that writes to ~/.ssh/authorized_keys without verifying the target host and keys. 5. If you need more assurance: ask the skill author for provenance (source repo URL, maintainer identity) and for an explicit list of all commands the skill will run, so you can audit them. Given these factors, treat the skill as useful but potentially risky — proceed only after inspection and with controls in place (isolated environment, backups, manual confirmations).

Like a lobster shell, security has layers — review code before you run it.

latestvk972tfe72kv3dyvyg0v72x2x2h83esac

106downloads

0stars

1versions

Updated 1mo ago

v1.0.0

MIT-0

FL Plugin — Model Migration Skill

Usage

/model-migrate-flagos <model_name> [upstream_folder] [plugin_folder]

Argument	Required	Default
`model_name`	Yes	—
`upstream_folder`	No	`/tmp/vllm-upstream-ref`
`plugin_folder`	No	current working directory

Execution

Step 1: Parse arguments and validate paths

Extract from user input:

{{model_name}} = first argument (required, snake_case)
{{upstream_folder}} = second argument or /tmp/vllm-upstream-ref
{{plugin_folder}} = third argument or current working directory

If {{upstream_folder}} doesn't exist, ask user whether to clone it. If {{plugin_folder}} doesn't exist, error out.

→ Tell user: Confirm parsed model name and paths.

Step 2: Load references and resolve placeholders

Read these files (relative to this SKILL.md):

references/procedure.md — step-by-step migration procedure
references/compatibility-patches.md — 0.13.0 patch catalog
references/operational-rules.md — communication, TaskList, bash rules, resilience

The procedure references executable scripts in scripts/:

scripts/validate_migration.py — automated code review (Step 6)
scripts/benchmark.sh — benchmark verification (Step 9)
scripts/serve.sh — serve model locally (Step 10.1, also used for E2E)
scripts/request.sh — test request (Step 10.2)
scripts/e2e_eval.py — E2E correctness verification (Step 11)
scripts/e2e_test_prompts.json — test prompts for E2E (5 text + 5 multimodal)
scripts/e2e_config.template.json — E2E config template (copy to e2e_config.json and fill in)
scripts/e2e_remote_serve.sh — manage GT server on remote machine via SSH

Then investigate upstream source + HuggingFace to resolve all placeholders:

Placeholder	How to derive
`{{model_name}}`	Direct from argument
`{{model_name_lower}}`	Lowercase of model_name (usually identical, e.g. `qwen3_5`) — used in file paths
`{{MODEL_DISPLAY_NAME}}`	From upstream code or HF model card
`{{ModelClassName}}`	From upstream model class (PascalCase)
`{{model_type}}`	From HF config.json `model_type` field
`{{ConfigClassName}}`	From upstream or derive from model_type
`{{skill_root}}`	Absolute path to this skill's folder (the directory containing this SKILL.md)

Naming conventions vary per model — always verify from actual source, never guess.

→ Tell user: Present all resolved values. Use AskUserQuestion if anything is ambiguous.

Step 3: Execute procedure

With placeholders resolved, execute every step in procedure.md sequentially. Apply patches from compatibility-patches.md during the copy-then-patch step. Follow operational-rules.md throughout.

→ Tell user: Before starting, output a numbered plan. Report progress at each step boundary.

Scripts Reference

Script	Step	Description
`validate_migration.py`	6	Automated import/API/registration checks
`benchmark.sh`	9	`vllm bench throughput` with dummy weights
`serve.sh`	10, 11	Start local vLLM server (port 8122, `VLLM_FL_PREFER_ENABLED=false`)
`request.sh`	10	Quick smoke-test request
`e2e_eval.py`	11	Token-level comparison vs upstream GT server
`e2e_test_prompts.json`	11	5 text + 5 multimodal test prompts
`e2e_config.template.json`	11	Config template (GT machine, local port, eval params)
`e2e_remote_serve.sh`	11	SSH-based GT server lifecycle (start/stop/status/logs)

Examples

Example 1: Typical new model

User says: "/model-migrate-flagos kimi_k25"
Actions:
  1. Parse → model_name=kimi_k25, defaults for upstream/plugin paths
  2. Clone upstream, find vllm/model_executor/models/kimi_k25.py
  3. Discover it wraps DeepseekV2 → follow kimi_k25 (wrapper) pattern
  4. Copy file, apply P1+P2 patches, create config bridge
  5. Register, validate, test, benchmark, serve+request
  6. E2E verification against upstream GT
Result: kimi_k25 fully working in plugin, all 11 steps passed

Example 2: Re-run after upstream update

User says: "migrate qwen3_5 again, upstream updated"
Actions:
  1. Idempotent re-run — overwrite existing files with fresh upstream copy
  2. Re-apply patches, re-validate, re-test
  3. Re-run E2E to confirm no regression
Result: qwen3_5 updated to match latest upstream, no regressions

Troubleshooting

General principle: When any runtime error occurs, first compare vLLM upstream code against both the plugin adaptation and the installed 0.13.0 environment. The diff is the fastest path to root cause. See operational-rules.md § Debugging Priority: Upstream-First for the full protocol.

Problem	Typical Cause	Fix
`ImportError` after copy-then-patch	Missing P1 fix (relative→absolute imports)	Verify all `from .xxx` converted to `from vllm.` or `from vllm_fl.`
`AttributeError: module 'vllm' has no attribute X`	API doesn't exist in 0.13.0	Check P3 in compatibility-patches.md; stub or remove
Config not recognized by vLLM	model_type mismatch or config bridge missing	Verify `_CONFIG_REGISTRY[model_type]` matches HF config.json exactly
Registration has no effect	Class name or import path typo	Compare with existing registrations in `__init__.py`
Benchmark `KeyError` on config field	Config bridge missing a field	Compare upstream config class vs bridge; add missing fields with defaults
Benchmark/Serve fails with OOM or "insufficient memory"	GPUs occupied by other processes	Kill GPU processes: `nvidia-smi --query-compute-apps=pid --format=csv,noheader \| xargs -r kill -9` then retry. Never skip these steps.
Model outputs garbled/gibberish text	`ColumnParallelLinear` used for merged projections with different sub-dimensions (TP sharding mismatch)	Override `__init__` to use `MergedColumnParallelLinear(output_sizes=[...])`. See P8 in compatibility-patches.md
`AssertionError: Duplicate op name`	Child class imports custom op from different module path than parent	Use same import path as parent module (e.g. `vllm_fl.ops.fla` not `vllm_fl.models.fla_ops`). See P11
`AttributeError` on `fused_recurrent_*` during CUDA graph warmup	`__init__` override with `nn.Module.__init__(self)` missed attributes used by inherited `_forward_core`	Create ALL attributes from parent's `__init__`, especially custom ops. See P12
E2E: local server not reachable	`serve.sh` port doesn't match `e2e_config.json` local port	Ensure both use same port (default 8122)
E2E: GT server not reachable	GT machine down or docker/conda env wrong	Check `e2e_remote_serve.sh status` or SSH manually
E2E: early token divergence (first 5 tokens)	Weight loading bug, TP sharding error	Check `load_weights`, `stacked_params_mapping`, MergedColumnParallelLinear
E2E: late minor divergence (token #15+)	Numerical noise from different op implementations	Usually acceptable; document in report
`resolve_op` fails with `VLLM_FL_PREFER_ENABLED=false`	Op not registered in dispatch, no fallback	Add try/except fallback to `flag_gems` in op import code

Comments

Loading comments...