Install
openclaw skills install @eddygk/proxmox-dc-lifecycleAutomate rebuilding a Windows AD domain controller VM on Proxmox, including safe demotion, metadata cleanup, OS reinstall, and re-promotion without corruptin...
openclaw skills install @eddygk/proxmox-dc-lifecycleRebuild a domain controller that lives as a VM on a Proxmox hypervisor, one DC at a time, without corrupting Active Directory. This skill was distilled from production DC rebuilds; the flow rebuilds any additional (non-last) DC.
Do not corrupt Active Directory. Everything else is negotiable. Concretely:
repadmin /replsummary (expect 0 fails) and netdom query fsmo. A green replication picture is your license to proceed; anything else means stop and diagnose.nTDSDSA (NTDS Settings) object exists for it under CN=Sites. That object's absence — not the VM being off, not the computer object being gone — is the gate that unlocks the OS rebuild.qm snapshot <vmid> <label> is seconds of cost and the only instant rollback.All environment-specific values live here, in one place. Fill this in for YOUR environment before running, then confirm against live state (qm config, qm list, a query to a healthy DC) — recalled facts go stale. Everything downstream (reference files, examples) refers to these names rather than hardcoding.
{
"domain": "example.local",
"subnet": "10.0.0.0/24",
"gateway": "10.0.0.1",
"timezone": "UTC",
"fsmoHolder": "DC1", // holds all 5 FSMO; NEVER the rebuild target
"dcs": [
{ "name": "DC1", "vmid": 0, "ip": "10.0.0.10" }, // authority / DNS
{ "name": "DC2", "vmid": 0, "ip": "10.0.0.20" }
// ...one entry per DC; rebuild one at a time
],
"primaryDns": "10.0.0.10", // a healthy DC; the new DC points here during build
"bridge": "vmbr0",
"vlanTag": 0, // 0 / omit if untagged
"isoStorage": "local", // Proxmox storage holding the two ISOs below
"serverIso": "<windows-server-eval>.iso",
"virtioIso": "virtio-win-<ver>.iso",
"pxeSource": null, // optional: { name, vmid } of a WDS/MDT server on the same VLAN
"credentialStore": "BRING YOUR OWN. The host has no secret manager. Generate/store secrets on a machine that does, and apply them by piping into the guest's stdin over SSH (e.g. <secret-source> | ssh root@<pvehost> 'qm guest exec <vmid> --pass-stdin 1 -- ...'). See references/credential-handling.md."
}
sudo and qm at /usr/sbin/qm. There is no separate SSH hop — "ssh root@host" collapses to local sudo qm.sudo qm guest exec <vmid> ... powershell.exe .... This is the execution backbone — see references/qga-execution.md. WinRM/PowerShell-Remoting between DCs is NOT used (it fails between DCs — see Gotchas).Work the phases below top to bottom. Each phase ends with a verification gate. Do not cross a gate that isn't green. Use a task list so a long rebuild survives interruptions.
hostname, ip -4 addr, sudo qm list. The DC VMIDs must be present.sudo qm guest cmd <vmid> ping for each DC.repadmin /replsummary, each DC's SYSVOL/NETLOGON shares and DFSR state. See references/verification.md for exact commands.0 fails, FSMO are where you expect, the target DC holds none. Record this — it's your "known good."sudo qm snapshot <vmid> predemote_<date> --description "...". Confirm with qm listsnapshot <vmid>.
Run the demotion locally inside the target guest via QGA (NOT remoting from another DC). See references/demotion.md for the script and the two gotchas that will otherwise hang or fail it:
-LocalAdministratorPassword (else it prompts and blocks under -NonInteractive),-RemoveDnsDelegation:$false when the parent DNS zone is on external nameservers (Cloudflare, etc.) you don't manage via Windows DNS — otherwise it stalls for minutes on RPC timeouts trying to delete a delegation it can't reach.If graceful demotion is genuinely impossible, fall back to forced removal + metadata cleanup — but only after intentionally taking the DC offline. See references/demotion.md.
Gate: from a surviving DC, the target no longer appears in Get-ADDomainController, and replication among the remaining DCs is still 0 fails.
A graceful demote usually cleans its own metadata. A forced/interrupted removal does NOT — you must clean it. Check from a surviving DC for residue: the nTDSDSA object (should be gone), an empty CN=<dc> server object under Sites (delete it), the computer object, DFSR/FRS member objects, and stale DNS records. Full procedure and the "what each leftover means" table is in references/metadata-cleanup.md.
Then run the full health verification (repadmin /replsummary, dcdiag on the FSMO holder and each surviving DC). A lone DFSREvent warning referencing the just-removed DC is expected and benign (it's retrying a partner that's gone); everything else should pass.
Gate (this is the big one): No nTDSDSA for the target anywhere. Get-ADDomainController lists only the surviving DCs (the target is absent). Replication 0 fails. Only now may you touch the VM's disk.
Keep the same VMID, MAC, and disk — reinstall onto the existing VM, don't recreate it (that's how MAC/IP/config are preserved).
The disk wipe is irreversible — gate it. Step 3 wipes the boot disk. Before running it: confirm the Phase-3 gate is green (no
nTDSDSAfor this DC anywhere — AD does not see it as a DC), confirm the<vmid>is the intended target and nothing else, and get explicit operator confirmation to wipe that specific VM. Never wipe while AD still sees the box as a DC, and never run this step unattended.
qm config <vmid>: note the boot disk bus (virtio0 ⇒ viostor driver; a scsiN disk ⇒ vioscsi), BIOS (OVMF ⇒ UEFI/GPT partitioning), MAC, net bridge/VLAN.autounattend.xml from assets/autounattend.xml.template — it injects virtio drivers in WinPE, partitions UEFI/GPT, installs Server Core, sets name + static IP + DNS, and enables the guest agent + WinRM at first logon. Fill the placeholders (see template header). Verify the exact image name with wiminfo against the ISO's install.wim (e.g. Windows Server 2025 SERVERSTANDARDCORE).scripts/build-unattend-iso.sh does the packaging. See references/os-reinstall.md for the exact qm set invocations and the destructive shutdown→boot sequence.C:\oobe_done.txt exists, hostname/IP/timezone are correct, time is in sync. The box is a clean domain-member-to-be (not yet a DC).Install AD DS role, then Install-ADDSDomainController (replica) against the domain, supplying domain creds + DSRM password. Point the new DC's DNS at a healthy DC during promotion; after it's a DC, set DNS to itself + a partner. Full script in references/promotion.md.
Gate: new DC appears in Get-ADDomainController with an nTDSDSA object, repadmin /replsummary is 0 fails including the new DC in both directions, SYSVOL/NETLOGON shared, dcdiag clean (modulo transient DFSR-initial-sync warnings). Re-confirm FSMO unchanged.
Detach install ISOs, set boot back to the disk, remove the unattend ISO and any temp cred files, and (optionally) prune now-stale pre-rebuild snapshots once the new DC has been healthy for a while. Then — and only then — move to the next DC.
Secrets are radioactive. Follow references/qga-execution.md precisely: pass passwords to the guest over QGA stdin (--pass-stdin), never on a process command line or -EncodedCommand. If a creds.txt convention is used, chmod 600, read it silently, delete it the moment it's loaded. Any cred file written into a guest must be deleted as the first action of the script that consumes it. Never echo a secret into chat, logs, or task XML. Show a sha256 fingerprint, never the value.
Know the boundary: this skill runs on the Proxmox host, which has no password manager. It can generate throwaway randoms (local Administrator, DSRM) and apply them, but it cannot store or later reveal them — and must not pretend to. After a rebuild, hand off to the operator to set known break-glass values (and rotate any exposed credential) from a machine that has their secret manager. Full detail: references/credential-handling.md.
This skill makes real changes to a live directory. Stop and confirm with the user when: a gate isn't green and you can't explain why; the target turns out to hold FSMO or be the last DC; a demotion would need -ForceRemoval; or you're about to run the first destructive step of a phase (snapshot-and-demote, disk wipe, promotion). Recommend the safe default, explain the risk in one or two lines, and let them decide.
references/qga-execution.md — the QGA-over-stdin pattern, the run_guest helper, secret handling, timeout/backgrounding behavior. Read this first — it's how every other phase actually executes.references/verification.md — exact read-only commands for FSMO, replication, dcdiag, SYSVOL/DFSR, and how to read the results.references/demotion.md — graceful demotion script + the LocalAdministratorPassword and RemoveDnsDelegation gotchas; forced-removal fallback.references/metadata-cleanup.md — finding and removing DC residue; what each leftover object means; how to tell a completed-but-messy removal from a real failure.references/os-reinstall.md — autounattend build, ISO attach, boot sequence, virtio driver selection, the qm set commands.references/promotion.md — AD DS role install + replica promotion + post-promotion DNS and verification.references/gotchas.md — the catalog of failure modes seen in the field, each with its signature and fix. Skim it before starting and consult it the instant something hangs or errors.references/credential-handling.md — what the skill does with secrets vs. what it must NOT pretend to do (no password manager on the host); the throwaway-random + hand-off model. The vault/rotation mechanics live off-host (on a machine with a secret manager) because that's where they can run.assets/autounattend.xml.template — parameterized Windows Server 2025 Core unattend (UEFI/GPT, virtio injection, static IP, agent+WinRM).scripts/run-guest-ps.sh — wrapper that runs base64'd PowerShell in a guest via QGA, with stdin secret passing.scripts/build-unattend-iso.sh — packages an autounattend.xml into an attachable ISO.