Install
openclaw skills install linux-system-healthDiagnose Linux OS-level issues — slow server, OOM kills, disk full, high CPU/load, DNS failures, connection timeouts, port exhaustion, too many open files, zombie processes, browser automation failures, locale problems, and kernel misconfigurations.
openclaw skills install linux-system-healthYou are a Linux OS diagnostic expert. When a user reports any of the following problems, use this skill:
Use the judgment rules below to systematically diagnose OS-level root causes.
When NOT to use this skill: For application-level issues specific to OpenClaw (gateway config, API keys, model configuration, service management, systemd units), use the openclaw-diagnostic skill instead. This skill only covers OS-level diagnostics.
Diagnostic workflow:
Commands: Run the corresponding section in scripts/diagnostics.sh. Run as root with
export LANG=C.Issue Registry: See reference.md for severity level definitions and the complete issue name table.
Data access scope — this skill collects OS-level diagnostic data. Review before running in sensitive environments:
| Category | What is accessed | Sections |
|---|---|---|
| System config files | /etc/os-release, /etc/resolv.conf, /etc/security/limits.conf, /etc/default/locale, /etc/locale.conf, /etc/systemd/journald.conf | 1, 6, 8, 11, 17 |
| Kernel interfaces | /proc/meminfo, /proc/stat, /proc/loadavg, /proc/sys/fs/*, /proc/sys/net/*, /sys/kernel/mm/* | 2, 3, 5, 6, 7, 14 |
| Kernel ring buffer | dmesg — may contain process names and OOM kill details | 2, 7, 12 |
| Systemd journal | journalctl -k — kernel messages only | 2 |
| Log directory | /var/log/ size enumeration only (does not read log content) | 11 |
| Process & socket table | ps, ss -p — exposes PIDs, command names, socket owners | 2, 3, 10, 15 |
| User home directories | /root/.cache/ms-playwright, /home/*/.cache/ms-playwright — Chromium binary search only | 16 |
| Outbound network probes | DNS resolution tests (nslookup/dig/getent to github.com), nameserver TCP/53 reachability, Chrome headless launch test (about:blank) | 8, 16 |
| Write operation | Creates and immediately removes /tmp/.oc_write_test to verify filesystem writability — the only write in the entire script | 12 |
Output format: After running diagnostics, report findings as a severity-sorted list (FATAL > CRITICAL > ERROR > WARNING > INFO). For each issue found, include:
OpenClaw.Memory.SystemMemoryCritical)Collect OS context for subsequent analysis.
Judgment rules:
Detect low memory and past OOM kills that affect any workload on this server.
Judgment rules:
Resource contention causes slow responses; high iowait indicates disk bottlenecks.
Judgment rules:
nproc → OpenClaw.CPU.SystemLoadHigh (WARNING)
/proc/stat) → OpenClaw.CPU.HighIOWait (WARNING)
Basic network configuration, DNS, IPv6, and firewall state.
Judgment rules:
:: but upstream resolves to IPv4 only → OpenClaw.Network.IPv6Mismatch (WARNING)
NODE_OPTIONS='--dns-result-order=ipv4first' or sysctl -w net.ipv6.conf.all.disable_ipv6=1Disk space exhaustion and inotify limits cause "ENOSPC" errors.
Judgment rules:
max_user_watches < 65536 → OpenClaw.Disk.InotifyWatchesTooLow (ERROR)
echo 'fs.inotify.max_user_watches=524288' >> /etc/sysctl.d/99-inotify.conf && sysctl -p /etc/sysctl.d/99-inotify.confmax_user_instances < 256 → OpenClaw.Disk.InotifyInstancesTooLow (WARNING)
echo 'fs.inotify.max_user_instances=512' >> /etc/sysctl.d/99-inotify.conf && sysctl -p /etc/sysctl.d/99-inotify.confLow ulimits cause "too many open files" (EMFILE) errors under load.
Judgment rules:
ulimit -n < 4096 → OpenClaw.Limits.NofileTooLow (ERROR)
* soft nofile 65536 and * hard nofile 65536 to /etc/security/limits.conf; re-loginnofile value > fs.nr_open → OpenClaw.Limits.NofileExceedsKernelMax (CRITICAL)
fs.nr_open first: sysctl -w fs.nr_open=1048576 and persist in /etc/sysctl.d/file-nr allocated / max > 80% → OpenClaw.Limits.SystemFileDescriptorsHigh (WARNING)
ls /proc/*/fd 2>/dev/null | wc -l); increase fs.file-max if needednf_conntrack, TCP tuning, and somaxconn affect high-concurrency workloads.
Judgment rules:
nf_conntrack_max < 65536 → OpenClaw.Kernel.NfConntrackMaxTooLow (ERROR)
sysctl -w net.netfilter.nf_conntrack_max=262144 and persist in /etc/sysctl.d/99-sysctl.confnf_conntrack_max; check for connection leakssomaxconn < 1024 → OpenClaw.Kernel.SomaxconnTooLow (WARNING)
sysctl -w net.core.somaxconn=4096 and persisttcp_max_tw_buckets < 10000 → OpenClaw.Kernel.TcpMaxTwBucketsTooLow (WARNING)
sysctl -w net.ipv4.tcp_max_tw_buckets=262144tcp_tw_reuse = 0 → OpenClaw.Kernel.TcpTwReuseNotEnabled (WARNING)
sysctl -w net.ipv4.tcp_tw_reuse=1ss -s > 10000 → OpenClaw.Kernel.TimeWaitOverflow (WARNING)
tcp_tw_reuse, increase tcp_max_tw_buckets, reduce tcp_fin_timeout/proc/net/netstat → OpenClaw.Kernel.TcpListenOverflows (WARNING)
somaxconn and application backlog settingvm.overcommit_memory = 2 and swap < 1 GB → OpenClaw.Kernel.StrictOvercommitWithLowSwap (WARNING)
vm.overcommit_memory=0Broken or slow DNS causes EAI_AGAIN errors, API timeouts, and silent connectivity failures.
Judgment rules:
/etc/resolv.conf is empty or has zero nameserver lines → OpenClaw.Network.NoDNSNameservers (ERROR)
echo 'nameserver 8.8.8.8' >> /etc/resolv.conf; for systemd-resolved check /etc/systemd/resolved.confnslookup, dig, and getent all fail for a known-good domain → OpenClaw.Network.DNSResolutionFailed (CRITICAL)
/etc/resolv.conf; consider adding a backup nameserverClock drift causes SSL/TLS certificate validation failures, API auth token rejection, and log timestamp inconsistencies.
Judgment rules:
chronyd, ntpd, or systemd-timesyncd is active → OpenClaw.Time.NTPServiceNotRunning (ERROR)
yum install chrony && systemctl enable --now chronyd (RHEL/CentOS) or apt install chrony && systemctl enable --now chronyd (Debian/Ubuntu)timedatectl shows "NTP synchronized: no" → OpenClaw.Time.ClockNotSynchronized (CRITICAL)
chronyc sources or ntpq -p); check firewall allows UDP port 123chronyc tracking shows system clock offset > 3 seconds, or hwclock drift > 5 seconds from system time → OpenClaw.Time.ClockDriftDetected (WARNING)
chronyc makestep or ntpdate -u pool.ntp.org; investigate why drift occurred (suspended VM, unreachable NTP server)Zombie processes indicate child process leaks; D-state (uninterruptible sleep) processes signal I/O hangs that block system operations.
Judgment rules:
ps -eo pid,ppid,stat,comm | awk '$3~/Z/'); the parent is not reaping children — restart or fix the parent processdmesg | grep -i error), NFS mounts (mount -t nfs), and storage subsystem; these processes cannot be killed normallykernel.pid_max → OpenClaw.Process.TotalProcessCountHigh (WARNING)
ps -eo user --sort=user | uniq -c | sort -rn | head); increase kernel.pid_max if neededSystemd journal grows unbounded on long-running servers, silently consuming disk space — a common hidden root cause of "disk full" events.
Judgment rules:
journalctl --vacuum-size=500M; set SystemMaxUse=500M in /etc/systemd/journald.conf and restart systemd-journald/var/log total size > 5 GB → OpenClaw.Logs.VarLogOversized (WARNING)
find /var/log -type f -size +100M); configure logrotate; clean old rotated logsRead-only filesystem (from ext4/xfs journal errors) prevents writing session data, logs, and PID files. Inode exhaustion produces "No space left on device" even with free disk space.
Judgment rules:
ro flag, or /tmp write test fails → OpenClaw.Disk.ReadOnlyFilesystem (CRITICAL)
dmesg for filesystem errors; run fsck on the affected partition (requires unmount or single-user mode); may indicate disk hardware failurefind / -xdev -printf '%h\n' | sort | uniq -c | sort -rn | head -10); clean up session/temp filesdmesg contains EXT4-fs error, XFS error, or read-only remount messages → OpenClaw.Disk.FilesystemErrorsDetected (CRITICAL)
fsck at next maintenance window; check disk SMART status (smartctl -a /dev/sdX)Firewall rules blocking inbound or outbound traffic are the #1 cause of "port not reachable" and "API connection refused" in self-hosted deployments.
Judgment rules:
iptables -L -n -v for detailed hit countsufw status shows default deny incoming (informational only) → OpenClaw.Network.UFWDefaultDeny (INFO)
ufw allow <port>/tcp)THP causes latency spikes and memory fragmentation for Node.js workloads. Multiple database and runtime vendors recommend disabling it on servers.
Judgment rules:
enabled is set to [always] → OpenClaw.Kernel.THPEnabled (WARNING)
echo never > /sys/kernel/mm/transparent_hugepage/enabled; persist via systemd unit or /etc/rc.localdefrag is set to [always] → OpenClaw.Kernel.THPDefragEnabled (INFO)
echo never > /sys/kernel/mm/transparent_hugepage/defrag; reduces latency spikes from compactionExcessive network connections exhaust file descriptors, memory, and conntrack table capacity, degrading system-wide performance.
Judgment rules:
close() on sockets — identify the leaking process and restart it; this is an application bugsysctl -w net.ipv4.ip_local_port_range='1024 65535'; enable tcp_tw_reuse; check for connection leaksOpenClaw skills that use browser automation (Playwright, Puppeteer) require Chromium shared libraries and headless mode. The diagnostic first tests whether Chrome can actually launch in headless mode. Dependency diagnosis is only performed when Chrome fails or is absent.
Judgment rules:
--headless=new --dump-dom about:blank) succeeds → no issue, skip dependency checksldconfig -p → OpenClaw.Browser.ChromiumDependenciesMissing (ERROR)
apt install -y libnss3 libatk-bridge2.0-0 libgbm1 libxkbcommon0 libdrm2 libgtk-3-0 libasound2; on RHEL/CentOS: yum install -y nss atk at-spi2-atk mesa-libgbm libxkbcommon libdrm gtk3 alsa-libldd on chromium binary shows one or more "not found" entries → OpenClaw.Browser.ChromiumBinaryLddFailures (CRITICAL)
ldd; run ldconfig after installation to update the dynamic linker cache/proc/sys/kernel/unprivileged_userns_clone is 0 → OpenClaw.Browser.UserNamespaceDisabled (ERROR)
sysctl -w kernel.unprivileged_userns_clone=1 and persist in /etc/sysctl.d/99-userns.conf; or configure Chromium with --no-sandbox (less secure, not recommended for production)Missing or misconfigured locale causes garbled text output, incorrect sorting in logs, and subtle bugs like backspace deleting two characters over SSH (when client sends UTF-8 but server expects ASCII). OpenClaw's text processing relies on correct UTF-8 support.
Judgment rules (use the persistent LANG value read from /etc/default/locale or /etc/locale.conf, not the runtime $LANG which may be overridden to C by the diagnostic runner):
LANG is empty, unset, or set to POSIX/C → OpenClaw.Locale.LocaleNotConfigured (ERROR)
apt install locales && dpkg-reconfigure locales, then set LANG=en_US.UTF-8 in /etc/default/locale; on RHEL/CentOS: localectl set-locale LANG=en_US.UTF-8LANG value does not appear in locale -a output (configured but not generated/installed) → OpenClaw.Locale.LocaleNotGenerated (WARNING)
/etc/locale.gen and run locale-gen; on RHEL/CentOS: localedef -i en_US -f UTF-8 en_US.UTF-8LANG does not contain UTF-8 or utf8 → OpenClaw.Locale.NonUTF8LocaleDetected (WARNING)
localectl set-locale LANG=en_US.UTF-8; re-login for the change to take effect