Install
openclaw skills install lemonade-server-managerManage Lemonade Servers natively. Use when checking system info, health status, listing available models, pulling or loading new models, completing LLM chats, or generating stable-diffusion images on a local or remote AI NPU/GPU cluster.
openclaw skills install lemonade-server-managerInteract with and manage local or remote Lemonade AI Server hardware directly via standard native network requests (curl).
server_url. If one is not specified by the user, assume http://localhost:8000.LEMONADE_API_KEY environment variable to authenticate.server_url (default http://localhost:8000) and using an optional LEMONADE_API_KEY. It does not instruct reading arbitrary host files or other env vars.server_url is used. If a remote URL is supplied, the agent will send user content and model commands there (expected for this skill, but worth the user's attention).Get hardware capabilities and device enumeration limits.
Returns: JSON object with hardware capabilities.
# Example Request
curl -X GET "http://localhost:8000/api/v1/system-info" \
-H "Authorization: Bearer ${LEMONADE_API_KEY}"
Verify status and monitor currently loaded models to prevent VRAM overflow.
Returns: JSON object containing server status and currently loaded models.
# Example Request
curl -X GET "http://localhost:8000/api/v1/health" \
-H "Authorization: Bearer ${LEMONADE_API_KEY}"
Get an array of downloaded models available to load into memory.
Returns: JSON list of all downloaded and available models.
# Example Request
curl -X GET "http://localhost:8000/api/v1/models" \
-H "Authorization: Bearer ${LEMONADE_API_KEY}"
Download and install a new model string to the target machine.
Returns: JSON stream or object confirming download status.
# Example Request
curl -X POST "http://localhost:8000/api/v1/pull" \
-H "Authorization: Bearer ${LEMONADE_API_KEY}" \
-H "Content-Type: application/json" \
-d '{"model": "llama3"}'
Load a model into VRAM/NPU to prepare for prompt responses.
Returns: JSON object confirming model loaded into memory.
# Example Request
curl -X POST "http://localhost:8000/api/v1/load" \
-H "Authorization: Bearer ${LEMONADE_API_KEY}" \
-H "Content-Type: application/json" \
-d '{"model": "llama3"}'
Force unload a model to free up memory before loading a larger variant.
Returns: JSON object confirming model unloaded.
# Example Request
curl -X POST "http://localhost:8000/api/v1/unload" \
-H "Authorization: Bearer ${LEMONADE_API_KEY}" \
-H "Content-Type: application/json" \
-d '{"model": "llama3"}'
Send a standard chat request to the LLM backend.
Returns: JSON object containing standard chat completion response.
# Example Request
curl -X POST "http://localhost:8000/api/v1/chat/completions" \
-H "Authorization: Bearer ${LEMONADE_API_KEY}" \
-H "Content-Type: application/json" \
-d '{
"model": "llama3",
"messages": [
{"role": "user", "content": "Hello, world!"}
]
}'
Submit a stable-diffusion prompt for image generation.
Returns: JSON object containing base64 generated image data.
# Example Request
curl -X POST "http://localhost:8000/api/v1/images/generations" \
-H "Authorization: Bearer ${LEMONADE_API_KEY}" \
-H "Content-Type: application/json" \
-d '{
"model": "sdxl",
"prompt": "A majestic lion standing on a cliff, digital art"
}'
flm, ryzenai-llm, and whispercpp backends are mutually exclusive on an NPU. Always use the /api/v1/unload endpoint on conflicting LLMs before attempting to load a Whisper model on an NPU./api/v1/health first to assess available hardware VRAM before pulling or loading a multi-gigabyte LLM to avoid out-of-memory errors on the host.