Install
openclaw skills install hugging-face-apiSearch and discover Hugging Face open-source models and datasets, then run OpenAI-compatible chat or embedding inference securely with cost control.
openclaw skills install hugging-face-apiA playbook for agents that use the Hugging Face MCP server. Follow these steps in order. Discover for free first; run billed inference only against confirmed-supported models.
Hugging Face — open-source model and dataset discovery plus OpenAI-compatible inference (chat and embeddings) across inference providers, via 7 MCP tools.
Use this skill to find open-source models and datasets on the Hugging Face Hub, confirm which models are runnable through the Inference router, and run chat completions and embeddings — while controlling cost, respecting licenses, and keeping the access token secret.
Use it when the task involves:
Set one secret:
| Variable | Required | Notes |
|---|---|---|
HF_TOKEN | Yes | hf_.... Get it at https://huggingface.co/settings/tokens. Never expose it. |
Optional: HF_HUB_BASE_URL, HF_ROUTER_BASE_URL, HF_TIMEOUT_MS, HF_MAX_RETRIES, LOG_LEVEL.
| Tool | Use it to | Cost |
|---|---|---|
hf_search_models | Search Hub models | Free |
hf_model_info | Inspect one model (license, task) | Free |
hf_search_datasets | Search Hub datasets | Free |
hf_list_inference_models | List models runnable via router | Free |
hf_chat | OpenAI-style chat completion | Billed |
hf_embeddings | Embedding vectors | Billed |
hf_request | Reach any other Hub/router endpoint | Depends |
Do this first; it costs nothing.
hf_search_models — find candidates by task/author/popularity.hf_model_info — check pipeline_tag and cardData.license.hf_search_datasets — find data if needed.hf_list_inference_models — confirm the chosen model is actually runnable.hf_list_inference_models.hf_chat with OpenAI-style messages and a bounded max_tokens.hf_embeddings with a batch of inputs (default model sentence-transformers/all-MiniLM-L6-v2).usage.max_tokens on hf_chat.inputs) instead of per-item calls.| Error | Reaction |
|---|---|
model_not_supported (402/403) | Call hf_list_inference_models, pick a listed model, retry. |
401 invalid token | Stop. Fix HF_TOKEN. Do not retry blindly. |
402 credits | Stop. Add credits or use a cheaper/free model. |
429 rate limit | Back off (server retries); slow down, batch, cache. |
hf_ token. The server redacts it; do not undo that.your_hf_token) in any shared config.hf_model_info → cardData.license).hf_search_models / hf_model_info (free).hf_list_inference_models.max_tokens (chat) / batched inputs (embeddings).usage.hf_search_models → hf_model_info → hf_list_inference_models → hf_chat. See recipes/find-and-run-model.md.hf_embeddings (batch) → store → query. See recipes/build-embeddings.md.hf_search_datasets → hf_request for details. See recipes/dataset-discovery.md.hf_chat before confirming the model is supported (causes model_not_supported).max_tokens, leading to runaway generation cost.hf_list_inference_models rather than hardcoding ids.HF_TOKEN periodically.