Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

NFS-e Parser — Brazilian invoice field extraction

vv0.1.0

NFS-e field extractor for Brazilian agents. 100% field accuracy on São Paulo NFS-e invoices (auxiliar-nfs-e + Surya). Extracts CNPJ, prestador, tomador, valo...

0· 26·0 current·0 all-time

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for tlalvarez/nfs-e-parser.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "NFS-e Parser — Brazilian invoice field extraction" (tlalvarez/nfs-e-parser) from ClawHub.
Skill page: https://clawhub.ai/tlalvarez/nfs-e-parser
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Required binaries: python3, git
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Canonical install target

openclaw skills install tlalvarez/nfs-e-parser

ClawHub CLI

Package manager switcher

npx clawhub@latest install nfs-e-parser
Security Scan
VirusTotalVirusTotal
Suspicious
View report →
OpenClawOpenClaw
Benign
medium confidence
Purpose & Capability
Name/description match the actions in SKILL.md. Required binaries (python3, git) and the instructions to run an OCR CLI and a Python parser are appropriate for extracting structured fields from NFS‑e PDFs.
Instruction Scope
SKILL.md instructs the agent to run Surya OCR on provided PDFs, read the resulting text files, and parse them into JSON — all expected for the stated task. The instructions also include example workflows that operate on arbitrary filesystem paths (e.g., Dunas/CNPJ/...). There is no instruction to exfiltrate data to unexpected external endpoints, but the agent will read arbitrary invoice files provided by the user (which is necessary for the skill).
!
Install Mechanism
Although this is an instruction-only skill (no bundled code), the runtime steps instruct the user/agent to `pip install surya-ocr 'transformers<5.0.0'` and `git clone https://github.com/Tlalvarez/Auxiliar-ai.git` then copy and run parser.py. Fetching and executing third‑party code/packages from PyPI/GitHub is a moderate risk: GitHub and PyPI are common release hosts, but the skill provides no pinned versions or checksum, and the repo/package authors are not verified here.
Credentials
No environment variables, API keys, or unrelated credentials are requested. The skill's external accesses are limited to cloning a GitHub repo and installing public Python packages per its instructions.
Persistence & Privilege
The skill does not request always:true, does not declare any system/config modifications, and is user‑invocable only. It does not demand permanent agent presence or access to other skills' configs.
Assessment
This skill appears to do what it promises (São Paulo NFS‑e field extraction), but it instructs the agent/user to download and run third‑party code and to pip install packages. Before installing or running: 1) Review the GitHub repository (Tlalvarez/Auxiliar-ai) and the parser.py source to ensure no unexpected behavior or network calls. 2) Inspect the pip package(s) (surya-ocr, transformers) and prefer pinned versions or checksums. 3) Run initial tests in an isolated environment or container with non‑sensitive sample invoices. 4) Monitor network activity the first time you run it (to detect any unexpected exfiltration). 5) If you cannot review the code, avoid running it on sensitive/production data and consider asking for a signed release or a vetted package instead.

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

Binspython3, git
bookkeepingvk97fg4hge7qbqcr1fsj3253qsd85e04fbrazilian-invoicevk97fg4hge7qbqcr1fsj3253qsd85e04fclaude-codevk97fg4hge7qbqcr1fsj3253qsd85e04fcnpjvk97fg4hge7qbqcr1fsj3253qsd85e04fcontabilidadevk97fg4hge7qbqcr1fsj3253qsd85e04fdunasvk97fg4hge7qbqcr1fsj3253qsd85e04finvoice-extractionvk97fg4hge7qbqcr1fsj3253qsd85e04fissvk97fg4hge7qbqcr1fsj3253qsd85e04flatestvk97fg4hge7qbqcr1fsj3253qsd85e04fnfs-evk97fg4hge7qbqcr1fsj3253qsd85e04fnfsevk97fg4hge7qbqcr1fsj3253qsd85e04fnota-fiscalvk97fg4hge7qbqcr1fsj3253qsd85e04fnota-fiscal-eletronicavk97fg4hge7qbqcr1fsj3253qsd85e04fopenclawvk97fg4hge7qbqcr1fsj3253qsd85e04fsao-paulovk97fg4hge7qbqcr1fsj3253qsd85e04ftask-rankervk97fg4hge7qbqcr1fsj3253qsd85e04f
26downloads
0stars
1versions
Updated 6h ago
vv0.1.0
MIT-0

nfs-e-parser

Brazilian NFS-e (Nota Fiscal Eletrônica de Serviços) field extraction for agents. When your Claude Code / OpenClaw agent needs to extract structured fields from a São Paulo NFS-e PDF — for bookkeeping, reimbursement, accountant handoff — install this skill + Surya OCR. 100% field accuracy (41/41 fields) on the published corpus.

Raw OCR (Tesseract, Surya, Google Document AI) gives you text. This skill turns that text into typed JSON: prestador_cnpj, tomador_cnpj, valor_servico, codigo_servico, codigo_verificacao, data_emissao, ISS fields, retenções, and más.

When to invoke this skill

Use nfs-e-parser when the agent:

  • Receives a Brazilian NFS-e PDF and needs structured fields, not raw text
  • Is doing bookkeeping for a Brazilian SMB (Dunas-style workflow)
  • Needs to validate CNPJ check digits before writing to a ledger
  • Batches invoices for accountant handoff (tomador summary per month, total valor_servico per prestador, etc.)
  • Needs the codigo_servico (LC 116/2003 code) for ISS reconciliation

How it works

Step 1. Install dependencies

# Python + Surya OCR (best accuracy, ~22s/page on CPU)
python3 -m venv .venv && source .venv/bin/activate
pip install surya-ocr 'transformers<5.0.0'

# Clone the parser (PyPI publish pending)
git clone https://github.com/Tlalvarez/Auxiliar-ai.git /tmp/auxiliar
cp /tmp/auxiliar/scripts/walkthroughs/nfs-e-extraction/parser.py ./nfse_parser.py

Step 2. OCR the PDF

surya_ocr path/to/nfse.pdf --output_dir /tmp/ocr/

Step 3. Parse + validate

import json
from nfse_parser import parse, validate_cnpj

with open('/tmp/ocr/nfse/nfse.txt', 'r') as f:
    text = f.read()

result = parse(text)

# Validate CNPJs before writing to ledger
if result.prestador.cpf_cnpj and not validate_cnpj(result.prestador.cpf_cnpj):
    print("WARNING: prestador CNPJ check digits invalid — OCR may have misread")

if result.missing_fields:
    print(f"WARNING: missing fields: {result.missing_fields}")

print(json.dumps(result.to_dict(), ensure_ascii=False, indent=2))

What you get back

{
  "numero_nota": "00002419",
  "codigo_verificacao": "JTT2-GSZX",
  "data_emissao": "27/03/2026",
  "hora_emissao": "22:09:44",
  "municipio_emissor": "São Paulo",
  "chave_acesso": "20260327u29291029000151",
  "prestador": {
    "nome": "MAGALHAES, NOGUEIRA SOCIEDADE DE ADVOGADOS",
    "cpf_cnpj": "29.291.029/0001-51",
    "inscricao_municipal": "5.866.516-1",
    "endereco": "R TABAPUA 474, CONJ 113 - ITAIM BIBI - CEP: 04533-001",
    "cep": "04533-001",
    "municipio": "São Paulo",
    "uf": "SP"
  },
  "tomador": {
    "nome": "DUNAS DESENVOLVIMENTO DE SOFTWARE LTDA",
    "cpf_cnpj": "64.717.332/0001-74",
    "inscricao_municipal": "0.152.976-5",
    "endereco": "AV PAULISTA 1636, CONJ 4 - BELA VISTA - CEP: 01310-200",
    "municipio": "São Paulo",
    "uf": "SP"
  },
  "valor_servico": "R$ 3.900,00",
  "iss": {
    "codigo_servico": "03220",
    "descricao_servico": "Advocacia",
    "aliquota": "*",
    "valor_iss": "0,00"
  },
  "retencoes": {
    "inss": "0,00",
    "irrf": "0,00",
    "csll": "0,00",
    "cofins": "0,00",
    "pis_pasep": "0,00",
    "ipi": "0,00"
  },
  "missing_fields": []
}

Example: bookkeeping batch

Agent workflow: "For all NFS-e PDFs in Dunas/CNPJ/Contabilidade/2026/03-Março/Notas-Fiscais-Recebidas/, extract fields and produce a summary per prestador."

from pathlib import Path
import subprocess
from nfse_parser import parse, validate_cnpj
from collections import defaultdict

pdfs = Path("Dunas/CNPJ/Contabilidade/2026/03-Março/Notas-Fiscais-Recebidas/").glob("*.pdf")
by_prestador = defaultdict(list)
warnings = []

for pdf in pdfs:
    subprocess.run(["surya_ocr", str(pdf), "--output_dir", "/tmp/ocr/"], check=True)
    text = Path(f"/tmp/ocr/{pdf.stem}/{pdf.stem}.txt").read_text()
    result = parse(text)

    if result.missing_fields:
        warnings.append((pdf.name, "missing:", result.missing_fields))
    if result.prestador.cpf_cnpj and not validate_cnpj(result.prestador.cpf_cnpj):
        warnings.append((pdf.name, "invalid CNPJ:", result.prestador.cpf_cnpj))

    key = result.prestador.cpf_cnpj or "unknown"
    by_prestador[key].append(result)

for cnpj, invoices in by_prestador.items():
    total = sum(float(r.valor_servico.replace("R$", "").replace(".", "").replace(",", ".").strip()) for r in invoices if r.valor_servico)
    print(f"{cnpj}: {len(invoices)} invoices, R$ {total:,.2f}")

Eval scorecard

On the 2-doc São Paulo NFS-e corpus (real Dunas invoices — gitignored source, committed ground truth):

OCR upstreamField accuracyNotes
Surya100% (41/41)Best. Preserves line-level ordering the parser relies on.
Google Document AI87.8% (36/41)~$0.002/page, 1000 pages/mo free tier
Tesseract63.4% (26/41)Fastest, but retention table reorders break positional parsing

Full methodology + reproducible command: https://auxiliar.ai/solve/nfs-e-extraction/

Known limitations (v0.1)

  • São Paulo only. Other municipalities' NFS-e forms have different layouts. Contributions welcome for Rio, Curitiba, Belo Horizonte, etc.
  • Retention values for non-zero retentions not end-to-end tested. The corpus has all-zero retentions (both prestadores are Simples Nacional). Parser handles the position-based logic but hasn't been validated against non-Simples documents.
  • CPF (11-digit) vs CNPJ (14-digit) tomadores. Both supported; CNPJ is the common case for business invoices.
  • No XML API integration. This is a PDF-first parser. For direct Prefeitura queries, use the SP NFS-e API.

Related

  • auxiliar-solve — the meta-ranker skill that directs agents to this skill for NFS-e queries
  • auxiliar-mcp — the MCP server exposing solve_task(task_slug="nfs-e-extraction") for in-loop queries
  • /solve/nfs-e-extraction — the full methodology page with eval breakdown: https://auxiliar.ai/solve/nfs-e-extraction/
  • /solve/pdf-text-extraction-mcp — the upstream OCR ranking (for choosing the OCR engine)

License

MIT (parser code + this skill). Your NFS-e PDFs remain yours; this parser runs locally.

Comments

Loading comments...