Aws Emr Skill

AWS EMR interaction skill for managing EMR Serverless, EMR on EC2, and EMR on EKS. Submit and manage Spark, Hive, and PySpark jobs across all three EMR deplo...

MIT-0 · Free to use, modify, and redistribute. No attribution required.
0 · 60 · 0 current installs · 0 all-time installs
MIT-0
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description (EMR Serverless / EMR on EC2 / EMR on EKS) match the included scripts (boto3 clients, EMR Serverless, EMR on EC2, emr-containers code). Required binary (python3) and required env var (AWS_REGION) are proportionate. Optional env vars and config keys align with EMR job submission and log retrieval needs.
Instruction Scope
SKILL.md instructions restrict behavior to EMR operations, log retrieval from S3, and using boto3's default credential chain. The skill documents where it may write temporary files (./tmp) and asks the agent to add the skill root to sys.path; these are expected for an embedded Python skill. The skill does read an optional simple KEY=VALUE config file if pointed to, but that file is documented and only used for EMR-related config keys.
Install Mechanism
No install spec — instruction-only install is used and the code lives in the skill bundle. There are no downloads from arbitrary URLs or extract/install steps in the skill manifest.
Credentials
Declared required env is only AWS_REGION (appropriate). The skill documents additional optional env vars (application IDs, EMR cluster/virtual-cluster IDs, execution role ARNs, S3 log URIs) which are reasonable for EMR operations. Note: primaryEnv is set to AWS_REGION (region is not a credential) — this is unusual labeling but not harmful.
Persistence & Privilege
always:false and model invocation not disabled (normal). The skill does not request persistent elevated platform privileges or attempt to modify other skills or global agent settings.
Assessment
This skill appears coherent and implements EMR management using boto3. Before installing: - Verify you trust the repository/source (source is listed as unknown in the registry metadata). Review the repo or vendor identity if possible. - Use least-privilege AWS credentials or an IAM role with only the EMR/S3 permissions the skill needs (list/describe/submit/cancel and S3 read for logs). Avoid granting broad admin rights. - If you set an optional config file via EMR_SKILL_CONFIG / EMR_SKILLS_CONFIG, ensure it does not point at sensitive system files. The skill will read that file if present. - Confirm S3 log buckets referenced (EMR_SERVERLESS_S3_LOG_URI, cluster LogUri) are buckets you control and that the skill only needs read access for logs (and write access only if job artifacts are uploaded). - If you need higher assurance, run the skill in a sandboxed agent or review the small set of Python files (they primarily call boto3) before enabling autonomous invocation.

Like a lobster shell, security has layers — review code before you run it.

Current versionv2.0.0
Download zip
latestvk978rccea4xpqd2j9nhy04tt25835g7j

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Runtime requirements

Clawdis
Binspython3
EnvAWS_REGION
Primary envAWS_REGION

SKILL.md

AWS EMR Skills

A Python skill for interacting with AWS EMR across three deployment modes: EMR Serverless, EMR on EC2, and EMR on EKS. Submit Spark and Hive jobs, manage clusters and applications, monitor job status, and retrieve logs.

When to Use (Trigger Phrases)

Invoke this skill when the user mentions:

"Submit a Spark job on EMR"
"List EMR Serverless applications"
"Add a step to my EMR cluster"
"Get EMR job logs"
"Check EMR job status"
"Cancel running EMR job"
"List EMR clusters"
"Create an EMR on EKS virtual cluster"
"Submit PySpark to EMR Serverless"
"Get step logs from EMR cluster"

Any request involving EMR Serverless applications/jobs, EMR on EC2 clusters/steps, or EMR on EKS virtual clusters/job runs.

Feature List

EMR Serverless

  • Applications: List, describe, start, stop EMR Serverless applications
  • Job Submission: Submit Spark SQL, Spark JAR, PySpark, and Hive jobs (sync/async)
  • Job Lifecycle: Get status, cancel, list job runs
  • Results: Retrieve SQL query results from S3
  • Logs: Get driver stdout/stderr logs with secret masking

EMR on EC2

  • Clusters: List, describe, terminate EMR clusters
  • Step Submission: Add Spark, PySpark, and Hive steps via command-runner.jar
  • Step Lifecycle: List, describe, cancel steps
  • Logs: Get step logs (stderr, stdout, controller, syslog) from S3

EMR on EKS

  • Virtual Clusters: List, describe, create, delete virtual clusters
  • Job Submission: Submit Spark and Spark SQL jobs to EKS
  • Job Lifecycle: Describe, list, cancel job runs
  • Logs: Get job logs from S3

Initial Setup

  1. Python 3.8+ with boto3>=1.26.0:

    pip install boto3>=1.26.0
    
  2. AWS credentials via boto3 default chain (env vars, config files, IAM roles).

  3. Environment variables (all optional, validated at point of use):

    export AWS_REGION="us-east-1"
    
    # EMR Serverless
    export EMR_SERVERLESS_APP_ID="00abcdef12345678"
    export EMR_SERVERLESS_EXEC_ROLE_ARN="arn:aws:iam::123456789:role/emr-role"
    export EMR_SERVERLESS_S3_LOG_URI="s3://my-bucket/emr-logs/"
    
    # EMR on EC2
    export EMR_CLUSTER_ID="j-XXXXXXXXXXXXX"
    
    # EMR on EKS
    export EMR_EKS_VIRTUAL_CLUSTER_ID="abc123def456"
    export EMR_EKS_EXEC_ROLE_ARN="arn:aws:iam::123456789:role/emr-eks-role"
    

How to Manage EMR

1. EMR Serverless

Fully managed serverless Spark/Hive execution. No infrastructure to manage.

  • Application management: scripts/on_serverless/emr_serverless_cli.py — 14 @tool functions
  • Detailed guide: references/emr_serverless/application_guide.md — Application lifecycle
  • Detailed guide: references/emr_serverless/job_guide.md — Job submission, results, logs

2. EMR on EC2

Traditional EMR clusters on EC2 instances. Submit work as Steps.

  • Cluster & step management: scripts/on_ec2/emr_on_ec2_cli.py — 10 @tool functions
  • Detailed guide: references/emr_on_ec2/cluster_guide.md — Cluster lifecycle
  • Detailed guide: references/emr_on_ec2/step_guide.md — Step submission, logs

3. EMR on EKS

Spark workloads on Amazon EKS via the emr-containers API.

  • Virtual cluster & job management: scripts/on_eks/emr_on_eks_cli.py — 10 @tool functions
  • Detailed guide: references/emr_on_eks/virtual_cluster_guide.md — Virtual cluster lifecycle
  • Detailed guide: references/emr_on_eks/job_run_guide.md — Job submission, logs

Available Scripts

ScriptDescription
scripts/on_serverless/emr_serverless_cli.pyEMR Serverless @tool functions (14 tools)
scripts/on_ec2/emr_on_ec2_cli.pyEMR on EC2 @tool functions (10 tools)
scripts/on_eks/emr_on_eks_cli.pyEMR on EKS @tool functions (10 tools)
scripts/config/emr_config.pyUnified configuration management
scripts/client/boto_client.pyboto3 client factory

References

DocumentDescription
references/emr_serverless/application_guide.mdEMR Serverless application management guide
references/emr_serverless/job_guide.mdEMR Serverless job submission and management guide
references/emr_on_ec2/cluster_guide.mdEMR on EC2 cluster management guide
references/emr_on_ec2/step_guide.mdEMR on EC2 step submission and management guide
references/emr_on_eks/virtual_cluster_guide.mdEMR on EKS virtual cluster management guide
references/emr_on_eks/job_run_guide.mdEMR on EKS job run management guide

Requirements

  • When writing temporary files (scripts, notes, etc.), place them in the ./tmp folder.
  • When importing scripts packages, add the skill root to path: sys.path.append(${emr_skill_root})
  • AWS credentials are handled by boto3's default credential chain — never pass access keys directly.
  • All configuration environment variables are optional and validated at the point of use.

Data Privacy & Trust

  • No credential storage: AWS credentials are resolved via boto3 default chain. No keys are stored or logged.
  • Secret masking: Log retrieval functions automatically mask potential AWS credentials in output.
  • Read-only by default: Most operations are read-only queries. Write operations (job submission, cluster termination) require explicit user action.

External Endpoints

This skill connects to:

  • AWS EMR Serverless API (emr-serverless.{region}.amazonaws.com)
  • AWS EMR API (elasticmapreduce.{region}.amazonaws.com)
  • AWS EMR Containers API (emr-containers.{region}.amazonaws.com)
  • AWS S3 API (s3.{region}.amazonaws.com) — for log and result retrieval

Files

26 total
Select a file
Select a file to preview.

Comments

Loading comments…