SageMaker Training Job

v1.0.2

Submit ML training jobs to AWS SageMaker — package code, upload to S3, launch on GPU/CPU instances, poll status, download artifacts. Use when training machin...

1· 96·0 current·0 all-time

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for zyyhhxx/sagemaker-training-job.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "SageMaker Training Job" (zyyhhxx/sagemaker-training-job) from ClawHub.
Skill page: https://clawhub.ai/zyyhhxx/sagemaker-training-job
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Required binaries: python3
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install sagemaker-training-job

ClawHub CLI

Package manager switcher

npx clawhub@latest install sagemaker-training-job
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description align with included scripts and docs: packaging/uploading source, submitting SageMaker jobs, polling status, downloading artifacts, listing jobs, and cost estimation. The included Python scripts implement the advertised functionality and the reference docs describe required IAM roles and S3 setup.
Instruction Scope
SKILL.md and references limit actions to packaging source, calling AWS (boto3) APIs, and running local smoke tests. The instructions require AWS credentials (via standard boto3 chain) and reference only expected paths and endpoints (S3, SageMaker, CloudWatch). There are no instructions to read unrelated system files or exfiltrate data to unexpected endpoints. The smoke test and packaging steps do create temporary files and upload to S3 as expected.
Install Mechanism
No install spec (instruction-only) — scripts rely on python3 and Python packages (boto3, optional sagemaker). Nothing is downloaded from arbitrary URLs or extracted; the skill ships its Python scripts and docs. This is a low-risk install pattern for this kind of tool.
Credentials
Primary credential declared is AWS_DEFAULT_REGION (region), and the skill relies on boto3's normal credential chain (instance profile, AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY, or configured profile). The SKILL.md and references clearly explain the need for AWS credentials and specific IAM roles. It would be clearer if required.env explicitly listed the possible credential env vars, but the current setup (using the standard boto3 chain and recommending instance profiles) is proportionate to the skill's purpose.
Persistence & Privilege
The skill is not always-enabled and allows user invocation. It does not request system-wide privileges or modify other skills. It performs normal actions (create S3 objects, call SageMaker APIs) with the provided AWS permissions; these are expected for the stated purpose.
Assessment
This skill appears to do exactly what it claims. Before installing or running it: 1) Be prepared to provide AWS credentials (prefer an EC2 instance profile or a scoped IAM user) and create two IAM roles with the least privilege needed for S3 and SageMaker as described; 2) Review the source packaging/dry-run output to avoid unintentionally uploading secrets (don’t point --source-dir at your home directory); 3) Run the smoke test in a controlled account/bucket to verify behavior and cost (it will submit a real SageMaker job and incur charges); 4) Ensure the Caller role is tightly scoped to your S3 bucket and the PassRole action is limited to the specific SageMaker execution role ARN. If you want stricter metadata, ask the maintainer to declare AWS credential env vars explicitly in requires.env so the platform makes credential needs clearer.

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

Binspython3
Primary envAWS_DEFAULT_REGION
latestvk97dhtj2h9dd1dkh4dzc4nj6xn848hwt
96downloads
1stars
3versions
Updated 3w ago
v1.0.2
MIT-0

SageMaker Training

Submit ML training jobs to AWS SageMaker from the command line. Supports PyTorch, TensorFlow, scikit-learn, and XGBoost with managed spot training for cost savings.

Prerequisites

  • boto3 Python package installed (pip install boto3). sagemaker recommended.
  • AWS credentials available — EC2 instance profile (recommended), or aws configure / env vars (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
  • S3 bucket for training artifacts
  • Two IAM roles configured — see references/setup.md for exact policies:
    • Role A (Caller): SageMaker job management + S3 access + ECR image pull
    • Role B (Execution): S3 data access + CloudWatch logs + ECR images

Security Notes

  • AWS credentials are never logged, embedded in scripts, or uploaded to S3. boto3 resolves credentials from the standard chain (instance profile → env → config file).
  • Source packaging excludes .git, .env, venv, __pycache__, and other non-essential files. Use --source-dir to explicitly scope what gets packaged. Always review --dry-run output before submitting to production.
  • IAM scope: Both caller and execution role policies should be scoped to your specific S3 bucket and SageMaker execution role ARN. See references/setup.md.

Quick Start

1. Write a training script

Follow the SageMaker training script contract: read data from SM_CHANNEL_TRAIN, save model to SM_MODEL_DIR. See references/training-scripts.md for templates.

2. Submit a training job

python3 scripts/sagemaker_train.py \
  --job-name my-experiment-001 \
  --script ./train.py \
  --role arn:aws:iam::ACCOUNT:role/SageMakerRole \
  --bucket my-sagemaker-bucket \
  --instance-type ml.g5.xlarge \
  --spot \
  --framework pytorch \
  --input-data s3://my-bucket/data/train/ \
  --hyperparameters '{"epochs":"50","lr":"0.001"}' \
  --output-dir ./results

The script packages your code, uploads to S3, submits the job, polls until complete, and downloads model artifacts to --output-dir.

3. Check cost

# Estimate before running
python3 scripts/sagemaker_cost.py --instance-type ml.g5.xlarge --duration 3600 --spot

# Check actual cost after job completes
python3 scripts/sagemaker_cost.py --job-name my-experiment-001

4. List recent jobs

python3 scripts/sagemaker_list.py --max 5
python3 scripts/sagemaker_list.py --status Failed

Key Options

FlagPurposeDefault
--spotManaged spot training (up to 70% savings)off
--instance-typeCompute instanceml.g5.xlarge
--max-runtimeKill job after N seconds3600
--frameworkpytorch, tensorflow, sklearn, xgboostpytorch
--image-uriCustom Docker image (overrides framework)auto
--requirementsrequirements.txt for extra depsnone
--dry-runPrint config without submittingoff
--no-waitSubmit and exit without pollingoff
--resume JOBReconnect to a running/completed job (skip submission)
--source-dirDirectory with all training codescript's parent
--input-dataS3 input(s), format: channel:s3://...none
--envJSON environment variables{}

Instance Selection

For tabular/Kaggle workloads:

  • Gradient boosting (LightGBM/XGBoost): ml.m5.2xlarge (CPU, $0.54/hr)
  • Small neural nets: ml.g4dn.xlarge (T4, $0.74/hr) — cheapest GPU
  • Standard deep learning: ml.g5.xlarge (A10G, $1.41/hr) — best price/performance
  • Heavy training: ml.p3.2xlarge (V100, $4.28/hr)

Always use --spot for non-urgent training — typical savings of 30-70%.

Workflow Integration

For autonomous agents running training jobs in a loop:

  1. Prepare data locally or upload to S3
  2. Write training script following the contract in references/training-scripts.md
  3. Use --dry-run first to validate config
  4. Submit with sagemaker_train.py — it blocks until completion by default
  5. Results download automatically to --output-dir
  6. Parse metrics from the output for experiment tracking

For parallel experiments, use --no-wait and poll with sagemaker_list.py.

Smoke Test

Verify the entire pipeline works end-to-end (~$0.01, takes ~3 min):

python3 scripts/sagemaker_smoke_test.py \
  --role arn:aws:iam::ACCOUNT:role/SageMakerTrainingExecutionRole \
  --bucket my-sagemaker-bucket

This runs a local pre-flight, submits a minimal job to SageMaker, verifies the downloaded model artifact, and checks cost. Use --keep to preserve output files.

Comments

Loading comments...