Numerai Tournament

v1.0.0

Autonomous Numerai tournament participation — train models, submit predictions, and earn NMR cryptocurrency.

0· 327·0 current·0 all-time
byVeselin Vasilev@obekt
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description ask for Numerai tournament functionality and the SKILL.md only requests Python, pip, and Numerai API credentials (public id + secret key) — all directly relevant to downloading datasets, training models, and submitting predictions.
Instruction Scope
Instructions describe creating a credentials file (~/.numerai/credentials.json) or setting the two environment variables, installing Python packages, downloading datasets from numer.ai, training LightGBM models locally, and submitting predictions via the NumerAPI client. There is no guidance to read unrelated files, exfiltrate arbitrary data, or post to endpoints outside numer.ai.
Install Mechanism
This is an instruction-only skill with no install spec or remote downloads; recommended installs are via pip (numerapi, lightgbm, etc.), which is appropriate and expected for this purpose.
Credentials
Only NUMERAI_PUBLIC_ID and NUMERAI_SECRET_KEY are required and the secret key is declared as the primary credential. No unrelated secrets or system config paths are requested.
Persistence & Privilege
The skill is not marked always:true and does not request persistent system-wide changes. Autonomous invocation is enabled by default (normal for skills) and is consistent with the skill's purpose of automated submissions.
Assessment
This skill appears coherent and requests only the Numerai API credentials and Python tooling it needs. Before installing, verify you trust the skill source (none provided here). Treat the NUMERAI_SECRET_KEY like any other API secret: store it securely (chmod 600, use a dedicated credentials file or environment variables), consider using a separate Numerai account with limited NMR if you plan to allow autonomous submissions, and review any executable code if the skill later adds code files. Expect large data (~8GB) and significant local compute for training; run in an environment where that is acceptable. If you want stronger assurance, ask the publisher for a code repository or a vetted implementation you can inspect before giving the secret key.

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

Binspython3, pip
EnvNUMERAI_PUBLIC_ID, NUMERAI_SECRET_KEY
Primary envNUMERAI_SECRET_KEY
cryptocurrencyvk976qyhhqzw5vyprv1r9fe8qqs81yg00data-sciencevk976qyhhqzw5vyprv1r9fe8qqs81yg00financevk976qyhhqzw5vyprv1r9fe8qqs81yg00latestvk976qyhhqzw5vyprv1r9fe8qqs81yg00lightgbmvk976qyhhqzw5vyprv1r9fe8qqs81yg00machine-learningvk976qyhhqzw5vyprv1r9fe8qqs81yg00numeraivk976qyhhqzw5vyprv1r9fe8qqs81yg00tradingvk976qyhhqzw5vyprv1r9fe8qqs81yg00
327downloads
0stars
1versions
Updated 1mo ago
v1.0.0
MIT-0

Numerai Tournament

Participate autonomously in the Numerai data science tournament. Numerai is a hedge fund that crowdsources stock market predictions from data scientists. You submit predictions on obfuscated financial data and earn (or lose) NMR cryptocurrency based on performance.

Overview

  • What: Predict stock market returns using obfuscated tabular features
  • How: Download data, train a model, submit predictions each round
  • Reward: Stake NMR on predictions; earn or lose based on correlation with targets
  • Frequency: New rounds open Tue–Sat at 13:00 UTC; scores resolve ~31 days later

Setup

1. Create a Numerai Account

# Visit https://numer.ai to sign up
# Then create API keys at https://numer.ai/account
# Store credentials:
mkdir -p ~/.numerai
cat > ~/.numerai/credentials.json << 'CREDS'
{
  "public_id": "YOUR_PUBLIC_ID",
  "secret_key": "YOUR_SECRET_KEY"
}
CREDS
chmod 600 ~/.numerai/credentials.json

Alternatively, set environment variables:

export NUMERAI_PUBLIC_ID="YOUR_PUBLIC_ID"
export NUMERAI_SECRET_KEY="YOUR_SECRET_KEY"

2. Install Dependencies

python3 -m venv venv && source venv/bin/activate
pip install numerapi lightgbm pandas numpy cloudpickle scikit-learn

On macOS ARM (Apple Silicon), LightGBM also requires:

brew install libomp

3. Download Tournament Data

from numerapi import NumerAPI
from pathlib import Path

napi = NumerAPI()  # No auth needed for data download
data_dir = Path("data")
data_dir.mkdir(exist_ok=True)

# Current dataset version is v5.2
napi.download_dataset("v5.2/train.parquet", dest_path=str(data_dir / "train.parquet"))
napi.download_dataset("v5.2/validation.parquet", dest_path=str(data_dir / "validation.parquet"))
napi.download_dataset("v5.2/live.parquet", dest_path=str(data_dir / "live.parquet"))
napi.download_dataset("v5.2/features.json", dest_path=str(data_dir / "features.json"))
napi.download_dataset("v5.2/live_benchmark_models.parquet", dest_path=str(data_dir / "live_benchmark_models.parquet"))

Note: Training data is ~8GB. Only live.parquet and features.json are needed for prediction.

Training a Model

The recommended approach is a LightGBM ensemble trained on multiple targets. This provides strong and stable performance.

Feature Selection

import json

with open("data/features.json") as f:
    feature_metadata = json.load(f)

# Three feature set sizes:
# "small"  — ~42 features (fast iteration)
# "medium" — ~780 features (good tradeoff)
# "all"    — ~2748 features (maximum signal, slow)
features = feature_metadata["feature_sets"]["medium"]

Target Selection

The main target is target. Additional targets improve ensemble diversity:

TargetDescription
targetPrimary tournament target
target_teager2b_20Current payout-correlated target
target_cyrusd_20Complementary target for ensemble diversity

LightGBM Training

import lightgbm as lgb
import pandas as pd
import pickle

train = pd.read_parquet("data/train.parquet", columns=["era"] + features + targets)

lgbm_params = {
    "n_estimators": 5000,       # Use 20000 for production quality
    "learning_rate": 0.005,
    "max_depth": 6,
    "num_leaves": 64,
    "min_child_samples": 5000,
    "colsample_bytree": 0.1,
    "subsample": 0.8,
    "subsample_freq": 1,
    "reg_alpha": 0.1,
    "reg_lambda": 1.0,
    "verbose": -1,
    "n_jobs": -1,
}

models = {}
for target in targets:
    X = train[features]
    y = train[target]
    mask = y.notna()
    model = lgb.LGBMRegressor(**lgbm_params)
    model.fit(X[mask], y[mask])
    models[target] = model

with open("models/ensemble_models.pkl", "wb") as f:
    pickle.dump(models, f)

Validation

Evaluate per-era correlation and Sharpe ratio:

val = pd.read_parquet("data/validation.parquet", columns=["era"] + features + targets)
predictions = pd.DataFrame(index=val.index)

for target, model in models.items():
    raw = model.predict(val[features])
    predictions[target] = pd.Series(raw, index=val.index).rank(pct=True)

ensemble = predictions.mean(axis=1).rank(pct=True)

corrs = []
for era in val["era"].unique():
    m = val["era"] == era
    pred_era = ensemble[m]
    tgt = val.loc[m, "target"]
    if tgt.notna().sum() >= 10:
        corrs.append(pred_era.corr(tgt))

corrs = pd.Series(corrs)
print(f"Mean Corr: {corrs.mean():.4f}")
print(f"Sharpe:    {corrs.mean() / corrs.std():.2f}")
print(f"% Positive: {(corrs > 0).mean() * 100:.1f}%")

Target validation performance: Mean Corr > 0.02, Sharpe > 1.0, >90% positive eras.

Submitting Predictions

Option A: Upload a Predictions CSV (Manual)

import json
from numerapi import NumerAPI

with open("~/.numerai/credentials.json") as f:
    creds = json.load(f)

napi = NumerAPI(creds["public_id"], creds["secret_key"])

# Check round status
current_round = napi.get_current_round()
is_open = napi.check_round_open()
print(f"Round {current_round}, Open: {is_open}")

if is_open:
    # Download live data
    napi.download_dataset("v5.2/live.parquet", dest_path="data/live.parquet")
    live = pd.read_parquet("data/live.parquet")

    # Generate predictions (same ensemble logic as validation)
    predictions = pd.DataFrame(index=live.index)
    for target, model in models.items():
        raw = model.predict(live[features])
        predictions[target] = pd.Series(raw, index=live.index).rank(pct=True)
    ensemble = predictions.mean(axis=1).rank(pct=True)

    # Save and submit
    submission = ensemble.to_frame("prediction")
    submission.to_csv("predictions.csv")
    napi.upload_predictions("predictions.csv", model_id="YOUR_MODEL_ID")

Option B: Upload a Model Pickle (Zero-Maintenance)

Upload a pickled function and Numerai runs it daily — no cron, no server.

Critical constraints for model upload:

  • Must be a pickled function (not a class), loaded via pd.read_pickle()
  • Must use Python 3.12 (Numerai's max supported version)
  • Must match Numerai runtime packages: lightgbm==4.5.0, numpy==2.1.3, pandas==2.3.1
  • Runtime limits: 1 CPU, 4GB RAM, 10 minute timeout
  • Use native LightGBM Boosters (not sklearn wrappers) to avoid dependency issues
# Build the upload pickle (run with Python 3.12!)
import cloudpickle
import lightgbm as lgb
import pandas as pd
import pickle

# Load trained sklearn models and extract native boosters
with open("models/ensemble_models.pkl", "rb") as f:
    sklearn_models = pickle.load(f)

boosters = {}
for name, model in sklearn_models.items():
    bstr = model.booster_.model_to_string()
    boosters[name] = lgb.Booster(model_str=bstr)

feature_cols = features  # medium feature set list
models = boosters

def predict(live_features: pd.DataFrame, live_benchmark_models: pd.DataFrame = None) -> pd.DataFrame:
    predictions = pd.DataFrame(index=live_features.index)
    for target, booster in models.items():
        raw = booster.predict(live_features[feature_cols])
        predictions[target] = pd.Series(raw, index=live_features.index).rank(pct=True)
    ensemble = predictions.mean(axis=1).rank(pct=True)
    return ensemble.to_frame("prediction")

with open("models/model_upload.pkl", "wb") as f:
    cloudpickle.dump(predict, f)

Then upload via the Numerai web UI at https://numer.ai or via API:

napi.upload_model("models/model_upload.pkl", model_id="YOUR_MODEL_ID")

Checking Performance

from numerapi import NumerAPI

napi = NumerAPI(public_id, secret_key)

# Round status
print(f"Current round: {napi.get_current_round()}")

# Get model performance (scores resolve after ~31 days)
# Check via https://numer.ai/models/YOUR_USERNAME

Tournament Rules & Key Facts

  • Dataset: v5.2 — obfuscated financial features, ~2748 total features
  • Rounds: Open Tue–Sat at 13:00 UTC. Weekday windows: ~1hr. Saturday: ~49hrs.
  • Scoring: 20D2L framework, ~31 day resolution
  • Payout formula: stake * clip(payout_factor * (0.75*CORR + 2.25*MMC), -0.05, +0.05)
    • CORR = correlation of predictions with target
    • MMC = meta-model contribution (originality bonus)
  • Staking: Optional — stake NMR to earn/lose based on performance. Start with 0 stake until the model proves consistent.
  • Current payout target: Resembles target_teager2b_20

Tips for Strong Performance

  1. Ensemble multiple targets — reduces variance, improves Sharpe
  2. Rank-normalize predictions — use .rank(pct=True) before averaging and after
  3. Use early stopping — prevent overfitting with lgb.early_stopping(300)
  4. Feature neutralization — improves MMC by decorrelating from common factors
  5. Era-aware validation — always evaluate per-era, never row-level metrics
  6. Don't overfit to validation — Numerai data is non-stationary; keep models simple

External Endpoints

This skill interacts with the following external services:

  • api.numer.ai — Numerai GraphQL API for round status, submissions, and scores
  • numer.ai — Data downloads (tournament datasets)

Security & Privacy

  • Your NUMERAI_PUBLIC_ID and NUMERAI_SECRET_KEY are sent to api.numer.ai for authentication
  • Predictions (stock return rankings) are uploaded to Numerai's servers
  • No other data leaves your machine
  • Store credentials in ~/.numerai/credentials.json with chmod 600 permissions

Comments

Loading comments...