X2strategy

v0.1.1

ALAGENT X2Strategy: any research input (PDF paper, Markdown draft, DOCX report, text notes, or keyword search) → structured strategy specification → executab...

1· 57· 2 versions· 0 current· 0 all-time· Updated 3d ago· MIT-0

byALAGENT-HKU@patrick-lew

X2Strategy

Any research input → Strategy spec → Executable code → Backtest → Diagnosis.

Capabilities

Capability	What it does	Deep dive
paper2spec	Any document (PDF/MD/DOCX/TXT) → structured strategy specification	references/paper2spec.md
spec2code	Strategy spec → Backtrader code → validate → backtest → diagnosis	references/spec2code.md

Input format auto-detected from extension:

Format	Extension	Notes
PDF (papers)	`.pdf`	PyMuPDF → Mode A (direct) or Mode B (FAISS)
Markdown (drafts)	`.md`, `.markdown`	Direct text read
DOCX (reports)	`.docx`	python-docx (requires `uv sync --extra docx`)
Plain text	`.txt`	Direct read

Interaction Principles

You are the executor. The user is the requester.

Run tools silently, present results and insights in natural language.
Never show CLI commands (uv run python scripts/...) unless user asks.
Offer next actions conversationally: "Would you like me to implement the second strategy as well?"

When reporting results, focus on what you found, not how:

❌ Bad:  "I ran `uv run python scripts/analyze.py paper.pdf` and got 3 strategies."
✅ Good: "This paper contains 3 independent strategies: [1] minimum distance method, [2] ADF stationarity, and [3] Johansen cointegration. Which one should I implement?"

Use interactive tools aggressively. When your platform provides interactive question tools — vscode_askQuestions (VS Code Copilot), AskUserQuestion (Claude Code), or equivalent — use them for ALL user-facing choices. Interactive tools present clickable options, which is faster and less error-prone than asking the user to type.

Apply interactive tools to:

First-Run Setup choices (workspace path, API provider, key input)
Gate 1 confirmation (proceed / adjust settings)
Gate 2 action menu (implement / deep dive / compare / adjust / export / re-extract)
Search result selection (pick papers from a numbered list)
Any scenario where the user picks from options

If no interactive tool is available, fall back to numbered text menus.

First-Run Setup

On first use, walk through three steps. Skip any already-configured step. Persist all choices to .env (gitignored) for session stability.

Step 1 — Workspace Location

Present choice via interactive tool:

./library/ (default, recommended)
Custom path

Write PAPER2SPEC_LIBRARY_PATH=/absolute/path to .env. Scan the directory for existing metadata.json to detect prior analyses.

Step 2 — LLM API Key

Check env for DEEPSEEK_API_KEY, OPENROUTER_API_KEY, OPENAI_API_KEY. If none found, present via interactive tool:

An LLM API key is required for strategy extraction and code generation. Recommended options:
  1. DeepSeek (best cost-performance, about ¥0.7 per paper) → https://platform.deepseek.com
  2. OpenRouter (one key for access to multiple models) → https://openrouter.ai/keys
Please provide your API key and tell me which provider it belongs to.

Do NOT check for or suggest ANTHROPIC_API_KEY.

Once received, write key + matching model to .env, then verify: uv run python -c "from paper2spec.llm import chat; print(chat('Say OK'))".

See references/skill-internals.md for .env format examples per provider.

Step 3 — Python Environment

cd <skill-path>
uv sync --all-extras    # Recommended: installs everything

If uv unavailable: pip install -e ".[codegen,agent,dev]". Always use uv run to execute scripts (auto-activates correct venv).

See references/skill-internals.md for selective install options and non-uv alternatives.

Completion

Once configured, confirm naturally with examples:

✅ Setup complete. You can now ask me for tasks directly, for example:

  • "Analyze this paper" + attach a PDF file
  • "Search for papers about momentum trading"
  • "Implement this strategy based on this paper" + provide the file path
  • "I wrote a strategy draft in Markdown; extract the spec and generate code"
  • "Compare the strategy differences between these two papers"

Just tell me what you want to do, and I will handle the rest.

Routing

User Intent	Route	Action
"Analyze this paper/doc"	paper2spec	Parse + extract specs
"Search for papers about X"	paper2spec	Search → Gate 1
"Here's my strategy draft" (MD/DOCX/TXT)	paper2spec	Auto-detect format, extract
"Generate code / Implement this"	spec2code	Spec → code → validate → backtest
"Run a backtest"	spec2code	Execute strategy.py
"End to end from paper"	both	paper2spec → Gate 2 → spec2code
"Compare results with paper"	spec2code	Read backtest output + spec, compare metrics

Interaction Gates

Two mandatory HITL gates. Skip only when user says "fully automatic" / "end to end without stopping".

Always present gate choices through interactive tools when available.

Gate 1 — Input Confirmation

When: After receiving/finding input, BEFORE extraction.

Three scenarios — present via interactive tool (or numbered text menu):

Scenario A — User provided a file:

📄 Received: "Tactical Asset Allocation" (Faber, 2007)
   Format: PDF, 18 pages
   Abstract: [first 2 sentences]

I'll extract trading strategies. ~30-60s, ~$0.01.
→ Proceed with extraction?
→ Or adjust settings first? (parser mode, model, output location)

Scenario B — Search results returned:

🔍 Found 8 papers for "momentum trading strategy":
  1. ⭐ "Time Series Momentum" (Moskowitz et al., 2012) — 847 citations
  2. "Momentum Crashes" (Daniel & Moskowitz, 2016) — 523 citations
  ...
Which paper to analyze? (pick number, "1, 3" for multiple, or refine search)

Do NOT auto-analyze. Always let user pick.

Scenario C — Raw text / strategy idea:

📝 I see you've described: "[brief summary]"
   I'll structure this into a formal spec. → Proceed? → Add more details first?

Keep it light for straightforward inputs — single confirm with default-proceed.

Gate 2 — Spec Review & Action Menu

When: After extraction completes, BEFORE code generation.

Show extraction summary, then present action menu via interactive tool:

✅ Strategy Extraction Complete
📋 Paper: "Pairs Trading: Does Volatility Timing Matter?"
   Detected: 3 independent strategies

   [1] Minimum Distance Method
       • 4 indicators (spread, SMA, Z-score, distance)
       • Entry: spread Z-score > 2σ, Exit: mean reversion

   [2] Stationarity-Based (ADF Test)
       • 3 indicators, Entry: cointegrated pair + spread deviation

   [3] Cointegration (Johansen)
       • 5 indicators, Entry: Johansen test + Z-score threshold

Then 6 actions:

🚀 Implement — Generate executable code (pick strategy # or "all")
🔍 Deep dive — Explain a strategy's logic in detail
📊 Compare — Side-by-side of detected strategies
✏️ Adjust — Modify spec parameters/constraints
💾 Export only — Save specs, stop here
🔄 Re-extract — Different model or parser mode

Key behaviors:

"Implement" → confirm which strategy index before generating code.
"Deep dive" → explain, then return to the same menu.
After code gen + backtest → present results, offer next decision.
Never silently chain extraction → code generation.

Gate Bypass

If user says "end to end" / "fully automatic" / "don't stop", collapse gates into inline status:

📄 Parsing paper... ✓ (3 strategies detected)
💻 Generating code for strategy 1... ✓
📊 Running backtest... ✓
📈 Results ready — see below.

Still stop on unexpected issues (0 strategies, errors, validation failures).

Agent Pipeline Flow

1. Receive input (file / search query / text)
2. ── Gate 1: Input Confirmation ──
3. [paper2spec] Parse document, extract strategy specs
4. ── Gate 2: Spec Review & Action Menu ──
5. User selects strategy + action
6. [spec2code] For each selected strategy:
   a. Read spec.json + reference docs
   b. Generate self-contained Backtrader strategy.py
   c. Validate (AST + structural checks)
   d. Run backtest, compare metrics vs paper
7. Present results + diagnosis
8. Offer next actions

For code generation patterns: references/spec2code.md For Backtrader patterns: references/backtrader_patterns.md

Internal Toolchain

Agent-only. Run silently; present results in natural language.

# End-to-end: any document → spec
uv run python scripts/analyze.py <file> -o library/<slug>/

# Validate generated code
uv run python scripts/validate_strategy.py library/<slug>/strategy_1.py

# Run backtest
uv run python library/<slug>/strategy_1.py

# Search papers
uv run python scripts/search.py "<query>" -n 5

# Step-by-step
uv run python scripts/parse.py <file> -o content.json
uv run python scripts/extract.py content.json -o spec.json

For full flags, output formats, and library management: references/skill-internals.md

Configuration

Variable	Default	Purpose
`PAPER2SPEC_LIBRARY_PATH`	`./library`	Output root
`PAPER2SPEC_MODEL`	`openai/gpt-4o-mini`	Default LLM (litellm-supported)
`DEEPSEEK_API_KEY`	—	DeepSeek (recommended)
`OPENROUTER_API_KEY`	—	OpenRouter (multi-model)
`OPENAI_API_KEY`	—	OpenAI direct
`PAPER2SPEC_ARXIV_MIN_INTERVAL`	`3.0`	Seconds between arXiv requests
`PAPER2SPEC_SEARCH_MAX_RETRIES`	`3`	Retry on HTTP 429/5xx

Any litellm-supported model works. The --model flag on any script overrides PAPER2SPEC_MODEL. Full config + .env examples: references/skill-internals.md

References

Read on demand for implementation details:

references/paper2spec.md — Parser modes, multi-strategy detection, output schemas
references/spec2code.md — Code generation workflow, Backtrader patterns
references/skill-internals.md — Script flags, output formats, .env examples, library management, project structure
references/backtrader_patterns.md — Strategy class, data loading, position sizing
references/indicator_cookbook.md — Built-in and custom indicators
references/data_sources.md — yfinance, akshare, FRED API

Limitations

Mode A truncates at 100K chars (first 90K + last 10K). Use Mode B for >100 page papers.
Tables/formulas: not yet extracted from PDFs.
Multi-strategy: conservative — may merge borderline-distinct strategies.
DOCX: paragraph text only (tables, images not preserved — use PDF for rich docs).
SSRN search: best-effort HTML scraping, may break on layout changes.

Version tags

latestvk97b5mqscwvd08ccaa9p04d2nn85kkd9