Test Time Compute Guide
v1.0.0Learn to enhance LLM performance using test-time compute with parallel sampling, sequential revision, and process reward models for better reasoning.
MIT-0
Security Scan
OpenClaw
Benign
high confidencePurpose & Capability
Name/description (test-time compute, CoT, PRMs) align with the content. Declared dependencies (Transformers, custom PRM) are reasonable for implementing the described techniques. No unrelated environment variables, binaries, or config paths are requested.
Instruction Scope
SKILL.md contains explanatory text and small example snippets (Python pseudocode) showing how to implement parallel sampling, sequential revision, and beam search with a PRM. It does not instruct the agent to read arbitrary files, access unrelated environment variables, or transmit data to unexpected external endpoints.
Install Mechanism
No install spec is provided (instruction-only), so nothing will be downloaded or written to disk by the skill itself. This is the lowest-risk installation model.
Credentials
The skill does not request credentials or environment variables. The suggested dependencies (Transformers, custom PRM) are appropriate for the task. There are no suspiciously broad or unrelated secret requirements.
Persistence & Privilege
always is false and the skill is user-invocable only. It does not request permanent presence or make changes to other skills or system-wide settings.
Assessment
This is a documentation/guide skill with example code — not an executable package. Before implementing the examples, review any third-party PRM or model code you incorporate (the guide mentions a "custom process reward model") and avoid sending sensitive data to remote APIs or untrusted models. Expect higher compute and token costs when using best-of-N/parallel sampling and beam-search strategies. If you later install or copy third-party PRM implementations, vet their source, dependencies, and network behavior before running them.Like a lobster shell, security has layers — review code before you run it.
ailatestllmreasoningtest-time-compute
License
MIT-0
Free to use, modify, and redistribute. No attribution required.
SKILL.md
test-time-compute-guide
Description
Master test-time compute and chain-of-thought reasoning techniques for LLMs. Learn how to effectively use "thinking time" to improve model performance through parallel sampling, sequential revision, and process reward models.
Implementation
Test-time compute (TTC) and Chain-of-Thought (CoT) have led to significant improvements in LLM performance. The core idea is enabling models to "think" longer before producing final answers, similar to human System 2 thinking.
Key Concepts:
- Parallel Sampling: Generate multiple outputs simultaneously and select the best using verifiers or process reward models
- Sequential Revision: Iteratively refine responses by asking the model to reflect on and correct mistakes
- Process Reward Models (PRM): Guide beam search candidate selection during decoding
- Self-Consistency: Use majority voting among multiple CoT rollouts when ground truth isn't available
When to Use Each Approach:
- Easier questions: Benefit from purely sequential test-time compute
- Harder questions: Perform best with optimal ratio of sequential to parallel compute
Code Examples
Example 1: Basic Chain-of-Thought Prompting
def cot_prompt(problem):
"""Generate chain-of-thought prompt for math problems"""
return f"""Solve this step by step:
Problem: {problem}
Let's think step by step:
"""
# Usage
problem = "What is 12345 times 56789?"
prompt = cot_prompt(problem)
Example 2: Best-of-N Sampling
import random
def best_of_n_sampling(model, prompt, n=5, scorer=None):
"""Generate N samples and return the highest scoring one"""
samples = []
for _ in range(n):
sample = model.generate(prompt, temperature=random.uniform(0.7, 1.2))
score = scorer(sample) if scorer else len(sample) # Simple length-based scoring
samples.append((sample, score))
return max(samples, key=lambda x: x[1])[0]
Example 3: Beam Search with Process Reward
def beam_search_with_prm(model, prm_model, prompt, beam_width=5, max_steps=10):
"""Beam search guided by process reward model"""
beams = [(prompt, 0.0)] # (sequence, cumulative_reward)
for step in range(max_steps):
candidates = []
for seq, reward in beams:
# Generate next tokens
next_tokens = model.generate_next_tokens(seq, top_k=beam_width)
for token in next_tokens:
new_seq = seq + token
# Get process reward for this step
step_reward = prm_model.evaluate(new_seq)
candidates.append((new_seq, reward + step_reward))
# Keep top beam_width candidates
beams = sorted(candidates, key=lambda x: x[1], reverse=True)[:beam_width]
return beams[0][0] # Return highest reward sequence
Dependencies
- Python 3.8+
- Transformers library (for LLM integration)
- Custom process reward model implementation
Files
2 totalSelect a file
Select a file to preview.
Comments
Loading comments…
