Test Time Compute Guide

v1.0.0

Learn to enhance LLM performance using test-time compute with parallel sampling, sequential revision, and process reward models for better reasoning.

0· 11·0 current·0 all-time
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description (test-time compute, CoT, PRMs) align with the content. Declared dependencies (Transformers, custom PRM) are reasonable for implementing the described techniques. No unrelated environment variables, binaries, or config paths are requested.
Instruction Scope
SKILL.md contains explanatory text and small example snippets (Python pseudocode) showing how to implement parallel sampling, sequential revision, and beam search with a PRM. It does not instruct the agent to read arbitrary files, access unrelated environment variables, or transmit data to unexpected external endpoints.
Install Mechanism
No install spec is provided (instruction-only), so nothing will be downloaded or written to disk by the skill itself. This is the lowest-risk installation model.
Credentials
The skill does not request credentials or environment variables. The suggested dependencies (Transformers, custom PRM) are appropriate for the task. There are no suspiciously broad or unrelated secret requirements.
Persistence & Privilege
always is false and the skill is user-invocable only. It does not request permanent presence or make changes to other skills or system-wide settings.
Assessment
This is a documentation/guide skill with example code — not an executable package. Before implementing the examples, review any third-party PRM or model code you incorporate (the guide mentions a "custom process reward model") and avoid sending sensitive data to remote APIs or untrusted models. Expect higher compute and token costs when using best-of-N/parallel sampling and beam-search strategies. If you later install or copy third-party PRM implementations, vet their source, dependencies, and network behavior before running them.

Like a lobster shell, security has layers — review code before you run it.

aivk97e3hra87f57e6w82snt2gxyd8416qblatestvk97e3hra87f57e6w82snt2gxyd8416qbllmvk97e3hra87f57e6w82snt2gxyd8416qbreasoningvk97e3hra87f57e6w82snt2gxyd8416qbtest-time-computevk97e3hra87f57e6w82snt2gxyd8416qb

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

SKILL.md

test-time-compute-guide

Description

Master test-time compute and chain-of-thought reasoning techniques for LLMs. Learn how to effectively use "thinking time" to improve model performance through parallel sampling, sequential revision, and process reward models.

Implementation

Test-time compute (TTC) and Chain-of-Thought (CoT) have led to significant improvements in LLM performance. The core idea is enabling models to "think" longer before producing final answers, similar to human System 2 thinking.

Key Concepts:

  • Parallel Sampling: Generate multiple outputs simultaneously and select the best using verifiers or process reward models
  • Sequential Revision: Iteratively refine responses by asking the model to reflect on and correct mistakes
  • Process Reward Models (PRM): Guide beam search candidate selection during decoding
  • Self-Consistency: Use majority voting among multiple CoT rollouts when ground truth isn't available

When to Use Each Approach:

  • Easier questions: Benefit from purely sequential test-time compute
  • Harder questions: Perform best with optimal ratio of sequential to parallel compute

Code Examples

Example 1: Basic Chain-of-Thought Prompting

def cot_prompt(problem):
    """Generate chain-of-thought prompt for math problems"""
    return f"""Solve this step by step:

Problem: {problem}

Let's think step by step:
"""

# Usage
problem = "What is 12345 times 56789?"
prompt = cot_prompt(problem)

Example 2: Best-of-N Sampling

import random

def best_of_n_sampling(model, prompt, n=5, scorer=None):
    """Generate N samples and return the highest scoring one"""
    samples = []
    for _ in range(n):
        sample = model.generate(prompt, temperature=random.uniform(0.7, 1.2))
        score = scorer(sample) if scorer else len(sample)  # Simple length-based scoring
        samples.append((sample, score))
    
    return max(samples, key=lambda x: x[1])[0]

Example 3: Beam Search with Process Reward

def beam_search_with_prm(model, prm_model, prompt, beam_width=5, max_steps=10):
    """Beam search guided by process reward model"""
    beams = [(prompt, 0.0)]  # (sequence, cumulative_reward)
    
    for step in range(max_steps):
        candidates = []
        for seq, reward in beams:
            # Generate next tokens
            next_tokens = model.generate_next_tokens(seq, top_k=beam_width)
            for token in next_tokens:
                new_seq = seq + token
                # Get process reward for this step
                step_reward = prm_model.evaluate(new_seq)
                candidates.append((new_seq, reward + step_reward))
        
        # Keep top beam_width candidates
        beams = sorted(candidates, key=lambda x: x[1], reverse=True)[:beam_width]
    
    return beams[0][0]  # Return highest reward sequence

Dependencies

  • Python 3.8+
  • Transformers library (for LLM integration)
  • Custom process reward model implementation

Files

2 total
Select a file
Select a file to preview.

Comments

Loading comments…