Paper Results Reverse Engineer

Workflows

Stable v3.0.3 release of the psychology Results reverse-engineering skill. Uses three-axis classification, study-profile-first workflow, design-adaptive branches (A–I), source verification, causal-language-by-design guardrails, anti-template-contamination checks, and branch-specific rules for experiments, surveys, RCTs, psychometrics, meta-analyses, neuroimaging coordinate-based meta-analyses, qualitative studies (thematic analysis, IPA, GT, CGT), and methodological simulations. Default output: standard mode. Optional Module H Writer Transfer Packet for downstream use with academic-results-writer.

Install

openclaw skills install paper-results-reverse-engineer

Paper Results Reverse Engineer (v3.0.3)

Deconstruct and learn from the Results section of psychology papers across all subfields. The Results section is a guided narrative, not a data dump — this skill reverse-engineers that narrative.

Supported subfields: Cognitive / Social / Personality / Developmental / Educational / Clinical / Psychometrics / Cognitive Neuroscience / fMRI / EEG / Meta-analysis / Qualitative / Mixed methods / Methodological / Simulation.

When to Use

Activate when the user pastes a Results section, uploads a PDF, provides figure captions + paragraphs, or requests "拆这篇结果部分" / "这张图怎么讲" / writing-strategy extraction / PPT scripts / statistical reporting checks.

Output Depth Modes

ModeTriggerOutput
quick"快速看一下" / "大概拆一下"Study Profile + B + D (core figures) + E + self-check. No Module C or F.
standard (default)(no mode specified) / "正常生成"Study Profile + A–G. Module C: paragraph/cluster level (2–4 clusters per ¶). Module F: PPT page suggestions + one-liners + evidence boundaries only.
close-reading"逐句拆解" / "完整精读" / "做 PPT" / "汇报讲稿"Study Profile + A–G at max depth. Module C: sentence-level. Module F: full verbatim scripts + Q&A + backup slides. Phased execution allowed for long papers.

Mode-adaptive chat prompt (mandatory): After each output, add one line indicating the current mode and available alternatives. See docs/execution-constraints.md for the full template.

Execution Constraints (Hard Limits)

  1. Self-Check: Max 1 per file. Auto-Patch: Max 1 after failed check, then stop and report.
  2. No recursive self-invocation — prompt user for new analysis rounds.
  3. PDF read limit: 2 reads per round max.
  4. Chat output: file-first — only path + 3–5 core findings + self-check + manual-review items.
  5. Context overflow: stop, prompt user to split into phases.

Full details: docs/execution-constraints.md

Supporting-File Loading Policy (Mandatory)

Before executing any branch-specific rule, module specification, or guardrail that references a docs/ file, read the corresponding file. The condensed rules in this SKILL.md are summaries; the full validated rule set is in docs/.

Docs reading table — read the file when the trigger condition is met:

TriggerRead
Branch B/C/E/F activedocs/branch-a-b-c-d-e-f.md
Branch G activedocs/branch-g-meta-analysis.md
Branch H activedocs/branch-h-qualitative.md
Branch I activedocs/branch-i-simulation.md
Module H triggereddocs/module-h-spec.md
G0–G8 self-check / source verification / B0 heading detectiondocs/source-verification.md
Causal-language audit / PPT three-layer separationdocs/causal-language-guardrails.md
G3 anti-template contamination checkdocs/anti-template-contamination.md
Long PDF / phased execution / Phase cleanupdocs/execution-constraints.md

Fail-open rule: If the required supporting file cannot be accessed, do NOT claim the full detailed rule set was applied. Continue with the condensed SKILL.md rules and explicitly report: supporting-file unavailable; condensed-rule mode used.

Input Types

#InputHandle by
1Full Results sectionStudy Profile → Modules A–G
2Single Results subsectionB–E; flag missing context
3Figure caption + paragraphD + C; brief Study Profile
4Abstract + Methods + ResultsStudy Profile → Full A–G
5PDF excerpt (pasted)Treat as type 1–3
6PDF file uploadpdftotext → detect Results → A–G
7Figure/table screenshotVision model → cross-ref caption/text → D
8Open-ended requestClarify; route to appropriate modules

Prompt templates: references/prompt-templates.md


Workflow: Study Profile First

Phase 0.3: Three-Axis Classification (Mandatory — Before Study Profile)

Classify every paper on three independent axes before filling the Study Profile. Never use single-axis labels.

AxisNameValues (non-exhaustive)
Axis 1Article TypeExperiment / Survey / Longitudinal / RCT / Psychometric validation / Meta-analysis / Methodological simulation / Qualitative / Mixed / Review
Axis 2Substantive DomainCognitive / Social / Developmental / Educational / Clinical / Cognitive Neuroscience / fMRI / EEG / Psychometrics / Health / Sleep / Meta-science
Axis 3Data/Method ModalityBehavioral accuracy/RT / Questionnaire scores / Clinical diagnosis / fMRI activation / fMRI RSA/MVPA / ALE coordinates / EEG/ERP / SEM/mediation / Meta-analytic ES / Monte Carlo simulation / Qualitative themes

Axis 1 determines Adaptive Branch (A–I). Axis 2 and 3 are secondary tags guiding terminology, chart types, and interpretation boundaries.

Critical distinction: Never conflate Axis 1 with Axis 2. Studying meta-analytic methods ≠ doing a meta-analysis (→ Branch I, not G). fMRI experiment ≠ fMRI coordinate-based meta-analysis (→ Branch F, not G subbranch).

Study Profile Template

## Study Profile

### 三轴分类
| 轴 | 类别 | 值 | 来源 |
|----|------|----|------|
| Axis 1 | Article Type | ... | [原文Methods] |
| Axis 2 | Substantive Domain | ... | [原文推断] |
| Axis 3 | Data/Method Modality | ... | [原文Methods] |
| Primary Branch | Branch A–I | ... | [教学性说明] |

### 基本信息
| 维度 | 内容 | 来源 |
|------|------|------|
| 样本信息 | N, population, age, sex, inclusion/exclusion | [原文Methods] |
| 任务或测量工具 | Task/questionnaire/interview/intervention | [原文Methods] |
| 核心变量 | IV/DV or predictor/outcome or mediator/moderator | [原文Methods] |
| 主要统计方法 | t/ANOVA/regression/SEM/meta-analysis/thematic analysis etc. | [原文Methods] |
| Results 小节标题 | (list all Results subsection titles) | [原文直接报告] |
| 核心表格和图表 | (Table/Figure numbers + brief description) | [caption] |
| 理论/模型预期 | (Introduction 中的理论预测 — NOT study's own hypotheses) | [原文Introduction] |
| 本研究直接检验的问题 | (Author's explicit questions in Introduction/Results) | [原文直接报告] |
| ⚠️ 假设性质说明 | (If "Intriguingly"/"Surprisingly" appear, note possible non-a-priori) | [正文推断] |
| Results 直接发现 | (1-2 sentence summary) | [原文直接报告] |
| Discussion 中的解释 | (Author's interpretation in Discussion) | [原文Discussion] |

Rules: Every field with source tag. N from Methods, never from df. [无法确定] if unavailable. Split hypothesis fields: 理论预期 / 检验问题 / Discussion 解释 are distinct. Task terminology must match paper (recognition ≠ recall).

Phase 0.5: Evidence Validation Rules (Mandatory)

Rule 1: Day/Session Strong Evidence Rule

Only write "Day1/Day2" / "两天实验" when the paper explicitly uses these markers. For procedural "then/after/subsequently" without day markers → use phase-based description. Also never infer "同一天" from absence of markers.

Rule 2: Stimulus Pool vs Actual Task Exposure

Separate "候选材料池" from "实际任务数量." If Methods says "12 videos created, participants viewed 10" → report "10 videos per house," not "12 videos."

Rule 3: Study Design Taxonomy

Use precise labels. Never write "observational" for controlled laboratory tasks. "Within-subject experiment" is valid. Only use "RCT" with random assignment. Only use "observational" with no stimulus/condition manipulation. Distinguish cluster vs individual randomization.

Rule 4: Closed-Loop Phase Precision Guardrail

For CL-TMR / closed-loop auditory stimulation papers, always separate into four timing components:

  • (a) Detection phase — when the algorithm detects a target event (e.g., SO up-state)
  • (b) Stimulus onset delay — fixed or variable delay between detection and stimulus delivery
  • (c) Stimulus duration — actual length of the auditory stimulus
  • (d) Actual stimulation phase variability — where does the sound actually fall relative to the ongoing oscillation?

Never infer "down-state stimulation" unless the paper explicitly reports it. Never infer "phase-locked stimulation" unless the paper reports measured phase precision metrics.

Supplementary guardrail: If Supplementary material contains phase analysis but was not read in quick/standard mode, mark ⚠️ Supplementary phase analysis not read; actual stimulation phase unverified in G5 manual review. Never draw conclusions about stimulation phase from the main text alone.

Rule 5: Sham-Control Trial Type Distinction

Distinguish between:

  • Physiological sham/control trials — within-participant control trials where target EEG events are detected but no stimulus is delivered (e.g., "sham" in CL-TMR)
  • Behavioral control conditions — separate experimental conditions manipulating task parameters during wake
  • Active acoustic control — a different sound delivered during sleep (e.g., white noise, reversed speech)

If sham = no sound (silent SO detection), do NOT label it as "active control" or "acoustic control." Label it as "physiological sham (no stimulus delivered)." Note the limitation: silent sham cannot control for non-specific arousal effects of sound presentation.


Phase 1: Adaptive Branch Selection

Based on Axis 1 (Article Type):

BranchArticle TypeKey Focus
AExperiment with random assignmentManipulation check, main effect, interaction, simple effects, post-hoc, ES
BSurvey / CorrelationalDescriptive, reliability, correlation, regression, mediation, moderation
CIntervention / RCTBaseline, CONSORT flow, primary outcome, secondary, AE, follow-up
DDevelopmental / EducationalAge/grade differences, growth curve, multilevel, measurement invariance
EPsychometric / Scale DevelopmentItem analysis, EFA/CFA, reliability, validity (convergent/discriminant/criterion), invariance
FNeuroimaging / fMRI / EEGTask phase, neural measure, ROI/electrode, activation/RSA/ERP, multiple comparison correction
GMeta-analysis / Systematic ReviewInclusion/exclusion, k, pooled ES, heterogeneity (Q/I²/τ²), moderator, bias, sensitivity
HQualitativeCoding, themes, subthemes, quotes, saturation, triangulation, reflexivity
IMethodological / SimulationSimulation factors, performance metrics (Type I error, power, RMSE, coverage), method comparison

Branch-Specific Key Rules

BranchKey focusFull spec
B (Survey)Cross-sectional mediation guardrail, hypothesis direction, measurement quality, internal inconsistency (B1–B9)docs/branch-a-b-c-d-e-f.md
C (RCT)AE/safety, clinical significance 6-layer, active comparator, Module B 14-block (C1–C6, C1a–C1h)docs/branch-a-b-c-d-e-f.md
D (Developmental)Age/group comparisons, longitudinal wording, nesting, measurement invariancedocs/branch-a-b-c-d-e-f.md
E (Psychometric)Evidence taxonomy, diagnostic wording, cutoff, classic scale rule, table orientation (Rules 1–9)docs/branch-a-b-c-d-e-f.md
F (fMRI/EEG)Task-phase, correction method, ROI source, brain-behavior wording, mechanism guardraildocs/branch-a-b-c-d-e-f.md
G (Meta-analysis)Moderator guardrail, PRISMA, publication bias, coordinate-based meta subbranch (G1–G17)docs/branch-g-meta-analysis.md
H (Qualitative)Theme detection, reflexivity grading, intercoder reliability, demographic audit, IPA/GT/CGT subtypes (H1–H23)docs/branch-h-qualitative.md
I (Simulation)N/A rule, heatmap precision, evidence boundary, anti-template 4-tier (I1–I6)docs/branch-i-simulation.md

Phase 2: Modules A–G

All module content references the Study Profile and selected branch. Never carry over terms or statistics from a previous paper.

Module A: Study Profile Extended

The Study Profile from Phase 0, extended with three-axis fields first, then traditional fields. Always use source tags.

Module B: Results Structure Map

For each subsection/paragraph cluster: subsection title, question answered, data/analysis used, corresponding table/figure, main result (1–2 sentences), author's intended conclusion, annotation (original heading vs teaching supplement).

B0: Results Heading Detection Rule (Universal): Scan for ALL heading signals (bold, standalone phrases, Title Case, functional labels). Do not rely on Markdown ##/###. Always separate "原文显式小节标题" from "Skill 教学性补充分块 [教学性补充]". Never write "原文无显式小节标题" without full-text scan. Full specification: docs/source-verification.md

Module C: Results Paragraph/Sentence Annotation

Print label legend first. Then annotate per mode: quick → skip; standard → paragraph/cluster level (2–4 clusters per ¶, function label + one note); close-reading → sentence-level with individual annotations.

14 Function Labels: 1-Restate aim/Q | 2-Restate method | 3-Overview trend | 4-Invite to view figure/table | 5-Report specific result | 6-Report statistical evidence | 7-Evaluative emphasis | 8-Compare with prior work | 9-Compare with prediction/model | 10-Explain/interpret | 11-Note non-significant/inconsistent | 12-Acknowledge limitation | 13-Hint at implication | 14-Transition to Discussion

Label Rules: L1 — "presented in"/"see Fig." → Label 4 takes priority. L2 — missing-data/cannot-compute → Label 11+12. L3 — dual-purpose sentences may carry multiple labels. L4 — "Interestingly"/"Surprisingly" → add Label 7 + flag as potentially exploratory.

Detailed examples: references/function-labels.md

Module D: Table/Figure Explanation

For core figures/tables: question answered, structure, author's guide sentence, key pattern, primary vs auxiliary, PPT narrative logic, 1-minute script (Chinese), easily misinterpreted points.

Figure analysis modes: Core hypothesis figures → full image mode (vision model). Supplementary → caption + body text mode. Flag: ⚠️ 未对此图进行图像分析 and use [caption] / [正文推断] tags only. Figure fallback rule: if image recognition fails → ⚠️ 图像识别失败; describe only what caption/body text confirms; never fabricate visual details.

Module E: Evidence Strength & Interpretation Boundary

Three layers separated: 原文直接结果 [原文直接报告] / 作者解释 [原文Discussion] / 教学性总结 [教学性说明].

Seven items: 1) Core claim 2) Evidence type 3) Alternative explanations 4) Evidence chain strength 5) Causal language audit 6) Missing links 7) What this study does NOT prove.

See docs/causal-language-guardrails.md for the full causal language ladder and three-layer separation rules.

Module F: PPT / Presentation Scripts

Output depth per mode. All modes enforce: three-layer separation (Result/Interpretation/Teaching), causal language check, branch-specific presentation angles. PPT scripts must never present Discussion interpretation as Results fact.

See docs/causal-language-guardrails.md for PPT causal language check rules.

Module G: Self-Check & Anti-Template Contamination

G0: Source verification — compare generated claims against original paper (not generated file). Use the verification template with verbatim source quotes. G1: File completeness — search for truncated/TODO/待补充. G2: Module completeness checklist. G3: Anti-template contamination — Tier a (pollution, delete) / b (method background, allow) / c (N/A contrast, allow) / d (audit checklist only, allow). G4: Task type confusion check. G5: Manual review with Critical/Important/Minor grading. G6: Time-structure audit. G7: Source verification audit. G8: Three-axis classification self-check.

Full specification: docs/source-verification.md and docs/anti-template-contamination.md


Source Attribution Conventions

TagMeaning
[原文直接报告]Directly from Results/Methods
[原文Discussion]Author's interpretation from Discussion
[原文Methods]Factual details from Methods
[图片识别]Read from figure via vision model
[正文推断]Inferred from body text
[教学性说明]Agent's educational commentary
[无法确定]Cannot determine from available sources

Author-disclosed vs skill-inferred rule: Never tag a limitation as [原文Discussion] unless authors explicitly state it. Full rules: docs/source-verification.md.


Statistical Language Rules

  • Derived Clinical Metric Rule: NNT/NNH/ARR/RR/OR/d calculated by skill → [Calculated by skill / 教学性计算].
  • Standardized Effect Size Precision Rule: When no Cohen's d/OR/RR reported → "No standardized between-group effect size was reported" (NOT "No effect size reported"). List what clinical effect information WAS reported.
  • Never fabricate statistics. Never rewrite one statistic as another (r ≠ t ≠ F). For model fit, report multiple indices, not just χ².
  • Full spec: docs/causal-language-guardrails.md

Causal Language Ladder (Summary)

DesignAllowedProhibited
Cross-sectional / Correlational / Survey相关、关联、预测导致、影响、证明机制
Experimental (random assignment)操纵X导致Y差异(still note boundary conditions)
LongitudinalX预测后续YX导致Y变化 (without experiment)
RCT干预效果显著(note attrition, baseline, blinding)
Meta-analysis总体证据显示、pooled effect 提示单一实验因果证明、证明方案最优
Qualitative主题显示、参与者叙述反映统计因果
Simulation在这些模拟条件下证明某方法最好、证明某效应不存在

Universal prohibition (all non-manipulation studies): "证明" / "直接导致" / "确定是因为". Full specification: docs/causal-language-guardrails.md

Mechanism-Wording Guardrail for EEG/ERP/ERSP Studies

When describing brain-behavior relationships in EEG/ERP/ERSP/MEG/fMRI studies:

  • Prohibited: "mediate/mediates/mediation" — unless the paper explicitly reports a formal statistical mediation model (e.g., bootstrap indirect effect, Sobel test, SEM path model)
  • Use instead: "correlate of," "marker of," "associated with," "may be related to," "predictor of" (for within-subject time-frequency analyses), "electrophysiological signature of"
  • When a formal mediation IS reported: still audit whether temporal precedence can be established (EEG data within same sleep epoch may not satisfy mediation assumptions)
  • Applies to: Module B (results structure), Module C (paragraph commentary), Module D (figure narration), Module E (evidence strength), Module F (PPT scripts), Study Profile (理论预期 field)

Module H: Writer Transfer Packet (Optional)

Compressed transfer packet for academic-results-writer Target-paper Results Style Adaptation Mode. Triggered by user request for "写作迁移包" / "给 academic-results-writer 使用".

Structure: H1-Source Identity, H2-Design Transfer Summary (with compatibility rating), H3-Results Organization Template (Transfer/Partial/Do not transfer), H4-Paragraph Writing Patterns (abstracted), H5-Figure/Table Narrative Patterns, H6-Results–Discussion Boundary, H7-Risk Flags, H8-Recommended Writer Mode.

Constraints: 1–2 pages max. No target paper original sentences — abstract function labels only. No target paper statistics for writer to apply. All target paper risks in H7. Partial extraction → H1 must mark coverage: partial. Design-incompatible → H8 must recommend fallback.

Full specification: docs/module-h-spec.md


File Output Template

→ Module A–H headers listed in Phase 2 above. Metadata Date Safety Rule: never fabricate generation date; use [无法确定] if not reliably confirmable via date/session_status.


Do-Not Rules (Core)

See Failure Modes table below for full list. Most critical:

  • ❌ Don't invent data / fabricate statistics / pull N from df / fabricate generation date.
  • ❌ Don't write Discussion as Results (three-layer separation).
  • ❌ Don't write correlation as causation; don't use "mediate" for EEG/ERP without formal mediation model.
  • ❌ Don't carry over previous paper terms (G3 anti-template contamination).
  • ❌ Don't mislabel: simulation ≠ meta-analysis (I vs G); lab task ≠ observational; sham ≠ active control.
  • ❌ Don't infer stimulation phase / day-session / metadata without explicit paper evidence.
  • ❌ Don't skip Study Profile, Module G, or label legend in Module C.
  • ❌ Don't print full analysis in chat (file-first). Self-check against original paper, not generated output.

Failure Modes (Summary)

FailurePrevention
Fabricated statisticsEnforce [无法确定]
N from dfPull from Methods
Template pollutionG3 search
Discussion → ResultsThree-layer separation
Correlation → causationCausal ladder
"Mediate" without mediation modelMechanism-wording guardrail
Phase inference without SupplementaryClosed-loop phase precision guardrail
Sham = "active control"Sham-control distinction rule
Inferred limitation → [原文Discussion]Author-disclosed vs skill-inferred rule
Fabricated generation dateMetadata date safety rule
Wrong branchStudy Profile → Axis 1
Day/Session inventedRule 1: require explicit markers
Stimulus pool = task countRule 2: separate pool vs actual
Lab task = observationalRule 3: precise design taxonomy
Simulation → meta-analysisI vs G distinction
ALE → pooled ESG10
Phase titles in merged filePhase 5 merge back
Phase files not cleanedCleanup verification

Output Directory & Naming

~/Desktop/OpenClaw_Paper_Analysis/
├── outputs_md/reverse_engineer/{FirstAuthor}_{Year}_Results_Reverse_Analysis.md
├── outputs_md/results_writer/
├── logs/
├── figures_notes/
└── templates/

Phase temp files: temp/{FirstAuthor}_{Year}/. Final output: only one H1 title, no Phase N titles.


Long PDF: Phased Execution

When PDF > ~20 pages or context is tight:

PhaseContentOutput
Phase 1Study ProfileStudy Profile table
Phase 2Module A–BA + B
Phase 3Module C–DC + D
Phase 4Module E–GE + F + G
Phase 5Merge + final self-checkComplete Markdown

Phases write to temp/, merged to final output. Clean temp files on success unless debug_mode: true.

Full details: docs/execution-constraints.md


Patch Mode

When user says "小修改" / "优化一下": modify only pointed-out issues. Append Revision log table at file end. Keep original structure.


Public version: 3.0.3 Internal version: psychology-results-reverse-analysis-v3.0.3-bridge Scope: General psychology literature (all subfields) Analysis mode: Study Profile first, three-axis classification (Article Type × Domain × Data Modality), design-adaptive branching (A–I), source-verified evidence Output mode: File-first (Markdown to desktop folder; chat = summary only) Key features: Cross-type validated across all 9 branches (A–I). Default standard mode. Retains quick/standard/close-reading modes. Optional Module H Writer Transfer Packet for academic-results-writer integration. Documentation: Branch-specific rules in docs/, examples in examples/, changelog in CHANGELOG.md