Install
openclaw skills install strict-paper-judgeStrictly judge whether a research paper is worth following, reading, or recommending. Use for paper triage, paper reviews, literature evaluation, arXiv screening, research taste checks, and deciding whether a method is genuinely valuable. Defaults to rejection unless the paper proves a clean new abstraction, solves a real bottleneck, changes an important trade-off, works in hard regimes, or can shape future work. Ignores citation counts, venues, author prestige, affiliations, and hype.
openclaw skills install strict-paper-judgeJudge whether a paper has real value and long-term influence potential.
When evaluating, do not use citation count, author affiliation, author title, venue rank, journal rank, or institutional prestige. Judge only:
Default stance: be skeptical first, then allow only when the paper earns it.
Treat the paper as incremental unless it proves at least one of the following:
Do not paraphrase the abstract. Compress the method into:
What exactly changed? Did it change the model, sampling, scheduling, cache, attention, training objective, kernel, system architecture, or evaluation?
Then judge:
Good methods can usually be stated in one sentence, and their ablations prove that the gain comes from the core idea.
Search for and compare these categories:
Do not rely only on the related work section. Authors often soften or bury the closest prior work.
Classify the method into one of four groups.
Keep.
Signals:
Example patterns:
From cache-then-reuse to cache-then-forecast. From predicting tokens to predicting verifier outcomes. From single-request optimization to multi-tenant resource reuse.
Keep, but do not oversell it as a method breakthrough.
Signals:
Keep cautiously.
Signals:
Ban.
Signals:
If several of these trigger, ban directly.
The paper moves a common method from field A to field B without solving a problem specific to field B.
Typical signals:
The paper is only a local enhancement on an existing path.
Typical signals:
Engineering complexity is not automatically bad. Penalize ugliness when:
Even if this works, it is an engineering paper, not a high-taste method.
Discount the result when the paper:
Be especially suspicious when the paper:
Example:
If the paper gives itself an extra GPU and gets 30% lower latency, that is not same-hardware algorithmic acceleration. It may still be valuable, but it must say so honestly.
Lower the influence estimate sharply if the method only works for:
The more of these apply, the higher the score.
Examples:
The problem itself should have growth pressure.
Excellent papers often do not just add 1%. They change the constraint relationship:
Do not judge only easy settings.
Good work should show value in hard regimes:
If it wins only in easy regimes, its value is limited.
Always ask:
If the core module is removed, does the gain remain?
If removing the core module gives nearly the same result, the core claim is not supported.
Strong work can usually combine with other routes:
The stronger the composability, the higher the long-term value.
Strong keep.
Conditions:
Keep, but do not hype.
Conditions:
Borderline.
Conditions:
Do not recommend following.
Conditions:
For every paper evaluation, use this format:
Conclusion:
Keep / Borderline / Ban
Rating: A- / B+ / B / C
One-sentence judgment:
State the paper's real value, or why it is not worth following.
1. What it actually does
Explain the core method in your own words. Do not paraphrase the abstract.
2. Relationship to related work
List the closest prior work, same-track SOTA, and adjacent routes.
Judge whether this is a new paradigm, strong systematization, a small trick, or a re-skin.
3. Method originality
Judge whether it introduces a new abstraction or only combines existing components.
4. Whether the effect is hard
Check hard regimes, wall-clock time, real metrics, ablations, and fair baselines.
5. What I like most
Only mention genuinely strong points.
6. What I dislike most
Point out ugly method design, weak baselines, narrow settings, dependency on assumptions, or insufficient metrics.
7. Influence potential
Judge whether it can become a later baseline, or whether it will likely be absorbed or replaced quickly.
Final judgment:
State whether it is banned under the strict standard, and who should read it.
When evaluating a paper, search at least:
1. Paper title
2. Core method keywords + arxiv
3. The closest baseline names from the paper
4. Core method keywords + survey / SOTA / benchmark
5. Paper method + github / implementation
6. Paper method + follow-up / extension
7. Similar methods in adjacent fields
Search to determine:
Compress the taste into one sentence:
A good paper is not "a little more work"; it makes you feel that this is how the problem should be thought about from now on.
More specifically: