Install
openclaw skills install blind-review-sanitizerOne-click removal of author names, affiliations, acknowledgments, and excessive self-citations from manuscripts to meet double-blind peer review requirements. Preserves document structure while anonymizing sensitive information.
openclaw skills install blind-review-sanitizerAutomatically anonymize academic manuscripts for double-blind peer review by removing author identifiers, institutional affiliations, acknowledgments, and excessive self-citations while preserving document formatting and scholarly content integrity.
Key Capabilities:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
--input | str | Yes | - | Path to input manuscript file (DOCX, MD, or TXT) |
--output | str | Yes | - | Path for sanitized output file |
--authors | list[str] | No | - | List of author names to redact |
--keep-acknowledgments | bool | No | false | Whether to preserve acknowledgment sections |
--highlight-self-cites | bool | No | false | Only highlight self-citations without replacement |
✅ Use this skill when:
❌ Do NOT use when:
cover-letter-drafter insteadcitation-formatter for bibliography managementhipaa-compliance-auditor for medical dataRelated Skills:
cover-letter-drafter, citation-formatter, conflict-of-interest-checkerjournal-club-presenter, conference-abstract-adaptorUpstream Skills:
cover-letter-drafter: Generate cover letters AFTER manuscript sanitization to avoid including blinded content in correspondencecitation-formatter: Format citations BEFORE sanitization to ensure proper numbering and formattingconflict-of-interest-checker: Check co-author conflicts BEFORE anonymization to maintain disclosure accuracyDownstream Skills:
journal-club-presenter: Create presentation materials using sanitized versions for external reviewconference-abstract-adaptor: Adapt abstracts for conferences that may have different anonymity requirementsComplete Workflow:
Manuscript Writing → citation-formatter → conflict-of-interest-checker → blind-review-sanitizer → cover-letter-drafter → Submission
Systematically identify and remove author names, institutional affiliations, and contact information from manuscripts using pattern recognition and user-specified rules.
from scripts.main import BlindReviewSanitizer
# Initialize sanitizer with known author names
sanitizer = BlindReviewSanitizer(
authors=["Zhang San", "Li Si", "Wang Wu"],
keep_acknowledgments=False,
highlight_self_cites=False
)
# Process text content
text = """Zhang San¹, Li Si²
¹Tsinghua University Computer Science Department
²Peking University School of Information
Email: zhangsan@tsinghua.edu.cn"""
sanitized = sanitizer.sanitize_text(text)
print(sanitized)
Parameters:
| Parameter | Type | Required | Description | Default |
|---|---|---|---|---|
authors | List[str] | No | List of author names to redact. Improves accuracy when specified. | None |
case_sensitive | bool | No | Whether author name matching is case-sensitive | False |
partial_match | bool | No | Allow partial name matching (e.g., "Zhang" matches "Zhang San") | True |
Best Practices:
Common Issues and Solutions:
Issue: Common words flagged as author names
Issue: Author names in citations not detected
Automatically detect and replace institutional identifiers including universities, research institutes, departments, and laboratories with generic placeholders.
from scripts.main import BlindReviewSanitizer
sanitizer = BlindReviewSanitizer()
# Institutional detection uses pattern matching
text_with_institutions = """
Department of Computer Science, Stanford University
Max Planck Institute for Informatics
MIT CSAIL Laboratory
"""
# Process institutional information
result = sanitizer._remove_institutions(text_with_institutions)
print(result)
# Output: [INSTITUTION], [INSTITUTION], [INSTITUTION]
Parameters:
| Parameter | Type | Required | Description | Default |
|---|---|---|---|---|
institution_keywords | List[str] | No | Custom keywords for institution detection | Predefined list |
strict_mode | bool | No | Only match explicit institutional patterns, reduces false positives | False |
Best Practices:
Common Issues and Solutions:
Issue: Generic words flagged as institutions
Issue: Multi-campus institutions not fully masked
Intelligently identify and handle acknowledgment sections, funding disclosures, and personal thanks that may reveal author identity or institutional affiliations.
from scripts.main import BlindReviewSanitizer
# Initialize without keeping acknowledgments
sanitizer = BlindReviewSanitizer(keep_acknowledgments=False)
# Sample acknowledgment section
acknowledgment_text = """Acknowledgments
We thank Professor Johnson for valuable discussions and the NSF Grant #12345 for funding.
This work was conducted at the Advanced Computing Center.
References"""
lines = acknowledgment_text.split('\n')
processed_lines = sanitizer.remove_acknowledgments(lines)
print('\n'.join(processed_lines))
# Output: [ACKNOWLEDGMENTS REMOVED] followed by References section
Parameters:
| Parameter | Type | Required | Description | Default |
|---|---|---|---|---|
keep_acknowledgments | bool | No | Retain acknowledgment section instead of removing | False |
acknowledgment_titles | List[str] | No | Custom section titles to recognize | Predefined list |
Best Practices:
Common Issues and Solutions:
Issue: Acknowledgment section not detected
Issue: Essential content in acknowledgment section
keep_acknowledgments with manual redactionIdentify excessive self-citations and first-person references to previous work that could deanonymize the submission, replacing them with neutral language.
from scripts.main import BlindReviewSanitizer
# Mode 1: Replace self-citations
sanitizer_replace = BlindReviewSanitizer(highlight_self_cites=False)
text_with_self_cites = """
As we showed in our previous work [1], the algorithm achieves 95% accuracy.
Our earlier study demonstrated similar findings [2].
In our prior research, we found that...
"""
result = sanitizer_replace.sanitize_text(text_with_self_cites)
print(result)
# Output: As [PREVIOUS WORK] described in their previous study [1]...
# Mode 2: Highlight only (for manual review)
sanitizer_highlight = BlindReviewSanitizer(highlight_self_cites=True)
result_highlighted = sanitizer_highlight.sanitize_text(text_with_self_cites)
print(result_highlighted)
# Output: As [SELF-CITE: we showed in our previous work] [1]...
Parameters:
| Parameter | Type | Required | Description | Default |
|---|---|---|---|---|
highlight_self_cites | bool | No | Only highlight self-citations without replacing | False |
neutral_replacements | Dict[str, str] | No | Custom replacement phrases | Default mappings |
Best Practices:
Common Issues and Solutions:
Issue: Legitimate references flagged as self-citations
Issue: Citations broken after text replacement
Process manuscripts in DOCX, Markdown, and plain text formats with format-aware handling to preserve structure while sanitizing content.
from pathlib import Path
from scripts.main import BlindReviewSanitizer, get_processor
# Initialize sanitizer
sanitizer = BlindReviewSanitizer(authors=["Dr. Smith", "Prof. Jones"])
# Process different file formats
input_files = [
Path("paper.docx"),
Path("manuscript.md"),
Path("article.txt")
]
for input_file in input_files:
if input_file.exists():
processor = get_processor(input_file, sanitizer)
output_file = input_file.parent / f"{input_file.stem}-blinded{input_file.suffix}"
processor.process(input_file, output_file)
print(f"Processed: {input_file} → {output_file}")
Supported Formats:
| Format | Extension | Features Preserved | Special Handling |
|---|---|---|---|
| Microsoft Word | .docx | Styles, tables, formatting | python-docx library required |
| Markdown | .md | Headers, lists, links | Line-based processing |
| Plain Text | .txt | Line breaks, spacing | Line-based processing |
Best Practices:
Common Issues and Solutions:
Issue: DOCX formatting lost after processing
Issue: Unicode/UTF-8 characters corrupted in text files
Generate comprehensive logs of all sanitization actions for transparency, quality assurance, and compliance verification.
from scripts.main import BlindReviewSanitizer
sanitizer = BlindReviewSanitizer(
authors=["Alice Chen", "Bob Smith"]
)
# Process document
text = """Alice Chen and Bob Smith from MIT present their findings.
Contact: alice@mit.edu"""
sanitized = sanitizer.sanitize_text(text)
# Review audit trail
print("=== Audit Trail ===")
for item in sanitizer.removed_items:
print(f"- {item}")
# Output shows:
# - Institution: MIT
# - Email: alice@mit.edu
Audit Information Captured:
| Information Type | Description | Use Case |
|---|---|---|
author_names | List of author names redacted | Verification, re-identification post-review |
institutions | Institutional affiliations masked | Compliance checking |
contact_info | Emails and phone numbers removed | Privacy verification |
acknowledgments | Whether acknowledgment section was removed | Journal requirement verification |
self_citations | Count and type of self-citations neutralized | Review bias prevention |
Best Practices:
Common Issues and Solutions:
Issue: Audit log too verbose
Issue: Sensitive information in audit logs
From input to output for double-blind journal submission:
# Step 1: Prepare input manuscript
cp my_paper.docx manuscript.docx
# Step 2: Run sanitization with explicit author list
python scripts/main.py \
--input manuscript.docx \
--authors "Zhang San,Li Si,Wang Wu" \
--output manuscript-blinded.docx
# Step 3: Review highlighted self-citations (optional but recommended)
python scripts/main.py \
--input manuscript-blinded.docx \
--highlight-self-cites \
--output manuscript-reviewed.docx
# Step 4: Verify audit trail
cat sanitization_report.txt
# Step 5: Final check for remaining identifiers
grep -i "zhang\|tsinghua\|peking" manuscript-blinded.docx || echo "No matches found - good!"
Python API Usage:
from pathlib import Path
from scripts.main import BlindReviewSanitizer, get_processor
def sanitize_for_submission(
input_path: Path,
authors: list[str],
output_dir: Path
) -> Path:
"""
Complete sanitization workflow for journal submission.
"""
# Initialize sanitizer with strict settings
sanitizer = BlindReviewSanitizer(
authors=authors,
keep_acknowledgments=False,
highlight_self_cites=False
)
# Determine output path
output_path = output_dir / f"{input_path.stem}-blinded{input_file.suffix}"
# Get appropriate processor
processor = get_processor(input_path, sanitizer)
# Process document
processor.process(input_path, output_path)
# Log results
print(f"Sanitization complete:")
print(f" Input: {input_path}")
print(f" Output: {output_path}")
print(f" Items redacted: {len(sanitizer.removed_items)}")
# Generate summary by category
categories = {}
for item in sanitizer.removed_items:
category = item.split(":")[0]
categories[category] = categories.get(category, 0) + 1
print(" Breakdown:")
for cat, count in categories.items():
print(f" - {cat}: {count}")
return output_path
# Execute workflow
authors = ["Zhang San", "Li Si", "Wang Wu"]
output = sanitize_for_submission(
Path("paper.docx"),
authors,
Path("./submissions/")
)
Expected Output Files:
submissions/
├── manuscript-blinded.docx # Anonymized manuscript
├── sanitization_report.txt # Audit trail of all redactions
└── verification_checklist.md # Pre-submission verification
Scenario: Preparing a research paper for submission to Nature, Science, or IEEE Transactions with strict double-blind review.
{
"input_file": "neural_network_study.docx",
"authors": ["Alice Chen", "Bob Smith", "Carol Wang"],
"keep_acknowledgments": false,
"highlight_self_cites": false,
"expected_processing": [
"Remove all author names from title page",
"Replace institutional affiliations with [INSTITUTION]",
"Remove complete acknowledgment section",
"Neutralize all self-citations",
"Remove email addresses and ORCID IDs"
]
}
Workflow:
Output Example:
Original: Alice Chen¹, Bob Smith²
¹MIT CSAIL, ²Stanford AI Lab
Email: achen@mit.edu
Sanitized: [AUTHOR NAME]¹, [AUTHOR NAME]²
¹[INSTITUTION], ²[INSTITUTION]
[EMAIL]
Scenario: Submitting to computer science conference (e.g., ICML, NeurIPS) that requires anonymization during review but allows deanonymization after acceptance.
{
"input_file": "deep_learning_paper.pdf",
"authors": ["Anonymous"],
"keep_acknowledgments": true,
"highlight_self_cites": true,
"special_requirements": [
"Preserve citations to arXiv preprints",
"Keep URLs and repository links",
"Anonymize GitHub repositories"
]
}
Workflow:
Output Example:
Original: Our implementation is available at
github.com/alicechen/bert-optimizer
Sanitized: Our implementation is available at
[ANONYMOUS_REPOSITORY]
(link will be provided upon acceptance)
Scenario: Submitting to medical journal requiring funding disclosure but author anonymity during review.
{
"input_file": "clinical_trial_results.docx",
"authors": ["Dr. Sarah Johnson", "Prof. Michael Lee"],
"keep_acknowledgments": true,
"special_handling": [
"Anonymize investigator names in acknowledgments",
"Keep funding sources with neutral language",
"Remove institutional affiliations",
"Preserve ethics committee information"
]
}
Workflow:
Output Example:
Original: We thank Dr. Sarah Johnson and the Clinical Research
Team at Mayo Clinic. Funded by NIH R01CA12345.
Sanitized: We thank [INVESTIGATOR] and the Clinical Research
Team at [INSTITUTION]. Funded by [FUNDING_SOURCE].
Scenario: Revising and resubmitting a previously rejected manuscript to a new journal, requiring fresh anonymization.
{
"input_file": "revised_paper.docx",
"authors": ["Original Author", "New Collaborator"],
"updated_metadata": {
"added_authors": ["New Collaborator"],
"removed_content": ["previous_institutional_mention"],
"new_acknowledgments": ["new_funding_source"]
},
"verification_steps": [
"Check for editor response letter remnants",
"Remove previous submission tracking numbers",
"Update all institutional references",
"Verify no reviewer comments remain"
]
}
Workflow:
Output Example:
Before: "We have revised the manuscript based on Nature
Medicine reviewer comments..."
After: Complete removal of all previous submission references
Pre-sanitization Checks:
During Sanitization:
Post-sanitization Verification:
Before Submission:
Input Preparation Issues:
❌ Processing tracked changes without accepting → Hidden revision marks reveal author identity
❌ Ignoring document metadata → File properties contain author name and institution
❌ Forgetting supplementary materials → Author info in supplementary PDFs not sanitized
❌ Incomplete author lists → Co-author names appear in text unrecognized
Sanitization Strategy Issues:
❌ Over-aggressive replacement → Legitimate citations and references damaged
❌ Under-sanitization → Subtle identifiers remain (e.g., "our previous work")
❌ Inconsistent handling → Some instances replaced, others missed
❌ Context-insensitive replacement → "University research" becomes "[INSTITUTION] research"
Output Validation Issues:
❌ Assuming perfect automation → Automated tools miss edge cases
❌ Submitting without verification → Undetected author info reaches reviewers
❌ Losing audit trail → No record of what was changed for post-review
❌ Forgetting downstream effects → Citations broken, cross-references lost
Problem: Author names still appear in output
Problem: Excessive false positives
Problem: Document formatting corrupted
Problem: Self-citations not detected
Problem: Acknowledgment section not removed
Problem: References/citations broken
Problem: python-docx import error
pip install python-docxAvailable in references/ directory:
External Resources:
Located in scripts/ directory:
main.py - Main sanitization engine with document processing logic⚠️ Important Limitations:
Not Foolproof: Automated sanitization cannot guarantee complete anonymity. Always perform manual verification.
Context Blindness: Pattern matching may miss context-dependent identifiers or incorrectly flag legitimate content.
Image Processing: This tool processes text only. Images, figures, and embedded objects may contain identifying information not detected.
LaTeX Support: Limited support for LaTeX source files. Consider using LaTeX-specific tools for LaTeX manuscripts.
Language Support: Optimized for English and Chinese. Other languages may have reduced accuracy.
⚠️ Ethical and Legal Considerations:
Last Updated: 2026-02-09
Skill ID: 162
Version: 2.0 (K-Dense Standard)