PoliBERT Sentiment Analysis

v1.0.0

Political sentiment analysis using PoliBERTweet - a RoBERTa model pre-trained on 83M political tweets. Analyzes support, opposition, and stance toward politi...

0· 59·1 current·1 all-time
byYirong@erongcao

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for erongcao/polibert-sentiment.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "PoliBERT Sentiment Analysis" (erongcao/polibert-sentiment) from ClawHub.
Skill page: https://clawhub.ai/erongcao/polibert-sentiment
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install polibert-sentiment

ClawHub CLI

Package manager switcher

npx clawhub@latest install polibert-sentiment
Security Scan
Capability signals
CryptoRequires sensitive credentials
These labels describe what authority the skill may exercise. They are separate from suspicious or malicious moderation verdicts.
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
The skill's name/description (PoliBERT sentiment analysis, Reddit integration, Polymarket integration) aligns with the included code. However there are small mismatches: SKILL.md lists 'twitter' as a source but there is no Twitter API integration in the code; requirements.txt pins extra heavy packages (numpy, pandas, scikit-learn) that are not used by the main script, and the pinned package versions in requirements.txt (e.g., transformers==4.48.0, torch==2.6.0) are stricter than the SKILL.md prose (transformers>=4.18.0, torch>=1.10.2). These are plausibly benign but unnecessary dependencies and version pinning are disproportionate to the core task.
Instruction Scope
Runtime instructions and code focus on local text, batch files, and Reddit. The main script downloads a HuggingFace model at first run (explicitly documented). The SKILL.md claims Reddit read-only access without credentials; the code uses praw with client_id/client_secret set to None (intended for read-only usage) but does not explicitly call praw.Reddit(read_only=True). The skill does not attempt to read unrelated local system credentials or network endpoints beyond HuggingFace/Reddit. test_sample.sh references an absolute local path and activates a virtualenv in that path, which could surprise users if run without adjusting paths.
Install Mechanism
There is no automated install spec in the registry entry (instruction-only), which is low-risk. The package includes a requirements.txt listing heavy ML packages (torch, transformers, numpy, pandas, scikit-learn, praw). Installing these will pull large binaries (torch, transformers) and the model download (~500MB) will occur at first run — expected for this use case but resource-heavy. No downloads from unknown/untrusted hosts are present in install metadata; the model comes from HuggingFace (model name provided).
Credentials
The skill declares no required environment variables or credentials. The code attempts to use PRAW in an unauthenticated/read-only manner (client_id/client_secret set to None) so no API keys are required for its documented Reddit behavior. No credentials for unrelated services are requested.
Persistence & Privilege
The skill does not request persistent platform privileges (always:false) and does not modify other skills or system-wide settings. It only downloads model files to the user's environment on first run and writes no agent config.
Assessment
This skill is coherent for political sentiment analysis but review these points before installing: 1) Installing will pull heavy ML dependencies and the PoliBERT model (~500MB) — expect large downloads and GPU/CPU resource usage. 2) requirements.txt pins versions and includes packages (pandas, scikit-learn) that the main script doesn't appear to need; consider installing only the packages you require to reduce footprint. 3) The code intends to fetch Reddit in read-only mode without credentials; PRAW usage here is uncredentialed but verify behavior in your environment (and rate limits). 4) test_sample.sh uses an absolute user path and activates a venv at that path — do not run it without adjusting the path to your environment. 5) The model comes from the HuggingFace handle in the SKILL.md; if provenance matters, verify the model owner and license on HuggingFace before use. 6) As with any political-analysis tool, results can be biased; validate outputs and consider ethical/privacy implications when analyzing user data or large social datasets.

Like a lobster shell, security has layers — review code before you run it.

latestvk979wst8zpq45x4b98xnf3dmch851gp1nlpvk979wst8zpq45x4b98xnf3dmch851gp1political-analysisvk979wst8zpq45x4b98xnf3dmch851gp1sentimentvk979wst8zpq45x4b98xnf3dmch851gp1
59downloads
0stars
1versions
Updated 1w ago
v1.0.0
MIT-0

PoliBERT Sentiment Analysis

Political sentiment analysis skill powered by PoliBERTweet - a transformer model trained on 83 million political tweets (Georgetown University, LREC 2022).

Overview

This skill provides political sentiment analysis capabilities using a specialized NLP model trained on political content. It can analyze sentiment toward political candidates, issues, and events from various data sources including Reddit, local files, or direct text input.

Features

  • Sentiment Classification: Support / Oppose / Neutral toward political targets
  • Stance Detection: Issue-specific stance analysis (e.g., pro/anti immigration)
  • Entity Targeting: Analyze sentiment toward specific politicians
  • Confidence Scoring: Probability scores for each classification
  • Reddit Data Integration: Auto-fetch political discussions from Reddit (free, read-only)
  • Batch Processing: Analyze multiple texts from files or stdin
  • JSON Output: Machine-readable results for integration with other tools

When to Use

Use this skill when you need to:

  • Analyze public sentiment toward political candidates or figures
  • Track political opinion trends on social media
  • Complement prediction market data with social sentiment
  • Monitor political discourse around specific issues
  • Aggregate opinions from Reddit political communities

Model Information

  • Model: PoliBERTweet
  • Architecture: RoBERTa (Robustly Optimized BERT)
  • Training Data: 83 million political tweets (2016-2020 US elections)
  • HuggingFace Hub: kornosk/polibertweet-political-twitter-roberta-mlm
  • Model Size: ~500MB
  • Academic Paper: LREC 2022
  • Institution: Georgetown University DataLab

Installation

Prerequisites

# Python 3.9 or higher
python --version

# Install core dependencies
pip install transformers>=4.18.0 torch>=1.10.2

# Optional: Reddit data fetching
pip install praw>=7.8.1

First Run

On first execution, the model will be automatically downloaded from HuggingFace Hub (~500MB):

python polibert_sentiment.py --text "Test"

Data Sources

SourceMethodCostData QualityUse Case
Reddit--redditFreeHighReal-time political discussions
Local File--file-User-dependentBatch analysis of collected data
Stdin--stdin-User-dependentPipeline integration
Direct Text--text-User-dependentQuick testing and single analysis

Reddit Data

Default Subreddits: r/politics, r/Conservative, r/democrats, r/Republican, r/PoliticalDiscussion

Note: Reddit data fetching uses read-only mode (no API credentials required). Rate limits apply.

Usage Examples

1. Single Text Analysis

python polibert_sentiment.py --text "J.D. Vance is the future of the Republican party"

Output:

Text: J.D. Vance is the future of the Republican party
Sentiment: SUPPORT (78.3% confidence)

2. Reddit Sentiment Analysis

# Analyze J.D. Vance sentiment from Reddit
python polibert_sentiment.py --candidate "J.D. Vance" --reddit --limit 50

# Analyze specific query
python polibert_sentiment.py --query "2028 election" --reddit --limit 100

# Custom subreddits
python polibert_sentiment.py --query "climate policy" --reddit --subreddits politics,environment

3. Batch File Analysis

# File with one text per line
python polibert_sentiment.py --candidate "Trump" --file tweets.txt

4. JSON Output (for integration)

python polibert_sentiment.py --candidate "Biden" --reddit --json

Output:

{
  "candidate": "Biden",
  "total_analyzed": 47,
  "sentiment_breakdown": {
    "support": {"count": 15, "percentage": 31.9},
    "oppose": {"count": 22, "percentage": 46.8},
    "neutral": {"count": 10, "percentage": 21.3}
  },
  "net_sentiment": -14.9,
  "average_confidence": 72.4
}

Integration with Other Skills

With Polymarket

Polymarket (market odds)  →  PoliBERT (social sentiment)  →  Prediction synthesis
     18.6% (Vance)                    35% Support                      Combined signal

With Prediction Skill

Use PoliBERT sentiment as an input factor in the BRACE forecasting framework:

  • Base rate: Historical election patterns
  • Sentiment: Social media trends (via PoliBERT)
  • Market: Prediction market odds (via Polymarket)

Example Workflow

# 1. Get market data
python polymarket.py search "presidential election winner 2028" --json

# 2. Get social sentiment
python polibert_sentiment.py --candidate "J.D. Vance" --reddit --limit 100 --json

# 3. Synthesize in prediction framework
# (Use prediction skill to combine signals)

Output Format

Human-Readable Output

📊 Sentiment Analysis: J.D. Vance
Source: Reddit | Total analyzed: 47

Support: 31.9% (15)
Oppose: 46.8% (22)
Neutral: 21.3% (10)

Net Sentiment: -14.9%
Avg Confidence: 72.4%

JSON Output Structure

{
  "candidate": "string",
  "total_analyzed": "integer",
  "sentiment_breakdown": {
    "support": {"count": "integer", "percentage": "float"},
    "oppose": {"count": "integer", "percentage": "float"},
    "neutral": {"count": "integer", "percentage": "float"}
  },
  "average_confidence": "float",
  "net_sentiment": "float",
  "sample_results": [
    {"text": "string", "sentiment": "string", "confidence": "float"}
  ]
}

Limitations and Considerations

Model Limitations

  1. Training Data: Model trained on 2016-2020 tweets, may not capture 2024-2028 linguistic patterns
  2. Context Sensitivity: May miss sarcasm, irony, or cultural references
  3. Temporal Drift: Political language evolves; model accuracy may degrade over time
  4. Confidence Calibration: Confidence scores are model outputs, not calibrated probabilities

Data Limitations

  1. Reddit Sample Bias: Reddit users skew younger, more educated, more liberal than general population
  2. Selection Bias: Active Reddit users are not representative voters
  3. Timing: Social sentiment can shift rapidly; snapshot may not represent election day mood
  4. Volume: Low-liquidity markets may have few social media discussions

Best Practices

  • Use as one input among many, not sole prediction basis
  • Combine with prediction markets, polling data, economic indicators
  • Track sentiment trends over time, not single snapshots
  • Adjust for platform demographics (Reddit ≠ Twitter ≠ general population)

Citation

If you use this skill or PoliBERTweet model in research, please cite:

@inproceedings{kawintiranon2022polibertweet,
  title={{P}oli{BERT}weet: A Pre-trained Language Model for Analyzing Political Content on {T}witter},
  author={Kawintiranon, Kornraphop and Singh, Lisa},
  booktitle={Proceedings of the Language Resources and Evaluation Conference (LREC)},
  year={2022},
  pages={7360--7367},
  publisher={European Language Resources Association}
}

License

  • Skill Code: MIT License
  • PoliBERTweet Model: Subject to HuggingFace Hub and original paper terms

Feedback and Contributions

Related Skills

  • polymarket-unified - Prediction market data for political forecasting
  • prediction - BRACE framework for calibrated forecasting
  • ai-model-team - Multi-model prediction system for financial markets

Version History

  • v1.0.0 (2026-04-17): Initial release
    • PoliBERTweet model integration
    • Reddit data source support
    • Sentiment analysis pipeline
    • JSON and human-readable output formats
    • Batch processing capabilities

Comments

Loading comments...