Zing Forum

Reading

AI-Law-Sentiment: An Automated Open-Source Dataset for Tracking Public Opinion Trends on AI Regulation

An open-source project that automatically tracks and analyzes public sentiment on AI regulation. It collects data from news, academic papers, Reddit communities, and regulatory data sources, uses VADER and FinBERT for sentiment analysis, and automatically updates daily via GitHub Actions to generate visual reports.

AI监管情感分析舆论监测VADERFinBERTGitHub Actions开源数据集NLP
Published 2026-05-26 21:55Recent activity 2026-05-26 22:20Estimated read 7 min
AI-Law-Sentiment: An Automated Open-Source Dataset for Tracking Public Opinion Trends on AI Regulation
1

Section 01

Core Introduction to the AI-Law-Sentiment Project

AI-Law-Sentiment is an open-source dataset project that automatically tracks public opinion trends on AI regulation. It collects multi-source data from news, academic papers, Reddit communities, and regulatory data sources, uses VADER and FinBERT for sentiment analysis, and automatically updates daily via GitHub Actions to generate visual reports. Its aim is to provide policy makers and researchers with supporting data on public opinion related to AI governance.

2

Section 02

Project Background and Source Information

  • Original Author/Maintainer: Felipe-ML-Projects
  • Source Platform: GitHub
  • Original Title: AI-Law-Sentiment
  • Original Link: https://github.com/Felipe-ML-Projects/AI-Law-Sentiment
  • Release Date: May 26, 2026 The core goal of the project is to build an open, reproducible dataset that records the evolution of public opinions on AI governance issues. All data and analysis results are publicly released via GitHub, and it plans to support peer-reviewed academic research.
3

Section 03

Data Sources and Coverage

The project integrates four types of data sources:

  1. News Media: Mainstream tech and legal media (e.g., Ars Technica, The Verge, MIT Tech Review, LawFare, EFF, etc.)
  2. Academic Literature: Preprint papers on AI law and governance from arXiv
  3. Social Media: Reddit communities (r/law, r/AIPolicy, r/MachineLearning, etc.)
  4. Regulatory Documents: Official regulatory documents and public comment solicitations on Regulations.gov
4

Section 04

Technical Implementation Analysis

Sentiment Analysis Models:

  • VADER: A rule-based fast tool suitable for large-scale data processing
  • FinBERT: A Transformer model for the financial domain (optional), which accurately understands professional terminology

Content Understanding Capabilities:

  • Topic Tagging: Identifies 12 AI law subfields (bias, liability, privacy, etc.)
  • Stance Detection: Determines whether content supports, opposes, or is neutral toward regulatory stances
  • Keyword Cloud: Generates hot topic visual charts

Automated Workflow: Executed daily via GitHub Actions: crawl latest content → filter data from the past 2 days → sentiment analysis and topic classification → generate reports and visualizations → update repository

5

Section 05

Data Outputs and Current Overview

Data Outputs:

  • Raw Data: Daily JSON snapshots in data/raw/
  • Processed Data: CSV files with sentiment scores in data/processed/
  • Analysis Reports: Daily Markdown summaries + visualizations in reports/

Data as of May 26, 2026:

  • Total analyzed entries: 421
  • Coverage days: 6
  • Average VADER sentiment score: +0.4447
  • Hot topics in the past 30 days: Labor, U.S. Legislation, Privacy, Transparency, National Security
  • Stance distribution in the past 30 days: 30% Support · 70% Neutral · 0% Oppose
6

Section 06

Application Scenarios and Value

Academic Research: Provides reproducible datasets for fields like AI governance and tech policy, supporting trend analysis and cross-platform comparisons Policy Making: Helps policy makers understand public attention and sentiment trends toward AI regulation Media Monitoring: Provides automated public opinion tools for journalists and analysts Open-Source Learning: Demonstrates a complete data pipeline (multi-source crawling, NLP analysis, CI/CD automation), serving as an example for data engineering and NLP learning

7

Section 07

Project Highlights and Conclusion

Project Highlights:

  1. Multi-source Integration: Covers four types of data sources (news, academic, social, official)
  2. Dual-Model Analysis: VADER for fast processing + FinBERT for precise analysis
  3. Fully Automated Operation: Executed daily via GitHub Actions with zero manual intervention
  4. Open Licensing: Data under CC BY 4.0, code under MIT License
  5. Academic-Friendly: Provides BibTeX citation format

Conclusion: AI-Law-Sentiment is a contribution from the open-source community in the field of AI governance. It continuously tracks public opinion trends through automated means, providing data infrastructure for understanding the evolution of public attitudes toward AI regulation, which is of great significance for promoting the democratization and transparency of technology governance.