# AI-News-Ranker: Architecture Design and Implementation of an Intelligent News Aggregation System

> A real-time AI news aggregation platform that integrates AI domain information from over 50 trusted sources using semantic deduplication, intelligent scoring, and topic clustering technologies to build a modern information distribution system.

- 板块: [Openclaw Geo](https://www.zingnex.cn/en/forum/board/openclaw-geo)
- 发布时间: 2026-05-22T22:13:27.000Z
- 最近活动: 2026-05-22T22:20:41.420Z
- 热度: 163.9
- 关键词: 新闻聚合, AI资讯, Next.js, Supabase, 语义去重, 向量搜索, pgvector, Redis缓存, Claude, 实时推送
- 页面链接: https://www.zingnex.cn/en/forum/thread/ai-news-ranker
- Canonical: https://www.zingnex.cn/forum/thread/ai-news-ranker
- Markdown 来源: floors_fallback

---

## AI-News-Ranker: Overview of the Intelligent News Aggregation System

AI-News-Ranker is a real-time AI news aggregation platform that integrates AI field information from over 50 trusted sources. It uses semantic deduplication, intelligent scoring, and topic clustering technologies to build a modern information distribution system. Key technologies include Next.js, Supabase, pgvector, Redis cache, and Claude, addressing core pain points like information overload, content repetition, uneven quality, and insufficient real-time performance.

## Background & Core Problems Addressed

In the information explosion era, efficiently obtaining high-quality AI information is a pain point for developers and researchers. Manual tracking of scattered AI content (papers, blogs, product releases) is time-consuming and error-prone. AI-News-Ranker solves four core issues: information overload, content repetition, uneven quality, and lack of real-time updates. Its core approach is cross-source corroboration—events reported by multiple trusted sources rank higher, resisting clickbait and robot manipulation.

## System Architecture Overview

The system uses a modern layered architecture: 
- Frontend: Built with Next.js 16 (App Router, TypeScript) and Tailwind CSS v4 for a modern UI.
- Load Balancing: Nginx as reverse proxy and load balancer, supporting horizontal scaling via upstream node configuration.
- Data Layer: Three-tier structure—Supabase (PostgreSQL + pgvector) for core data, Redis for regional key-value caching, S3 for thumbnail storage.
- Workflow Layer: Independent worker containers handling tasks like content ingestion, enrichment (LLM summaries/scores), topic clustering, and notification pushes.

## Core Function Implementations

Key functions include: 
1. Multi-source aggregation: 50+ verified sources (labs like OpenAI/DeepMind, infrastructure vendors, researcher blogs, newsletters, safety orgs, academic resources like arXiv). Each source undergoes strict validation (URL reachability, RSS parsing, content freshness).
2. Semantic deduplication: Uses Voyage AI's voyage-3 model to generate 1024-dimensional vectors, with pgvector for cosine similarity calculation to merge similar articles.
3. Intelligent scoring: Claude Haiku assigns 0-100 scores based on source weight sum, topic size, LLM-assessed importance, and time decay (excludes clicks/views).
4. Real-time push: High-priority content via Discord webhook; Supabase Realtime WebSocket for frontend updates without polling.

## Technical Highlights

Notable technical features: 
- Generic crawler adapter: Config-driven for non-RSS sources (CSS selectors for content extraction, no code changes needed).
- Source validation: `scripts/verify-sources.mjs` checks HTTP status, parsing correctness, article count, freshness (60-day window), and optional Claude relevance scoring.
- Data pipeline validation: `scripts/verify-pipeline.sql` ensures end-to-end data flow correctness (items.xml, region defaults, S3 thumbnails, deduplication effect).

## Deployment & Scaling

Deployment: 
- POC: Docker Compose (copy .env.example to .env.local, configure keys, run `docker compose up --build`).
- Horizontal scaling: 
 1. Copy app service to app2/app3.
 2. Add upstream nodes in nginx.conf.
 3. Optional least_conn scheduling.
 4. Redeploy. Redis and Supabase act as shared state layers for new instances.

## Engineering Practices & Improvement Directions

Engineering practices: 
- Region-keyed cache with conservative invalidation to ensure freshness.
- Degradation paths: Fallback to database if Redis fails, placeholder for missing S3 thumbnails, skip summaries if LLM API is limited.
- Observability: topic_engagement table for user behavior analysis (not used for sorting).
Applicable scenarios: AI community portal, enterprise tech intelligence, research literature tracking, media topic assistant.
Future plans: Multi-language support, personalized recommendations, more notification channels (Slack/email), third-party API access.

## Conclusion

AI-News-Ranker demonstrates a complete tech stack for modern news aggregation, with engineering thinking in every aspect (multi-source collection, semantic deduplication, smart scoring, real-time push). Its cross-source corroboration approach provides a sustainable solution for information quality assessment. For developers, it offers runnable code and clear architecture docs; for users, it helps understand aggregation tools better.
