Zing 论坛

正文

AI-News-Ranker:智能新闻聚合系统的架构设计与实现

一个实时AI新闻聚合平台,通过语义去重、智能评分和话题聚类技术,从50多个可信来源整合AI领域资讯,构建现代化的信息分发系统。

新闻聚合AI资讯Next.jsSupabase语义去重向量搜索pgvectorRedis缓存Claude实时推送
发布时间 2026/05/23 06:13最近活动 2026/05/23 06:20预计阅读 7 分钟
AI-News-Ranker:智能新闻聚合系统的架构设计与实现
1

章节 01

AI-News-Ranker: Overview of the Intelligent News Aggregation System

AI-News-Ranker is a real-time AI news aggregation platform that integrates AI field information from over 50 trusted sources. It uses semantic deduplication, intelligent scoring, and topic clustering technologies to build a modern information distribution system. Key technologies include Next.js, Supabase, pgvector, Redis cache, and Claude, addressing core pain points like information overload, content repetition, uneven quality, and insufficient real-time performance.

2

章节 02

Background & Core Problems Addressed

In the information explosion era, efficiently obtaining high-quality AI information is a pain point for developers and researchers. Manual tracking of scattered AI content (papers, blogs, product releases) is time-consuming and error-prone. AI-News-Ranker solves four core issues: information overload, content repetition, uneven quality, and lack of real-time updates. Its core approach is cross-source corroboration—events reported by multiple trusted sources rank higher, resisting clickbait and robot manipulation.

3

章节 03

System Architecture Overview

The system uses a modern layered architecture:

  • Frontend: Built with Next.js 16 (App Router, TypeScript) and Tailwind CSS v4 for a modern UI.
  • Load Balancing: Nginx as reverse proxy and load balancer, supporting horizontal scaling via upstream node configuration.
  • Data Layer: Three-tier structure—Supabase (PostgreSQL + pgvector) for core data, Redis for regional key-value caching, S3 for thumbnail storage.
  • Workflow Layer: Independent worker containers handling tasks like content ingestion, enrichment (LLM summaries/scores), topic clustering, and notification pushes.
4

章节 04

Core Function Implementations

Key functions include:

  1. Multi-source aggregation: 50+ verified sources (labs like OpenAI/DeepMind, infrastructure vendors, researcher blogs, newsletters, safety orgs, academic resources like arXiv). Each source undergoes strict validation (URL reachability, RSS parsing, content freshness).
  2. Semantic deduplication: Uses Voyage AI's voyage-3 model to generate 1024-dimensional vectors, with pgvector for cosine similarity calculation to merge similar articles.
  3. Intelligent scoring: Claude Haiku assigns 0-100 scores based on source weight sum, topic size, LLM-assessed importance, and time decay (excludes clicks/views).
  4. Real-time push: High-priority content via Discord webhook; Supabase Realtime WebSocket for frontend updates without polling.
5

章节 05

Technical Highlights

Notable technical features:

  • Generic crawler adapter: Config-driven for non-RSS sources (CSS selectors for content extraction, no code changes needed).
  • Source validation: scripts/verify-sources.mjs checks HTTP status, parsing correctness, article count, freshness (60-day window), and optional Claude relevance scoring.
  • Data pipeline validation: scripts/verify-pipeline.sql ensures end-to-end data flow correctness (items.xml, region defaults, S3 thumbnails, deduplication effect).
6

章节 06

Deployment & Scaling

Deployment:

  • POC: Docker Compose (copy .env.example to .env.local, configure keys, run docker compose up --build).
  • Horizontal scaling:
  1. Copy app service to app2/app3.
  2. Add upstream nodes in nginx.conf.
  3. Optional least_conn scheduling.
  4. Redeploy. Redis and Supabase act as shared state layers for new instances.
7

章节 07

Engineering Practices & Improvement Directions

Engineering practices:

  • Region-keyed cache with conservative invalidation to ensure freshness.
  • Degradation paths: Fallback to database if Redis fails, placeholder for missing S3 thumbnails, skip summaries if LLM API is limited.
  • Observability: topic_engagement table for user behavior analysis (not used for sorting). Applicable scenarios: AI community portal, enterprise tech intelligence, research literature tracking, media topic assistant. Future plans: Multi-language support, personalized recommendations, more notification channels (Slack/email), third-party API access.
8

章节 08

Conclusion

AI-News-Ranker demonstrates a complete tech stack for modern news aggregation, with engineering thinking in every环节 (multi-source collection, semantic deduplication, smart scoring, real-time push). Its cross-source corroboration approach provides a sustainable solution for information quality assessment. For developers, it offers runnable code and clear architecture docs; for users, it helps understand aggregation tools better.