Reading

TopicOps: A Control Plane and Topic Governance Framework for Open Source Intelligence Collection

A declarative intelligence collection control plane that integrates scattered crawlers, OSINT workflows, and data sources into versionable, testable, and auditable topic packs, supporting direct operation by AI agents.

TopicOps情报收集OSINT控制平面声明式配置MCP协议数据血缘开源情报

Published 2026-04-29 04:13Recent activity 2026-04-29 04:21Estimated read 11 min

Section 01

Introduction / Main Floor: TopicOps: A Control Plane and Topic Governance Framework for Open Source Intelligence Collection

Section 02

Project Positioning: Not a Crawler, but a Governance Layer Above Crawlers

The project developers clearly distinguish the positioning difference between TopicOps and traditional crawler tools. TopicOps itself does not perform web crawling; instead, it manages 'why to collect' and 'what to collect'—i.e., collection intent. It provides a set of declarative languages to describe Topics, Queries, Sources, Schedules, Scoring Rules, and Data Lineage.

The advantage of this layered architecture is that the bottom layer can integrate various collection tools (custom crawlers, RSS readers, API clients), while the upper layer maintains a unified governance interface. Whether security researchers are monitoring threat intelligence or market analysts are tracking competitor dynamics, they can use the same set of abstractions to express and manage their information needs.

Section 03

Core Concept: Topic Pack

The core abstraction of TopicOps is the 'Topic', which defines a specific information collection goal. A Topic Pack is a collection of multiple related topics that can be version-controlled, shared, and reused.

A typical topic definition includes the following elements:

Identification and Metadata: Topic ID, name, version, owner, priority, and status. These metadata enable topics to be tracked and managed in team collaboration.

Collection Intent: A natural language description of the problem this topic aims to solve. For example: 'Track the latest research in the field of AI agent security, including identity authentication, permission management, and threat models.'

Query Statements: Specific search expressions used to retrieve relevant content from various data sources. Multiple sets of queries are supported to cover synonyms and different expressions.

Negative Terms: Negative keywords used to filter noise. For example, when researching 'AI agents', you may need to exclude irrelevant content like 'real estate agents' and 'insurance agents.'

Data Source Configuration: Defines which channels to collect information from. TopicOps supports multiple built-in adapters, including local JSONL files (for testing), GitHub repository search, arXiv paper library, Hugging Face model library, and RSS/Atom feeds.

Schedule: Defines the collection frequency, such as executing every 120 minutes.

Section 04

Declarative Configuration and Git-Native Workflow

TopicOps configurations use YAML format, which is naturally suitable for Git version control. This means that intelligence collection logic can be branch-managed, code-reviewed, and change-tracked like code. When Analyst A modifies the query statement of a topic, Analyst B can clearly see the changes via Git diff and roll back if necessary.

The project provides a rich set of CLI tools to operate these configurations:

init: Initialize a new topic pack workspace
lint: Check the syntax and semantic correctness of configuration files
dedupe: Detect and merge duplicate or nearly duplicate topics
diff: Compare the differences between two configuration versions
simulate: Simulate topic execution on specified data sources to preview collection results
run: Actually execute the topic collection task
export: Export the topic into a shareable topic pack format

Section 05

Data Lineage and Auditability

The intelligence collection field has strict requirements for traceability. TopicOps has designed a complete data lineage mechanism to ensure that every collected artifact can answer the following questions:

Which topic and topic version produced this artifact?
Which configuration hash defined this collection behavior?
Which adapter and source query obtained this data?
What is the collection timestamp?
What is the original data source record?

Each run generates a Manifest file, stored in the .topicops/artifacts/manifests/ directory. At the same time, normalized artifacts are appended to the .topicops/artifacts/normalized/artifacts.jsonl file. A local SQLite database (by default located at .topicops/topicops.db) maintains a complete run history and metadata index.

Section 06

MCP Protocol Support and AI Agent Integration

A forward-looking design of the project is support for the MCP (Model Context Protocol) protocol. Through the 'topicops mcp' command, you can start an MCP server to expose TopicOps capabilities to compatible AI assistants and agent systems.

The functions exposed by the MCP layer include:

Read-only access to topic resources
Query of configuration schemas
Prompt templates (to guide AI on how to interact with TopicOps)
Tool calls (lint, deduplication, difference comparison, simulation execution, trial run, etc.)
Cost estimation

This means that AI agents can directly read topic definitions, execute simulated collections, evaluate query effects, and even assist in creating new topic drafts. For example, a security analyst can let an AI assistant automatically generate the corresponding monitoring topic configuration based on a threat description.

Section 07

Ethical and Compliance Design

The project documentation specially emphasizes respect for network ethics and laws and regulations:

Comply with the robots.txt protocol
Respect API terms of service
Do not bypass paywalls, captchas, login restrictions, or access controls
Do not collect credentials, personal privacy data, or protected information
Do not provide functions to evade detection
Use clear user-agent strings
Prioritize the use of official APIs, public data sources, and public datasets
Store keys only in environment variables; never write them into manifests, logs, or artifacts

These principles reflect the developers' emphasis on responsible data collection and provide enterprise users with confidence in compliant use.

Section 08

Application Scenarios

TopicOps is suitable for various intelligence collection scenarios:

Threat Intelligence Monitoring: Security teams can define topics to track the latest dynamics of specific vulnerabilities, attack techniques, or threat organizations.

Academic Research Tracking: Researchers can monitor arXiv and related conferences to obtain the latest papers in specific fields in a timely manner.

Open Source Intelligence (OSINT): Investigative journalists and analysts can systematically collect public information related to the investigation target.

Competitive Intelligence: Enterprises can monitor competitors' product updates, technology stack changes, and talent dynamics.

Technology Trend Tracking: Developers can track the dynamics of open source projects in specific technologies (such as large language models, vector databases).

TopicOps: A Control Plane and Topic Governance Framework for Open Source Intelligence Collection

Introduction / Main Floor: TopicOps: A Control Plane and Topic Governance Framework for Open Source Intelligence Collection

Project Positioning: Not a Crawler, but a Governance Layer Above Crawlers

Core Concept: Topic Pack

Declarative Configuration and Git-Native Workflow

Data Lineage and Auditability

MCP Protocol Support and AI Agent Integration

Ethical and Compliance Design

Application Scenarios

Continue Reading

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Folkering OS: When the Operating System Itself Is AI—A Self-Evolving Bare-Metal Rust System

LLM-assisted-analysis: A New Approach to Detecting Logical Vulnerabilities in Smart Contracts Using Large Language Models

Building Modern LLM from Scratch: A Tutorial-level Implementation of Llama-style Language Model