Zing Forum

Reading

LLM-Enhanced Semantic Networks: A New Framework for Improving Cross-Sectional Stock Return Prediction

This paper proposes a two-stage framework that uses large language models (LLMs) to filter spurious edges in financial networks constructed based on text similarity, enhancing the economic authenticity of the network and significantly improving the performance of pairs trading strategies.

金融网络横截面收益预测大语言模型文本挖掘配对交易网络过滤
Published 2026-04-21 21:59Recent activity 2026-04-22 12:15Estimated read 5 min
LLM-Enhanced Semantic Networks: A New Framework for Improving Cross-Sectional Stock Return Prediction
1

Section 01

[Introduction] LLM-Enhanced Semantic Networks: A New Framework for Improving Stock Return Prediction

This paper proposes a two-stage framework that uses large language models (LLMs) to filter spurious edges in financial networks constructed based on text similarity, enhancing the economic authenticity of the network and significantly improving the risk-adjusted return performance of pairs trading strategies.

2

Section 02

Background: Potential and Existing Pitfalls of Text-Based Financial Networks

In the field of quantitative investment, text-based financial networks can capture cross-sectional correlations between companies and support pairs trading strategies (betting on regression to equilibrium when prices deviate). However, in practice, networks constructed from text similarity often have spurious connections (e.g., accidental co-occurrence of common words or popular concepts), which contaminate the network structure and lead to systematic biases in strategies.

3

Section 03

Methodology: Two-Stage Framework and Signal Aggregation Strategy

Two-Stage Framework: 1. Candidate Graph Construction: Extract text from 10-K filings of U.S. listed companies, generate semantic embeddings, and build a sparse candidate graph using a high similarity threshold; 2. LLM-Enhanced Edge Filtering: Use prompt engineering to guide LLMs to judge the real economic relationships (competition, supply chain, etc.) of candidate connections.

Signal Aggregation: Relationship awareness (weighting different association types) + distance weighting (network distance decay), converted into trading decisions (signal strength is positively correlated with position size).

4

Section 04

Evidence: Significant Improvement in Backtest Performance

Backtests on S&P 500 components from 2011 to 2019 show: The Sharpe ratio of the long-short portfolio after LLM filtering increased from 0.742 to 0.820 (+10%+), and the maximum drawdown narrowed from -10.47% to -7.85%. Compared with traditional filtering methods (industry matching, keyword matching, etc.), LLMs perform better, confirming their unique advantage in understanding business semantics.

5

Section 05

Methodological Implications: LLMs as Network Quality Enhancers

This study demonstrates the value of LLMs in network structure optimization (judging relationships, filtering noise), and this paradigm can be applied to supply chain networks, knowledge graphs, social networks, and other fields. As an intermediate layer tool, LLMs improve input quality and are more robust than end-to-end predictions, opening up new directions for financial AI.

6

Section 06

Limitations and Improvement Opportunities

  1. High computational cost: Significant costs and delays for LLM judgments in large-scale networks; 2. Static network: Regular updates struggle to capture dynamic relationships in a timely manner; 3. LLM bias: Possible misjudgments for emerging/niche industries. Improvement directions: Incremental update mechanism, real-time filtering, bias correction.
7

Section 07

Conclusion: Towards More Intelligent Financial Network Analysis

This study combines LLMs with quantitative methods to enhance the authenticity of financial networks and improve strategy performance. In the future, LLM-enhanced frameworks are expected to expand to scenarios such as bond pricing and commodity analysis, feeding back into network science research and unlocking greater value.