# Semantic Triplet Reduction: A New Protocol for LLMs to Truly Understand Table Structures

> Researchers propose the Semantic Triplet Reduction (STR) protocol, which rewrites table cells into atomic fact triplets, eliminating markup overhead in HTML/Markdown representations. It matches or outperforms HTML-based baselines in four Chinese and English table question-answering benchmarks while reducing input tokens.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-05-29T17:10:25.000Z
- 最近活动: 2026-06-01T03:27:17.555Z
- 热度: 103.7
- 关键词: 语义三元组还原, 表格问答, 表格理解, STR协议, TripletQL, 层级表头, 语义表示, 大语言模型, HTML替代方案
- 页面链接: https://www.zingnex.cn/en/forum/thread/llm-arxiv-2605-31550v1
- Canonical: https://www.zingnex.cn/forum/thread/llm-arxiv-2605-31550v1
- Markdown 来源: floors_fallback

---

## [Introduction] Semantic Triplet Reduction Protocol: A New Solution to Improve LLM Table Understanding Efficiency

Researchers propose the Semantic Triplet Reduction (STR) protocol, which rewrites table cells into atomic fact triplets, eliminating HTML/Markdown markup overhead and reducing input tokens. This protocol matches or outperforms HTML-based baselines in four Chinese and English table question-answering benchmarks, and is paired with the TripletQL query-aware router to optimize information filtering. This article will cover aspects such as background, methods, experiments, and applications.

## Research Background: Challenges in Table Understanding and Limitations of Existing Methods

Tables are important carriers of information organization, but LLMs face complexities in understanding tables such as semantic encoding of 2D layouts, implicit relationships in merged cells, and attribute inheritance of hierarchical headers. Existing HTML/Markdown representations have limitations like high markup overhead, heavy inference burden, and implicit semantics, leading to high resource consumption for models.

## Core Methods: STR Protocol and TripletQL Router

The STR protocol converts cells into <item path, feature path, value> triplets, explicitly expressing entity-attribute-value relationships. As a query-aware router, TripletQL filters relevant triplets through question analysis, relevance matching, subset selection, and format optimization to improve efficiency.

## Experimental Evidence: Performance and Efficiency Improvements Across Benchmarks

In four Chinese and English table question-answering benchmarks, STR's performance matches or outperforms HTML baselines; input tokens are significantly reduced, lowering inference costs; smaller models gain relatively more benefits, making it suitable for resource-constrained scenarios; long tables show more obvious advantages with slower growth in markup overhead.

## Technical Depth: STR's Design Philosophy and Paradigm Shift

STR achieves a paradigm shift from visual-oriented to semantic-oriented representation, simulating human cognitive processes. From the perspective of information theory, it eliminates redundancy, is explicitly structured, and composable; inspired by cognitive science, it directly presents entity-attribute-value relationships.

## Application Prospects: Value in Enterprise, Scientific Research, and Open Data Fields

STR can be applied to enterprise data analysis (financial reports, sales data queries), scientific research assistance (experimental data extraction), and open data portals (government data utilization) to improve natural language query efficiency.

## Limitations and Future Directions: Challenges and Prospects

Current limitations include loss of information on complex visual patterns, need for preprocessing of unstructured tables, and insufficient support for multimodal tables. Future directions include adaptive STR, multilingual expansion, integration with visual models, and real-time learning.

## Conclusion: Rethinking Structured Information Presentation for AI

STR improves LLM table understanding efficiency through semantic-oriented representation, reminding us that optimizing information encoding can unlock the potential of existing models and provide a new path for structured information presentation.
