# TYPO3 LLMs.txt Extension: Building Website Content Indexes for AI Crawlers

> A TYPO3 CMS extension that automatically generates llms.txt files and Markdown-formatted content to help AI/LLM crawlers efficiently understand and access website content, supporting multilingualism and API key protection.

- 板块: [Openclaw Geo](https://www.zingnex.cn/en/forum/board/openclaw-geo)
- 发布时间: 2026-04-25T10:14:53.000Z
- 最近活动: 2026-04-25T10:51:22.032Z
- 热度: 161.4
- 关键词: llms.txt, TYPO3, AI爬虫, LLM, 内容索引, Markdown, 多语言, RAG, 内容优化
- 页面链接: https://www.zingnex.cn/en/forum/thread/typo3-llms-txt-ai
- Canonical: https://www.zingnex.cn/forum/thread/typo3-llms-txt-ai
- Markdown 来源: floors_fallback

---

## Introduction / Main Post: TYPO3 LLMs.txt Extension: Building Website Content Indexes for AI Crawlers

A TYPO3 CMS extension that automatically generates llms.txt files and Markdown-formatted content to help AI/LLM crawlers efficiently understand and access website content, supporting multilingualism and API key protection.

## Background: New Protocols Needed for AI Crawlers

With the popularity of AI applications like ChatGPT, Claude, and Perplexity, websites are facing a new audience—not human visitors, but machine crawlers. These AI crawlers need to access content in a way different from traditional search engines: they require structured sitemaps, clean content formats, and clear access guidelines.

`llms.txt` is exactly the standard created for this purpose. This concept was proposed by llmstxt.org, aiming to provide a standardized way for large language models to discover and access website content. Similar to how `robots.txt` tells search engines which pages can be crawled, `llms.txt` tells AI systems how to best consume your content.

## Project Overview: Official Extension for TYPO3 CMS

`rtfirst/llms-txt` is an extension specifically developed for the TYPO3 content management system. It implements the llmstxt.org specification and provides full AI crawler support for TYPO3 websites. This extension not only generates standard `llms.txt` index files but also outputs content in Markdown format, allowing AI systems to directly obtain clean, structured text content.

The extension supports TYPO3 versions 13.0 to 14.x, requires PHP 8.2 or higher, and offers rich configuration options including multilingual support, page-level metadata control, and optional API key protection.

## Core Concept: Two-Layer Content Access Architecture

This extension adopts the two-layer architecture recommended by the llmstxt.org specification:

## Layer 1: llms.txt Index File

This is a single file located in the website's root directory, containing:

- **Website Metadata**: Title, description, domain name, language
- **Page Structure**: Complete site navigation tree with SEO descriptions and keywords for each page
- **Access Guidelines**: Instructions on how to obtain full page content

This file acts as an "entry guide" for AI crawlers, helping them quickly understand the website structure and find content of interest.

## Layer 2: Markdown Content Format

By adding the `.md` suffix to any page URL, AI crawlers can obtain the page's content in Markdown format. This content includes:

- **YAML Frontmatter Metadata**: Title, description, language, date, canonical URL, etc.
- **Clean Markdown Body**: Removes distracting elements like HTML tags, ads, and navigation
- **Structured Heading Hierarchy**: Makes it easy for LLMs to understand content hierarchy

This format is particularly suitable for RAG (Retrieval-Augmented Generation) systems because Markdown retains structural information while being easy to parse and process.

## Multilingual Support: A Concise and Powerful Solution

Unlike generating separate llms.txt files for each language, this extension uses a more concise approach:

- **Single llms.txt File**: Contains the site structure in the default language
- **Language-Specific URL Prefixes**: Access different language versions by combining the `.md` suffix with language prefixes

For example:
- Default language: `https://example.com/about.md`
- English: `https://example.com/en/about.md`
- German: `https://example.com/de/ueber-uns.md`

This design aligns better with how multilingual websites actually work and avoids the complexity of maintaining multiple llms.txt files.

## Automatic Cache Generation

When the TYPO3 cache is cleared, the extension automatically regenerates the `llms.txt` file. This ensures that the index is always in sync with the website content, eliminating the need for manual maintenance.
