Zing Forum

Reading

Batch Rewriting Product Descriptions Using Large Language Models: A Practical Solution for SEO Optimization

This article introduces a product description rewriting project based on multiple open-source and closed-source LLMs (including GPT-4o-mini, Mistral, Qwen2, Phi-3, DeepSeek), covering model selection, evaluation metrics, engineering optimization, and practical deployment experience, providing a reproducible technical path for e-commerce content optimization.

LLMSEO电商商品描述文本生成GPT-4oMistralQwen量化批量处理
Published 2026-04-09 03:24Recent activity 2026-04-09 03:58Estimated read 11 min
Batch Rewriting Product Descriptions Using Large Language Models: A Practical Solution for SEO Optimization
1

Section 01

[Main Floor/Introduction] Practical SEO Optimization Solution for Batch Rewriting Product Descriptions Using Multiple LLMs

This article introduces a product description rewriting project based on multiple open-source and closed-source LLMs such as GPT-4o-mini, Mistral, Qwen2, Phi-3, and DeepSeek, covering model selection, evaluation metrics, engineering optimization, and practical deployment experience, providing a reproducible technical path for e-commerce content optimization. The core goal is to solve the problems of high labor costs and difficulty in unifying keyword placement and brand tone faced by traditional manual product description writing, and to achieve batch automated SEO optimization for over 2000 SKU descriptions.

2

Section 02

Project Background and Core Challenges

In e-commerce operations, the quality of product descriptions directly affects search rankings and user conversion rates. Traditional manual writing methods face two major challenges: first, high labor costs when dealing with thousands of SKUs; second, difficulty in ensuring consistency in keyword placement, semantic coherence, and brand tone across descriptions. This project addresses the need to rewrite over 2000 product descriptions and explores how to use Large Language Models (LLMs) to achieve batch automated optimization. The core challenge is: how to generate text that is more compliant with SEO standards and has marketing appeal while maintaining the original meaning and accuracy of key information (such as numbers and entity names).

3

Section 03

Technical Architecture and Model Selection

The project adopts a multi-model comparison strategy and evaluates five mainstream models simultaneously:

  • GPT-4o-mini: OpenAI's lightweight model with low API call cost and fast response speed
  • Mistral-7B-Instruct: Open-source instruction-tuned model, friendly for local deployment
  • Qwen2-7B-Instruct: Alibaba Cloud's open-source model with outstanding Chinese understanding ability
  • Phi-3-medium-128k-instruct: Microsoft's lightweight model with a long context window of up to 128K
  • DeepSeek-LLM-7B-Chat: DeepSeek's open-source dialogue model with excellent Chinese generation quality

The design idea of multi-model parallelism stems from the complexity of e-commerce scenarios—different product categories have different requirements for language style and professional terminology, and a single model is difficult to cover all scenarios. Through horizontal comparison, the optimal model can be matched for different business lines.

4

Section 04

Core Dependencies and Engineering Implementation

The project is built based on the Hugging Face ecosystem, with main dependencies including: AutoTokenizer: Responsible for converting input text into token sequences understandable by the model, supporting special format requirements of different models; AutoModelForCausalLM: Loads pre-trained causal language models and undertakes the core task of text generation; BitsAndBytesConfig: Implements 4-bit/8-bit quantization, reducing the memory usage of large models by more than 50%, enabling 7B-level models to run smoothly on consumer-grade GPUs; accelerate library: Provides distributed training and mixed-precision support to improve inference throughput.

The key function tokenizer.apply_chat_template() uniformly handles multi-turn dialogue formats, ensuring consistency in instruction following across different models. This design greatly simplifies the adaptation cost when switching between multiple models.

5

Section 05

Evaluation System Design

Product description rewriting cannot rely solely on manual review; the project has established a four-layer quantitative evaluation framework: Semantic Similarity: Uses BERTScore and cosine similarity to measure semantic consistency before and after rewriting, preventing information distortion caused by the model's "over-performance". Vocabulary Overlap: Monitors keyword retention rate through ROUGE and BLEU metrics, which is crucial for SEO—if core search terms are replaced during rewriting, it will directly affect page rankings. Text Length Control: Calculates the length ratio before and after rewriting to avoid content that is too long or too short, which may affect page layout and user experience. Key Information Preservation: Independently verifies the accuracy of numbers (prices, specification parameters) and entity names (brands, models), which is a hard requirement in e-commerce scenarios.

6

Section 06

Performance Optimization Strategies

To meet the batch processing demand of over 2000 items, the project implemented two key optimizations: Batch API Calls: Merge single requests into batches for submission to reduce network round-trip overhead. Actual tests show that throughput increases by about 3-5 times in batch mode, and API costs decrease simultaneously. Prompt Caching: Cache and reuse context such as repeatedly appearing product categories and brand style guides to avoid repeated transmission in each request. In long description scenarios, this optimization can save 20%-30% of token consumption.

At the hardware level, the project runs on the Google Colab L4 GPU environment (22.5GB VRAM, 53GB RAM), and achieves single-machine batch processing with quantization technology, without the need to purchase additional expensive computing resources.

7

Section 07

Practical Insights and Recommendations

This project verifies the feasibility boundary of LLMs in e-commerce content production, with key experiences as follows:

  • Model selection should be combined with specific scenarios: GPT-4o-mini is suitable for rapid iteration and English content; Qwen2 and DeepSeek have more advantages in Chinese marketing copy; Phi-3's long context capability is suitable for handling descriptions with detailed specification parameters.
  • Evaluation metrics must be multi-dimensional: A single metric can easily lead to suboptimal solutions—for example, over-pursuing BLEU scores may result in rigid text and ignore marketing appeal.
  • Engineering optimization is the key to implementation: The original unoptimized solution is unbearable in terms of cost and latency; batch processing and caching strategies are the necessary paths to transform a prototype into a production system.

For developers who want to reproduce this solution, it is recommended to start with clarifying evaluation standards, first verify the effectiveness of the model-metric combination on small batches of data, then gradually expand to full-scale data. At the same time, reserve a manual review link as a quality guarantee, especially for the verification of sensitive information such as prices and specifications, which cannot be completely dependent on automation.