# Practical Multimodal Embedding Pipeline: E-commerce Product Vector Retrieval System Based on Gemini Batch API

> A complete ETL pipeline project demonstrating how to use the Gemini Embedding 2 model via Batch API to generate text+image multimodal vectors for over 100,000 products, store them in Qdrant for efficient retrieval, with batch processing costs being only half of the synchronous API.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-04-27T14:33:29.000Z
- 最近活动: 2026-04-27T14:54:37.800Z
- 热度: 161.7
- 关键词: Gemini, 多模态Embedding, Batch API, Qdrant, 向量检索, 电商, ETL流水线, HNSW, Matryoshka
- 页面链接: https://www.zingnex.cn/en/forum/thread/embedding-gemini-batch-api
- Canonical: https://www.zingnex.cn/forum/thread/embedding-gemini-batch-api
- Markdown 来源: floors_fallback

---

## Introduction: Practical E-commerce Multimodal Vector Retrieval System Based on Gemini Batch API

This project demonstrates how to build a multimodal vector generation pipeline that can handle over 100,000 products. Using the Batch API capability of the Google Gemini Embedding 2 model, it achieves a unified vector representation of text and images at an extremely low cost (only half of the synchronous API), and stores them in the Qdrant vector database for efficient retrieval, addressing complex needs in e-commerce search, recommendation, and other scenarios.

## Project Background and Core Challenges

Traditional product search relies on keyword matching, which struggles to handle complex semantic needs (e.g., "find a style similar to a red dress but with shorter sleeves"). Multimodal Embedding enables cross-modal semantic search by encoding product text descriptions and images into the same vector space. This project is based on the H&M e-commerce dataset (about 105,000 products, including text attributes and images), with core challenges including: 1. Efficient batch processing for large data scales; 2. Controlling API call costs; 3. Reliability of long-chain ETL; 4. Effective fusion of text and image embeddings.

## Technical Architecture Overview

The pipeline uses an 8-stage design, with data flow: HuggingFace dataset → Data ingestion → Image download → Shard construction → Batch submission → Gemini Batch API → Result collection → Collection initialization → Vector storage → Qdrant vector retrieval. Core components: Embedding model is Gemini Embedding 2 (1536-dimensional Matryoshka representation); API mode selects Batch API (saves 50% cost); vector database uses Qdrant (locally deployed, supports HNSW index and binary quantization); state management uses SQLite (WAL mode, resume from breakpoints).

## Cost Advantages of Gemini Batch API

Gemini Batch API offers about a 50% discount compared to synchronous calls. Cost comparison for a typical record (1 image + ~50 text tokens): 1k records synchronous $0.13 vs Batch ~$0.065; 10k records synchronous $1.3 vs Batch ~$0.65; 100k records synchronous $13 vs Batch ~$6.5; 1M records synchronous $130 vs Batch ~$65. The estimated cost to process the 105k H&M dataset is about $6.8, making large-scale multimodal embedding feasible for small and medium-sized projects.

## Detailed Explanation of the 8-Stage Pipeline

Details of the 8-stage pipeline:
1. Data ingestion: Load the dataset from HuggingFace, write to SQLite state database, discard precomputed embedding columns;
2. Image download: Asynchronous HTTP/2 client for concurrent downloads (default 32 concurrency, 5 retries), SHA256-named cache, state database records download status;
3. Shard construction: Divide pending records into JSONL shards, including base64-encoded images;
4. Batch submission: Check Tier1 rate limits (≤432k tokens), control concurrent task count (≤9), poll status, automatically reset failed tasks;
5. Result collection: Download results, parse embeddings and write to Parquet, update state database;
6. Qdrant collection initialization: Create the hm_products collection, configure 1536-dimensional cosine similarity, HNSW index, binary quantization, payload keyword index;
7. Vector storage: Read vectors from Parquet, generate IDs with UUID5, batch upsert to Qdrant (default 4 concurrency);
8. Validation: Record count matching, random sampling of vector quality, self-search test, cross-modal search test.

## Key Design Decisions

Key design decisions:
1. Idempotency and resume from breakpoints: Each stage is idempotent, SQLite state database tracks record status, supports recovery from interruptions;
2. Rate limiting and quota management: Conservatively use Gemini Tier1 quota (reserve 10% buffer);
3. Deterministic ID generation: UUID5 generates vector point IDs based on product IDs, ensuring no duplicate data in repeated runs;
4. Local-first architecture: Qdrant is deployed locally, eliminating pay-as-you-go costs of cloud services.

## Application Scenarios and Expansion Ideas

Application scenarios:
- Visually similar product recommendation: Real-time retrieval of nearest neighbors to recommend similar styles;
- Cross-modal search: Text-to-image/image-to-text search (e.g., street photos to find mall products);
- Intelligent tag generation: Automatically complete product attribute tags;
- Duplicate product detection: Identify duplicate/highly similar products.
Expansion ideas: Integrate real-time data streams to support incremental updates; integrate lightweight models to reduce latency; add user behavior data to implement personalized reordering.

## Quick Start and Project Conclusion

Quick start steps:
1. Clone the repository and install dependencies: git clone <repo-url> → cd gemini-multimodal-embeddings → uv sync;
2. Configure API key: cp .env.example .env → edit to set GEMINI_API_KEY;
3. Start local Qdrant: docker compose up -d;
4. Pilot run: make pilot (recommended 500 records test);
5. Full run: make full;
6. Check progress: uv run gme status.
Conclusion: Multimodal Embedding reshapes the e-commerce search and recommendation experience. This project leverages the cost advantages of Gemini Batch API to provide a technical blueprint supporting 100k-level products, suitable for reference by relevant developers.