Zing Forum

Reading

LLM-based E-commerce Intelligent Analytics Assistant: Fusion Practice of RAG and NL-to-SQL

An e-commerce data analysis system integrating LLM, Retrieval-Augmented Generation (RAG), and Natural Language to SQL (NL-to-SQL) technologies, enabling business personnel to obtain in-depth data insights using natural language.

LLMRAGNL-to-SQL电商数据分析自然语言查询FAISSGroqStreamlit智能分析
Published 2026-06-07 02:45Recent activity 2026-06-07 02:47Estimated read 7 min
LLM-based E-commerce Intelligent Analytics Assistant: Fusion Practice of RAG and NL-to-SQL
1

Section 01

Introduction: E-commerce Intelligent Analytics Assistant Integrating LLM, RAG, and NL-to-SQL

This article introduces an e-commerce data analysis system that combines Large Language Model (LLM), Retrieval-Augmented Generation (RAG), and Natural Language to SQL (NL-to-SQL) technologies, aiming to enable business personnel to obtain in-depth data insights through natural language. The project is maintained by Rihanashariff and was published on the GitHub platform on June 6, 2026, with the original title "LLM-Powered-Analytics-Assistant-with-RAG."

2

Section 02

Project Background: Pain Points and Needs of E-commerce Data Analysis

In e-commerce operations, business personnel face two major challenges: First, massive structured data (transactions, user behavior) requires complex SQL skills to extract value, and relying on the data team leads to long waiting cycles; second, unstructured data such as customer reviews is difficult to perform semantic understanding and sentiment analysis through traditional BI tools. The need is to build a natural language interaction system that allows business personnel to directly ask questions to the data and get visual results.

3

Section 03

System Architecture and Technology Selection

Data Layer

Uses the public dataset of Brazilian e-commerce Olist (about 100,000 orders, 8 tables, 40,000 customer reviews, 2016-2018), stored in SQLite.

Core Engine

  • NL-to-SQL module: Converts English questions to SQL
  • RAG retrieval module: Processes unstructured data based on FAISS vector database
  • Hybrid reasoning module: Intelligently coordinates SQL and RAG
  • Visualization module: Automatically generates Plotly charts

Technology Stack

LLM API (Groq), embedding model (Sentence Transformers), vector database (FAISS), frontend (Streamlit), data processing (Pandas/NumPy)

4

Section 04

Detailed Explanation of Three Query Modes

Structured Data Query (SQL Mode)

For metrics like order volume and sales, automatically generates SQL, e.g.:

  • Top 5 product categories with the highest revenue
  • Average order delivery time

Unstructured Data Analysis (RAG Mode)

Processes customer reviews through FAISS semantic retrieval + LLM summary/sentiment analysis, e.g.:

  • Customer reviews on delivery services
  • Main reasons for low-rated reviews

Hybrid Analysis Mode (HYBRID Mode)

Combines structured and unstructured data, e.g.:

  • Impact of delivery time on customer satisfaction
  • Product categories with low sales and poor reviews

Hybrid Analysis Mode (HYBRID Mode)

Combines structured and unstructured data, e.g.:

  • Impact of delivery time on customer satisfaction
  • Product categories with low sales and poor reviews
5

Section 05

Intelligent Routing and Visualization Capabilities

Intelligent Routing

LLM automatically judges the query type:

  • Only needs structured data → SQL mode
  • Only needs text analysis → RAG mode
  • Combination of both → HYBRID mode

Visualization

Automatically recommends chart types (bar charts, line charts, etc.), generates Plotly interactive charts, and achieves the experience of "get charts immediately after asking questions."

6

Section 06

Project Structure and Practical Value

Project Structure

  • data/: Database and vector indexes
  • sql/: NL-to-SQL conversion
  • rag/: Embedding generation and semantic retrieval
  • llm/: Routing, sentiment analysis, result synthesis, chart generation
  • app.py: Streamlit interface

Practical Value

  1. Natural language interface lowers the threshold for data analysis
  2. RAG effectively handles unstructured text
  3. Hybrid reasoning answers complex business questions
  4. Open-source tech stack (Groq replaces OpenAI, FAISS replaces commercial vector databases) controls costs
7

Section 07

Summary and Insights

This project provides a practical reference implementation for e-commerce data analysis, showing the complete path from data preparation to interface design. With the improvement of LLM capabilities and cost reduction, conversational data analysis is expected to become a standard configuration of enterprise BI tools, and its open-source implementation provides valuable practical experience for the industry.