Zing Forum

Reading

WhatsApp Group Chat Podcast Generator: An Open-Source Tool to Convert Chat Logs into Professional Podcasts

This project is a set of command-line tools and Python libraries that can automatically convert WhatsApp group chat logs into two-person conversational podcasts, integrating a complete workflow including message segmentation, script generation, speech synthesis, and audio splicing.

播客生成WhatsApp聊天记录语音合成大语言模型内容转换开源工具AI应用
Published 2026-05-17 22:14Recent activity 2026-05-17 22:25Estimated read 7 min
WhatsApp Group Chat Podcast Generator: An Open-Source Tool to Convert Chat Logs into Professional Podcasts
1

Section 01

WhatsApp Group Chat Podcast Generator: Core Features and Value Overview

The generative-ai-group project developed by Sanand0 is an open-source set of command-line tools and Python libraries. Its core function is to automatically convert WhatsApp group chat logs into high-quality two-person podcasts, covering a complete workflow including message segmentation, script generation, speech synthesis, and audio splicing. This tool solves the problem of fragmented knowledge in technical community group chats being difficult to spread widely, and has both technical highlights and practical application value.

2

Section 02

Project Background and Creative Origin

With the rapid development of generative AI, discussions in technical communities contain rich knowledge value, but chat logs exist in fragmented form and are difficult for a wider audience to consume. Sanand0's generative-ai-group project cleverly solves this problem by converting WhatsApp group chat logs into professional podcasts, lowering the threshold for knowledge sharing, and providing a new idea for secondary dissemination of community content.

3

Section 03

System Architecture and Core Processing Flow

The core system flow is divided into four stages:

  1. Message Segmentation and Organization: Merge JSON files and fix format issues via split_whatsapp_messages.py, store segments using Sunday as the anchor point (Monday to Saturday are included in the current week's Sunday file, Sunday entries go to the next week), and messages with missing timestamps are saved to unknown-time.json;
  2. Threaded Transcription: Identify message reply relationships and organize them into a structured conversation context;
  3. AI Script Generation: Call the OpenAI gpt-5.4-mini model to convert the organized logs into a two-person conversation script;
  4. Speech Synthesis and Splicing: Use Gemini's gemini-3.1-flash-tts-preview interface to generate audio clips with different voices, splice them into a complete podcast via ffmpeg, and config.toml supports custom prompts, TTS styles, and voice characteristics.
4

Section 04

Technical Implementation Highlights

The project's technical highlights include:

  1. Pure Functions and Type Hints: The code uses a pure function style with Python type hints, ensuring high readability and maintainability;
  2. Environment Variable Management: Receive API keys (OPENAI_API_KEY, GEMINI_API_KEY, etc.) via environment variables, protecting sensitive information and enabling flexible configuration;
  3. uv Toolchain Integration: uv is recommended as the package management tool, with fast dependency resolution, a clean experience, and ensuring environment consistency;
  4. RSS Subscription Support: Generate a podcast.xml RSS feed, making it easy for listeners to subscribe and listen via clients.
5

Section 05

Usage Scenarios and Value

The tool has a wide range of application scenarios:

  • Technical community operators: Convert high-quality group discussions into podcasts to extend content lifecycle;
  • Knowledge sharers: Break through the limitations of text to reach audio-consuming audiences;
  • Community members: Review real-time discussions they missed. Macroscopically, this project demonstrates the potential of AI in content form conversion, reconstructing unstructured fragmented conversations into structured narrative audio, involving NLP tasks such as information extraction and content reorganization.
6

Section 06

Scalability, Customization, and CLI Design

Scalability and Customization: Via config.toml, you can modify podcast prompts, adjust the overall TTS style, and configure unique voices for each speaker to adapt to the needs of communities with different themes; CLI Design: The basic usage is uv run podcast.py to automatically process all weekly records; the tts-script subcommand allows specifying a script file for synthesis testing; the --describe option shows interface descriptions; the --format json option outputs structured data for easy integration.

7

Section 07

Summary and Insights

The generative-ai-group project is an elegant AI application case that combines large language model capabilities with traditional software engineering to solve practical content production problems. It is not only a technical tool but also a concrete manifestation of content operation ideas—using AI to amplify the value of human discussions and allow knowledge to flow in richer forms. For AI content generation developers, this project provides a complete reference implementation from data preprocessing to audio output, and each link is worth in-depth learning.