WebRTC Peer-to-Peer Video Communication
Uses WebRTC to implement browser-to-browser peer-to-peer communication, with advantages including reducing server relay pressure, SRTP encrypted transmission guarantee, ICE framework handling complex network environments, and dynamically adjusting bitrate and resolution to ensure a smooth experience.
Multimodal Emotion Reasoning Engine
Based on the PyTorch framework, it integrates computer vision and natural language processing models: extracts facial expression feature vectors from video streams, extracts acoustic features from audio streams, and outputs emotion classification results through joint modeling. Multimodal fusion improves accuracy and robustness.
Speaker-Aware Transcription System
Through voiceprint recognition technology, it first performs speaker diarization, then transcribes each segment to generate labeled text, facilitating subsequent retrieval and personalized insights.
RAG-Driven Meeting Record Retrieval
Uses the Nomic embedding model to convert transcription text into vector storage. When users query, it first retrieves relevant segments, injects them into large language model prompts to generate answers, supporting semantic matching and traceable information.
AI-Generated Meeting Insights
Automatically generates structured reports based on transcription and emotion analysis results, including meeting duration statistics, key topic extraction, decision item identification, emotion trend analysis, speech fairness assessment, etc. Visual presentation helps grasp meeting quality.