Section 01
Foley-Omni: Introduction to the Unified Multimodal Audio Generation Model
Foley-Omni is an open-source multimodal audio generation model that supports generating speech, sound effects, and music based on text descriptions and video content, realizing end-to-end video soundtrack synthesis. This project aims to solve the time-consuming and professional problems of traditional video audio production through a unified model architecture, lowering the threshold for audio production.