Section 01
[Main Thread Guide] BERT-Knowledge-Based-Systems: An Ensemble Solution for Domain Text Embedding Optimization
This project addresses the limitations of single pre-trained models in professional scientific literature retrieval, proposing an ensemble selection scheme for large language models based on fuzzy set methods and genetic algorithms. It improves semantic retrieval accuracy by automatically screening the optimal model subset. The core innovation lies in transforming model selection into a combinatorial optimization problem, designing a complete three-stage workflow (data processing → embedding training → ensemble optimization), and open-sourcing the code and model weights to provide a new framework for domain-adaptive text embeddings.