Section 01
GhostLM: An Open-Source Language Model Built for Cybersecurity (Introduction)
GhostLM is an open-source language model built from scratch using PyTorch, specifically designed for the cybersecurity domain. The v1.0 version's training data includes 516,000 records and approximately 363 million tokens, covering six domains such as code, general language, and mathematical reasoning. It aims to address the limitations of general-purpose LLMs in cybersecurity, such as insufficient depth of domain knowledge and deviations in code understanding, providing support for professional scenarios like code security auditing and threat intelligence processing. At the same time, it promotes community collaboration through an open-source model, though it also faces challenges like knowledge timeliness.