Section 01
[Introduction] Systematic Analysis of Large Language Models for Binary Deobfuscation and the BinDeObfBench Benchmark
This paper systematically evaluates the performance of Large Language Models (LLMs) on binary deobfuscation tasks by constructing the BinDeObfBench benchmark. Key findings include: reasoning ability and domain expertise are more important than model size; supervised fine-tuning (SFT) for deobfuscation tasks outperforms general pre-training; models with reasoning capabilities exhibit stronger robustness in heavily obfuscated scenarios and better cross-architecture generalization. The release of BinDeObfBench provides a standardized evaluation foundation for LLM-assisted deobfuscation research.