Deep learning models have achieved great success in fields like image recognition and natural language processing, but their internal working mechanisms are often a "black box". Researchers and developers can see inputs and outputs, yet struggle to understand how millions of internal parameters collaborate. This opacity leads to multiple issues: model biases are hard to detect, security vulnerabilities are difficult to find, and model theft is challenging to prevent.
In recent years, model reverse engineering has gradually become an important branch of AI security research. By analyzing the input-output behavior of models, researchers attempt to reconstruct their internal structures—this not only helps understand how models work but also evaluates their robustness and security.