Section 01
MoRFI: A New Method to Locate the Neural Mechanism of Large Model Hallucinations
Core Idea: Large models are prone to hallucinations when fine-tuned on new knowledge, and the mechanism has long been unclear. The MoRFI method uses sparse autoencoders to analyze residual stream activations, identifies latent directions causally related to hallucinations, and can restore the model's knowledge retrieval ability through single-dimensional intervention, providing a new path to mitigate hallucinations.