After identifying sensitive heads, CausalLens adopts a three-layer intervention mechanism:
Sensitivity-Guided Intervention: Based on sensitivity scores, directionally adjust the output of high-risk attention heads to reduce their activation intensity when there is insufficient visual evidence.
Multi-Head Causal Intervention: Hallucinations are often the result of the combined action of multi-layer attention networks. CausalLens synchronously intervenes within a specified layer range (e.g., layers 10 to 20) to ensure that the intervention effect propagates deep into the model.
Adaptive Mixing Strategy: Completely replacing attention output may lead to information loss. CausalLens finds the optimal balance between the original representation and the intervened representation through an adjustable mixing parameter (gamma_mix).