Section 01
Influcoder: A Guide to the Efficient Data Attribution Method
Source: arXiv paper June 2026, 'Influcoder: Distilling Decoders' Gradient Influence Rankings into an Encoder for Data Attribution' (link: http://arxiv.org/abs/2606.13668v1).
Influcoder is an innovative data attribution method. Addressing the problems of slow speed and high storage overhead of traditional influence functions in data attribution for Large Language Model (LLM) training data, it proposes distilling the gradient influence ranking knowledge from the decoder into a lightweight encoder, enabling fast and low-cost influence computation on large-scale datasets and promoting the transition of data attribution from academic research to practical applications.