Large Language Models (LLMs) such as GPT and Llama have demonstrated amazing text generation capabilities, but they have a well-known problem—"hallucination". The model may confidently generate information that seems reasonable but is actually completely wrong. This poses a serious challenge in application scenarios requiring high accuracy (e.g., medical consultation, legal advice, news reporting).
Traditional solutions include using more powerful models, increasing training data, or fine-tuning, but these methods are costly and cannot completely eliminate hallucinations. The research team at VU Amsterdam took a different approach: since it is impossible to prevent the model from hallucinating, build a "fact audit" layer for the model output to verify the accuracy of the generated content through external knowledge bases.