How Can the Factuality of Large Language Models Be Automatically Evaluated?
Google DeepMind introduced the FACTS Grounding system in December 2024, which could revolutionise the evaluation of large language models' (LLMs) factuality. This benchmark is the first to enable automated verification of responses based on long documents of up to 32,000 tokens, with a particular focus on source fidelity