Overview of the AMMA-UQ Framework
AMMA-UQ (Adaptive Multi-Modal Attention for Uncertainty Quantification) is an innovative framework extended from the Consistency Hypothesis work by Xiao et al. (UAI 2025). It addresses the uncertainty quantification problem of black-box LLMs and proposes three key technical innovations, aiming to obtain more accurate confidence estimates with fewer sampling times.
The core idea of the framework is: uncertainty quantification should not be a simple "look at discrepancies after multiple samplings", but a refined process of intelligently fusing multi-dimensional signals and dynamically adjusting sampling strategies.
Key Innovation 1: Adaptive Sampling Strategy
Traditional methods usually use a fixed number of samplings (e.g., 10 or 20 times) regardless of the actual complexity of the problem. AMMA-UQ breaks this paradigm and introduces a dynamic sampling mechanism based on entropy stabilization.
Specifically, the framework continuously monitors changes in the entropy value of the output distribution during sampling. When the entropy value stabilizes—i.e., new samplings no longer significantly change the characteristics of the output distribution—sampling stops automatically. This strategy brings significant efficiency improvements: experiments show that compared to fixed sampling, AMMA-UQ reduces sample requirements by an average of 48.7% while maintaining or even improving quantification accuracy.
The significance of this adaptive mechanism is that it allows more reasonable allocation of computing resources: fewer samplings for simple problems and more for complex ones, instead of a one-size-fits-all approach.
Key Innovation 2: Multi-Modal Similarity Fusion
A single similarity metric is often insufficient to fully capture the differences in text outputs. AMMA-UQ innovatively fuses three complementary similarity signals:
Lexical Similarity: Based on traditional metrics like ROUGE-L, it measures the degree of lexical overlap between output texts. These metrics are computationally efficient and sensitive to surface-level changes.
Semantic Similarity: Using pre-trained models like SBERT, it captures the distance of outputs in the semantic vector space. This method can understand outputs that are semantically equivalent but expressed differently.
Task-Specific Similarity: Similarity metrics designed for specific tasks, such as answer correctness judgment in question-answering tasks or information coverage evaluation in summarization tasks.
By fusing these three types of signals, AMMA-UQ can construct a more robust and comprehensive representation of output similarity than a single metric.
Key Innovation 3: Attention Mechanism Aggregation
After obtaining the multi-modal similarity matrix, AMMA-UQ introduces an attention mechanism to learn discriminative weight allocation.
Unlike simple averaging or weighted summation, the attention layer can automatically learn the importance differences between different sample pairs. In the specific implementation, the framework uses an attention network with a hidden layer dimension of 64, where the input is pairwise similarity features and the output is aggregated weights. This data-driven aggregation method allows the framework to adaptively adjust the weights of different similarity signals and the contribution of different sample pairs according to specific tasks and data characteristics.