arxiv_llm_reliability ยท Jun 18, 2026 ยท paper
A Systematic Evaluation of Black-Box Uncertainty Estimation Methods for Large Language Models
In brief
Although large language models (LLMs) have shown strong capabilities across a wide range of tasks, their outputs often remain unreliable and may contain hallucinations, making uncertainty estimation (UE) essential for...
agentevaluation