LLM Digest

Story

arxiv_llm_reliability · Jun 29, 2026 · paper

Source brief

Poller: Are LLMs Suitable for Evaluating the Poetry Understanding Task?

arxiv.orgJun 29, 2026
original source linked

In brief

Traditional automatic evaluation methods have been shown to be unsuitable for modern Chinese poetry because of the distinct nature of this literary genre. Human evaluation remains reliable, but is expensive and not ap...

Feed lens

evaluation

Read the original at arxiv.org →Open in live feed Read that day’s brief

Poller: Are LLMs Suitable for Evaluating the Poetry Understanding Task?

Earlier in this thread 4 items

Towards Continual Motion-Language Agents: LoRA Variants for Incremental Motion Understanding and Generation

A Multi-Dataset Benchmark for Evaluating LLM Agents in Microservice Failure Diagnosis

Evaluating and Enhancing Negation Comprehension in Remote Sensing MLLMs

Evaluating Offline Monitoring of Internal AI Agents