Story

arxiv_llm_reliability ยท Jun 29, 2026 ยท paper

Source brief

Poller: Are LLMs Suitable for Evaluating the Poetry Understanding Task?

arxiv.orgJun 29, 2026
original source linked

In brief

Traditional automatic evaluation methods have been shown to be unsuitable for modern Chinese poetry because of the distinct nature of this literary genre. Human evaluation remains reliable, but is expensive and not ap...

Feed lens
evaluation

Continue reading

Read the original at arxiv.org โ†’Open in live feedRead that dayโ€™s brief

Earlier in this thread 4 items