๐Ÿ“ฐ Story

arxiv_llm_reliability ยท Jun 12, 2026 ยท paper

โ† Live feed ๐Ÿ“ˆ Storylines ๐Ÿ“ฐ Daily recap ๐Ÿ—“๏ธ Weekly recap ๐Ÿ”” RSS

CORA: Analyzing and bridging thinking-answer gap in Multimodal RLVR via Consistency-Oriented Reasoning Alignment

In brief

Reinforcement learning with verifiable rewards (RLVR) has successfully elicited the reasoning capabilities of large language models, motivating its extension to multimodal scenarios. Existing methods primarily focus o...

evaluation
Read the original at arxiv.org โ†’Open in live feed

Related stories 4 items