Story
arxiv_cs_cl ยท Jun 25, 2026 ยท paper
Source brief
Paved with True Intents: Intent-Aware Training Improves LLM Safety Classification Across Training Regimes
arxiv.orgJun 25, 2026
original source linked
In brief
We argue that safety classifiers should model user intent as an explicit signal between the prompt and the final label. To study this, we introduce AIMS, a human-annotated dataset of 1,724 difficult safety prompts, ea...
Feed lens
eval