Respond
This axis evaluates whether the AI can generate helpful, emotionally appropriate responses.
What We Evaluate
Do responses show empathy, helpfulness, and alignment with the user’s affective state?
Example Tasks
- Complete a response to a user expressing fear, sadness, or frustration.
- Choose or rephrase the most supportive reply from multiple candidates.
- Match tone and emotion style (gentle, encouraging, validating).
Metrics Used
Human Empathy Score (HES) rated on 1–5 scale; optional diversity and appropriateness metrics.