Sense
This axis evaluates an AI system’s ability to detect and recognize emotional content from input.
What We Evaluate
Can the model identify expressed or implied emotions from user dialogue, even when indirect or masked?
Example Tasks
- Classify a text utterance as expressing one of 10 emotion categories.
- Estimate continuous valence/arousal from user messages.
- Distinguish between literal vs. sarcastic emotional tone.
Metrics Used
Macro-F1 for emotion label classification, CCC (concordance correlation coefficient) for valence/arousal regression.