Sense

This axis evaluates an AI system’s ability to detect and recognize emotional content from input.

What We Evaluate

Can the model identify expressed or implied emotions from user dialogue, even when indirect or masked?

Example Tasks

Metrics Used

Macro-F1 for emotion label classification, CCC (concordance correlation coefficient) for valence/arousal regression.

Constructs for Sense