SERA-X axes diagram

tentative axes:

Sense Explain Respond Adapt Extended

Sense

This axis evaluates an AI system’s ability to detect and recognize emotional content from input.

What We Evaluate

Can the model identify expressed or implied emotions from user dialogue, even when indirect or masked?

Example Tasks

Metrics Used

Macro-F1 for emotion label classification, CCC for valence/arousal regression.

Explain

This axis tests whether the system can infer the causes or appraisals that explain an emotion.

What We Evaluate

Can the system identify what situational factors or beliefs caused an emotion?

Example Tasks

Metrics Used

Appraisal vector accuracy, cause-label accuracy, partial credit for next-emotion inference.

Respond

This axis evaluates whether the AI can generate helpful, emotionally appropriate responses.

What We Evaluate

Do responses show empathy, helpfulness, and alignment with the user’s affective state?

Example Tasks

Metrics Used

Human Empathy Score (HES) rated on 1–5 scale.

Adapt

This axis measures how well the model adjusts to new cultural, linguistic, or situational emotion norms.

What We Evaluate

Does the system retain old skills while adapting to new affective expressions and norms?

Example Tasks

Metrics Used

Forward transfer gain and catastrophic forgetting in “Experience Pack” tasks.

Extended

The Extended Axis evaluates AI emotional intelligence across interactions, social contexts, and system-level effects.

What We Evaluate

This axis evaluates emotional intelligence at multiple levels of human-AI interaction, including:

Example Tasks

Metrics Used

Human-AI Dyad empathy gain, interaction-cost, scenario goal-completion.