Our Research

Current phase: Phase 3 – finalizing the multi-turn Promptfoo benchmarking harness, prompt weights, and scoring plans so Phase 4 can run the pilot and publish results, while we continue to back-propagate feedback from earlier axis reviews.

Phase 1 — Motivation & Literature Phase 2 — Pilot Methodology (SER axes) Phase 3 — Pilot Readiness & Reporting Prep Phase 4

Research Questions

The primary question guiding this project is what is the best way to evaluate emotional intelligence in AI systems?

Several secondary questions help explore this focus:

What frameworks have been used so far?
Where do they fall short?
What patterns can be seen in existing axes, and where are there gaps?

Bigger Picture Goal

How can we understand emotional intelligence in LLMs in a way that can help us develop prosocial AI systems for public benefit?

See the table below for the theoretical foundations that inform our methodology, or visit the README for additional context.

Phase 1 — Motivation & Literature

Goal

Establish a theoretical foundation for EQ benchmarking through literature synthesis, culminating in a peer-reviewed manuscript now under consideration.

Methodology

Systematic reviews across Philosophy, Psychology, Neuroscience, and Computer Science
Comparative analysis of existing EQ benchmarking frameworks and methodologies
Database creation detailing axes, definitions, measurement methods, and Sense, Explain, Respond (SER) alignment

The Phase 1 outputs ground the axes and terminology that inform every subsequent phase.

Phase 2 translates this outline into a reproducible, text-only, non-branching pilot over the SER axes.

Learn more about Phase 1

Phase 2 — Pilot Methodology (SER axes)

Goal

Turn the literature outline into a runnable pilot focused on the Sense, Explain, Respond (SER) axes.

Methodology

Design text-only, non-branching micro-dialogues with fixed user turns (2–5 turns per item)
Author rubrics and safety gates for SER scoring, emphasizing empathy, grounding, and guardrails
Build evaluation harnesses, datasheets, and logging for reproducible pilot runs

Phase 2 locks the pilot scope and deliverables ahead of Phase 3 execution.

Learn more about Phase 2

Phase 3 — Pilot Readiness & Reporting Prep

Goal

Finalize the text-only, non-branching SER pilot configuration—including prompts, weights, and rubric documentation—in preparation for Phase 4 model evaluations.

Methodology

Iterate on fixed-turn micro-dialogues and scoring rubrics using targeted spot-checks for clarity and safety
Lock SER subscores and the MDB-Pilot composite (0.30·Sense + 0.30·Explain + 0.40·Respond) so Phase 4 scoring is reproducible
Update safety gates and extension hooks (branching, multimodal, cultural coverage) based on findings from earlier phases

Phase 3 delivers the polished materials and reporting templates that Phase 4 will use for comprehensive pilot evaluations and public reporting.

Learn more about Phase 3

Phase 4: Empirical Pilot Execution and Iterative Refinement

Goal

Empirically validate and refine benchmarking methodology through practical evaluations.

Methodology

Empirical testing on diverse AI platforms
Quantitative data (accuracy, fairness metrics) and qualitative data (user experiences)
Mixed-methods iterative refinement

Learn more about Phase 4

Phase 5 — Benchmark Framework Development (planned)

Pending completion of the pilot in Phase 4, the team will expand into a comprehensive benchmark framework that incorporates multimodal sensing, cross-cultural validation, longitudinal tracking, and richer adaptation pathways. Detailed documentation will be published once the Phase 4 evaluations conclude.

Ethical Considerations and Transparency

Regular ethical review and IRB approval
Detailed methodological transparency and reproducibility

Theoretical Foundations

The table below is rendered from theoretical_foundations.json so that every page cites the same source metadata.

Foundational frameworks that guide SERA-X benchmark design
Framework	Description	Disciplinary origin	Reference	Role in SERA-X
Loading foundations…