DeepDialogue

A Multi-Turn Emotionally-Rich Spoken Dialogue Dataset

Politecnico di Torino, Kore University of Enna

Abstract

Recent advances in conversational AI have demonstrated impressive capabilities in single-turn responses, yet multi-turn dialogues remain challenging for even the most sophisticated language models. Current dialogue datasets are limited in their emotional range, domain diversity, turn depth, and are predominantly text-only, hindering progress in developing more human-like conversational systems across modalities. To address these limitations, we present DeepDialogue, a large-scale multimodal dataset containing 40,150 high-quality multi-turn dialogues spanning 41 domains and incorporating 20 distinct emotions with coherent emotional progressions. Our approach pairs 9 different language models (4B-72B parameters) to generate 65,600 initial conversations, which we then evaluate through a combination of human annotation and LLM-based quality filtering. The resulting dataset reveals fundamental insights: smaller models fail to maintain coherence beyond 6 dialogue turns; concrete domains (e.g., "cars," "travel") yield more meaningful conversations than abstract ones (e.g., "philosophy"); and cross-model interactions produce more coherent dialogues than same-model conversations. A key contribution of DeepDialogue is its speech component, where we synthesize emotion-consistent voices for all 40,150 dialogues, creating the first large-scale open-source multimodal dialogue dataset that faithfully preserves emotional context across multi-turn conversations.

Dataset Overview

Key Features

Diverse conversation domains
Emotional annotations for each utterance
Paired audio from two different TTS models
Conversations generated from multiple LLM combinations
JSON format for easy parsing and analysis
--
Conversations
--
Avg. Turns/Conversation
--
Hours of Audio
--
Domains

Explore Conversations

Loading conversation...

Select a domain and conversation to begin exploring the dataset

Research Applications

Conversational Analysis

Investigate the dynamics of human-like conversations generated by LLMs

Emotional Model Pretraining

Leverage large-scale data for model pretraining on synthetic emotional speech

Conversational AI

Develop more natural-sounding dialogue systems with appropriate emotional inflection

BibTeX

Complete BibTeX citation will be updated soon.

@misc{koudounas2025deepdialoguemultiturnemotionallyrichspoken,
      title={DeepDialogue: A Multi-Turn Emotionally-Rich Spoken Dialogue Dataset}, 
      author={Alkis Koudounas and Moreno La Quatra and Elena Baralis},
      year={2025},
      eprint={2505.19978},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2505.19978}, 
}