Datasets#
Spoken Emotion Recognition Datasets: A collection of datasets for the purpose of emotion recognition/detection in speech. The table is chronologically ordered and includes a description of the content of each dataset along with the emotions included.
Dataset |
Year |
Content |
Emotions |
Format |
Size |
Language |
Paper |
Access |
License |
---|---|---|---|---|---|---|---|---|---|
2022 |
864 audio files of single-word emotional utterances with Mexican cultural shaping. |
6 emotions provides single-word utterances for anger, disgust, fear, happiness, neutral, and sadness. |
Audio |
0,097 GB |
Spanish (Mexican) |
The Mexican Emotional Speech Database (MESD): elaboration and assessment based on machine learning |
Open |
||
2022 |
Synthesized database of three basic emotions and neutral expression based on rule-based manipulation for a diphone synthesizer which we release to the public |
997 utterances including 6 emotions: angry, bored, happy, neutral, sad and scared |
Audio |
941 MB |
German |
Open |
|||
2021 |
~32700 audio recordings files produced by 154 speakers. Each audio recording corresponds to one English numeral (from βzeroβ to βbillionβ) |
Intonations: neutral, bored, excited and question |
Audio |
2.27 GB |
β |
β |
Open |
Unknown |
|
2021 |
~13285 audio files collected from movies, tv shows and youtube containing speech and non-speech. |
12 different natural emotions (boredom, neutral, happiness, sadness, anger, fear, surprise, disgust, excitement, pleasure, pain, disappointment) with 2 levels of intensity. |
Audio |
2 GB |
Chinese, English, French, Russian and others |
β |
Open |
Unknown |
|
2021 |
29 hours, 3500 sentences, by 10 native English speakers and 10 native Chinese speakers. |
5 emotions: angry, happy, neutral, sad, and surprise. |
Audio, Text |
2.4 GB (zip) |
Chinese, English |
Seen And Unseen Emotional Style Transfer For Voice Conversion With A New Emotional Speech Dataset |
Open |
Academic License |
|
2021 |
40 hours, 6,000+ recordings of 25,000+ sentences by 70+ English speakers (see db link for details). |
continuous emotion dimensions characterized using valence, arousal, and trustworthiness. |
Audio, Video, Text |
15 GB |
English |
Restricted |
Academic License & Commercial License |
||
2020 |
100 hours by over 100 speakers (see db link for details). |
This corpus is annotated with emotional labels using attribute-based descriptors (activation, dominance and valence) and categorical labels (anger, happiness, sadness, disgust, surprised, fear, contempt, neutral and other). |
Audio |
β |
β |
Restricted |
Academic License & Commercial License |
||
2020 |
Recordings and their associated transcriptions by a diverse group of speakers. |
4 emotions: general, joy, anger, and sadness. |
Audio, Text |
β |
Korean |
β |
Partially open |
||
2020 |
400 utterances by 38 speakers (27 male and 11 female). |
4 emotions: angry, happy, neutral, and sad. |
Audio |
0.072 GB |
Urdu |
Cross Lingual Speech Emotion Recognition: Urdu vs. Western Languages |
Open |
β |
|
2020 |
1935 recording by 61 speakers (45 male and 16 female). |
3 levels of emotion. |
Audio |
0.195 GB |
Arabic |
β |
Open |
β |
|
2020 |
non-speech, 1085 audio file by 12 speakers. |
non-speech 6 emotions: achievement, anger, fear, pain, pleasure, and surprise with 3 emotional intensities (low, moderate, strong, peak). |
Audio |
β |
β |
β |
Restricted |
||
2019 |
more than 2000 minutes of audio-visual data of 398 people (201 male and 197 female) coming from 6 cultures. |
emotions are characterized using valence and arousal. |
Audio, Video |
β |
Chinese, English, German, Greek, Hungarian and Serbian |
SEWA DB: A Rich Database for Audio-Visual Emotion and Sentiment Research in the Wild |
Restricted |
||
2019 |
1400 dialogues and 14000 utterances from Friends TV series by multiple speakers. |
7 emotions: Anger, disgust, sadness, joy, neutral, surprise and fear. MELD also has sentiment (positive, negative and neutral) annotation for each utterance. |
Audio, Video, Text |
10.1 GB |
English |
MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations |
Open |
||
2019 |
3000 semi-natural utterances, equivalent to 3 hours and 25 minutes of speech data from online radio plays by 87 native-Persian speakers. |
6 emotions: anger, fear, happiness, sadness, neutral and surprise. |
Audio |
0.101 GB |
Persian |
ShEMO: a large-scale validated database for Persian speech emotion detection |
Open |
β |
|
2019 |
9365 emotional and 332 neutral samples produced by 68 native speakers (23 females, 45 males). |
7/6 emotions: anger, sadness, happiness, fear, surprise, disgust, and the secondary emotion guilt. |
Audio |
β |
Italian |
DEMoS: An Italian emotional speech corpus. Elicitation methods, machine learning, and perception |
Restricted |
EULA: End User License Agreement |
|
2018 |
around 500 utterances by a diverse group of actors (over 5 actors) siumlating various emotions. |
5 emotions: anger, disgust, fear, happiness, and sadness. |
Audio |
0.392 GB |
Greek |
Open |
β |
||
2018 |
Recordings for 4 speakers- 2 males and 2 females. |
The emotional styles are neutral, sleepiness, anger, disgust and amused. |
Audio |
5.88 GB |
English |
The emotional voices database: Towards controlling the emotion dimension in voice generation systems |
Open |
β |
|
2018 |
7356 recordings by 24 actors. |
7 emotions: calm, happy, sad, angry, fearful, surprise, and disgust |
Audio, Video |
24.8 GB |
English |
Open |
|||
2018 |
2400 recording of 240 sentences by 4 actors (2 males and 2 females). |
5 primary emotions: angry, sad, neutral, happy, excited. 5 secondary emotions: anxious, apologetic, pensive, worried, enthusiastic. |
Audio |
β |
English |
An Open Source Emotional Speech Corpus for Human Robot Interaction Applications |
Open |
||
2018 |
6 different sentences by 12 speakers (6 fmelaes + 6 males). |
7 emotions: happy, sad, angry, fearful, surprise, disgust and neutral. Each emotion is acted in 2 different intensities. |
Audio |
2 GB |
French (Canadian) |
β |
Open |
||
2018 |
1115 audio instances sentences extracted from various films. |
5 emotions: anger, contempt, happiness, fear, and sadness. |
Audio |
β |
English, Italian & Spanish |
Categorical vs Dimensional Perception of Italian Emotional Speech |
Restricted |
EULA: End User License Agreement |
|
2018 |
1384 recording by multiple speakers. |
3 emotions: angry, happy, surprised. |
Audio |
2 GB |
Arabic |
Open |
|||
2018 |
144 audio file labelled by 40 listeners. |
Emotion (no speech) defined in regard of valence and arousal. |
Audio |
0.1034 GB |
β |
The Perceived Emotion of Isolated Synthetic Audio: The EmoSynth Dataset and Results |
Open |
||
2018 |
65 hours of annotated video from more than 1000 speakers and 250 topics. |
6 Emotion (happiness, sadness, anger,fear, disgust, surprise) + Likert scale. |
Audio, Video |
β |
English |
Multi-attention Recurrent Network for Human Communication Comprehension |
Open |
||
2018 |
14 different phrases by 12 speakers (6 female + 6 male) for a total of 1167 recordings. |
7 emotions: Happiness, Disgust, Fear, Neutral, Anger, Surprise, Sadness |
Audio |
β |
Portuguese |
VERBO: Voice Emotion Recognition dataBase in Portuguese Language |
Restricted |
Available for research purposes only |
|
2017 |
2199 opinion utterances with annotated sentiment. |
Sentiment annotated between very negative to very positive in seven Likert steps. |
Audio, Video |
β |
English |
Multi-attention Recurrent Network for Human Communication Comprehension |
Open |
||
2017 |
20 sentences by 12 actors. |
4 emotions: angry, sad, happy, neutral, other, without agreement |
Audio, Video |
β |
English |
MSP-IMPROV: An Acted Corpus of Dyadic Interactions to Study Emotion Perception |
Restricted |
Academic License & Commercial License |
|
2017 |
7442 clip of 12 sentences spoken by 91 actors (48 males and 43 females). |
6 emotions: angry, disgusted, fearful, happy, neutral, and sad |
Audio, Video |
β |
English |
Open |
|||
Example emotion videos used in investigation of emotion perception in schizophrenia |
2017 |
6 videos:Two example videos from each emotion category (angry, happy and neutral) by one female speaker. |
3 emotions: angry, happy and neutral. |
Audio, Video |
0.063 GB |
English |
β |
Open |
|
2014 |
6 actors who played 14 sentences. |
6 emotions: disgust, fear, anger, joy, surprise, sadness. |
Audio |
0.355 GB |
Italian |
Open |
β |
||
2013 |
3.8 hours of recordings by 46 participants. |
negative and positive sentiment (valence and arousal). |
Audio, Video |
β |
β |
Introducing the RECOLA Multimodal Corpus of Remote Collaborative and Affective Interactions |
Restricted |
Academic License & Commercial License |
|
2012 |
Videos10 actors portraying 10 states. |
12 emotions: amusement, anxiety, cold anger (irritation), despair, hot anger (rage), fear (panic), interest, joy (elation), pleasure(sensory), pride, relief, and sadness. Plus, 5 additional emotions: admiration, contempt, disgust, surprise, and tenderness. |
Audio, Video |
β |
French |
Introducing the Geneva Multimodal Expression Corpus for Experimental Research on Emotion Perception |
Restricted |
β |
|
2012 |
9114 spontaneous utterances and 2656 acted utterances by 4 professional actors (two male and two female). |
9 emotional states: fear, surprise, sadness, disgust, anger, anticipation, joy, acceptance and the neutral state. |
Audio |
β |
Japanese |
Restricted |
β |
||
2012 |
347 dialogs with 9,083 system-user exchanges. |
Emotions classified as garbage, non-angry, slightly angry and very angry. |
Audio |
1.1 GB |
β |
A Parameterized and Annotated Spoken Dialog Corpus of the CMU Letβs Go Bus Information System |
Open |
License available with the data. Free of charges for research purposes only. |
|
2012 |
95 dyadic conversations from 21 subjects. Each subject converses with another playing one of four characters with emotions. |
5 FeelTrace annotations: activation, valence, dominance, power, intensity |
Audio, Video, Text |
104 GB |
English |
Restricted |
Academic EULA |
||
2011 |
480 British English utterances by 4 males actors. |
7 emotions: anger, disgust, fear, happiness, sadness, surprise and neutral. |
Audio, Video |
β |
English (British) |
Restricted |
Free of charges for research purposes only. |
||
2010 |
2800 recording by 2 actresses. |
7 emotions: anger, disgust, fear, happiness, pleasant surprise, sadness, and neutral. |
Audio |
β |
English |
Open |
|||
2007 |
26 text passage read by 10 speakers. |
4 main emotions: joy, sadness, anger and neutral. |
β |
0.352 GB |
Estonian |
Open |
|||
2007 |
12 hours of audiovisual data by 10 actors. |
5 emotions: happiness, anger, sadness, frustration and neutral. |
β |
β |
English |
IEMOCAP: Interactive emotional dyadic motion capture database |
Restricted |
||
2006 |
A set of human speech with vocal emotion spoken by a Japanese male speaker. |
47 emotions including angry, joyful, disgusting, downgrading, funny, worried, gentle, relief, indignation, shameful, etc. |
Audio |
β |
Japanese |
EMOTIONAL SPEECH SYNTHESIS USING SUBSPACE CONSTRAINTS IN PROSODY |
Restricted |
Available for research purposes only. |
|
2005 |
800 recording spoken by 10 actors (5 males and 5 females). |
7 emotions: anger, neutral, fear, boredom, happiness, sadness, disgust. |
Audio |
β |
German |
Open |
β |
||
2005 |
Videos by 42 subjects, coming from 14 different nationalities. |
6 emotions: anger, fear, surprise, happiness, sadness and disgust. |
Audio, Video |
0.8 GB |
German |
β |
Open |
Free of charges for research purposes only. |
|
2002 |
4 speakers (2 males and 2 females). |
5 emotions: neutral, surprise, happiness, sadness and anger |
β |
β |
Danish |
β |
β |
References#
Swain, Monorama & Routray, Aurobinda & Kabisatpathy, Prithviraj, Databases, features and classifiers for speech emotion recognition: a review, International Journal of Speech Technology, paper1
Dimitrios Ververidis and Constantine Kotropoulos, A State of the Art Review on Emotional Speech Databases, Artificial Intelligence & Information Analysis Laboratory, Department of Informatics Aristotle, University of Thessaloniki, paper2
Florian Eyben, Anton Batliner and Bjoern Schulle, Towards a standard set of acoustic features for the processing of emotion in speech, Acoustical society of America, paper3
Aeluri Pramod Reddy and V Vijayarajan, Extraction of Emotions from Speech-A Survey, VIT University, International Journal of Applied Engineering Research, paper4
Emotional Speech Databases, document
Expressive Synthetic Speech, http://emosamples.syntheticspeech.de/
Contributing#
All contributions are welcome! If you know a dataset that belongs here (see criteria) but is not listed, please feel free to add it. For more information on Contributing, please refer to CONTRIBUTING.md.
If you notice a typo or a mistake, please report this as an issue and help us improve the quality of this list.
Disclaimer#
The maintainer and the contributors try their best to keep this list up-to-date, and to only include working links (using automated verification with the help of the urlchecker-action). However, we cannot guarantee that all listed links are up-to-date. Read more in DISCLAIMER.md.