Datasets#

Spoken Emotion Recognition Datasets: A collection of datasets for the purpose of emotion recognition/detection in speech. The table is chronologically ordered and includes a description of the content of each dataset along with the emotions included.

SER-Datasets#
Dataset	Year	Content	Emotions	Format	Size	Language	Paper	Access	License
nEmo	2024	3 hours of samples recorded with the participation of nine actors.	6 emotions: anger, fear, happiness, sadness, surprised, and neutral.	Audio	0.434 GB	Polish	nEMO: Dataset of Emotional Speech in Polish	Open	CC BY 4.0
MDER	2024	2000 voice records of people speaking Moroccan dialect.	5 emotions: Neutral, Happy, Sad, Angry and Fearful.	Audio	0.187 GB	Arabic Moroccan	–	Open	CC BY 4.0
EMOVOME	2024	999 spontaneous voice messages from 100 Spanish speakers, collected from real conversations on a messaging app.	Valence & arrousal dimensions and 7 emotions: happiness, disgust, anger, surprise, fear, sadness, and neutral.	Audio	–	Spanish	EMOVOME Database: Advancing Emotion Recognition in Speech Beyond Staged Scenarios	Partially open	CC BY 4.0
EMNS	2023	1206 high quality labeled utterances by one female speaker (2-3 hours).	Anger, excitement, disgust, happiness, surprise, sadness, and neutral (plus sarcasm)	Audio	0.042 GB	English (British)	EMNS /Imz/ Corpus: An emotive single-speaker dataset for narrative storytelling in games, television and graphic novels	Open	Apache 2.0
CAVES	2023	Full hd visual recordings of 10 native cantonese speakers uttering 50 sentences.	Anger, happiness, sadness, surprise, fear, disgust and neutral	Audio	47 GB	Chinese (cantonese)	A Cantonese Audio-Visual Emotional Speech (CAVES) dataset	Open	Available for research purposes only
BANSpEmo	2023	792 utterance recordings from 22 unprofessional speakers (11 males and 11 females) of six basic emotional reactions of two sets of sentences.	angry, disgusted, happy, surprised, sad, fear	Audio	0.555 GB	Bangla	BANSpEmo: A Bangla Emotional Speech Recognition Dataset	Open	CC BY 4.0
KBES	2023	900 audio signals from 35 actors (20 females and 15 males). Each emotion is represented with two intensity levels (low & high)	angry, disgusted, happy, neutral, sad	Audio	0.337 GB	Bangla	KBES: A dataset for realistic Bangla speech emotion recognition with intensity level	Open	CC BY 4.0
RESD	2022	Russian emotional speech dialogue dataset ~3.5 hours of actor-voiced dialogues, each ~3 minutes long, with speech files (16000 or 44100Hz), with speech-to-text transcripts	anger, disgust, fear, enthusiasm, happiness, neutral, sadness	Audio	0.48 GB	Russian	EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark	Open	MIT
Hi, KIA	2022	A shared short Wakeup Word database focusing on perceived emotion in speech The dataset contains 488 Wakeup Word speech	angry, happy, sad, neutral	Audio	0.75 GB	Korean	Hi, KIA: A Speech Emotion Recognition Dataset for Wake-Up Words	Open	CC BY-SA 4.0
Emozionalmente	2022	6902 labeled samples acted out by 431 amateur actors while verbalizing 18 different sentences	anger, disgust, fear, joy, sadness, surprise, neutral	Audio	0.581 GB	Italian	–	Open	CC BY 4.0
BanglaSER	2022	1467 Bangla speech-audio recordings by 34 non-professional participating actors (17 male and 17 female) from diverse age groups between 19 and 47 years.	angry, happy, neutral, sad, surprise	Audio	0.425 GB	Bangla	BanglaSER: A speech emotion recognition dataset for the Bangla language	Open	CC BY 4.0
B-SER	2022	1224 speech-audio recordings by 34 non-professional participating actors (17 male and 17 female) from diverse age groups between 19 and 47 years.	angry, happy, sad and surprise	Audio	0.363 GB	Bangla	–	Open	CC BY 4.0
Kannada	2022	468 audio samples, six different sentences, pronounced by thirteen people (four male and nine female), in five basic emotions plus one neutral emotion	Anger, Sadness, Surprise, Happiness, Fear, Neutral	Audio	0.1661 GB	Kannada	–	Open	CC BY 4.0
Quechua-SER	2022	12420 audio recordings (~15 hours) and their transcriptions by 7 native speakers.	Emotional labels using dimensions: valence, arousal, and dominance.	Audio	3.53 GB	Quechua Collao	A speech corpus of Quechua Collao for automatic dimensional emotion recognition	Open	CC BY 4.0
MESD	2022	864 audio files of single-word emotional utterances with Mexican cultural shaping.	6 emotions provides single-word utterances for anger, disgust, fear, happiness, neutral, and sadness.	Audio	0.097 GB	Spanish (Mexican)	The Mexican Emotional Speech Database (MESD): elaboration and assessment based on machine learning	Open	CC BY 4.0
SyntAct	2022	Synthesized database with 997 utterances of three basic emotions and neutral expression based on rule-based manipulation for a diphone synthesizer which we release to the public	6 emotions: angry, bored, happy, neutral, sad and scared	Audio	0.941 GB	German	SyntAct: A Synthesized Database of Basic Emotions	Open	CC BY-SA 4.0
BEAT	2022	76-Hour and 30-Speaker of 4 different languages: English (60h), Chinese (12h), Spanish (2h) and Japanese (2h).	8 emotions: happiness, anger, disgust, sadness, contempt, surprise, fear, and neutral	Audio, Video	42 GB	English, Chinese, Spanish, Japanese	A Large-Scale Semantic and Emotional Multi-Modal Dataset for Conversational Gestures Synthesis	Open	Non-commercial license
Dusha	2022	300000 audio recordings (~350 hours) of Russian speech, their transcripts and emotiomal labels. The dataset has two subsets: acted and real-life	4 emotions: angry, happy, sad and neutral. Arousal and valence metrics are also available.	Audio	58 GB	Russian	Large Raw Emotional Dataset with Aggregation Mechanism	Open	Public license with attribution and conditions reserved
MAFW	2022	10045 video-audio clips in the wild.	11 single-label emotion categories (anger, disgust, fear, happiness, neutral, sadness, surprise, contempt, anxiety, helplessness, and disappointment) and 32 multi-label emotion categories.	Audio, Video	–	–	MAFW: A Large-scale, Multi-modal, Compound Affective Database for Dynamic Facial Expression Recognition in the Wild	Restricted	Non-commercial research purposes
EMOVIE	2021	9724 samples with audio files and its emotion human-labeled annotation.	Polarity metrics (positive:+1, negative:-1)	Audio	0.572 GB	Chinese (Mandarin)	EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional Text-to-Speech Model	Open	CC BY-NC-SA 2.0
emoUERJ	2021	Ten sentences from eight actors, equally divided between genders, and they were free to choose the phrases for record audios in four emotions (377 audios).	happiness, anger, sadness or neutral	Audio	0.1051 GB	Portuguese (Brazilian)	–	Open	CC BY 4.0
Thorsten-Voice Dataset 2021.06 emotional	2021	2400 normalized mono recordings by one person (Thorsten Müller) representing 300 sentences.	Amusement, Disgust Anger, Suprise and Neutral (plus drunk, whispering and sleepy states)	Audio	0.399 GB	German	–	Open	CC0: Public Domain
ASED	2021	2474 recordings by 65 participants (25 females and 40 males)). Recordings were judged and rejected according to the opionion of eight judges.	Five emotions: anger, happiness, fear, sadness and neutral	Audio	0.135 GB	Amharic	A New Amharic Speech Emotion Dataset and Classification Benchmark	Open	–
ESCorpus-PE	2021	Spanish peruvian speech gathered from Spanish interviews, TV reports, political debate and testimonials. It contains 3749 utterances, 80 speakers (44 male and 36 female), created from Youtube audios	Valence, Arousal and Dominance	Audio	1.9 GB	Spanish (Peruvian)	–	Open	CC BY-SA 4.0
SUBSECO	2021	7000 sentence-level utterances of the Bangla language, 20 professional actors (10 males and 10 females), recordings, 10 sentences for 7 target emotions.	Anger, Disgust, Fear, Happiness, Neutral, Sadness and Surprise	Audio	1.7 GB	English	SUST Bangla Emotional Speech Corpus (SUBESCO): An audio-only emotional speech corpus for Bangla	Open	CC BY 4.0
Audio-Speech-Sentiment	2021	Audio Speech Sentiment Dataset	4 emotions provides audio recordings of spoken sentences for anger, happiness, sadness, and neutral emotions.	Audio	1.1 GB	English	–	Open	CC0: Public Domain
LSSED	2021	LSSED: A Large-Scale Dataset and Benchmark for Speech Emotion Recognition	Anger, happiness, sadness, disappointment, boredom, disgust, excitement, fear, surprise, normal, and other.	Audio	90 GB	English	LSSED: A Large-Scale Spanish Emotional Speech Database for Speech Processing and Machine Learning	Restricted	-
MLEnd	2021	~32700 audio recordings files produced by 154 speakers. Each audio recording corresponds to one English numeral (from “zero” to “billion”)	Intonations: neutral, bored, excited and question	Audio	2.27 GB	–	–	Open	Unknown
ASVP-ESD	2021	~13285 audio files collected from movies, tv shows and youtube containing speech and non-speech.	12 different natural emotions (boredom, neutral, happiness, sadness, anger, fear, surprise, disgust, excitement, pleasure, pain, disappointment) with 2 levels of intensity.	Audio	2 GB	Chinese, English, French, Russian and others	–	Open	Unknown
ESD	2021	29 hours, 3500 sentences, by 10 native English speakers and 10 native Chinese speakers.	5 emotions: angry, happy, neutral, sad, and surprise.	Audio, Text	2.4 GB	Chinese, English	Seen And Unseen Emotional Style Transfer For Voice Conversion With A New Emotional Speech Dataset	Open	Academic License
MuSe-CAR	2021	40 hours, 6,000+ recordings of 25,000+ sentences by 70+ English speakers (see db link for details).	continuous emotion dimensions characterized using valence, arousal, and trustworthiness.	Audio, Video, Text	15 GB	English	The Multimodal Sentiment Analysis in Car Reviews (MuSe-CaR) Dataset: Collection, Insights and Improvements	Restricted	Academic License & Commercial License
THAI SER	2021	The recordings are 41 hours, 36 minutes long (27,854 utterances), and were performed by 200 professional actors (112 female, 88 male).	5 main emotions assigned to actors: Neutral, Anger, Happiness, Sadness, and Frustration.	Audio	12 GB	Thai	–	Open	CC BY-SA 4.0
French Emotional Speech Database - Oréau	2020	79 utterances with 10 to 13 utterances pro emotion by 32 non-professional speakers.	7 emotions: sadness, anger, disgust, fear, surprise, joy, neutral.	Audio	0.264 GB	French	–	Open	CC BY 4.0
Att-HACK	2020	25 speakers interpreting 100 utterances in 4 social attitudes, with 3-5 repetitions each per attitude for a total of around 30 hours of speech.	expressive speech in French, 100 phrases with multiple versions (3 to 5) in four social attitudes (friendly, distant, dominant and seductive).	Audio	6.6 GB	French	Att-HACK: An Expressive Speech Database with Social Attitudes	Open	CC BY-NC-ND 4.0
MSP-Podcast corpus	2020	100 hours by over 100 speakers (see db link for details).	This corpus is annotated with emotional labels using attribute-based descriptors (activation, dominance and valence) and categorical labels (anger, happiness, sadness, disgust, surprised, fear, contempt, neutral and other).	Audio	13.4 GB	English	The MSP-Conversation Corpus	Restricted	Academic License & Commercial License
AISHELL-3	2020	Roughly 85 hours of emotion-neutral recordings spoken by 218 native Chinese mandarin speakers and total 88035 utterances.	Neutral	Audio	19 GB	Chinese (Mandarin)	AISHELL-3: A Multi-speaker Mandarin TTS Corpus and the Baselines	Open	Apache 2.0
BEASC	2020	Bangla Emotional Audio-Speech Corpus	6 emotions provides Bangla spoken utterances for anger, happiness, sadness, fear, surprise, and neutral.	Audio	9 GB	Bangla	BEASC: Bangla Emotional Audio-Speech Corpus - A Speech Emotion Recognition Corpus for the Low-Resource Bangla Language	Open	CC BY 4.0
emotiontts open db	2020	Recordings and their associated transcriptions by a diverse group of speakers.	4 emotions: general, joy, anger, and sadness.	Audio, Text	–	Korean	–	Partially open	CC BY-NC-SA 4.0
URDU-Dataset	2020	400 utterances by 38 speakers (27 male and 11 female).	4 emotions: angry, happy, neutral, and sad.	Audio	0.072 GB	Urdu	Cross Lingual Speech Emotion Recognition: Urdu vs. Western Languages	Open	–
BAVED	2020	1935 recording by 61 speakers (45 male and 16 female).	3 levels of emotion.	Audio	0.195 GB	Arabic	–	Open	–
VIVAE	2020	non-speech, 1085 audio file by 11 speakers.	non-speech 6 emotions: achievement, anger, fear, pain, pleasure, and surprise with 3 emotional intensities (low, moderate, strong, peak).	Audio	0.0935 GB	Nonverbal (English)	The Variably Intense Vocalizations of Affect and Emotion (VIVAE) corpus prompts new perspective on nonspeech perception	Restricted	CC BY-NC-SA 4.0
VESUS	2019	252 distinct phrases, each read by 10 actors totalling 6 hours of speech.	5 emotions: anger, happiness, sadness, fear and neutral.	Audio	–	English	VESUS: A Crowd-Annotated Database to Study Emotion Production and Perception in Spoken English	Restricted	Academic EULA
Morgan Emotional Speech Set	2019	999 spontaneous voice messages from 100 Spanish speakers, collected from real conversations on a messaging app.	Valence & arrousal dimensions and 4 emotions: happiness, anger, sadness, and calmness.	Audio	0.192 GB	English	Categorical and Dimensional Ratings of Emotional Speech: Behavioral Findings From the Morgan Emotional Speech Set	Open	CC BY 4.0
PMEmo	2019	Dataset containing emotion annotations of 794 songs as well as the simultaneous electrodermal activity (EDA) signals. A Music Emotion Experiment was well-designed for collecting the affective-annotated music corpus of high quality, which recruited 457 subjects.	Valence, Arousal	Audio, EDA	1.3 GB	Chinese, English	The PMEmo Dataset for Music Emotion Recognition	Open	CC BY-SA 4.0
SEWA	2019	more than 2000 minutes of audio-visual data of 398 people (201 male and 197 female) coming from 6 cultures.	emotions are characterized using valence and arousal.	Audio, Video	–	Chinese, English, German, Greek, Hungarian and Serbian	SEWA DB: A Rich Database for Audio-Visual Emotion and Sentiment Research in the Wild	Restricted	SEWA EULA
MELD	2019	1400 dialogues and 14000 utterances from Friends TV series by multiple speakers.	7 emotions: Anger, disgust, sadness, joy, neutral, surprise and fear. MELD also has sentiment (positive, negative and neutral) annotation for each utterance.	Audio, Video, Text	10.1 GB	English	MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations	Open	MELD: GPL-3.0 License
ShEMO	2019	3000 semi-natural utterances, equivalent to 3 hours and 25 minutes of speech data from online radio plays by 87 native-Persian speakers.	6 emotions: anger, fear, happiness, sadness, neutral and surprise.	Audio	0.101 GB	Persian	ShEMO: a large-scale validated database for Persian speech emotion detection	Open	–
DEMoS	2019	9365 emotional and 332 neutral samples produced by 68 native speakers (23 females, 45 males).	7/6 emotions: anger, sadness, happiness, fear, surprise, disgust, and the secondary emotion guilt.	Audio	2.5 GB	Italian	DEMoS: An Italian emotional speech corpus. Elicitation methods, machine learning, and perception	Restricted	EULA: End User License Agreement
AESDD	2018	around 500 utterances by a diverse group of actors (over 5 actors) siumlating various emotions.	5 emotions: anger, disgust, fear, happiness, and sadness.	Audio	0.392 GB	Greek	Speech Emotion Recognition for Performance Interaction	Open	–
Emov-DB	2018	Recordings for 4 speakers- 2 males and 2 females.	The emotional styles are neutral, sleepiness, anger, disgust and amused.	Audio	5.88 GB	English	The emotional voices database: Towards controlling the emotion dimension in voice generation systems	Open	–
OMG Emotion	2018	420 relatively long emotion videos with an average length of 1 minute, collected from a variety of Youtube channels.	7 emotions:anger, disgust, fear, happy, sad, surprise and neutral. Plus valence, arousal.	Audio, Video	–	English	The OMG-Emotion Behavior Dataset	Open	CC BY-NC-SA 3.0
RAVDESS	2018	7356 recordings by 24 actors.	7 emotions: calm, happy, sad, angry, fearful, surprise, and disgust	Audio, Video	24.8 GB	English	The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English	Open	CC BY-NC-SA 4.0
JL corpus	2018	2400 recording of 240 sentences by 4 actors (2 males and 2 females).	5 primary emotions: angry, sad, neutral, happy, excited. 5 secondary emotions: anxious, apologetic, pensive, worried, enthusiastic.	Audio	1.9 GB	English	An Open Source Emotional Speech Corpus for Human Robot Interaction Applications	Open	CC0 1.0
CaFE	2018	6 different sentences by 12 speakers (6 fmelaes + 6 males).	7 emotions: happy, sad, angry, fearful, surprise, disgust and neutral. Each emotion is acted in 2 different intensities.	Audio	2 GB	French (Canadian)	–	Open	CC BY-NC-SA 4.0
EmoFilm	2018	1115 audio instances sentences extracted from various films.	5 emotions: anger, contempt, happiness, fear, and sadness.	Audio	0.277 GB	English, Italian, Spanish	Categorical vs Dimensional Perception of Italian Emotional Speech	Restricted	EULA: End User License Agreement
ANAD	2018	1384 recording by multiple speakers.	3 emotions: angry, happy, surprised.	Audio	2 GB	Arabic	Arabic Natural Audio Dataset	Open	CC BY-NC-SA 4.0
EmoSynth	2018	144 audio file labelled by 40 listeners.	Emotion (no speech) defined in regard of valence and arousal.	Audio	0.1034 GB	–	The Perceived Emotion of Isolated Synthetic Audio: The EmoSynth Dataset and Results	Open	CC BY 4.0
CMU-MOSEI	2018	65 hours of annotated video from more than 1000 speakers and 250 topics.	6 Emotion (happiness, sadness, anger,fear, disgust, surprise) + Likert scale.	Audio, Video	190.1 GB	English	Multi-attention Recurrent Network for Human Communication Comprehension	Open	CMU-MOSEI License
VERBO	2018	14 different phrases by 12 speakers (6 female + 6 male) for a total of 1167 recordings.	7 emotions: Happiness, Disgust, Fear, Neutral, Anger, Surprise, Sadness	Audio	–	Portuguese	VERBO: Voice Emotion Recognition dataBase in Portuguese Language	Restricted	Available for research purposes only
CMU-MOSI	2017	2199 opinion utterances with annotated sentiment.	Sentiment annotated between very negative to very positive in seven Likert steps.	Audio, Video	4.3 GB	English	Multi-attention Recurrent Network for Human Communication Comprehension	Open	CMU-MOSI License
MSP-IMPROV	2017	20 sentences by 12 actors.	4 emotions: angry, sad, happy, neutral, other, without agreement	Audio, Video	3.4 GB	English	MSP-IMPROV: An Acted Corpus of Dyadic Interactions to Study Emotion Perception	Restricted	Academic License & Commercial License
CREMA-D	2017	7442 clip of 12 sentences spoken by 91 actors (48 males and 43 females).	6 emotions: angry, disgusted, fearful, happy, neutral, and sad	Audio, Video	0.607 GB	English	CREMA-D: Crowd-sourced Emotional Multimodal Actors Dataset	Open	Open Database License & Database Content License
Example emotion videos used in investigation of emotion perception in schizophrenia	2017	6 videos:Two example videos from each emotion category (angry, happy and neutral) by one female speaker.	3 emotions: angry, happy and neutral.	Audio, Video	0.063 GB	English	–	Open	Permitted Non-commercial Re-use with Acknowledgment
EMOVO	2014	6 actors who played 14 sentences.	6 emotions: disgust, fear, anger, joy, surprise, sadness.	Audio	0.355 GB	Italian	EMOVO Corpus: an Italian Emotional Speech Database	Open	–
RECOLA	2013	3.8 hours of recordings by 46 participants.	negative and positive sentiment (valence and arousal).	Audio, Video	–	–	Introducing the RECOLA Multimodal Corpus of Remote Collaborative and Affective Interactions	Restricted	Academic License & Commercial License
GEMEP corpus	2012	Videos10 actors portraying 10 states.	12 emotions: amusement, anxiety, cold anger (irritation), despair, hot anger (rage), fear (panic), interest, joy (elation), pleasure(sensory), pride, relief, and sadness. Plus, 5 additional emotions: admiration, contempt, disgust, surprise, and tenderness.	Audio, Video	–	French	Introducing the Geneva Multimodal Expression Corpus for Experimental Research on Emotion Perception	Restricted	–
OGVC	2012	9114 spontaneous utterances and 2656 acted utterances by 4 professional actors (two male and two female).	9 emotional states: fear, surprise, sadness, disgust, anger, anticipation, joy, acceptance and the neutral state.	Audio	5.3 GB	Japanese	Naturalistic emotional speech collectionparadigm with online game and its psychological and acoustical assessment	Restricted	–
LEGO corpus	2012	347 dialogs with 9,083 system-user exchanges.	Emotions classified as garbage, non-angry, slightly angry and very angry.	Audio	1.1 GB	–	A Parameterized and Annotated Spoken Dialog Corpus of the CMU Let’s Go Bus Information System	Open	License available with the data. Free of charges for research purposes only.
SEMAINE	2012	95 dyadic conversations from 21 subjects. Each subject converses with another playing one of four characters with emotions.	5 FeelTrace annotations: activation, valence, dominance, power, intensity	Audio, Video, Text	104 GB	English	The SEMAINE Database: Annotated Multimodal Records of Emotionally Colored Conversations between a Person and a Limited Agent	Restricted	Academic EULA
SAVEE	2011	480 British English utterances by 4 males actors.	7 emotions: anger, disgust, fear, happiness, sadness, surprise and neutral.	Audio, Video	–	English (British)	Multimodal Emotion Recognition	Restricted	Free of charges for research purposes only.
TESS	2010	2800 recording by 2 actresses.	7 emotions: anger, disgust, fear, happiness, pleasant surprise, sadness, and neutral.	Audio	–	English	BEHAVIOURAL FINDINGS FROM THE TORONTO EMOTIONAL SPEECH SET	Open	CC BY-NC-ND 4.0
EEKK	2007	26 text passage read by 10 speakers.	4 main emotions: joy, sadness, anger and neutral.	–	0.352 GB	Estonian	Estonian Emotional Speech Corpus	Open	CC-BY license
IEMOCAP	2007	12 hours of audiovisual data by 10 actors in 5 sessions.	Full: neutral state; happiness; sadness; anger; surprise; fear; disgust; frustration; excited; other. Balance 5 emotions: happiness, anger, sadness, frustration and neutral. Three dimensions: valence, arousal, dominance	Audio, Video, Text	17.7 GB	English	IEMOCAP: Interactive emotional dyadic motion capture database	Restricted	IEMOCAP license
Keio-ESD	2006	A set of human speech with vocal emotion spoken by a Japanese male speaker.	47 emotions including angry, joyful, disgusting, downgrading, funny, worried, gentle, relief, indignation, shameful, etc.	Audio	0.0435 GB	Japanese	EMOTIONAL SPEECH SYNTHESIS USING SUBSPACE CONSTRAINTS IN PROSODY	Restricted	Available for research purposes only.
EMO-DB	2005	800 recording spoken by 10 actors (5 males and 5 females).	7 emotions: anger, neutral, fear, boredom, happiness, sadness, disgust.	Audio	0.049 GB	German	A Database of German Emotional Speech	Open	–
eNTERFACE05	2005	Videos by 42 subjects, coming from 14 different nationalities.	6 emotions: anger, fear, surprise, happiness, sadness and disgust.	Audio, Video	0.8 GB	German	–	Open	Free of charges for research purposes only.
DES	2002	4 speakers (2 males and 2 females).	5 emotions: neutral, surprise, happiness, sadness and anger	–	–	Danish	Documentation of the Danish Emotional Speech Database	–	–

References#

Swain, Monorama & Routray, Aurobinda & Kabisatpathy, Prithviraj, Databases, features and classifiers for speech emotion recognition: a review, International Journal of Speech Technology, paper1
Dimitrios Ververidis and Constantine Kotropoulos, A State of the Art Review on Emotional Speech Databases, Artificial Intelligence & Information Analysis Laboratory, Department of Informatics Aristotle, University of Thessaloniki, paper2
Florian Eyben, Anton Batliner and Bjoern Schulle, Towards a standard set of acoustic features for the processing of emotion in speech, Acoustical society of America, paper3
Aeluri Pramod Reddy and V Vijayarajan, Extraction of Emotions from Speech-A Survey, VIT University, International Journal of Applied Engineering Research, paper4
Emotional Speech Databases, document
Expressive Synthetic Speech, http://emosamples.syntheticspeech.de/

Contributing#

All contributions are welcome! If you know a dataset that belongs here (see criteria) but is not listed, please feel free to add it. For more information on Contributing, please refer to CONTRIBUTING.md.

If you notice a typo or a mistake, please report this as an issue and help us improve the quality of this list.

Disclaimer#

The maintainer and the contributors try their best to keep this list up-to-date, and to only include working links (using automated verification with the help of the urlchecker-action). However, we cannot guarantee that all listed links are up-to-date. Read more in DISCLAIMER.md.

Recommended tools#

Nkululeko

This toolkit has a data directory with each python-preprocessing script that can load most datasets in this list. The processing script there will split the data into train, validation, and test sets, and save them as CSV files with file paths and labels. Then, you can make make experiments to detect emotions from speech using that dataset with Nkululeko or other tools.

ERTK

Similar to Nkululeko, ERTK (emotion recognition toolkit) also has dataset directory that can load most datasets in this list.