Speech and Dialogue | Doctoral Program - Information Engineering and Computer Science

Speech and Dialogue

Computational Models for Analyzing Affective Behavior and Personality from Speech and Text

Firoj Alam

Publications | alam [at] disi.unitn.it (Email)


Automatic analysis and summarization of affective behavior and personality from human-human interactions are becoming a central theme in many research areas including computer and social sciences and psychology. Affective behavior is defined as short-term states, which are very brief in duration, arise in response to an event or situation that are relevant and change rapidly over time. They include empathy, anger, frustration, satisfaction, and dissatisfaction. Personality is defined as individual’s longer-term characteristics that are stable over time and that describe individual’s true nature. The stable personality traits have been captured in psychology by the Big-5 model that includes the following traits: openness, conscientiousness, extraversion, agreeableness and neuroticism. Traditional approaches towards measuring behavioral information and personality use either observer- or self- assessed questionnaires. Observers usually monitor the overt signals and label interactional scenarios, whereas self-assessors evaluate what they perceive from the interactional scenarios. Using this measured behavioral and personality information, a typical descriptive summary is designed to improve domain experts’ decision-making processes. However, such a manual approach is time-consuming and expensive. Thus it motivated us to the design of automated computational models. Moreover, the motiva- tion of studying affective behavior and personality is to design a behavioral profile of an individual, from which one can understand/predict how an individual interprets or values a situation. Therefore, the aim of the work is to design automated computational models for analyzing affective behavior such as empathy, anger, frustration, satisfaction, and dissatisfaction and Big-5 personality traits using behavioral signals that are expressed in conversational interactions.

Spoken Dialog systems

Alessandra Cervone

Publications | alessandra.cervone [at] unitn.it (Email) | Website


My research focuses on Open-domain spoken dialog systems.

Deep Learning for Distant Speech Recognition

Mirco Ravanelli

Publications | mirco.ravanelli [at] unitn.it (Email) | Website


Building computers that understand speech represents a crucial step towards easy-to-use human-machine interfaces. During the last decade, much research has been devoted to improving Automatic Speech Recognition (ASR) technologies, resulting in several popular applications ranging from web-search to car control and radiological reporting, just to name a few. Unfortunately, most state-of-the-art systems provide a satisfactory performance only in close-talking scenarios, where the user is forced to speak very close to a microphone-equipped device. Considering the growing interest towards speech recognition and the progressive use of this technology in everyday lives, it is easy to predict that in the future users will prefer to relax the constraint of handling or wearing any device to access speech recognition services, requiring technologies able to cope with distant-talking interactions also in challenging acoustic environments. A challenging but worthwhile scenario is represented by far-field speech recognition in the domestic environment, where users might prefer to freely interact with their home appliances without wearing or even handling any microphone-equipped device. To improve current distant-talking ASR systems, a promising approach concerns the use of Deep Neural Networks (DNNs). In particular, designing a proper DNN paradigm in a multi-channel far-field scenario can potentially help in overtaking the major limitations of current distant-talking technologies. To reach this ambitious goal, my efforts are focused not only to study proper neural network architectures, but also on devising novel learning algorithms and training strategies, which can be more suitable for distant-taking speech recognition purposes.