Magdalena Igras-Cybulska, Bartosz Ziółko, Piotr Żelasko, Marcin Witkowski
Statistics of pauses appearing in Polish as a potential source of biometry
information for automatic speaker recognition were described. The usage of three
main types of acoustic pauses (silent, filled and breath pauses) and syntactic
pauses (punctuation marks in speech transcripts) was investigated quantitatively
in three types of spontaneous speech (presentations, simultaneous interpretation
and... hiện toàn bộ
Abstract In this article, we propose a new set of acoustic features for
automatic emotion recognition from audio. The features are based on the
perceptual quality metrics that are given in perceptual evaluation of audio
quality known as ITU BS.1387 recommendation. Starting from the outer and middle
ear models of the auditory system, we base our features on the masked perceptual
loudness which defi... hiện toàn bộ
Voice activity detection (VAD) based on deep neural networks (DNN) have
demonstrated good performance in adverse acoustic environments. Current
DNN-based VAD optimizes a surrogate function, e.g., minimum cross-entropy or
minimum squared error, at a given decision threshold. However, VAD usually works
on-the-fly with a dynamic decision threshold, and the receiver operating
characteristic (ROC) curv... hiện toàn bộ