Magdalena Igras-Cybulska, Bartosz Ziółko, Piotr Żelasko, Marcin Witkowski
Statistics of pauses appearing in Polish as a potential source of biometry information for automatic speaker recognition were described. The usage of three main types of acoustic pauses (silent, filled and breath pauses) and syntactic pauses (punctuation marks in speech transcripts) was investigated quantitatively in three types of spontaneous speech (presentations, simultaneous interpretation and...... hiện toàn bộ
AbstractIn this article, we propose a new set of acoustic features for automatic emotion recognition from audio. The features are based on the perceptual quality metrics that are given in perceptual evaluation of audio quality known as ITU BS.1387 recommendation. Starting from the outer and middle ear models of the auditory system, we base our features o...... hiện toàn bộ
Voice activity detection (VAD) based on deep neural networks (DNN) have demonstrated good performance in adverse acoustic environments. Current DNN-based VAD optimizes a surrogate function, e.g., minimum cross-entropy or minimum squared error, at a given decision threshold. However, VAD usually works on-the-fly with a dynamic decision threshold, and the receiver operating characteristic (ROC) curv...... hiện toàn bộ