Analysis of speech production real-time MRI

Computer Speech & Language - Tập 52 - Trang 1-22 - 2018
Vikram Ramanarayanan1,2, Sam Tilsen3, Michael Proctor4, Johannes Töger5, Louis Goldstein6, Krishna S. Nayak6, Shrikanth Narayanan6
1Educational Testing Service R&D, San Francisco, CA, United States
2University of California, San Francisco, CA, United States
3Cornell University, Ithaca, New York
4Macquarie University New South Wales Australia
5Lund University, Lund, Sweden
6University of Southern California Los Angeles, CA, United States

Tài liệu tham khảo

Arens, 2001, Magnetic resonance imaging of the upper airway structure of children with obstructive sleep apnea syndrome, Am. J. Respir. Crit. Care Med., 164, 698, 10.1164/ajrccm.164.4.2101127 Asadiabadi, 2017, Vocal tract airway tissue boundary tracking for rtMRI using shape and appearance priors, 636 Atal, 1983, Efficient coding of LPC parameters by temporal decomposition, 8, 81 Badin, 1998, A three-dimensional linear articulatory model based on MRI data Bae, 2011, Real-time magnetic resonance imaging of velopharyngeal activities with simultaneous speech recordings, Cleft Palate-Craniofac. J., 48, 695, 10.1597/09-158 Baer, 1991, Analysis of vocal tract shape and dimensions using magnetic resonance imaging: vowels, J. Acoust. Soc. Am., 90, 799, 10.1121/1.401949 Beautemps, 1995, Deriving vocal-tract area functions from midsagittal profiles and formant frequencies: a new model for vowels and fricative consonants based on experimental data, Speech Commun., 16, 27, 10.1016/0167-6393(94)00045-C Beer, 2004, Dynamic near-real-time magnetic resonance imaging for analyzing the velopharyngeal closure in comparison with videofluoroscopy, J. Magn. Reson. Imaging, 20, 791, 10.1002/jmri.20197 Birkholz, 2013, Modeling consonant-vowel coarticulation for articulatory speech synthesis, PLoS One, 8, e60603, 10.1371/journal.pone.0060603 Birkholz, 2006, Vocal tract model adaptation using magnetic resonance imaging, 493 Bresch, 2010, Statistical multi-stream modeling of real-time MRI articulatory speech data Bresch, 2009, Region segmentation in the frequency domain applied to upper airway real-time magnetic resonance images, IEEE Trans. Med. Imaging, 28, 323, 10.1109/TMI.2008.928920 Browman, 1995, Dynamics and articulatory phonology, 175 Burdumy, 2016 Byrd, 2003, The elastic phrase: modeling the dynamics of boundary-adjacent lengthening, J. Phon., 31, 149 Byrd, 2009, Timing effects of syllable structure and stress on nasals: a real-time MRI examination, J. Phon., 37, 97 Carey, 2017, Vocal tract images reveal neural representations of sensorimotor transformation during speech imitation, Cereb. Cortex, 27, 3064, 10.1093/cercor/bhx056 Carignan, 2013, The role of the pharynx and tongue in enhancement of vowel nasalization: a real-time MRI investigation of french nasal vowels, 3042 Carignan, 2015, A real-time MRI investigation of the role of lingual and pharyngeal articulation in the production of the nasal vowel system of French, J. Phon., 50, 34 Chi, 2011, Identification of craniofacial risk factors for obstructive sleep apnoea using three-dimensional MRI, Eur. Respir. J., 38, 348, 10.1183/09031936.00119210 Cootes, 1995, Active shape models-their training and application, Comput. Vis. Image Underst., 61, 38, 10.1006/cviu.1995.1004 Cootes, 2001, Active appearance models, IEEE Trans. Pattern Anal. Mach. Intell., 23, 681, 10.1109/34.927467 Delvaux, 2002, French nasal vowels: acoustic and articulatory properties, 53 Demolin, 1997, Coarticulation and articulatory compensations studied by dynamic MRI Demolin, 2002, Real-time MRI and articulatory coordination in speech, C. R. Biol., 325, 547, 10.1016/S1631-0691(02)01458-0 Demolin, 2000, Real time MRI and articulatory coordinations in vowels, 86 Deng, 1998, A dynamic, feature-based approach to the interface between phonology and phonetics for speech modeling and recognition, Speech Commun., 24, 299, 10.1016/S0167-6393(98)00023-5 Deng, 1997, Production models as a structural basis for automatic speech recognition, Speech Commun., 22, 93, 10.1016/S0167-6393(97)00018-6 Ding, 2010, Convex and semi-nonnegative matrix factorizations, IEEE Trans. Pattern Anal. Mach. Intell., 32, 45, 10.1109/TPAMI.2008.277 Drissi, 2011, Feasibility of dynamic MRI for evaluating velopharyngeal insufficiency in children, Eur. Radiol., 21, 1462, 10.1007/s00330-011-2069-7 Echternach, 2016, Morphometric differences of vocal tract articulators in different loudness conditions in singing, PLoS One, 11, e0153792, 10.1371/journal.pone.0153792 Eide, 1996, A parametric approach to vocal tract length normalization, 1, 346 Engwall, 2003, A revisit to the application of MRI to the analysis of speech production – testing our assumptions, 43 Engwall, 2004, From real-time MRI to 3D tongue movements Engwall, 1999, Collecting and analysing two and three-dimensional MRI data for Swedish, KTH STL-QPSR, 3, 011 Eryildirim, 2011, A guided approach for automatic segmentation and modeling of the vocal tract in MRI images, 61 Fitch, 1999, Morphology and development of the human vocal tract: a study using magnetic resonance imaging, J. Acoust. Soc. Am., 106, 1511, 10.1121/1.427148 Frankel, 2001, ASR-articulatory speech recognition Freitas, 2016, Comparison of cartesian and non-cartesian real-time MRI sequences at 1.5 T to assess velar motion and velopharyngeal closure during speech, PLoS One, 11, e0153322, 10.1371/journal.pone.0153322 Fu, 2017, High-frame-rate full-vocal-tract 3D dynamic speech imaging, Magnet. Reson. Med., 77, 1619, 10.1002/mrm.26248 Fu, 2015, High-resolution dynamic speech imaging with joint low-rank and sparsity constraints, Magnet. Reson. Med., 73, 1820, 10.1002/mrm.25302 Ghosh, 2011, Automatic speech recognition using articulatory features from subject-independent acoustic-to-articulatory inversion, J. Acoust. Soc. Am., 130, EL251, 10.1121/1.3634122 Ghosh, 2011, A subject-independent acoustic-to-articulatory inversion, 4624 Greenwood, 1992, Measurements of vocal tract shapes using magnetic resonance imaging, IEE Proc. I – Commun. Speech Vis., 139, 553, 10.1049/ip-i-2.1992.0074 Hagedorn, 2014, Characterizing post-glossectomy speech using real-time MRI, 170 Hagedorn, 2011, Automatic analysis of singleton and geminate consonant articulation using real-time magnetic resonance imaging, 409 Hagedorn, 2017, Characterizing articulation in apraxic speech using real-time magnetic resonance imaging, J. Speech Lang Hear. Res., 60, 877, 10.1044/2016_JSLHR-S-15-0112 Hardcastle, 1972, The use of electropalatography in phonetic research, Phonetica, 25, 197, 10.1159/000259382 Harshman, 1977, Factor analysis of tongue shapes, J. Acoust. Soc. Am., 62, 693, 10.1121/1.381581 Hart, 2010, A neural basis for motor primitives in the spinal cord, J. Neurosci., 30, 1322, 10.1523/JNEUROSCI.5894-08.2010 Heinz, 1964, On the derivation of area functions and acoustic spectra from cinéradiographic films of speech, J. Acoust. Soc. Am., 36, 1037, 10.1121/1.2143313 Hewer, 2014, A hybrid approach to 3D tongue modeling from vocal tract MRI using unsupervised image segmentation and mesh deformation, 418 Iltis, 2015, High-speed real-time magnetic resonance imaging of fast tongue movements in elite horn players, Quant. Imaging Med. Surg., 5, 374 Israel, 2012, Emphatic segments and emphasis spread in Lebanese Arabic: a real-time magnetic resonance imaging study Jolliffe, 2002 Jung, 1996, Deriving gestural scores from articulator-movement records using weighted temporal decomposition, IEEE Trans. Speech Audio Process., 4, 2, 10.1109/TSA.1996.481448 Kass, 1988, Snakes: active contour models, Int. J. Comput. Vis., 1, 321, 10.1007/BF00133570 Katsamanis, 2011, Validating RT-MRI based articulatory representations via articulatory recognition Kessler, 2015, The emerging science of quantitative imaging biomarkers terminology and definitions for scientific studies and regulatory submissions, Stat. Methods Med. Res., 24, 9, 10.1177/0962280214537333 Kim, 2014, Enhanced airway-tissue boundary segmentation for real-time magnetic resonance imaging data, 222 Kim, 2012, Improved imaging of lingual articulation using real-time multislice MRI, J. Magn. Reson. Imaging, 35, 943, 10.1002/jmri.23510 Kröger, 2009, Articulatory synthesis of speech and singing: State of the art and suggestions for future research, Vol. 5398, 306 Kröger, 2007, A gesture-based concept for speech movement control in articulatory speech synthesis, 174 Labrunie, 2016, Tracking contours of orofacial articulators from real-time MRI of speech, 470 Ladefoged, 1971, Direct Measurement of the Vocal Tract, J. Acoust. Soc. Am., 49, 104, 10.1121/1.1975547 Lammert, 2013, Statistical methods for estimation of direct and differential kinematics of the vocal tract, Speech Commun., 55, 147, 10.1016/j.specom.2012.08.001 Lammert, 2011, Automatic identification of stable modes and fluctuations in a repetitive task using real-time MRI Lammert, 2013, Interspeaker variability in hard palate morphology and vowel production, J. Speech Lang. Hear. Res., 56, S1924, 10.1044/1092-4388(2013/12-0211) Lammert, 2010, Data-driven analysis of realtime vocal tract MRI using correlated image regions, 1572 Lammert, 2015, On short-time estimation of vocal tract length from formant frequencies, PLoS One, 10, e0132193, 10.1371/journal.pone.0132193 Lee, 2003, Variational inference and learning for segmental switching state space models of hidden speech dynamics, 1, I Lee, 1998, A frequency warping approach to speaker normalization, IEEE Trans. Speech Audio Process., 6, 49, 10.1109/89.650310 Lee, 2015, Systematic variation in the articulation of the Korean liquid across prosodic positions Li, 2010, Distance regularized level set evolution and its application to image segmentation, IEEE Trans. Image Process., 19, 3243, 10.1109/TIP.2010.2069690 Li, 2016, Speaker verification based on the fusion of speech acoustics and inverted articulatory signals, Comput. Speech Lang., 36, 196, 10.1016/j.csl.2015.05.003 Li, 2006, The relationships among various nonnegative matrix factorization methods for clustering, 362 Ling, 2013, Articulatory control of HMM-based parametric speech synthesis using feature-space-switched multiple regression, IEEE Trans. Audio Speech Lang. Proc., 21, 207, 10.1109/TASL.2012.2215600 Ling, 2009, Integrating articulatory features into HMM-based parametric speech synthesis, IEEE Trans. Audio Speech Lang. Proc., 17, 1171, 10.1109/TASL.2009.2014796 Lingala, 2016, Recommendations for real-time speech MRI, J. Magn. Reson. Imaging, 43, 28, 10.1002/jmri.24997 Harandi, 2015, 3D segmentation of the tongue in MRI: a minimally interactive model-based approach, Comput. Methods Biomech. Biomed. Eng. Imaging Vis., 3, 178, 10.1080/21681163.2013.864958 Ma, 2004, Target-directed mixture dynamic models for spontaneous speech recognition, IEEE Trans. Speech Audio Process., 12, 47, 10.1109/TSA.2003.818074 Mády, 2003, Consonant articulation in glossectomee speech evaluated by dynamic MRI, 3233 Maeda, 1979, An articulatory model of the tongue based on a statistical analysis, J. Acoust. Soc. Am., 65, S22, 10.1121/1.2017158 Maeda, 1990, Compensatory articulation during speech: evidence from the analysis and synthesis of vocal-tract shapes using an articulatory model, Speech Prod. Speech Model. Part of the NATO ASI Series book series (ASID, volume 55), 131 Mcdermott, 2006, Production-oriented models for speech recognition, IEICE Trans. Inf. Syst., 89, 1006, 10.1093/ietisy/e89-d.3.1006 McGowan, R., 1994. Knowledge from speech production used in speech technology: Articulatory synthesis. Haskins Laboratories Status Report on Speech Research SR-117/118, 25–29. Mermelstein, 1973, Articulatory model for the study of speech production, J. Acoust. Soc. Am., 53, 1070, 10.1121/1.1913427 Metze, 2002, A flexible stream architecture for ASR using articulatory features Mussa-Ivaldi, 1999, Motor primitives, force-fields and the equilibrium point theory, 392 Narayanan, 2004, An approach to real-time magnetic resonance imaging for speech production, J. Acoust. Soc. Am., 115, 1771, 10.1121/1.1652588 Narayanan, 2014, Real-time magnetic resonance imaging and electromagnetic articulography database for speech production research (TC), J. Acoust. Soc. Am., 136, 1307, 10.1121/1.4890284 Niebergall, 2013, Real-time MRI of speaking at a resolution of 33 ms: undersampled radial flash with nonlinear inverse reconstruction, Magnet. Reson. Med., 69, 477, 10.1002/mrm.24276 Öhman, 1967, Numerical model of coarticulation, J. Acoust. Soc. Am., 41, 310, 10.1121/1.1910340 Olthoff, 2014, On the physiology of normal swallowing as revealed by magnetic resonance imaging in real time, Gastroenterol. Res. Pract., 2014, 1, 10.1155/2014/493174 Ostry, 1996, Coarticulation of jaw movements in speech production: is context sensitivity in speech kinematics centrally planned?, J. Neurosci., 16, 1570, 10.1523/JNEUROSCI.16-04-01570.1996 Perkell, 1992, Electromagnetic midsagittal articulometer systems for transducing speech articulatory movements, J. Acoust. Soc. Am., 92, 3078, 10.1121/1.404204 Prasad, 2016, Information theoretic optimal vocal tract region selection from real time magnetic resonance images for broad phonetic class recognition, Comput. Speech Lang., 39, 108, 10.1016/j.csl.2016.03.003 Proctor, 2013, Paralinguistic mechanisms of production in human beatboxing: a real-time magnetic resonance imaging study, J. Acoust. Soc. Am., 133, 1043, 10.1121/1.4773865 Proctor, 2009, Articulatory comparison of Tamil liquids and stops using real-time magnetic resonance imaging, J. Acoust. Soc. Am., 125, 2568, 10.1121/1.4783732 Proctor, 2013, Velic coordination in French Nasals: a realtime magnetic resonance imaging study, 577 Proctor, 2010, Temporal analysis of articulatory speech errors using direct image analysis of real-time magnetic resonance imaging, J. Acoust. Soc. Am., 128, 2289, 10.1121/1.3508036 Proctor, 2015, Articulation of English vowels in running speech: a real-time MRI study Proctor, 2012, Articulatory bases of English liquids, 285 Proctor, 2016, Lingual consonant production in Khoekhoe: a real-time MRI study, 337 Proctor, 2010, Rapid semi-automatic segmentation of real-time Magnetic Resonance Images for parametric vocal tract analysis, 1576 Proctor, 2012, Articulation of Mandarin Sibilants: a multi-plane realtime MRI study Raeesy, 2013, Automatic segmentation of vocal tract MR images, 1328 Rahim, 1993, On the use of neural networks in articulatory speech synthesis, J. Acoust. Soc. Am., 93, 1109, 10.1121/1.405559 Ramanarayanan, 2009, Analysis of pausing behavior in spontaneous speech using real-time magnetic resonance imaging of articulation, J. Acoust. Soc. Am., 126, EL160, 10.1121/1.3213452 Ramanarayanan, 2010, Investigating articulatory setting-pauses, ready position, and rest-using real-time MRI Ramanarayanan, 2012, Exploiting speech production information for automatic speech and speaker modeling and recognition-possibilities and new opportunities, 1 Ramanarayanan, 2013, An investigation of articulatory setting using real-time magnetic resonance imaging, J. Acoust. Soc. Am., 134, 510, 10.1121/1.4807639 Ramanarayanan, 2013, Spatio-temporal articulatory movement primitives during speech production: extraction, interpretation, and validation, J. Acoust. Soc. Am., 134, 1378, 10.1121/1.4812765 Ramanarayanan, 2011, Automatic data-driven learning of articulatory primitives from real-time MRI data using convolutive NMF with sparseness constraints Ramanarayanan, 2014, Are articulatory settings mechanically advantageous for speech motor control?, PLoS One, 9, 1, 10.1371/journal.pone.0104168 Ramanarayanan, 2016, Directly data-derived articulatory gesture-like representations retain discriminatory information about phone categories, Comput. Speech Lang., 36, 330, 10.1016/j.csl.2015.03.004 Rose, 1996, The potential role of speech production models in automatic speech recognition, J. Acoust. Soc. Am., 99, 1699, 10.1121/1.414679 Sagar, 2014, Feasibility study to assess clinical applications of 3-T cine MRI coupled with synchronous audio recording during speech in evaluation of velopharyngeal insufficiency in children, Pediatric Radiol., 45, 217, 10.1007/s00247-014-3141-7 Sampaio, 2017, Vocal tract morphology using real-time magnetic resonance imaging, 359 Scott, 2012, Towards clinical assessment of velopharyngeal closure using MRI: evaluation of real-time MRI sequences at 1.5 and 3T, Br. J. Radiol., 85, 1083, 10.1259/bjr/32938996 Scott, 2013, Adaptive averaging applied to dynamic imaging of the soft palate, Magnet. Reson. Med., 70, 865, 10.1002/mrm.24503 Shosted, 2012, Using magnetic resonance to image the pharynx during Arabic speech: Static and dynamic aspects, 2182 Silva, 2015, Unsupervised segmentation of the vocal tract from real-time MRI sequences, Comput. Speech Lang., 33, 25, 10.1016/j.csl.2014.12.003 Silva, 2016, Quantitative systematic analysis of vocal tract data, Comput. Speech Lang., 36, 307, 10.1016/j.csl.2015.05.004 Silva, 2013, Segmentation and analysis of vocal tract from midsagittal real-time MRI Singh, 2008, A unified view of matrix factorization models, 358 Smith, 2014, Complex tongue shaping in lateral liquid production without constriction-based goals, 413 Sosnik, 2004, When practice leads to co-articulation: the evolution of geometrically defined movement primitives, Exp. Brain Res., 156, 422, 10.1007/s00221-003-1799-4 Stone, 1995, A head and transducer support system for making ultrasound images of tongue/jaw movement, J. Acoust. Soc. Am., 98, 3107, 10.1121/1.413799 Stone, 2001, Modeling tongue surface contours from cine-MRI images, J. Speech Lang. Hear. Res., 44, 1026, 10.1044/1092-4388(2001/081) Strang, 2006 Subtelny, 1972, Cineradiographic study of sibilants, Folia Phoniatr., 24, 30, 10.1159/000263541 Sutton, 2010, Faster dynamic imaging of speech with field inhomogeneity corrected spiral fast low angle shot (FLASH) at 3T, J. Magn. Reson. Imaging, 32, 1228, 10.1002/jmri.22369 Teixeira, 2012, Real-time MRI for portuguese, 306 Tiede, 2000, Contrasts in speech articulation observed in sitting and supine conditions, 25 Tilsen, 2016, Anticipatory posturing of the vocal tract reveals dissociation of speech movement plans from linguistic units, PLoS One, 11, e0146813, 10.1371/journal.pone.0146813 Toda, 2004, Mapping from articulatory movements to vocal tract spectrum with gaussian mixture model for articulatory speech synthesis Töger, 2016, Sensitivity of quantitative RT-MRI metrics of vocal tract dynamics to image reconstruction settings, 165 Vaz, 2016, Convex hull convolutive non-negative matrix factorization for uncovering temporal patterns in multivariate time-series data, 963 Vijay Kumar, 2012, Assessment of swallowing and its disorders: a dynamic MRI study, Eur. J. Radiol., 82, 215, 10.1016/j.ejrad.2012.09.010 Vorperian, 2005, Development of vocal tract length during early childhood – a magnetic resonance imaging study, J. Acoust. Soc. Am., 117, 338, 10.1121/1.1835958 Welch, 2002, A novel volumetric magnetic resonance imaging paradigm to study upper airway anatomy, Sleep, 25, 532, 10.1093/sleep/25.5.530 Welling, 2002, Speaker adaptive modeling by vocal tract normalization, IEEE Trans. Speech Audio Process., 10, 415, 10.1109/TSA.2002.803435 Westbury, 1990, X-ray microbeam speech production database, J. Acoust. Soc. Am., 88, S56, 10.1121/1.2029064 Whalen, 2005, The Haskins optically corrected ultrasound system (Hocus), J. Speech Lang. Hear. Res., 48, 543, 10.1044/1092-4388(2005/037) Wrench, 2000, A multi-channel/multi-speaker articulatory database for continuous speech recognition research Yehia, 1997, A parametric three-dimensional model of the vocal-tract based on MRI data, 3, 1619 Zhang, 2016, Extraction of tongue contour in real-time magnetic resonance imaging sequences, 937 Zhang, 2012, Real-time magnetic resonance imaging of normal swallowing, J. Magn. Reson. Imaging, 35, 1372, 10.1002/jmri.23591 Zu, 2013, Evaluation of swallow function after tongue cancer treatment using real-time magnetic resonance imaging, JAMA Otolaryngol. Head Neck Surg., 139, 1312, 10.1001/jamaoto.2013.5444