Adjustable deterministic pseudonymization of speech

Computer Speech & Language - Tập 72 - Trang 101284 - 2022
S. Pavankumar Dubagunta1,2, Rob J.J.H. van Son3,4, Mathew Magimai.-Doss1
1Idiap Research Institute, Martigny, Switzerland
2École Polytechnique Fédérale de Lausanne (EPFL), Switzerland
3Netherlands Cancer Institute, Amsterdam, The Netherlands
4ACLC, University of Amsterdam, Amsterdam, The Netherlands

Tài liệu tham khảo

Almaadeed, 2016, Text-independent speaker identification using vowel formants, J. Signal Process. Syst., 82, 345, 10.1007/s11265-015-1005-5 Ardila, 2019 Boersma, P., Weenink, D., Praat: Doing Phonetics by Computer (Computer program). Version 6.1.06. Christensen, 2018 De Jong, 2009, Praat script to detect syllable nuclei and measure speech rate automatically, Behav. Res. Methods, 41, 385, 10.3758/BRM.41.2.385 Dromey, 2013, Assessing correlations between lingual movements and formants, Speech Commun., 55, 315, 10.1016/j.specom.2012.09.001 Eyben, 2016, The geneva minimalistic acoustic parameter set (gemaps) for voice research and affective computing, IEEE Trans. Affect. Comput., 7, 190, 10.1109/TAFFC.2015.2457417 Fang, 2019, Speaker anonymization using X-vector and neural waveform models, 155 Finck, 2020, They who must not be identified—distinguishing personal from non-personal data under the GDPR, Int. Data Privacy Law, 10, 11, 10.1093/idpl/ipz026 Fradette, 2003, Conventional and robust paired and independent-samples t tests: Type i error and power rates, J. Modern Appl. Statist. Methods, 2, 481, 10.22237/jmasm/1067646120 Harper, 2017, Quantifying labial, palatal, and pharyngeal contributions to third formant lowering in American english/r/, J. Acoust. Soc. Am., 142, 10.1121/1.5014445 Kent, 1989, Relationships between speech intelligibility and the slope of second-formant transitions in dysarthric subjects, Clinical Linguist. & Phonet., 3, 347, 10.3109/02699208908985295 Korshunov, 2017, Presentation attack detection in voice biometrics Kucur Ergunay, 2015, On the vulnerability of speaker verification to realistic voice spoofing, 1 Kung, 2018, A compressive privacy approach to generalized information bottleneck and privacy funnel problems, J. Franklin Inst. B, 355, 1846, 10.1016/j.jfranklin.2017.07.002 Lammert, 2015, On short-time estimation of vocal tract length from formant frequencies, PLOS ONE, 10, 10.1371/journal.pone.0132193 Lee, 1988, On robust linear prediction of speech, IEEE Trans. Acoust. Speech Signal Process., 36, 642, 10.1109/29.1574 Lee, 2015, Relationships between formant frequencies of sustained vowels and tongue contours measured by ultrasonography, Am. J. Speech-Lang. Pathol., 24, 739, 10.1044/2015_AJSLP-14-0063 Mawalim, 2020, X-VEctor singular value modification and statistical-based decomposition with ensemble regression modeling for speaker anonymization system, 1703 McKell, 2016 Moulines, 1990, Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones, Speech Commun., 9, 453, 10.1016/0167-6393(90)90021-Z Ning, 2019, A review of deep learning based speech synthesis, Appl. Sci., 9, 4050, 10.3390/app9194050 O’Shaughnessy, 2000, Speaker recognition, 437 Panayotov, 2015, Librispeech: an ASR corpus based on public domain audio books, 5206 Patino, 2020 Patino, 2020 Povey, 2016, Purely sequence-trained neural networks for ASR based on lattice-free mmi., 2751 R: A Language and Environment for Statistical Computing. http://www.R-project.org/. Ribeiro, 2018 Richardson, 2017, Discrimination and identification of a third formant frequency cue to place of articulation by young children and adults, Lang. Speech, 60, 27, 10.1177/0023830915625680 Rubinstein, 2016, Anonymization and risk, Wash. Law Rev., 91, 59 Rudzicz, 2012, The TORGO database of acoustic and articulatory speech from speakers with dysarthria, Lang. Res. Eval., 46, 523, 10.1007/s10579-011-9145-0 Sapir, 2010, Formant centralization ratio: A proposal for a new acoustic measure of dysarthric speech, J. Speech, Lang., Hear. Res., 114, 10.1044/1092-4388(2009/08-0184) Sapir, 2007, Effects of intensive voice treatment (the lee silverman voice treatment [lsvt]) on vowel articulation in dysarthric individuals with idiopathic parkinson disease: Acoustic and perceptual findings, J. Speech, Lang., Hear. Res., 899, 10.1044/1092-4388(2007/064) Snyder, 2018, X-vectors: Robust DNN embeddings for speaker recognition, 5329 Soldo, 2012, Synthetic references for template-based ASR using posterior features Soldo, 2011, Posterior features for template-based ASR van Son, R.J.J.H., Pseudonymize Speech, [Online; accessed 10th May 2020]. https://robvanson.github.io/PseudonymizeSpeech/. van Son, 2020 van Son, 2020 van Son, 2020 van Son, 2018, Vowel space as a tool to evaluate articulation problems., 357 Srivastava, 2020, Evaluating voice conversion-based privacy protection against informed attackers Stalla-Bourdillon, 2017, Anonymous data v. Personal data – a false debate: An EU perspective on anonymization, pseudonymization and personal data, Wis. Int. Law J., 34, 39 Tomashenko, 2020, The voiceprivacy 2020 challenge Tomashenko, N., et al., The VoicePrivacy 2020 Challenge Evaluation Plan, Online; accessed 1st April 2020], https://www.voiceprivacychallenge.org/docs/VoicePrivacy_2020_Eval_Plan_v1_2.pdf. Tomashenko, 2020, Introducing the voiceprivacy initiative, 1693 Tomashenko, 2021 Tomashenko, 2021 Ullmann, 2015, Objective speech intelligibility assessment through comparison of phoneme class conditional probability sequences, 4924 Van Son, 2001, The IFA corpus: a phonemically segmented dutch ”open source” speech database, 2051 Wang, X., et al., The VoicePrivacy 2020 Challenge Subjective evaluation-1. https://www.voiceprivacychallenge.org/docs/6_Subjective_evaluation_1_naturalness_intelligibility_speaker_verifiability_X_Wang.pdf. Accessed on 25.05.2021. Yamagishi, 2019 Zhang, 2017, Advanced data exploitation in speech analysis: An overview, IEEE Signal Process. Mag., 34, 107, 10.1109/MSP.2017.2699358