Nội dung được dịch bởi AI, chỉ mang tính chất tham khảo

Sơ Đồ Mã Hóa Tần Số Tham Số Cao Cho Các Đối Tượng Âm Thanh Không Gian Sử Dụng Mạng Nơ-ron Tích Hợp Rải Rác

Springer Science and Business Media LLC - Tập 54 - Trang 817-833 - 2021

Yulin Wu^1,2, Ruimin Hu^1,2, Xiaochen Wang^1,3, Chenhao Hu¹, Shanfa Ke¹

¹National Engineering Research Center for Multimedia Software, School of Computer Science, Wuhan University, Wuhan, China

²Hubei Key Laboratory of Multimedia and Network Communication Engineering, Wuhan University, Wuhan, China

³Research Institute of Wuhan University in Shenzhen, Shenzhen, China

Tóm tắt

Các hệ thống âm thanh dựa trên đối tượng đã trở nên phổ biến trong những năm gần đây vì chúng cung cấp sự linh hoạt cho nhiều kịch bản thính giác, chẳng hạn như trò chơi thực tế ảo, rạp hát tương tác và giao tiếp âm thanh không gian. Để tiết kiệm băng thông, nhiều đối tượng âm thanh được nén thành tín hiệu trộn đơn âm và các tham số thông tin bên. Tuy nhiên, độ phân giải tần số của các tham số thông tin bên quá thấp gây ra hiện tượng biến dạng trùng lặp. Để khắc phục vấn đề này, một sơ đồ mã hóa mới dựa trên độ phân giải tần số tham số cao (224 dải con trong một khung) được đề xuất trong bài báo này. Các tham số thông tin bên với độ phân giải tần số cao được nén và tái tạo thông qua mạng nơ-ron tích hợp rải rác (SSAE) và được sử dụng thêm để phục hồi các đối tượng âm thanh. Hiệu suất của phương pháp đề xuất được so sánh với các phương pháp SAOC (mã hóa đối tượng âm thanh không gian) hiện có ở cùng tỉ lệ bit tổng thể, được đánh giá cả bằng kết quả khách quan và chủ quan. Đánh giá cho thấy phương pháp của chúng tôi có thể hỗ trợ chất lượng cao của các đối tượng âm thanh không gian.

Từ khóa

#Âm thanh dựa trên đối tượng; mã hóa đối tượng âm thanh không gian; mạng nơ-ron tích hợp rải rác; tần số tham số cao.

Tài liệu tham khảo

Ando A (2011) Conversion of multichannel sound signal maintaining physical properties of sound in reproduced sound field. IEEE Transactions Audio Speech Lang Process 19(6):1467–1475 Antoine L, Fabian-Robert S, Zafar R, Daichi K, Bertrand R, Nobutaka I, Nobutaka O, Julie F (2017) The 2016 signal separation evaluation campaign. In: Latent Variable Analysis and Signal Separation - 12th International Conference, Springer International Publishing, pp 323–332 Arteaga D, Pons J (2021) Multichannel-based learning for audio object extraction. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 206–210 Bosi M, Goldberg RE (2012) Introduction to digital audio coding and standards, vol 721. Springer, New York Bosi M, Brandenburg K, Quackenbush S, Fielder L, Akagiri K, Fuchs H, Dietz M, Herre J, Davidson G, Oikawa Y (1997) ISO/IEC MPEG-2 advanced audio coding. Audio Eng Soc (AES) 45(10):789–814 Dolby Laboratories (2015) Dolby Atmos for the Home Theater. [Available]: http://www.dolby.com/us/en/technologies/dolby-atmos/dolby-atmos-for-the-home-theater.pdf Dolby Laboratories (2016) Dolby Atmos. [Available]: http://www.dolby.com/us/en/brands/dolby-atmos.html Elfitri I, Muharam M, Shobirin M (2014) Distortion analysis of hierarchical mixing technique on MPEG surround standard. In: International Conference on Advanced Computer Science and Information System, pp 396–400 Faller C, Baumgarte F (2003) Binaural cue coding-part II: schemes and applications. IEEE Transactions Speech Audio Process 11(6):520–531 Févotte C, Gribonval R, Vincent E (2005) BSS\_EVAL toolbox user guide–Revision 2.0 Gnouma M, Ladjailia A, Ejbali R, Zaied M (2019) Stacked sparse autoencoder and history of binary motion image for human activity recognition. Multimedia Tools Appl 78(2):2157–2179 Herre J, Disch S (2007) New concepts in parametric coding of spatial audio: from SAC to SAOC. In: IEEE International Conference on Multimedia and Expo (ICME), pp 1894–1897 Herre J, Purnhagen H, Koppens J, Hellmuth O, Engdegard J, Hilpert J, Villemoes L, Terentiv L, Falch C, Holzer A, Valero ML, Resch B, Mundt H, Oh HO (2012) MPEG spatial audio object coding-The ISO/MPEG standard for efficient coding of interactive audio scenes. Audio Eng Soc (AES) 60(9):655–673 Herre J, Hilpert J, Kuntz A, Plogsties J (2015a) MPEG-H 3D audio-the new standard for coding of immersive spatial audio. IEEE J Sel Topics Signal Process 9(5):770–779 Herre J, Hilpert J, Kuntz A, Plogsties J (2015b) MPEG-H audio-the new standard for universal spatial/3D audio coding. Audio Eng Soc (AES) 62(12):821–830 Hu C, Hu R, Wang X, Wu T, Li D (2020) Multi-step coding structure of spatial audio object coding. In: International Conference on Multimedia Modeling, pp 666–678 Hu C, Hu R, Wang X, Wu Y (2021a) Spatial audio object coding based on time-frequency shifting and scheduling. In: IEEE International Conference on Multimedia and Expo (ICME), pp 1–6 Hu C, Hu R, Wang X, Wu Y, Liu W (2021b) Efficient multi-step audio object coding with limited residual information. In: IEEE International Conference on Multimedia and Expo (ICME), pp 1–6 Hu C, Wang X, Hu R, Wu Y (2021) Audio object coding based on n-step residual compensating. Multimedia Tools Appl 80(12):18717–18733 ISO/IEC 23003-2 (2018) Information technology —- MPEG audio technologies —- Part 2: Spatial Audio Object Coding (SAOC) ISO/IEC 23008-3 (2019) Information technology —- High efficiency coding and media delivery in heterogeneous environments —- Part 3: 3D audio Jia M, Yang Z, Bao C, Zheng X, Ritz C (2015) Encoding multiple audio objects using intra-object sparsity. IEEE/ACM Transactions Audio Speech Lang Process 23(6):1082–1095 Jia M, Zhang J, Bao C, Zheng X (2017) A psychoacoustic-based multiple audio object coding approach via intra-object sparsity. Appl Sci 7(12):1301–1312 Kadam VJ, Jadhav SM, Kurdukar AA, Shirsath MR (2020) Arrhythmia classification using feature ensemble learning based on stacked sparse autoencoders with GA-SVM guided features. In: International Conference on Industry 4.0 Technology (I4Tech), pp 94–99 Kim K, Seo J, Beack S, Kang K, Hahn M (2011) Spatial audio object coding with two-step coding structure for interactive audio service. IEEE Transactions Multimedia 13(6):1208–1216 Li Y, Lei Y, Wang P, Jiang M, Liu Y (2021) Embedded stacked group sparse autoencoder ensemble with L1 regularization and manifold reduction. Appl Soft Comput 101:107003 Murtaza A, Herre J, Paulus J, Terentiv L, Fuchs H, Disch S (2015) ISO/MPEG-H 3D audio: SAOC 3D decoding and rendering. In: Audio Engineering Society (AES) Convention 139 Purnhagen H, Hirvonen T, Villemoes L, Samuelsson J, Klejsa J (2016) Immersive audio delivery using joint object coding. In: Audio Engineering Society (AES) Convention 140 Recommendation ITU-R BS1534-3 (2015) Method for the subjective assessment of intermediate quality level of audio systems. International Telecommunication Union Radiocommunication Assembly Rohlfing C, ECohen J, Liutkus A (2017) Very low bitrate spatial audio coding with dimensionality reduction. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 741–745 Shi C, Luo B, He S, Li K, Liu H, Li B (2020) Tool wear prediction via multidimensional stacked sparse autoencoders with feature fusion. IEEE Transactions Ind Inform 16(8):5150–5159 Villemoes L, Hirvonen T, Purnhagen H (2017) Decorrelation for audio object coding. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 706–710 Vincent E, Gribonval R, Févotte C (2006) Performance measurement in blind audio source separation. IEEE Transactions Audio Speech Lang Process 14(4):1462–1469 Wang Y, Yao H, Zhao S (2016) Auto-encoder based dimensionality reduction. Neurocomputing 184:232–242 Wu T, Hu R, Wang X, Ke S, Wang J (2017) High quality audio object coding framework based on non-negative matrix factorization. China Commun 14(9):32–41 Wu T, Hu R, Wang X, Ke S (2019) Audio object coding based on optimal parameter frequency resolution. Multimedia Tools Appl 78(15):20723–20738 Wu Y, Hu R, Hu C, Ke S, Li G, Wang X (2021a) Low bitrates audio object coding using convolutional auto-encoder and densenet mixture model. In: IEEE International Conference on Multimedia and Expo (ICME), pp 1–6 Wu Y, Hu R, Wang X, Hu C, Li G (2021b) Stacked sparse autoencoder for audio object coding. In: International Conference on Multimedia Modeling (MMM), pp 50–61 Yang F, Herranz L, Cheng Y, Mozerov MG (2021) Slimmable compressive autoencoders for practical neural image compression. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 4998–5007 Yang Z, Jia M, Bao C, Wang W (2015a) An analysis-by-synthesis encoding approach for multiple audio objects. In: Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), pp 59–62 Yang Z, Jia M, Wang W, Zhang J (2015b) Multi-stage encoding scheme for multiple audio objects using compressed sensing. Cybern Information Technol 15(6):135–146 Yu M, Quan T, Peng Q, Yu X, Liu L (2021) A model-based collaborate filtering algorithm based on stacked AutoEncoder. Neural Comput Appl. https://doi.org/10.1007/s00521-021-05933-8 Zhang Q, Zhou J, Zhang B (2020) A noninvasive method to detect diabetes mellitus and lung cancer using the stacked sparse autoencoder. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 1409–1413 Zhang S, Wu X, Qu T (2019) Sparse autoencoder based multiple audio objects coding method. In: Audio Engineering Society (AES) Convention 146 Zheng X, Ritz C, Xi J (2013) Encoding navigable speech sources: a psychoacoustic-based analysis-by-synthesis approach. IEEE Transactions Audio Speech Lang Process 21(1):29–38 Zheng X, Ritz C, Xi J (2013b) A psychoacoustic-based analysis-by-synthesis scheme for jointly encoding multiple audio objects into independent mixtures. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 281–285

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA