Architecture for parallel marker-free variable length streams decoding

Journal of Real-Time Image Processing - Tập 16 - Trang 2127-2146 - 2017
Yousef Baroud1, José Manuel Mariños Velarde1, Zhe Wang1, Steffen Kieß1, Seyyed Mahdi Najmabadi1, Jajnabalkya Guhathakurta1, Sven Simon1
1Institut für Parallele und Verteilte Systeme, University of Stuttgart, Stuttgart, Germany

Tóm tắt

Due to throughput requirements above 1 gigapixel/sec for the real-time compression of modern image and video data streams, parallelism for encoding and decoding is inevitable. To achieve parallel decoding, a well-established technique is to insert markers into the variable length code (VLC) stream. By locating markers, it is then possible to extract the sub-streams that are, in turn, decoded in parallel. The use of markers adversely affects compression especially when a high degree of parallelism is required. In this paper, we propose an architecture of a marker-free parallel decoding approach of VLC streams. Instead of multiple local entropy decoders, the proposed architecture is based on using a single parallel entropy decoder in conjunction with a novel format to construct the VLC stream. The approach runs at high clock rates supporting parallelism to a high number of decoders. A synthesized clock frequency well above 110 MHz is achieved for up to 20 decoders on a medium-sized FPGA.

Tài liệu tham khảo

Recommendation ITU-R BT.2020-2: Parameter values for ultra-high definition television systems for production and international programme exchange (2015) ITU-T Recommendation H.264 : Advanced video coding for generic audiovisual services. http://www.itu.int/rec/T-REC-H.264-200711-I/en (2007) ITU-T Recommendation ITU-T H.265: High efficiency video coding. http://handle.itu.int/11.1002/1000/11885 (2013) Meenderinck, C., Azevedo, A., Juurlink, B., Alvarez Mesa, M., Ramirez, A.: Parallel scalability of video decoders. J. Signal Process. Syst. 57(2), 173–194 (2009). doi:10.1007/s11265-008-0256-9 Wu, N., Wen, M., Ren, H.S.J., Zhang, C.: A parallel H.264 encoder with CUDA: mapping and evaluation. In: 2012 IEEE 18th International Conference on Parallel and Distributed Systems (ICPADS), pp. 276–283 (2012). doi:10.1109/ICPADS.2012.46 Lu, Y., Zhang, Q., Wei, B.: Real-time CPU based H.265/HEVC encoding solution with x86 platform technology. In: 2015 International Conference on Computing, Networking and Communications (ICNC), pp. 418–421 (2015). doi:10.1109/ICCNC.2015.7069380 Saponara, S., Martina, M., Casula, M., Fanucci, L., Masera, G.: Motion estimation and CABAC VLSI co-processors for real-time high-quality H.264/AVC video coding. Microprocess. Microsyst. Embed. Hardw. Des. 34(7–8), 316–328 (2010). doi:10.1016/j.micpro.2010.06.003 Mei-Hua, X., Yu-Lan, C., Feng, R., Zhang-Jin, C.: Optimizing design and FPGA implementation for CABAC decoder. In: 2007 International Symposium on High Density packaging and Microsystem Integration, pp. 1–5 (2007). doi:10.1109/HDP.2007.4283645 Nunez, J.L., Chouliaras, V.A.: High-performance arithmetic coding VLSI macro for the H264 video compression standard. IEEE Trans. Consum. Electron. 51(1), 144–151 (2005). doi:10.1109/TCE.2005.1405712 Yang, Y.C., Guo, J.I.: High-throughput H.264/AVC high-profile CABAC decoder for HDTV applications. IEEE Trans. Circuits Syst. Video Technol. 19(9), 1395–1399 (2009). doi:10.1109/TCSVT.2009.2020340 Sze, V., Chandrakasan, A.P.: Joint algorithm-architecture optimization of CABAC. J. Signal Process. Syst. 69(3), 239–252 (2012). doi:10.1007/s11265-012-0678-2 Liao, T.T., Shen, C.A., Tseng, Y.H.: The algorithm and VLSI architecture of a high efficient motion estimation with adaptive search range for HEVC systems. J. Real-Time Image Process. (2017). doi:10.1007/s11554-017-0697-0 Lung, C.Y., Shen, C.A.: Design and implementation of a highly efficient fractional motion estimation for the HEVC encoder. J. Real-Time Image Process. (2016). doi:10.1007/s11554-016-0663-2 Varma, K.C.R.C., Kumar, M.V.P., Mahapatra, S.: Search range reduction for uni-prediction and bi-prediction in HEVC. J. Real-Time Image Process. (2016). doi:10.1007/s11554-016-0636-5 Sze, V., Budagavi, M.: Parallelization of cabac transform coefficient coding for hevc. In: 2012 Picture Coding Symposium, pp. 509–512 (2012). doi:10.1109/PCS.2012.6213266 Ono, F., Rucklidge, W., Arps, R., Constantinescu, C.: JBIG2—the ultimate bi-level image coding standard. In: ICIP, pp. 140–143 (2000). http://dblp.uni-trier.de/db/conf/icip/icip2000.html#OnoRAC00 Wallace, G.K.: The JPEG still picture compression standard. Commun. ACM 34(4), 30–44 (1991) Weinberger, M.J., Seroussi, G., Sapiro, G.: The LOCO-I lossless image compression algorithm: principles and standardization into JPEG-LS. IEEE Trans. Image Process. 9(8), 1309–1324 (2000) Singh, S., Bhasin, A., Saha, K.: Parallelization of variable length decoding. http://www.google.com/patents/US8520958 (2013). US Patent 8,520,958 Korodi, G., He, D., Yang, E., Martin-Cocher, G.: Methods and devices for load balancing in parallel entropy coding and decoding. http://www.google.com/patents/US8730071 (2014). US Patent 8,730,071 Ebrahimi, T., Horne, C.: MPEG-4 natural video coding—an overview. In: Signal Processing: Image Communication, vol. 14. Elsevier, Amsterdam, Netherlands, pp. 365–385 (2000) ITU: ISO/IEC 10918-1: 1993(E) CCIT Recommendation T.81. http://www.w3.org/Graphics/JPEG/itu-t81.pdf (1993) Moussalli, R., Najjar, W.A., Luo, X., Khan, A.: A high throughput no-stall Golomb-rice hardware decoder. In: 21st IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, FCCM 2013, Seattle, WA, USA, 28–30 April 2013, pp. 65–72. IEEE Computer Society (2013). doi:10.1109/FCCM.2013.9 Altera: White paper: video and image processing design using fpgas systems. Tech. Rep. WP-VIDEO0306-1.1, Altera Corporation (2007) Bailey, D.: Design for Embedded Image Processing on FPGAs. Wiley, New York (2011). https://books.google.de/books?id=ynSYGQdsgIAC Baroud, Y., Lê, N., Wang, Z., Kieß, S., Najmabadi, S.M., Simon, S.: A parallel codec architecture for marker-free variable length code streams. In: Proceedings of the 10th HiPEAC Workshop on Reconfigurable Computing (WRC) (2016) Baroud, Y., Velarde, J.M.M., Simon, S.: Architecture for parallelizing decoding of marker-free variable length code streams. In: 2016 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA), pp. 270–275 (2016). doi:10.1109/SPA.2016.7763626 Fimoff, M., Laud, T., Lee, R.: Method of processing variable size blocks of data by storing numbers representing size of data blocks in a fifo. http://www.google.tl/patents/USRE41569 (2010). US Patent RE41,569 Kwon, O.: Apparatus for parallel encoding/decoding of digital video signals. http://www.google.co.ug/patents/EP0720372A1?cl=en (1996). EP Patent App. EP19,940,120,951 Lei, S., Sun, M.T.: An entropy coding system for digital hdtv applications. IEEE Trans. Circuits Syst. Video Technol. 1(1), 147–155 (1991). http://dblp.uni-trier.de/db/journals/tcsv/tcsv1.html#LeiS91 Boliek, M., Allen, J.D., Schwarz, E.L., Gormish, M.J.: Very high speed entropy coding. In: ICIP, vol. 3 (1994) Lin, H.D., Messerschmitt, D.: Designing a high-throughput VLC decoder. I. Parallel decoding methods. IEEE Trans. Circuits Syst. Video Technol. 2(2), 197–206 (1992). doi:10.1109/76.143419 Sevcenco, A.M., Lu, W.S.: Adaptive down-scaling techniques for JPEG-based low bit-rate image coding. In: 2006 IEEE International Symposium on Signal Processing and Information Technology, pp. 349–354 (2006). doi:10.1109/ISSPIT.2006.270824 Lin, W., Dong, L.: Adaptive downsampling to improve image compression at low bit rates. IEEE Trans. Image Process. 15(9), 2513–2521 (2006). doi:10.1109/TIP.2006.877415 Ahangar, A.I., Agarwal, R., Lakhotia, K.: Real time low complexity VLSI decoder for prefix coded images. In: 2016 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1694–1697 (2016). doi:10.1109/ISCAS.2016.7538893 Lee, E.S., Lee, K.C., Son, K.J., Moon, S.P., Chang, T.G.: Multi-symbol accessing Huffman decoding method for MPEG-2 AAC. J. Electr. Eng. Technol. 4(4) (2014). doi:10.5370/JEET.2014.9.4.1411 Nikara, J., Vassiliadis, S., Takala, J., Sima, M., Liuha, P.: Parallel multiple-symbol variable-length decoding. In: Werner, B. (ed.) IEEE International Conference on Computer Design, pp. 126–131. IEEE Computer Society Press, 10662 Los Vaqueros Circle, P.O. Box 3014, Los Alamitos, CA 90720-1314, Freiburg, Germany (2002). ISBN: 0-7695-1700-5 Howard, P.G., Vitter, J.S.: Fast and efficient lossless image compression. In: Proceedings of the 1993 Data Compression Conference, (Snowbird), pp. 351–360 (1993)