lintoto.me
Open in
urlscan Pro
2606:50c0:8003::153
Public Scan
URL:
https://lintoto.me/
Submission: On June 11 via api from US — Scanned from DE
Submission: On June 11 via api from US — Scanned from DE
Form analysis
0 forms found in the DOMText Content
Toggle navigation * about(current) * publications * repositories * FANGZHENG LIN MSc student @ Tokyo Institute of Technology, Waseda University Alumni Contacts: csl#lunlimited.net lin.f.ab#m.titech.ac.jp lin_toto#toki.waseda.jp I am a MSc student at the CARAS Lab, Department of Information and Communications Engineering, Tokyo Institute of Technology. My current research field is Computer Architecture and Security. I am an alumni of the Katto Lab, School of Fundamental Science and Engineering, Waseda University. My research interests during this period were Learned Image Compression, Parallel Processing, and High Performance Computing. Download my CV here. SELECTED PUBLICATIONS 2023 1. Recoil: Parallel rANS Decoding with Decoder-Adaptive Scalability Fangzheng Lin, Kasidis Arunruangsirilert, Heming Sun, and 1 more author In International Conference on Parallel Processing (ICPP), 2023 Abs arXiv Code Entropy coding is essential to data compression, image and video coding, etc. The Range variant of Asymmetric Numeral Systems (rANS) is a modern entropy coder, featuring superior speed and compression rate. As rANS is not designed for parallel execution, the conventional approach to parallel rANS partitions the input symbol sequence and encodes partitions with independent codecs, and more partitions bring extra overhead. This approach is found in state-of-the-art implementations such as DietGPU. It is unsuitable for content-delivery applications, as the parallelism is wasted if the decoder cannot decode all the partitions in parallel, but all the overhead is still transferred. To solve this, we propose Recoil, a parallel rANS decoding approach with decoder-adaptive scalability. We discover that a single rANS-encoded bitstream can be decoded from any arbitrary position if the intermediate states are known. After renormalization, these states also have a smaller upper bound, which can be stored efficiently. We then split the encoded bitstream using a heuristic to evenly distribute the workload, and store the intermediate states and corresponding symbol indices as metadata. The splits can then be combined simply by eliminating extra metadata entries. The main contribution of Recoil is reducing unnecessary data transfer by adaptively scaling parallelism overhead to match the decoder capability. The experiments show that Recoil decoding throughput is comparable to the conventional approach, scaling massively on CPUs and GPUs and greatly outperforming various other ANS-based codecs. 2. Multistage Spatial Context Models for Learned Image Compression Fangzheng Lin, Heming Sun, Jinming Liu, and 1 more author In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023 Abs arXiv Code 4 4 Total citations 4 Recent citations n/a Field Citation Ratio n/a Relative Citation Ratio Recent state-of-the-art Learned Image Compression methods feature spatial context models, achieving great rate-distortion improvements over hyperprior methods. However, the autoregressive context model requires serial decoding, limiting runtime performance. The Checkerboard context model allows parallel decoding at a cost of reduced RD performance. We present a series of multistage spatial context models allowing both fast decoding and better RD performance. We split the latent space into square patches and decode serially within each patch while different patches are decoded in parallel. The proposed method features a comparable decoding speed to Checkerboard while reaching the RD performance of Autoregressive and even also outperforming Autoregressive. Inside each patch, the decoding order must be carefully decided as a bad order negatively impacts performance; therefore, we also propose a decoding order optimization algorithm. 2022 1. Streaming-Capable High-Performance Architecture of Learned Image Compression Codecs Fangzheng Lin, Heming Sun, and Jiro Katto In IEEE International Conference on Image Processing (ICIP), 2022 Abs arXiv Code Learned image compression allows achieving state-of-the-art accuracy and compression ratios, but their relatively slow runtime performance limits their usage. While previous attempts on optimizing learned image codecs focused more on the neural model and entropy coding, we present an alternative method to improving the runtime performance of various learned image compression models. We introduce multi-threaded pipelining and an optimized memory model to enable GPU and CPU workloads asynchronous execution, fully taking advantage of computational resources. Our architecture alone already produces excellent performance without any change to the neural model itself. We also demonstrate that combining our architecture with previous tweaks to the neural models can further improve runtime performance. We show that our implementations excel in throughput and latency compared to the baseline and demonstrate the performance of our implementations by creating a real-time video streaming encoder-decoder sample application, with the encoder running on an embedded device. © Copyright 2024 Fangzheng Lin. Powered by Jekyll with al-folio theme. Hosted by GitHub Pages. Last updated: June 07, 2024.