lintoto.me Open in urlscan Pro
2606:50c0:8003::153  Public Scan

URL: https://lintoto.me/
Submission: On June 11 via api from US — Scanned from DE

Form analysis 0 forms found in the DOM

Text Content

Toggle navigation
 * about(current)
 * publications
 * repositories
 * 




FANGZHENG LIN

MSc student @ Tokyo Institute of Technology, Waseda University Alumni

Contacts: csl#lunlimited.net lin.f.ab#m.titech.ac.jp lin_toto#toki.waseda.jp

I am a MSc student at the CARAS Lab, Department of Information and
Communications Engineering, Tokyo Institute of Technology. My current research
field is Computer Architecture and Security.

I am an alumni of the Katto Lab, School of Fundamental Science and Engineering,
Waseda University. My research interests during this period were Learned Image
Compression, Parallel Processing, and High Performance Computing.

Download my CV here.


SELECTED PUBLICATIONS


2023

 1. Recoil: Parallel rANS Decoding with Decoder-Adaptive Scalability
    Fangzheng Lin, Kasidis Arunruangsirilert, Heming Sun, and 1 more author
    In International Conference on Parallel Processing (ICPP), 2023
    
    Abs arXiv Code
    
    
    Entropy coding is essential to data compression, image and video coding,
    etc. The Range variant of Asymmetric Numeral Systems (rANS) is a modern
    entropy coder, featuring superior speed and compression rate. As rANS is not
    designed for parallel execution, the conventional approach to parallel rANS
    partitions the input symbol sequence and encodes partitions with independent
    codecs, and more partitions bring extra overhead. This approach is found in
    state-of-the-art implementations such as DietGPU. It is unsuitable for
    content-delivery applications, as the parallelism is wasted if the decoder
    cannot decode all the partitions in parallel, but all the overhead is still
    transferred. To solve this, we propose Recoil, a parallel rANS decoding
    approach with decoder-adaptive scalability. We discover that a single
    rANS-encoded bitstream can be decoded from any arbitrary position if the
    intermediate states are known. After renormalization, these states also have
    a smaller upper bound, which can be stored efficiently. We then split the
    encoded bitstream using a heuristic to evenly distribute the workload, and
    store the intermediate states and corresponding symbol indices as metadata.
    The splits can then be combined simply by eliminating extra metadata
    entries. The main contribution of Recoil is reducing unnecessary data
    transfer by adaptively scaling parallelism overhead to match the decoder
    capability. The experiments show that Recoil decoding throughput is
    comparable to the conventional approach, scaling massively on CPUs and GPUs
    and greatly outperforming various other ANS-based codecs.

 2. Multistage Spatial Context Models for Learned Image Compression
    Fangzheng Lin, Heming Sun, Jinming Liu, and 1 more author
    In IEEE International Conference on Acoustics, Speech and Signal Processing
    (ICASSP), 2023
    
    Abs arXiv Code
    4
    4 Total citations
    4 Recent citations
    n/a Field Citation Ratio
    n/a Relative Citation Ratio
    
    Recent state-of-the-art Learned Image Compression methods feature spatial
    context models, achieving great rate-distortion improvements over hyperprior
    methods. However, the autoregressive context model requires serial decoding,
    limiting runtime performance. The Checkerboard context model allows parallel
    decoding at a cost of reduced RD performance. We present a series of
    multistage spatial context models allowing both fast decoding and better RD
    performance. We split the latent space into square patches and decode
    serially within each patch while different patches are decoded in parallel.
    The proposed method features a comparable decoding speed to Checkerboard
    while reaching the RD performance of Autoregressive and even also
    outperforming Autoregressive. Inside each patch, the decoding order must be
    carefully decided as a bad order negatively impacts performance; therefore,
    we also propose a decoding order optimization algorithm.


2022

 1. Streaming-Capable High-Performance Architecture of Learned Image Compression
    Codecs
    Fangzheng Lin, Heming Sun, and Jiro Katto
    In IEEE International Conference on Image Processing (ICIP), 2022
    
    Abs arXiv Code
    
    
    Learned image compression allows achieving state-of-the-art accuracy and
    compression ratios, but their relatively slow runtime performance limits
    their usage. While previous attempts on optimizing learned image codecs
    focused more on the neural model and entropy coding, we present an
    alternative method to improving the runtime performance of various learned
    image compression models. We introduce multi-threaded pipelining and an
    optimized memory model to enable GPU and CPU workloads asynchronous
    execution, fully taking advantage of computational resources. Our
    architecture alone already produces excellent performance without any change
    to the neural model itself. We also demonstrate that combining our
    architecture with previous tweaks to the neural models can further improve
    runtime performance. We show that our implementations excel in throughput
    and latency compared to the baseline and demonstrate the performance of our
    implementations by creating a real-time video streaming encoder-decoder
    sample application, with the encoder running on an embedded device.


© Copyright 2024 Fangzheng Lin. Powered by Jekyll with al-folio theme. Hosted by
GitHub Pages. Last updated: June 07, 2024.