unic.ece.cornell.edu Open in urlscan Pro
128.253.97.40  Public Scan

Submitted URL: http://unic.ece.cornell.edu/
Effective URL: https://unic.ece.cornell.edu/
Submission: On November 17 via api from US — Scanned from DE

Form analysis 0 forms found in the DOM

Text Content

Last modified 10/07/2024 19:19:12 CST  


      ALIAN RESEARCH GROUP

            Computer Systems Laboratory
            School of Electrical and Computer Engineering
            Cornell University




Links: LinkedIn | github


ABOUT

The computation in future datacenters will be distributed over a heterogenous
array of processing elements, packaged modularly within a servers boundaries.
The inter- and intra-server data movement will bottleneck such a computing
landscape. The vision of ARG is to seamlessly integrate processor, memory, and
network architecture through a co-design with operating systems, network
software stack, and software libraries to minimize the data movement in future
datacenters. ARG is part of Computer Systems Lab at ECE Cornell.

We have several open Ph.D. and one Post Doc position. Please read this if you
are interested to join us!


CURRENT PROJECTS


SYSTEM DESIGN FOR NEAR-MEMORY ACCELERATION AT SCALE

The strict separation of responsibilities between compute and memory has given
rise to the memory wall. While there is extensive research on accelerating
application kernels near or inside memory, the adoption of such accelerators at
the system level remains an open research question. We are designing cross-stack
solutions to distribute computations to memory without intrusive hardware and
software changes, aiming to accelerate large-scale services.

 * Near-memory acceleration of datacenter taxes [MICRO'23][HPCA'24]
 * Application-transparent, near-memory distributed processing [MICRO'18]


FUSION OF ACCELERATORS, IO, AND GENERAL PURPOSE CORES

Future datacenters will integrate a diverse array of heterogeneous accelerators
alongside general-purpose cores. As applications execute on this platform, each
phase will run on either a general-purpose or specialized compute element. We
are developing solutions to seamlessly fuse these general-purpose and
specialized compute elements at runtime, creating a composable, accelerated
compute chain.

 * Data motion acceleration[HPCA'24]
 * Intelligent data direct IO [ISCA'21][MICRO'22]


DEVELOPMENT AND DEPLOYMENT CO-DESIGN OF MICRO-SERVICES

Due to the sheer scale of the cloud services we use every minute of our lives,
clear abstraction layers have emerged in the development and deployment of
large-scale services. While this decoupling enhances programmer productivity, it
can also lead to significant inefficiencies, especially with the ever-increasing
heterogeneity in datacenter compute fabrics. We are designing hardware and
software solutions to efficiently deploy large-scale services in the
heterogeneous cloud.

 * Profiling service weaver [Under Development]
 * Specialized hardware threads for networking [Under Development]


ACCELERATION OF COMPUND AI SYSTEMS

Today's AI systems are not just LLMs; they consist of other components that form
a Compund AI System. We are designing systems to accelerate end-to-end Compound
AI systems.

 * Near-memory Acceleration of Retrieval Augmented Generation (RAG) [ASPLOS'25]


SPECIALIZING MEMORY SUBSYSTEM

There is significant focus on specializing compute, but less emphasis on
specializing the memory hierarchy to support that compute. We are developing a
holistic understanding of various memory and interconnection technologies to
tailor the memory subsystem accordingly.

 * Survey of various DRAM technologies [Under Development]
 * Per-bank bandwidth regulation for shared Last-Level Cache [RTSS'24]
 * Near-memory datacenter network [MICRO'19]


ARCHITECTURAL SIMULATION AND TOOL DEVELOPMENT

Software-based simulation is the backbone of computer architecture research and
development. Architectural simulators such as gem5 are widely used by academia
and industry. However, traditionally, the focus of architectural simulators has
been primarily on simulating CPU and memory subsystems, often overlooking the
I/O subsystem and the complex interplay between software, OS, hardware, and
network. We are extending the gem5 simulator to model modern network
technologies and run the latest software stack. Additionally, we are using
generative AI to reduce the steep learning curve of gem5 and to increase the
productivity of design space exploration.

 * Accurate network simulation in gem5 [IISWC'18][ISPASS'20][ISPASS'24]
 * Accelerating gem5 [ISPASS'17][ISPASS'23]


PUBLICATIONS

 * Derrick Quinn, Mohammad Nouri, Neel Patel, Alireza Salimi, Sukhan Lee, Hamed
   Zamani, and Mohammad Alian, "Accelerating Retrieval-Augmented Generation,"
   ASPLOS 2025 [paper][slides]
 * Connor Sullivan, Alex Manley, Mohammad Alian, Heechul Yun, "Per-Bank
   Bandwidth Regulation of Shared Last-Level Cache for Real-Time Systems," RTSS
   2024 [paper][slides]
 * Johnson Umeike, Siddharth Agarwal, Nikita Lazarev, Mohammad Alian, "Userspace
   Networking in gem5," ISPASS 2024 [paper][ open source][slides]
 * Rohan Mahapatra, Soroush Ghodrati, Byung Hoon Ahn, Sean Kinzer, Shu-Ting
   Wang, Hanyang Xu, Lavanya Karthikeyan, Hardik Sharma, Amir Yazdanbakhsh,
   Mohammad Alian, and Hadi Esmaeilzadeh, "Domain-Specific Computational Storage
   for Serverless Computing," ASPLOS 2024 [paper][slides]
 * Neel Patel, Amin Mamandipoor, Mohammad Nouri, and Mohammad Alian, "SmartDIMM:
   In-Memory Acceleration of Upper Layer I/O Protocols," HPCA 2024
   [paper][slides] [artifacts available]
 * Shu-Ting Wang, Hanyang Xu, Amin Mamandipoor, Rohan Mahapatra, Byung Hoon Ahn,
   Soroush Ghodrati, Krishnan Kailas, Mohammad Alian, Hadi Esmaeilzadeh, "Data
   Motion Acceleration: Chaining Cross-Domain Multi Accelerators," HPCA 2024
   [paper][slides]
 * Neel Patel, Amin Mamandipoor, Derrick Quinn, and Mohammad Alian, "XFM:
   Accelerated Software-Defined Far Memory," MICRO 2023 [artifacts available,
   functional, and reproduced][paper][slides]
 * Johnson Umeike, Neel Patel, Alex Manley, Amin Mamandipoor, Heechul Yun,
   Mohammad Alian, "Profiling gem5 Simulator," ISPASS 2023 [paper][slides]
   More Mohammad Alian, Siddharth Agarwal, Jongmin Shin, Neel Patel, Yifan Yuan,
   Daehoon Kim, Ren Wang, Nam Sung Kim, "IDIO: Network-driven, inbound network
   data orchestration on server processors," MICRO 2022 [paper][slides] Ki-Dong
   Kang, Gyeongseo Park, Hyosang Kim, Mohammad Alian, Nam Sung Kim, and Daehoon
   Kim, "NMAP: Power Management Based on Network Packet Processing Mode
   Transition for Latency-Critical Workloads," MICRO 2021 [paper] Yifan Yuan,
   Mohammad Alian, Yipeng Wang, Ilia Kurakin, Ren Wang, Charlie Tai, Nam Sung
   Kim, "Don't Forget the I/O When Allocating Your LLC," ISCA 2021 [technology
   adapted by Intel®] [paper][slides] Mohammad Alian, Jongmin Shin, Ki-Dong
   Kang, Ren Wang, Alexandros Daglis, Daehoon Kim, Nam Sung Kim, "IDIO:
   Orchestrating Inbound Network Data on Server Processors," IEEE Computer
   Architecture Letters (CAL) 2020 [paper] Soroush Ghodrati, Byung Hoon Ahn,
   Joon Kyung Kim, Sean Kinzer, Brahmendra Yatham, Navateja Alla, Hardik Sharma,
   Mohammad Alian, Eiman Ebrahimi, Nam Sung Kim, Cliff Young, Hadi Esmaeilzadeh,
   "Planaria: Dynamic architecture fission for spatial multi-tenant acceleration
   of deep neural networks," MICRO 2020 [paper][slides] Jason Lowe-Power, Abdul
   Mutaal Ahmad, Ayaz Akram, Mohammad Alian, et al. "The gem5 simulator: Version
   20.0+," arXiv preprint 2020 [paper] Mohammad Alian, Yifan Yuan, Jie Zhang,
   Ren Wang, Myoungsoo Jung, and Nam Sung Kim, "Data direct I/O characterization
   for future I/O system exploration," ISPASS 2020 [paper][slides] Mohammad
   Alian, and Nam Sung Kim, "NetDIMM: Low-latency, near-memory network interface
   architecture," MICRO 2019 [paper][slides] Mohammad Alian, Seung Won Min, Hadi
   Asgharimoghaddam, Ashutosh Dhar, Dong Kai Wang, Thomas Roewer, Adam McPadden,
   Oliver OHalloran, Deming Chen, Jinjun Xiong, Daehoon Kim, Wen-mei Hwu, and
   Nam Sung Kim, "Application-transparent near-memory processing architecture
   with memory vhannel network," MICRO 2018 [best paper nominee][industry
   product] [paper][slides] Youjie Li, Jongsea Park, Mohammad Alian, Yifan Yuan,
   Qu Zheng, Petian Pan, Ren Wang, Alexander Gerhard Schwing, Hadi Esmaeilzadeh,
   and Nam Sung Kim, "A network-centric hardware/argorithm co-design to
   accelerate distributed training of deep neural networks," MICRO 2018
   [hardware prototype demonstration] [paper][slides] Mohammad Alian, Krishna
   Parasuram Srinivasan, and Nam Sung Kim, "Simulating PCI-Express interconnect
   for future system exploration," IISWC 2018 [best paper nominee]
   [paper][slides] Mohammad Alian, Gabor Dozsa, Umur Darbaz, Stephan
   Diestelhorst, Daehoon Kim, and Nam Sung Kim, "dist-gem5: Distributed
   simulation of computer clusters," ISPASS 2017 [best paper nominee][open
   source] [paper][slides] Mohammad Alian, Ahmed Abulila, Lokesh Jindal, Daehoon
   Kim, and Nam Sung Kim, "NCAP: Network-driven, packet context-aware power
   management for client-server architecture," HPCA 2017 [best paper
   nominee][IEEE Micro honerable mention] [paper][slides]


CURRENT MEMBERS

Mohammad Alian   Principal Investigator, Assistant Professor Neel Maulik Patel  
Ph.D. Student Derrick Quinn   Ph.D. Student Mohammad Nouri   Ph.D. Student Huy
Tran   Ph.D. Student Alex Manley   MS. Student


PAST MEMBERS

Johnson Umeike   MS. Graduate   First Employment: Ph.D. student at the
University of Maryland


TOOLS

 * dist-gem5
 * DPDK and DDIO on gem5


RESEARCH SPONSORS

 * National Science Foundation
 * Semiconductor Research Corporation
 * Samsung Electronics
 * NVIDIA (Equipment Donation)
 * Ampere Computing (Equipment Donation)