digantamisra98.github.io Open in urlscan Pro
2606:50c0:8003::153  Public Scan

Submitted URL: http://digantamisra98.github.io/
Effective URL: https://digantamisra98.github.io/
Submission: On February 08 via api from US — Scanned from DE

Form analysis 0 forms found in the DOM

Text Content


















@mdaew.dcgnuudim.raanet [unscramble]

Home News Experience Research Education Achievements

I am a Research MSc student at Mila affiliated with University of Montréal
supervised by Irina Rish.

I also am the Founder and President at Landskape AI, a theoretical and
analytical deep learning non-profit research organization.

I broadly work on robust and interpretable learning under constraints with focus
on but not limited to the following domains:

 * Diffusion Models and Prompting
 * Compression (MoE, Sparsity)
 * Vision Language Models
 * Continual Learning



I am always open to collaboration. Feel free to setup a call with me if you
would like to discuss my current research or new interesting ideas!

Looking for PhD positions for Fall 2024 cohort.

CV  /  Google Scholar  /  GitHub  /  Blog  /  Certificates




NEWS



 * October 2023: Our work on Mitigating Mode Collapse in Sparse Mixture of
   Experts is accepted to New in ML workshop, NeurIPS, 2023.
 * October 2023: Our work on Shapley Interactions for Complex Feature
   Attribution is accepted to NeurIPS ATTRIB workshop, 2023.
 * August 2023: Our new preprint on Reprogramming under constraints is now out
   on ArXiv.
 * August 2023: Gave an invited talk at TU Eindoven on Learning under
   constraints.
 * June 2023: I will be joining HSL, CMU in Fall 2023 as a Visiting Researcher.
 * May 2023: Our work on Challenging Common Assumptions about Catastrophic
   Forgetting got accepted to CoLLAs, 2023.
 * May 2023: Gave an invited talk at VITA, UT-Austin on Multi-Domain Expert
   Layers.
 * April 2023: Our work on Beyond the Imitation Game: Quantifying and
   extrapolating the capabilities of language models got accepted to TMLR.
 * March 2023: Our work on Pruning CodeBERT for Improved Code-to-Text Efficiency
   is accepted to the Sparsity in Neural Network (SNN) workshop @ ICLR, 2023.
 * November 2022: Gave a talk titled Modality agnostic adaptation in deep
   learning at the IBM Generalisation talk series.
 * November 2022: Our work on APP: Anytime Progressive Pruning is accepted to
   the SlowDNN workshop, 2023.
 * November 2022: Our work on APP: Anytime Progressive Pruning is accepted to
   the Continual Lifelong Learning (CLL) workshop at ACML, 2022.
 * July 2022: Our work on APP: Anytime Progressive Pruning is accepted to the
   Sparsity in Neural Network (SNN) workshop, 2022.
 * June 2022: Our work on Scaling the Number of Tasks in Continual Learning got
   accepted to the CoLLAs 2022 workshop.
 * June 2022: Our work on APP: Anytime Progressive Pruning is accepted to the
   Dynamic Neural Network (DyNN) workshop at ICML, 2022.
 * May 2022: Awarded the MILA Entrepreneurs Grant worth CAD$5,000.
 * May 2022: Awarded the AI Week 2022 Student Travel Bursary worth CAD$1,500.
 * April 2022: Awarded the UNIQUE AI Excellence Scholarship worth C$10,000.
 * April 2022: The preprint of our paper APP: Anytime Progressive Pruning is out
   now.
 * April 2022: I am starting as a researcher at Morgan Stanley.
 * March 2022: Awarded the DIRO x Quebec Ministry of Higher Education
   international students scholarship worth C$4,000.
 * February 2022: I will be serving as a Program Committee member for Conference
   on Lifelong Learning Agents(CoLLA) 2022.
 * January 2022: I am selected to be a part of the MILA Winter 2022
   Entrepreneurs Cohort.
 * December 2021: I will be serving as a teaching assistant for the INF8225:
   Probabilistic Learning at Polytechnique University taught by Christopher J.
   Pal for the Winter 2022 semester.
 * September 2021: I will be serving as a reviewer at WACV 2022.
 * August 2021: Our fine grained tense modification task was accepted to
   Google's Big Bench.
 * July 2021: I am also joining the VITA, UT-Austin as a Visiting Research
   Scholar to work on sparsity under the guidance of Assistant Professor
   Zhangyang Wang.
 * May 2021: We are organizing the Spring Edition of the Weights & Biases ML
   Reproducibility Challenge. Visit our page to learn more.
 * May 2021: I will be joining MILA as a graduate student this fall '21.
 * January 2021: Our WACV paper's video is now out on YouTube. Watch it here.
 * January 2021: I will be speaking at the W&B Deep Learning Salon on "From
   Smooth Activations to Robustness to Catastrophic Forgetting".   I will be
   joined by Maithra Raghu from Google Brain. Watch it here.
 * December 2020: I'm starting full time as a Machine Learning Engineer at
   Weights & Biases.
 * October 2020: Our paper Rotate to Attend: Convolutional Triplet Attention
   Module is accepted to WACV 2021.
 * September 2020: Gave a talk on my paper on Mish at the Robert Bosch Bangalore
   Research Office.
 * August 2020: I completed my Undegraduate degree in Electronics and Electrical
   Engineering from Kalinga Institute of Industrial Technology (KIIT).
 * August 2020: Gave a talk on Mish and Non-Linear Dynamics at Computer Vision
   Talks. Watch here.
 * July 2020: My paper Mish: A Self Regularized Non-Monotonic Neural Activation
   Function is accepted at BMVC 2020.
 * July 2020: CROWN: A comparison of morphology for Mish, Swish and ReLU
   produced in collaboration with Javier Ideami. Watch here.
 * May 2020: Participated in an AMA for my paper on Mish at the Weights & Biases
   reading group.
 * April 2020: Presented my views and discussed about Data Science on the The
   World is Ending Podcast. Listen to the episode here.
 * February 2020: Talk on Mish and Non-Linear Dynamics at Sicara is out now.
   Watch here.
 * February 2020: Podcast episode on Mish at Machine Learning Café is out now.
   Listen here.
 * November 2019: Presented a talk on my paper on Mish at the University of
   Athens.








   Research Experience

Research Associate IOct 2023 - Present
Carnegie Mellon University (CMU), Human Sensing Lab (HSL)
Supervisor: Prof. Fernando De la Torre
Research Area: Transfer of personalization on continual update of diffusion
models.


Machine Learning ResearcherApril. 2022 - Feb. 2023
Morgan Stanley
Supervisor: Kashif Rasul
Research Area: Continual Learning, Time Series, Model Reprogramming


 

Remote Visiting Research ScholarAug. 2021 - Present
VITA, University of Texas at Austin
Supervisor: Dr. Zhangyang Wang
Research Area: Sparsity, Robustness and Knowledge Distillation.


Research AffiliateFeb. 2020 - Present
Laboratory of Space Research (LSR), University of Hong Kong
Supervisor: Dr. Quentin A. Parker
Research Area: Computer Vision applications in PNe Exploration.


Research InternJun. 2018 - Aug. 2018
NVIDIA AI Lab, Bennett University
Supervisors: Dr. Deepak Garg and Dr. Suneet Gupta
Research Area: Large Scale Visual Recognition.


   Industrial and Leadership Experience

Founder, President and ResearcherSept. 2019 - Present
Landskape AI
Mentors: Assc. Prof. Jaegul Choo, Javier Ideami and Federico Lois
Research Area: Analytical Deep Learning Theory.


Technical Content and Course DeveloperNov. 2023 - Present
Towards AI
Area: RAG, LLM, LangChain, LLamaIndex


Machine Learning EngineerDec. 2020 - Oct. 2021
Weights & Biases
Team: Frameworks and Integrations.


Technical Content DeveloperJun. 2020 - Jan. 2021
Paperspace
Blog
Topic Area: Computer Vision (Attention Mechanisms).


Publication All Conference Journal Workshop Preprint Under Submission
*indicates equal contribution


Mish: A Self Regularized Non-Monotonic Neural Activation Function
Diganta Misra




BMVC, 2020
project / paper / abstract / bibtex

We propose Mish, a novel self-regularized non-monotonic activation function
which can be mathematically defined as:
f(x)=xtanh(softplus(x))f(x)=xtanh(softplus(x)). As activation functions play a
crucial role in the performance and training dynamics in neural networks, we
validated experimentally on several well-known benchmarks against the best
combinations of architectures and activation functions. We also observe that
data augmentation techniques have a favorable effect on benchmarks like
ImageNet-1k and MS-COCO across multiple architectures. For example, Mish
outperformed Leaky ReLU on YOLOv4 with a CSP-DarkNet-53 backbone on average
precision (APval50AP50val) by 2.1%2.1% in MS-COCO object detection and ReLU on
ResNet-50 on ImageNet-1k in Top-1 accuracy by ≈1%≈1% while keeping all other
network parameters and hyperparameters constant. Furthermore, we explore the
mathematical formulation of Mish in relation with the Swish family of functions
and propose an intuitive understanding on how the first derivative behavior may
be acting as a regularizer helping the optimization of deep neural networks.

@article{misra2019mish,
title={Mish: A self regularized non-monotonic neural activation function},
author={Misra, Diganta},
journal={arXiv preprint arXiv:1908.08681},
volume={4},
pages={2},
year={2019},
publisher={CoRR}
} CV Talk Episode / ML Cafe Episode / Sicara Talk / W&B Salon Episode / CROWN

   
 
For those who are curious, the name Mish was coined by my girlfriend. 👩‍💻
Rotate to Attend: Convolutional Triplet Attention Module
Diganta Misra*, Trikay Nalamada*, Ajay Uppili Arasanipalai*, Qibin Hou




WACV, 2021
project / paper / supplementary / video / abstract / bibtex

Benefiting from the capability of building interdependencies among channels or
spatial locations, attention mechanisms have been extensively studied and
broadly used in a variety of computer vision tasks recently. In this paper, we
investigate light-weight but effective attention mechanisms and present triplet
attention, a novel method for computing attention weights by capturing
cross-dimension interaction using a three-branch structure. For an input tensor,
triplet attention builds inter-dimensional dependencies by the rotation
operation followed by residual transformations and encodes inter-channel and
spatial information with negligible computational overhead. Our method is simple
as well as efficient and can be easily plugged into classic backbone networks as
an add-on module. We demonstrate the effectiveness of our method on various
challenging tasks including image classification on ImageNet-1k and object
detection on MSCOCO and PASCAL VOC datasets. Furthermore, we provide extensive
insight into the performance of triplet attention by visually inspecting the
GradCAM and GradCAM++ results. The empirical evaluation of our method supports
our intuition on the importance of capturing dependencies across dimensions when
computing attention weights.

@inproceedings{misra2021rotate,
title={Rotate to attend: Convolutional triplet attention module},
author={Misra, Diganta and Nalamada, Trikay and Arasanipalai, Ajay Uppili and
Hou, Qibin},
booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of
Computer Vision},
pages={3139--3148},
year={2021}
}    
APP: Anytime Progressive Pruning
Diganta Misra*, Bharat Runwal*, Tianlong Chen, Zhangyang Wang, Irina Rish




DyNN workshop at ICML,2022
SNN, 2022
CLL workshop at ACML, 2022
SlowDNN workshop, 2023
project / paper / webpage / poster / abstract / bibtex

With the latest advances in deep learning, there has been a lot of focus on the
online learning paradigm due to its relevance in practical settings. Although
many methods have been investigated for optimal learning settings in scenarios
where the data stream is continuous over time, sparse networks training in such
settings have often been overlooked. In this paper, we explore the problem of
training a neural network with a target sparsity in a particular case of online
learning: the anytime learning at macroscale paradigm (ALMA). We propose a novel
way of progressive pruning, referred to as \textit{Anytime Progressive Pruning}
(APP); the proposed approach significantly outperforms the baseline dense and
Anytime OSP models across multiple architectures and datasets under short,
moderate, and long-sequence training. Our method, for example, shows an
improvement in accuracy of ≈7%≈7% and a reduction in the generalization gap by
≈22%≈22%, while being ≈1/3≈1/3 rd the size of the dense baseline model in
few-shot restricted imagenet training. We further observe interesting
nonmonotonic transitions in the generalization gap in the high number of
megabatches-based ALMA. The code and experiment dashboards can be accessed at
\url{https://github.com/landskape-ai/Progressive-Pruning} and
\url{https://wandb.ai/landskape/APP}, respectively.

@misc{misra2022app,
title={APP: Anytime Progressive Pruning},
author={Diganta Misra and Bharat Runwal and Tianlong Chen and Zhangyang Wang and
Irina Rish},
year={2022},
eprint={2204.01640},
archivePrefix={arXiv},
primaryClass={cs.LG}
} NSL presentation / MLC Research Jam #8 /
MLC Research Jam #9 / Continual AI Seminar

     
D2D2-Sparse: Navigating the low data learning regime with sparse networks   New!
Diganta Misra*, Niklas Nolte*, Lu Yin*




Under Review, 2023

abstract

Learning under constraints has been a fundamental avenue of research in deep
learning since the advent of modern deep neural networks. In parallel to the
upwards trajectory of scaling neural networks, one practical constraint that has
embodied efficient deep learning has been that of sparsity. Unstructured weight
sparsity has been the cornerstone of pioneering works in the space of pruning
and lottery ticket hypothesis. In this paper, we propose \textbf{D2D2-Sparse}, a
novel dual dynamic sparse learning system for low-data learning regime. Our
paper combines two popular constraints in deep learning namely sparsity and
low-data learning, often studied in disjoint paradigms, thus opening new
directions of research in sparsity. D2D2-Sparse outperforms standard iterative
pruning schema when coupled with standard deep networks in computer vision tasks
like image classification and in natural language processing like code
generation with no extra-overhead cost on inference. Compared to iterative
pruning, on 1818-th total data budget, D2D2-Sparse achieves a ≈≈ 4% top-1
accuracy boost for ResNet-18 on the CIFAR-100 classification task. Further, we
demonstrate the effectiveness of the proposed method in anytime learning
scenarios and provide extensive analysis into evolution of sparse masks in
D2D2-Sparse over the training process. Code, dashboard, and model weights will
be open-sourced for public access upon acceptance.

GitChameleon: Breaking the version barrier for code generation models   New!
Justine Gehring*, Nizar Islah*, Diganta Misra*, Massimo Caccia, Irina Rish




Under Review, 2024

abstract

The ever-changing landscape of programming languages poses a significant
challenge in the development and training of models designed for code
generation. Code, being a dynamic and constantly evolving environment,
necessitates a continuous process of adaptation to stay in sync with the rapidly
shifting paradigms, frameworks, and methodologies within the software
development domain. The inherent variability in coding styles, the emergence of
new programming languages, and the continuous evolution of libraries and
packages underscore the imperative for an active approach in updating code
generation models. In response to this challenge, we introduce GitChameleon, an
innovative dataset comprising more than 12,000 version-sensitive examples in
Python, designed to facilitate research into the adaptation of code generation
models to the rapidly changing landscape of programming languages. Furthermore,
we assess the performance of state-of-the-art code models and demonstrate their
inadequacy in generating version-specific code. For example, the latest
CodeLlama-70B only achieves a 46.76% exact string match score when evaluated on
GitChameleon.

SPIRIT: Zero Shot Information Retrieval Domain Transfer with Soft Prompts   New!
Ethan Kim, Diganta Misra




Under Review, 2023

abstract

Dense information retrieval yields strong in-domain performance, but often
struggles with out-of-domain generalization, lagging be- hind unsupervised
methods. Retrieval tasks can vary across a num- ber of dimensions including
domain, query intent, and language. Using a single dense retrieval model for all
tasks often underper- forms lexical methods such as BM25. For practical
information retrieval systems, it is expensive to deploy a different model for
each task. Therefore, our motivation is to develop a cheap and effective
information retrieval model that maintains strong per- formance across different
domains while easily adapting to any new domain. Other approaches to domain
transfer in information retrieval rely on large auxiliary language models or
datasets and create a separate model for each task. In this work, we develop a
method utilizing prompt tuning to efficiently adapt dense retrievers with a
minimal amount of additional computation. By combining models trained on a
variety of different domains, we can effectively boost performance on a target
task in a new domain. Specifically, we train dense retrieval models using prompt
tuning on a large number of information retrieval tasks across diverse domains
and types of query intents. To adapt to a new domain, we create new prompt
embeddings by averaging the prompt embeddings from a set of source tasks
selected in an unsupervised manner. We evaluate zero-shot transfer performance
across a wide variety of information retrieval domains and show competitive
performance while lever- aging a minimal amount of compute. Notably, our SPIRIT
method achieves while being extremely lightweight and practical to deploy in
production.

Shapley Interactions for Complex Feature Attribution   New!
Divyansh Singhvi, Andrej Erkelens, Raghav Jain, Diganta Misra, Naomi Saphra




NeurIPS ATTRIB workshop, 2023

abstract

Feature interaction is an established approach to understanding complex patterns
of attribution in many models. In this paper, we use Shapley Taylor interaction
indices (STII) to analyze how linguistic structure influences language model
output in masked and auto-regressive language models (MLMs and ALMs). We find
that ALMs, and to a lesser degree MLMs, tend to combine pairs of tokens with
more nonlinear interactions if they co-occur in the same idiomatic multiword
expression. We also find that while ALMs tend to become more linear in their
interactions at greater positional distances, in MLMs this linearity is scaled
by syntactic distance, implying that the learned structure in MLMs relies more
on syntax than the recency-based structure favored natively by ALMs.

Mitigating Mode Collapse in Sparse Mixture of Experts   New!
Nizar Islah, Diganta Misra, Timothy Nest, Matthew Riemer




NeurIPS New in ML workshop, 2023

abstract

The recent success of Sparse Mixture-of-Experts (SMoEs) models has sparked
renewed interest in routed networks in deep learning. A prominent aspect of the
SMoE is the scaling of the number of total parameters in a model, effectively
increasing capacity while keeping computation costs similar to dense models.
Yet, these models pose optimization challenges as inputs are routed discretely
to experts in each layer. Often, a regularization term is added to the loss
function to penalize the imbalanced selection of experts. We aim to demonstrate
that the heuristic regularization strategies used in recent SMoEs, while
successful in some tasks, have significant limitations which we aim to address.
In multi-domain or multi-task settings, without explicit knowledge of the task
or domain, the network will suffer from a mode collapse-performance tradeoff, in
which some experts will receive significantly less training signal, or
performance on some tasks will suffer. Second, we derive a theoretical basis of
the various routing functions, with entropy-maximization as a common objective.
Third, we will demonstrate a first application of Generative Flow Networks
(GFlowNets) to SMoEs, with a state, policy, and action space, represented at a
particular layer of the model by the input, routing network, and sampling from
expert probabilities, respectively. We aim to show that SMoEs trained with the
Trajectory Balance objective from GFlowNet literature can achieve competitive
performance with state of the art routing methods, such as Switch Transformer,
and suffer less from expert collapse in multi-task (NYUv2, Pascal-Context) and
multi-domain (Omniglot) settings. This work lays some foundations for further
exploration of theoretically motivated approaches to routing in sparse MoEs.

Pruning CodeBERT for Improved Code-to-Text Efficiency   New!
Alex Gu, Ria Sonecha, Saaketh Vedantam, Bharat Runwal, Diganta Misra




SNN workshop at ICLR, 2023

abstract

The size and prevalence of large language models (LLMs) make them an apt target
for model compression. Most LLMs consist of a Transformer encoder and decoder,
which each have 6 to 12 layers of multiheaded self-attention blocks, along with
fully connected layers. This results in a large number of parameters, making
them quite expensive to train and query. Our work focuses on finding techniques
to prune CodeBERT, a specific LLM trained to work multimodally between text and
code. We explore the effects of structured and unstructured magnitude pruning on
the encoder layers of CodeBERT, evaluating on the task of generating natural
language comments from a piece of Ruby code.

Poster
Uncovering the Hidden Cost of Model Compression   New!
Diganta Misra, Agam Goyal, Bharat Runwal, Pin Yu Chen




Under Review, 2023

paper / abstract / bibtex

In the era of resource-intensive foundation models, efficient adaptation in
downstream tasks has become paramount. Visual Prompting (VP), inspired by
prompting in Large Language Models (LLMs), has emerged as a key transfer
learning method in computer vision. Aligned with the growing significance of
efficiency, research in model compression has become pivotal to alleviate the
computational burden in both training and deploying over-parameterized neural
networks. A key goal in model compression is the development of sparse models
capable of matching or surpassing the performance of their over-parameterized,
dense counterparts. While prior research has explored the impact of model
sparsity on transfer learning, its effects on visual prompting-based transfer
remain unclear. This study addresses this gap, revealing that model sparsity
adversely affects the performance of visual prompting-based transfer,
particularly in low-data-volume scenarios. Furthermore, our findings highlight
the negative influence of sparsity on the calibration of downstream
visual-prompted models. This empirical exploration calls for a nuanced
understanding beyond accuracy in sparse settings, opening avenues for further
research in Visual Prompting for sparse models.

@article{misra2023reprogramming, title = {Reprogramming under constraints:
Revisiting efficient and reliable transferability of lottery tickets}, author =
{Diganta Misra and Agam Goyal and Bharat Runwal and Pin Yu Chen}, year = {2023},
journal = {arXiv preprint arXiv: 2308.14969} } Cohere ForAI Lightning Talk /
Google Sparsity Reading Group Talk / MLC Research Jam 17

     
Challenging Common Assumptions about Catastrophic Forgetting
Timothée Lesort, Oleksiy Ostapenko, Diganta Misra, Md Rifat Arefin, Pau
Rodriguez, Laurent Charlin, Irina Rish




CoLLAs workshop, 2022
CoLLAs, 2023
paper / abstract / bibtex

Standard gradient descent algorithms applied to sequences of tasks are known to
produce catastrophic forgetting in deep neural networks. When trained on a new
task in a sequence, the model updates its parameters on the current task,
forgetting past knowledge. This article explores scenarios where we scale the
number of tasks in a finite environment. Those scenarios are composed of a long
sequence of tasks with reoccurring data. We show that in such setting,
stochastic gradient descent can learn, progress, and converge to a solution that
according to existing literature needs a continual learning algorithm. In other
words, we show that the model performs knowledge retention and accumulation
without specific memorization mechanisms. We propose a new experimentation
framework, SCoLe (Scaling Continual Learning), to study the knowledge retention
and accumulation of algorithms in potentially infinite sequences of tasks. To
explore this setting, we performed a large number of experiments on sequences of
1,000 tasks to better understand this new family of settings. We also propose a
slight modifications to the vanilla stochastic gradient descent to facilitate
continual learning in this setting. The SCoLe framework represents a good
simulation of practical training environments with reoccurring situations and
allows the study of convergence behavior in long sequences. Our experiments show
that previous results on short scenarios cannot always be extrapolated to longer
scenarios.

@article{lesort2022scaling,
title = {Scaling the Number of Tasks in Continual Learning},
author = {Timothée Lesort and Oleksiy Ostapenko and Diganta Misra and Md Rifat
Arefin and Pau Rodríguez and Laurent Charlin and Irina Rish},
year = {2022},
journal = {arXiv preprint arXiv: Arxiv-2207.04543}
}

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of
language models
Diganta Misra, Mukund Varma T., Multiple authors




TMLR, 2023
project / paper / abstract / bibtex

Language models demonstrate both quantitative improvement and new qualitative
capabilities with increasing scale. Despite their potentially transformative
impact, these new capabilities are as yet poorly characterized. In order to
inform future research, prepare for disruptive new model capabilities, and
ameliorate socially harmful effects, it is vital that we understand the present
and near-future capabilities and limitations of language models. To address this
challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench).
BIG-bench currently consists of 204 tasks, contributed by 442 authors across 132
institutions. Task topics are diverse, drawing problems from linguistics,
childhood development, math, common-sense reasoning, biology, physics, social
bias, software development, and beyond. BIG-bench focuses on tasks that are
believed to be beyond the capabilities of current language models. We evaluate
the behavior of OpenAI's GPT models, Google-internal dense transformer
architectures, and Switch-style sparse transformers on BIG-bench, across model
sizes spanning millions to hundreds of billions of parameters. In addition, a
team of human expert raters performed all tasks in order to provide a strong
baseline. Findings include: model performance and calibration both improve with
scale, but are poor in absolute terms (and when compared with rater
performance); performance is remarkably similar across model classes, though
with benefits from sparsity; tasks that improve gradually and predictably
commonly involve a large knowledge or memorization component, whereas tasks that
exhibit "breakthrough" behavior at a critical scale often involve multiple steps
or components, or brittle metrics; social bias typically increases with scale in
settings with ambiguous context, but this can be improved with prompting.

@article{srivastava2022beyond,
title = {Beyond the Imitation Game: Quantifying and extrapolating the
capabilities of language models},
author = {Aarohi Srivastava and Abhinav Rastogi and Abhishek Rao and Abu Awal Md
Shoeb and Abubakar Abid and Adam Fisch and Adam R. Brown and Adam Santoro and
Aditya Gupta and Adrià Garriga-Alonso and Agnieszka Kluska and Aitor Lewkowycz
and Akshat Agarwal and Alethea Power and Alex Ray and Alex Warstadt and
Alexander W. Kocurek and Ali Safaya and Ali Tazarv and Alice Xiang and Alicia
Parrish and Allen Nie and Aman Hussain and Amanda Askell and Amanda Dsouza and
Ameet Rahane and Anantharaman S. Iyer and Anders Andreassen and Andrea Santilli
and Andreas Stuhlmüller and Andrew Dai and Andrew La and Andrew Lampinen and
Andy Zou and Angela Jiang and Angelica Chen and Anh Vuong and Animesh Gupta and
Anna Gottardi and Antonio Norelli and Anu Venkatesh and Arash Gholamidavoodi and
Arfa Tabassum and Arul Menezes and Arun Kirubarajan and Asher Mullokandov and
Ashish Sabharwal and Austin Herrick and Avia Efrat and Aykut Erdem and Ayla
Karakaş and B. Ryan Roberts and Bao Sheng Loe and Barret Zoph and Bartłomiej
Bojanowski and Batuhan Özyurt and Behnam Hedayatnia and Behnam Neyshabur and
Benjamin Inden and Benno Stein and Berk Ekmekci and Bill Yuchen Lin and Blake
Howald and Cameron Diao and Cameron Dour and Catherine Stinson and Cedrick
Argueta and César Ferri Ramírez and Chandan Singh and Charles Rathkopf and
Chenlin Meng and Chitta Baral and Chiyu Wu and Chris Callison-Burch and Chris
Waites and Christian Voigt and Christopher D. Manning and Christopher Potts and
Cindy Ramirez and Clara E. Rivera and Clemencia Siro and Colin Raffel and
Courtney Ashcraft and Cristina Garbacea and Damien Sileo and Dan Garrette and
Dan Hendrycks and Dan Kilman and Dan Roth and Daniel Freeman and Daniel Khashabi
and Daniel Levy and Daniel Moseguí González and Danny Hernandez and Danqi Chen
and Daphne Ippolito and Dar Gilboa and David Dohan and David Drakard and David
Jurgens and Debajyoti Datta and Deep Ganguli and Denis Emelin and Denis Kleyko
and Deniz Yuret and Derek Chen and Derek Tam and Dieuwke Hupkes and Diganta
Misra and Dilyar Buzan and Dimitri Coelho Mollo and Diyi Yang and Dong-Ho Lee
and Ekaterina Shutova and Ekin Dogus Cubuk and Elad Segal and Eleanor Hagerman
and Elizabeth Barnes and Elizabeth Donoway and Ellie Pavlick and Emanuele Rodola
and Emma Lam and Eric Chu and Eric Tang and Erkut Erdem and Ernie Chang and
Ethan A. Chi and Ethan Dyer and Ethan Jerzak and Ethan Kim and Eunice Engefu
Manyasi and Evgenii Zheltonozhskii and Fanyue Xia and Fatemeh Siar and Fernando
Martínez-Plumed and Francesca Happé and Francois Chollet and Frieda Rong and
Gaurav Mishra and Genta Indra Winata and Gerard de Melo and Germán Kruszewski
and Giambattista Parascandolo and Giorgio Mariani and Gloria Wang and Gonzalo
Jaimovitch-López and Gregor Betz and Guy Gur-Ari and Hana Galijasevic and Hannah
Kim and Hannah Rashkin and Hannaneh Hajishirzi and Harsh Mehta and Hayden Bogar
and Henry Shevlin and Hinrich Schütze and Hiromu Yakura and Hongming Zhang and
Hugh Mee Wong and Ian Ng and Isaac Noble and Jaap Jumelet and Jack Geissinger
and Jackson Kernion and Jacob Hilton and Jaehoon Lee and Jaime Fernández Fisac
and James B. Simon and James Koppel and James Zheng and James Zou and Jan Kocoń
and Jana Thompson and Jared Kaplan and Jarema Radom and Jascha Sohl-Dickstein
and Jason Phang and Jason Wei and Jason Yosinski and Jekaterina Novikova and
Jelle Bosscher and Jennifer Marsh and Jeremy Kim and Jeroen Taal and Jesse Engel
and Jesujoba Alabi and Jiacheng Xu and Jiaming Song and Jillian Tang and Joan
Waweru and John Burden and John Miller and John U. Balis and Jonathan Berant and
Jörg Frohberg and Jos Rozen and Jose Hernandez-Orallo and Joseph Boudeman and
Joseph Jones and Joshua B. Tenenbaum and Joshua S. Rule and Joyce Chua and Kamil
Kanclerz and Karen Livescu and Karl Krauth and Karthik Gopalakrishnan and
Katerina Ignatyeva and Katja Markert and Kaustubh D. Dhole and Kevin Gimpel and
Kevin Omondi and Kory Mathewson and Kristen Chiafullo and Ksenia Shkaruta and
Kumar Shridhar and Kyle McDonell and Kyle Richardson and Laria Reynolds and Leo
Gao and Li Zhang and Liam Dugan and Lianhui Qin and Lidia Contreras-Ochando and
Louis-Philippe Morency and Luca Moschella and Lucas Lam and Lucy Noble and
Ludwig Schmidt and Luheng He and Luis Oliveros Colón and Luke Metz and Lütfi
Kerem Şenel and Maarten Bosma and Maarten Sap and Maartje ter Hoeve and Madotto
Andrea and Maheen Farooqi and Manaal Faruqui and Mantas Mazeika and Marco
Baturan and Marco Marelli and Marco Maru and Maria Jose Ramírez Quintana and
Marie Tolkiehn and Mario Giulianelli and Martha Lewis and Martin Potthast and
Matthew L. Leavitt and Matthias Hagen and Mátyás Schubert and Medina Orduna
Baitemirova and Melody Arnaud and Melvin McElrath and Michael A. Yee and Michael
Cohen and Michael Gu and Michael Ivanitskiy and Michael Starritt and Michael
Strube and Michał Swędrowski and Michele Bevilacqua and Michihiro Yasunaga and
Mihir Kale and Mike Cain and Mimee Xu and Mirac Suzgun and Mo Tiwari and Mohit
Bansal and Moin Aminnaseri and Mor Geva and Mozhdeh Gheini and Mukund Varma T
and Nanyun Peng and Nathan Chi and Nayeon Lee and Neta Gur-Ari Krakover and
Nicholas Cameron and Nicholas Roberts and Nick Doiron and Nikita Nangia and
Niklas Deckers and Niklas Muennighoff and Nitish Shirish Keskar and Niveditha S.
Iyer and Noah Constant and Noah Fiedel and Nuan Wen and Oliver Zhang and Omar
Agha and Omar Elbaghdadi and Omer Levy and Owain Evans and Pablo Antonio Moreno
Casares and Parth Doshi and Pascale Fung and Paul Pu Liang and Paul Vicol and
Pegah Alipoormolabashi and Peiyuan Liao and Percy Liang and Peter Chang and
Peter Eckersley and Phu Mon Htut and Pinyu Hwang and Piotr Miłkowski and Piyush
Patil and Pouya Pezeshkpour and Priti Oli and Qiaozhu Mei and Qing Lyu and
Qinlang Chen and Rabin Banjade and Rachel Etta Rudolph and Raefer Gabriel and
Rahel Habacker and Ramón Risco Delgado and Raphaël Millière and Rhythm Garg and
Richard Barnes and Rif A. Saurous and Riku Arakawa and Robbe Raymaekers and
Robert Frank and Rohan Sikand and Roman Novak and Roman Sitelew and Ronan LeBras
and Rosanne Liu and Rowan Jacobs and Rui Zhang and Ruslan Salakhutdinov and Ryan
Chi and Ryan Lee and Ryan Stovall and Ryan Teehan and Rylan Yang and Sahib Singh
and Saif M. Mohammad and Sajant Anand and Sam Dillavou and Sam Shleifer and Sam
Wiseman and Samuel Gruetter and Samuel R. Bowman and Samuel S. Schoenholz and
Sanghyun Han and Sanjeev Kwatra and Sarah A. Rous and Sarik Ghazarian and Sayan
Ghosh and Sean Casey and Sebastian Bischoff and Sebastian Gehrmann and Sebastian
Schuster and Sepideh Sadeghi and Shadi Hamdan and Sharon Zhou and Shashank
Srivastava and Sherry Shi and Shikhar Singh and Shima Asaadi and Shixiang Shane
Gu and Shubh Pachchigar and Shubham Toshniwal and Shyam Upadhyay and Shyamolima
and Debnath and Siamak Shakeri and Simon Thormeyer and Simone Melzi and Siva
Reddy and Sneha Priscilla Makini and Soo-Hwan Lee and Spencer Torene and
Sriharsha Hatwar and Stanislas Dehaene and Stefan Divic and Stefano Ermon and
Stella Biderman and Stephanie Lin and Stephen Prasad and Steven T. Piantadosi
and Stuart M. Shieber and Summer Misherghi and Svetlana Kiritchenko and Swaroop
Mishra and Tal Linzen and Tal Schuster and Tao Li and Tao Yu and Tariq Ali and
Tatsu Hashimoto and Te-Lin Wu and Théo Desbordes and Theodore Rothschild and
Thomas Phan and Tianle Wang and Tiberius Nkinyili and Timo Schick and Timofei
Kornev and Timothy Telleen-Lawton and Titus Tunduny and Tobias Gerstenberg and
Trenton Chang and Trishala Neeraj and Tushar Khot and Tyler Shultz and Uri
Shaham and Vedant Misra and Vera Demberg and Victoria Nyamai and Vikas Raunak
and Vinay Ramasesh and Vinay Uday Prabhu and Vishakh Padmakumar and Vivek
Srikumar and William Fedus and William Saunders and William Zhang and Wout
Vossen and Xiang Ren and Xiaoyu Tong and Xinyi Wu and Xudong Shen and Yadollah
Yaghoobzadeh and Yair Lakretz and Yangqiu Song and Yasaman Bahri and Yejin Choi
and Yichi Yang and Yiding Hao and Yifu Chen and Yonatan Belinkov and Yu Hou and
Yufang Hou and Yuntao Bai and Zachary Seid and Zhao Xinran and Zhuoye Zhao and
Zijian Wang and Zijie J. Wang and Zirui Wang and Ziyi Wu},
year = {2022},
journal = {arXiv preprint arXiv: Arxiv-2206.04615}
} Tense task

   
Genetic Algorithm Optimized Inkjet Printed Electromagnetic Absorber on Paper
Substrate
Diganta Misra, Rahul Pelluri, Vijay Kumar Verma, Bhargav Appasani, Nisha Gupta




IEEE AESPC, 2018
paper / abstract / bibtex

Printable electronics based electromagnetic absorbers are receiving increasing
attention of the electromagnetic community because of their unprecedented
advantages. This paper presents the design of printable electromagnetic
absorbers for the X band. The design of the absorber is optimized using the
Genetic Algorithm (GA) to enhance the absorptivity and the absorption bandwidth.
The design involves the placement of several square-shaped conductive ink at
optimal locations on the paper substrate such that desired absorption
characteristics are obtained. Simulations are carried out using the HFSS
simulation software. The optimized structure offers an absorptivity of more than
90% in the X band thereby proving to be a viable solution for stealth
applications.

@inproceedings{misra2018genetic,
title={Genetic Algorithm Optimized Inkjet Printed Electromagnetic Absorber on
Paper Substrate},
author={Misra, Diganta and Pelluri, Rahul and Verma, Vijay Kumar and Appasani,
Bhargav and Gupta, Nisha},
booktitle={2018 International Conference on Applied Electromagnetics, Signal
Processing and Communication (AESPC)},
volume={1},
pages={1--3},
year={2018},
organization={IEEE}
}
Large-Scale Meta-Analysis of Genes Encoding Pattern in Wilson’s Disease   (Best
Paper Award)
Diganta Misra, Anurag Tiwari, Amrita Chaturvedi




Springer IC4S, 2018
paper / abstract / bibtex

In this paper, we propose an unsupervised learning approach with an objective to
understand gene expressions for analysis of Wilson’s disease in the liver of Mus
musculus organisms. We proceeded to obtain the best parameters for cluster
division to correctly classify gene expression sets so as to capture the effect
and characteristics of the disease in the genome levels of the organisms in the
best possible way. The clustering proved beneficial in capturing the correct
genetic analogy of Wilson’s disease. Analytical experiments were carried out
using various clustering algorithms and were evaluated using performance metrics
including silhouette score analysis and Calinski–Harabasz index.

@inproceedings{misra2019large, title={Large-Scale Meta-Analysis of Genes
Encoding Pattern in Wilson’s Disease}, author={Misra, Diganta and Tiwari, Anurag
and Chaturvedi, Amrita}, booktitle={Advances in Computer Communication and
Computational Sciences: Proceedings of IC4S 2018}, pages={389--400},
year={2019}, organization={Springer} }
Convoluted Cosmos: Classifying Galaxy Images Using Deep Learning
Diganta Misra, Sachi Nandan Mohanty, Mohit Agarwal, Suneet K Gupta




Springer ICDMAI, 2019 (Proceedings of the AISC)
paper / abstract / bibtex

In this paper, a deep learning-based approach has been developed to classify the
images of galaxies into three major categories, namely, elliptical, spiral, and
irregular. The classifier successfully classified the images with an accuracy of
97.3958%, which outperformed conventional classifiers like Support Vector
Machine and Naive Bayes. The convolutional neural network architecture involves
one input convolution layer having 16 filters, followed by 4 hidden layers, 1
penultimate dense layer, and an output Softmax layer. The model was trained on
4614 images for 200 epochs using NVIDIA-DGX-1 Tesla-V100 Supercomputer machine
and was subsequently tested on new images to evaluate its robustness and
accuracy.

@incollection{misra2020convoluted,
title={Convoluted cosmos: classifying galaxy images using deep learning},
author={Misra, Diganta and Mohanty, Sachi Nandan and Agarwal, Mohit and Gupta,
Suneet K},
booktitle={Data Management, Analytics and Innovation},
pages={569--579},
year={2020},
publisher={Springer}
}

Research under progress

Dynamic Sparse Upcycling
Diganta Misra, Mohammed Muqeeth (UNC Chapel Hill), Nizar Islah (Mila), Shiwei
Liu (VITA, UT-Austin), Zhangyang "Atlas" Wang (VITA, UT-Austin), Conglong Li
(Microsoft)




Open Source Frameworks & Projects  

Avalanche: an End-to-End Library for Continual Learning
Dec'20 - Present

I am an active lead maintainer of the Reproducible Continual Learning framework
by Avalanche and also actively work on the evaluation framework of Avalanche
mainly in the direction of integration of Weights & Biases API.

   



Echo
Jun'19 - Present

Echo is an OSS deep learning package with support for TensorFlow, PyTorch and
MegEngine, containing novel validated methods, components and building blocks
used in deep learning.

     



Evonorm
Apr'20

Created the most popular open source reimplementation of Evolving
Normalization-Activation Layers by Liu. et. al.

 



ECANets
Jan'21

Reproduced the CVPR 2020 paper: ECA-Net: Efficient Channel Attention for Deep
Convolutional Neural Networks for the ML Reproducibility Challenge 2020.
Integrated with Weights & Biases.

     



Big Bench
Aug'21

Our fine grained tense modification task was accepted to Google's Big Bench for
testing large LMs. In collaboration with Mukund Varma T.

 



MDEL
April 2023 - Present

I am currently the lead for the modelling part of the Multi-Domain Expert Layers
(MDEL) Training: How to increase knowledge without breaking the bank? as a
collaborative effort co-ordinated by Ontocord AI wherein my team is working on
different aspects of architecture design and training of the MDEL model on
SUMMIT supercomputer cluster as part of the INCITE allocation.







   Education

Masters in Machine LearningSeptember 2021 - Present
Montréal Institute of Learning Algorithms (Mila)
Advisor: Professor Irina Rish
Montréal, Canada

 

Masters in Computer Science (MSc CS)September 2021 - Present
University of Montréal
Advisor: Professor Irina Rish
Montréal, Canada


Bachelors of Technology (B.Tech) in EEEJun. 2016 - May. 2020
Kalinga Institute of Industrial Technology (KIIT)
Advisor: Asst. Prof. Dr. Bhargav Appasani
Bhubaneswar, India

   Internships and Exchange Programs

Data Science InternJun. 2018 - Feb. 2019
CSIR-CDRI

During this internship, I was involved in building the analytical pipeline, data
collection, pre-processing of data, cleaning of data, Geo-spatial Analysis of
data and Document writing for the project on understanding demographics of
Venture Capital and Early Seed Investments. As a part of a team of three, I was
advised and mentored by Dr. Sukant Khurana.

Remote

Summer InternMay. 2018 - Jun. 2018
IIT-Kharagpur

Studied basic algorithmic techniques using functional programming languages -
Lisp and Prolog under the guidance of Assc. Prof. Pawan Kumar.

Kharagpur, India

 

Summer Exchange InternJun. 2017 - Aug. 2017
Bangkok University

Served as a primary instructor for cultural engagements along with teaching
basic english and computer science to primary grade students at RangsonWittaya
School, Nakhon Sawan under the AIESEC SDG #4 programme. Was also part of culture
exchange, entrepreneurship and social service programs at Bangkok University

Bangkok, Thailand

Initiatives and Academic Services

NeuroMatch Academy

I was responsible for developing the content for the Strategies section in the
Continual Learning lecture of the Deep Learning Cohort of Neuromatch Academy
2021. W&B ML Reproducibility Challenge

I was the lead organizer of the W&B MLRC 2021 where I actively supported our
challenge participants. Our mission of organizing this challenge was to make
machine learning research reproducible, transparent and accessible to everyone.
This initiative was also supported by our W&B MLRC Grant of $500 for each
participant. INF8225: Probabilistic Learning

I was a teaching assistant for the INF8225: Probabilistic Learning course at
Polytechnique University taught by Christopher J. Pal for the Winter 2022
semester. INF8245E: Machine Learning

I was a teaching assistant for the INF8245E: Machine Learning course at
Polytechnique University taught by Sarath Chandar for the Fall 2023 semester.
IFT6390: Machine Learning

I was a teaching assistant for the IFT6390: Machine Learning course at UdeM
taught by Ioannis Mitliagkis for the Fall 2023 semester. IFT6760A: Towards AGI:
Scaling, Emergence, Alignment

I am a teaching assistant for the IFT6760A: Towards AGI: Scaling, Emergence,
Alignment course at UdeM taught by Irina Rish for the Winter 2024 semester. Deep
Learning Theory Reading Group, Mila

I was an organizer of the DL Theory Reading Group at Mila, Montreal. Mila 2022
Entrepreneurs Cohort Program

I was selected as one of the entrepreneurs in residence and pitched my startup
idea called 9CLEF (Elevator Pitch). Mila 2022 TRAIL

I was selected as one of the first students in the Trustworthy Responsible AI
Learning certificate (TRAIL) program at Mila, Montreal. (Certificate) McMedHacks
2023

I was selected as one of the mentors of McMedHacks 2023 and will be mentoring
the participants on the topic of AI in Healthcare. NewinML workshop @ NeurIPS,
2023

I am serving as an organizer, Program Chair (PC) and Area Chair (AC) for the
NewinML workshop at NeurIPS 2023. Volunteer @ NeurIPS, 2023

I will be serving as a volunteer at the NeurIPS, 2023 conference. Served as a
Reviewer/ Program Committee member for:

CVPR 2024 (R), Workshop on Data-centric Machine Learning Research (DMLR):
Harnessing Momentum for Science @ ICLR, 2024 (R), Workshop on Secure and
Trustworthy Large Language Models @ ICLR, 2024 (R), TMLR (R), Conference on
Lifelong Learning Agents(CoLLA) 2022 (PC), Conference on Lifelong Learning
Agents(CoLLA) 2023, 2024 (R), ICASSP 2023 (R) (Certificate), Efficient Systems
for Foundation Models workshop at ICLR 2023 (R), Continual Learning AI
Un-Conference (R), Springer Soft Computing (R).

Achievements
(Complete list available upon request)

Quebec Ministry of Higher Education International Students Scholarship
2022

I was awarded the DIRO x Quebec Ministry of Higher Education international
students scholarship worth CAD$4,000 for the academic year 2022. Quebec Ministry
of Higher Education International Students Scholarship
2023

I was awarded the DIRO x Quebec Ministry of Higher Education international
students scholarship worth CAD$3,000 for the academic year 2023. UNIQUE AI
Excellence Scholarship
2022

I was awarded the UNIQUE AI Excellence Scholarship worth CAD$10,000 for the
academic year 2022. Under this scholarship, I will be working with Irina Rish
and Pouya Bashivan on dynamic sparsity based research. PaperswithCode Top
Contributor award
2022

I was awarded the PaperswithCode Top Contributor award for the academic year
2022. MILA Entrepreneurs Grant
2022

I was awarded the MILA Entrepreneurs Grant worth CAD$5,000 to pursue my startup
venture 9CLEF (Elevator Pitch) and build an early prototype. AMII AI Week Travel
Bursary
2022

I was awarded the AMII AI Week 2022 Student Travel Bursary worth CAD$1,500.






Updated on: 22nd January, 2024 Danke Schön, Jon Barron!