leobitz.github.io Open in urlscan Pro
2606:50c0:8001::153  Public Scan

Submitted URL: http://leobitz.github.io/
Effective URL: https://leobitz.github.io/
Submission: On September 15 via api from US — Scanned from DE

Form analysis 0 forms found in the DOM

Text Content

Toggle navigation
   
 * about(current)
   
 * publications
 * projects
 * cv
   
 * 




AMANUEL MERSHA

A passionate AI/ML Researcher and Engineer. Currently serving as a Lecturer at
School of Information Technology and Engineering, Addis Ababa Institute of
Technology, Ethiopia.

Addis Ababa Institute of Technology

King George St, 5 Kilo

Addis Ababa, Ethiopia

I recently graduated with a Master of Science in Artificial Intelligence from
Addis Ababa University. I am broadly interested in deep learning models that are
efficient, robust to distribution shift, and able to acquire new knowledge over
time. These concepts are essential for all subfields of learning systems, such
as natural language processing, computer vision, and robotics. Therefore, I am
interested in the foundational concepts of all of these domains. I also enjoy
developing products, and I have participated in several machine learning
projects where I have built robust AI systems.

My research interest includes:

 * Efficient Deep Neural Networks
 * Deep Learning Theory
 * Out of distribution generalization


NEWS

May 1, 2024 I will be working as a visiting researcher at the SymbioticLab in
the CSE department in University of Michigan till end of July. My research work
will be developing algorithms in the area of large scale distributed LLM and
MMLM training and inference. Jul 17, 2023 I will be attending the DLRL Summer
School at MILA in Montreal, CA from July 17th to 21st, 2023. Jul 14, 2023 I
finished my M.Sc. thesis titled “Reinforcement Learning Based Layer Skipping
Vision Transformer for Efficient Inference” at Addis Ababa Institute of
Technology.


SELECTED PUBLICATIONS

 1. DynamicViT: Making Vision Transformer faster through layer skipping
    Amanuel Mersha, and Sammy Assefa
    Vision Transformers: Theory and Applications Workshop at NeurIPS 2022, Nov
    2022
    
    Abs PDF Website
    
    
    Recent deep learning breakthroughs in language and vision tasks can be
    mainly attributed to large-scale transformers. Unfortunately, the massive
    size and high compute requirement of these models have limited their use in
    resource-constrained environments. Dynamic neural networks present a unique
    opportunity to reduce the amount of compute requirement as these models
    enable dynamically adjusting the computational path given an input. We
    propose a layer-skipping dynamic vision transformer (ViT) network that skips
    layers for each sample based on decisions given by a reinforcement learning
    agent. Extensive experiments on CIFAR-10 and CIFAR-100 showed that this
    dynamic ViT model gains an average of 40% throughput increase in the
    inference phase when evaluated on different batch sizes ranging from 1 to
    1024.

 2. DistillEmb: Distilling Word Embeddings via Contrastive Learning
    Amanuel Mersha, and Stephen Wu
    Transfer Learning for NLP Workshop at NeurIPS 2022, Nov 2022
    
    Abs PDF Website
    
    
    Word embeddings powered the early days of neural network-based NLP research.
    Their effectiveness in small data regimes makes them still relevant in
    low-resource environments. However, they are limited in two critical ways:
    linearly increasing memory requirements and out-of-vocabulary token
    handling. In this work, we present a distillation technique of word
    embeddings into a CNN network using contrastive learning. This method allows
    embeddings to be regressed given the characters of a token. It is then used
    as a pretrained layer, replacing word embeddings. Low-resource languages are
    the primary beneficiary of this method and hence, we show its effectiveness
    on two morphology-rich Semitic languages, and in a multilingual NER task
    comprised of 10 African languages. Apart from being data and memory
    efficient, the model significantly increases performance across several
    benchmarks and is capable of transferring word representations.

 3. Dynamic Transformer Network
    Amanuel Mersha
    Workshop on Dynamic Neural Networks at ICML 2022, Jul 2022
    
    Abs PDF Website
    
    
    The recent deep learning breakthroughs in language and vision tasks can be
    mainly attributed to large-scale transformers. Unfortunately, their massive
    size and high compute requirement have limited their use in
    resource-constrained environments. Dynamic neural networks could potentially
    reduce the amount of compute requirement by dynamically adjusting the
    computational path based on the input. Similar to soft attention, this work
    presents a simple way of constructing an oracle function that enables a
    transformer network to determine the dependency between its layers. It can
    then be used as a strategy to skip layers without a reinforcement learning
    agent. We show that such a model learns to skip, on average, half of its
    layers for each sample in a batch input.

 4. Morphology-rich Alphasyllabary Embeddings
    Amanuel Mersha, and Stephen Wu
    Proceedings of the 12^th Language Resources and Evaluation Conference, Jan
    2020
    
    Abs PDF Website
    
    
    Word embeddings have been successfully trained in many languages. However,
    both intrinsic and extrinsic metrics are variable across languages,
    especially for languages that depart significantly from English in
    morphology and orthography. This study focuses on building a word embedding
    model suitable for the Semitic language of Amharic (Ethiopia), which is both
    morphologically rich and written as an alphasyllabary (abugida) rather than
    an alphabet. We compare embeddings from tailored neural models, simple
    pre-processing steps, off-the-shelf baselines, and parallel tasks on a
    better-resourced Semitic language – Arabic. Experiments show our model’s
    performance on word analogy tasks, illustrating the divergent objectives of
    morphological vs. semantic analogies.

You can even add a little note about which of these is the best way to reach
you.
© Copyright 2024 Amanuel Negash Mersha.