leobitz.github.io
Open in
urlscan Pro
2606:50c0:8001::153
Public Scan
Submitted URL: http://leobitz.github.io/
Effective URL: https://leobitz.github.io/
Submission: On September 15 via api from US — Scanned from DE
Effective URL: https://leobitz.github.io/
Submission: On September 15 via api from US — Scanned from DE
Form analysis
0 forms found in the DOMText Content
Toggle navigation * about(current) * publications * projects * cv * AMANUEL MERSHA A passionate AI/ML Researcher and Engineer. Currently serving as a Lecturer at School of Information Technology and Engineering, Addis Ababa Institute of Technology, Ethiopia. Addis Ababa Institute of Technology King George St, 5 Kilo Addis Ababa, Ethiopia I recently graduated with a Master of Science in Artificial Intelligence from Addis Ababa University. I am broadly interested in deep learning models that are efficient, robust to distribution shift, and able to acquire new knowledge over time. These concepts are essential for all subfields of learning systems, such as natural language processing, computer vision, and robotics. Therefore, I am interested in the foundational concepts of all of these domains. I also enjoy developing products, and I have participated in several machine learning projects where I have built robust AI systems. My research interest includes: * Efficient Deep Neural Networks * Deep Learning Theory * Out of distribution generalization NEWS May 1, 2024 I will be working as a visiting researcher at the SymbioticLab in the CSE department in University of Michigan till end of July. My research work will be developing algorithms in the area of large scale distributed LLM and MMLM training and inference. Jul 17, 2023 I will be attending the DLRL Summer School at MILA in Montreal, CA from July 17th to 21st, 2023. Jul 14, 2023 I finished my M.Sc. thesis titled “Reinforcement Learning Based Layer Skipping Vision Transformer for Efficient Inference” at Addis Ababa Institute of Technology. SELECTED PUBLICATIONS 1. DynamicViT: Making Vision Transformer faster through layer skipping Amanuel Mersha, and Sammy Assefa Vision Transformers: Theory and Applications Workshop at NeurIPS 2022, Nov 2022 Abs PDF Website Recent deep learning breakthroughs in language and vision tasks can be mainly attributed to large-scale transformers. Unfortunately, the massive size and high compute requirement of these models have limited their use in resource-constrained environments. Dynamic neural networks present a unique opportunity to reduce the amount of compute requirement as these models enable dynamically adjusting the computational path given an input. We propose a layer-skipping dynamic vision transformer (ViT) network that skips layers for each sample based on decisions given by a reinforcement learning agent. Extensive experiments on CIFAR-10 and CIFAR-100 showed that this dynamic ViT model gains an average of 40% throughput increase in the inference phase when evaluated on different batch sizes ranging from 1 to 1024. 2. DistillEmb: Distilling Word Embeddings via Contrastive Learning Amanuel Mersha, and Stephen Wu Transfer Learning for NLP Workshop at NeurIPS 2022, Nov 2022 Abs PDF Website Word embeddings powered the early days of neural network-based NLP research. Their effectiveness in small data regimes makes them still relevant in low-resource environments. However, they are limited in two critical ways: linearly increasing memory requirements and out-of-vocabulary token handling. In this work, we present a distillation technique of word embeddings into a CNN network using contrastive learning. This method allows embeddings to be regressed given the characters of a token. It is then used as a pretrained layer, replacing word embeddings. Low-resource languages are the primary beneficiary of this method and hence, we show its effectiveness on two morphology-rich Semitic languages, and in a multilingual NER task comprised of 10 African languages. Apart from being data and memory efficient, the model significantly increases performance across several benchmarks and is capable of transferring word representations. 3. Dynamic Transformer Network Amanuel Mersha Workshop on Dynamic Neural Networks at ICML 2022, Jul 2022 Abs PDF Website The recent deep learning breakthroughs in language and vision tasks can be mainly attributed to large-scale transformers. Unfortunately, their massive size and high compute requirement have limited their use in resource-constrained environments. Dynamic neural networks could potentially reduce the amount of compute requirement by dynamically adjusting the computational path based on the input. Similar to soft attention, this work presents a simple way of constructing an oracle function that enables a transformer network to determine the dependency between its layers. It can then be used as a strategy to skip layers without a reinforcement learning agent. We show that such a model learns to skip, on average, half of its layers for each sample in a batch input. 4. Morphology-rich Alphasyllabary Embeddings Amanuel Mersha, and Stephen Wu Proceedings of the 12^th Language Resources and Evaluation Conference, Jan 2020 Abs PDF Website Word embeddings have been successfully trained in many languages. However, both intrinsic and extrinsic metrics are variable across languages, especially for languages that depart significantly from English in morphology and orthography. This study focuses on building a word embedding model suitable for the Semitic language of Amharic (Ethiopia), which is both morphologically rich and written as an alphasyllabary (abugida) rather than an alphabet. We compare embeddings from tailored neural models, simple pre-processing steps, off-the-shelf baselines, and parallel tasks on a better-resourced Semitic language – Arabic. Experiments show our model’s performance on word analogy tasks, illustrating the divergent objectives of morphological vs. semantic analogies. You can even add a little note about which of these is the best way to reach you. © Copyright 2024 Amanuel Negash Mersha.