benjaminwarner.dev
Open in
urlscan Pro
2606:4700:3035::ac43:91d0
Public Scan
Submitted URL: http://benjaminwarner.dev/
Effective URL: https://benjaminwarner.dev/
Submission: On November 08 via api from US — Scanned from DE
Effective URL: https://benjaminwarner.dev/
Submission: On November 08 via api from US — Scanned from DE
Form analysis
0 forms found in the DOMText Content
About Projects Blog All MIXED PRECISION by Benjamin Warner Recent Posts Aug 16, 2023 FlashAttention with PyTorch Compile Benchmarking FlashAttention and FlashAttention-2 on a Consumer GPU FlashAttention-2 builds on FlashAttention, yielding significant speedups on server-class GPUs. Unlike the PyTorch implementation of FlashAttention, FlashAttention-2 currently cannot compile into a single Cuda Graph via PyTorch 2.0's Compile. Does this matter, and if so at what model sizes and sequence lengths? In this post I attempt to answer these questions by benchmarking FlashAttention and FlashAttention-2 on a consumer GPU. Jul 28, 2023 Creating a Transformer From Scratch Part Two: The Rest of the Transformer In this post, I will show you how to build the rest of the Transformer. By the end of this post, you will be familiar with all the pieces of a Transformer model and, combined with your knowledge of Attention, will be able to write an entire Transformer from scratch. Jul 1, 2023 Creating a Transformer From Scratch Part One: The Attention Mechanism You cannot create a Transformer without Attention. In this post, I will show you how to write an Attention layer from scratch in PyTorch. By the end of this post, you will be familiar with all three flavors of Attention: Bidirectional, Causal, and Cross Attention, and should be able to write your own implementation of the Attention mechanism in code. May 10, 2023 How to Quickly Finetune Your Transformer Performance Tips for Faster Training While recent releases of language models have emphasized the large in Large Language Models, most everyday NLP work uses smaller language models, finetuned on custom or task specific datasets. In this post, I will show how to achieve fast finetuning performance on modern GPUs using tools like PyTorch 2.0’s torch.compile and FlashAttention. All Posts © 2024 Benjamin Warner