benjaminwarner.dev Open in urlscan Pro
2606:4700:3035::ac43:91d0  Public Scan

Submitted URL: http://benjaminwarner.dev/
Effective URL: https://benjaminwarner.dev/
Submission: On November 08 via api from US — Scanned from DE

Form analysis 0 forms found in the DOM

Text Content

About Projects Blog All


MIXED PRECISION

by Benjamin Warner

Recent Posts

Aug 16, 2023

FlashAttention with PyTorch Compile
Benchmarking FlashAttention and FlashAttention-2 on a Consumer GPU

FlashAttention-2 builds on FlashAttention, yielding significant speedups on
server-class GPUs. Unlike the PyTorch implementation of FlashAttention,
FlashAttention-2 currently cannot compile into a single Cuda Graph via PyTorch
2.0's Compile. Does this matter, and if so at what model sizes and sequence
lengths? In this post I attempt to answer these questions by benchmarking
FlashAttention and FlashAttention-2 on a consumer GPU.

Jul 28, 2023

Creating a Transformer From Scratch
Part Two: The Rest of the Transformer

In this post, I will show you how to build the rest of the Transformer. By the
end of this post, you will be familiar with all the pieces of a Transformer
model and, combined with your knowledge of Attention, will be able to write an
entire Transformer from scratch.

Jul 1, 2023

Creating a Transformer From Scratch
Part One: The Attention Mechanism

You cannot create a Transformer without Attention. In this post, I will show you
how to write an Attention layer from scratch in PyTorch. By the end of this
post, you will be familiar with all three flavors of Attention: Bidirectional,
Causal, and Cross Attention, and should be able to write your own implementation
of the Attention mechanism in code.

May 10, 2023

How to Quickly Finetune Your Transformer
Performance Tips for Faster Training

While recent releases of language models have emphasized the large in Large
Language Models, most everyday NLP work uses smaller language models, finetuned
on custom or task specific datasets. In this post, I will show how to achieve
fast finetuning performance on modern GPUs using tools like PyTorch 2.0’s
torch.compile and FlashAttention.

All Posts
© 2024 Benjamin Warner