www.cockroachlabs.com Open in urlscan Pro
2a05:d014:58f:6201::1f4 Public Scan

Back to summary

Submitted URL:
https://friends.cockroachlabs.com/MzUwLVFJTi04MjcAAAGQMM7_XJMuOFVS6pwMAIxMOQrv4elzqzT6GdrSsdzguHCGBjUvDYTHF0O8fwAi-x1rYPj876E=
Effective URL:
https://www.cockroachlabs.com/big-ideas-podcast/andy-pavlo-ottertune/?utm_campaign=eblast-podcast-big-ideas-one-off-promo-best...
Submission: On January 02 via api (January 2nd 2024, 3:25:23 pm UTC) from US — Scanned from DE

Form analysis
3 forms found in the DOM

<form id="footer-mktoForm_1480_5" class="mkto-install-form mkto-footer-form m-auto p-0 mktoForm mktoHasWidth mktoLayoutLeft" __bizdiag="-1089065311" __biza="WJ__" novalidate="novalidate"
  style="font-family: Helvetica, Arial, sans-serif; font-size: 13px; color: rgb(51, 51, 51);">
  <style type="text/css">
    .mktoForm .mktoButtonWrap.mktoSimple .mktoButton {
      color: #fff;
      border: 1px solid #75ae4c;
      padding: 0.4em 1em;
      font-size: 1em;
      background-color: #99c47c;
      background-image: -webkit-gradient(linear, left top, left bottom, from(#99c47c), to(#75ae4c));
      background-image: -webkit-linear-gradient(top, #99c47c, #75ae4c);
      background-image: -moz-linear-gradient(top, #99c47c, #75ae4c);
      background-image: linear-gradient(to bottom, #99c47c, #75ae4c);
    }

    .mktoForm .mktoButtonWrap.mktoSimple .mktoButton:hover {
      border: 1px solid #447f19;
    }

    .mktoForm .mktoButtonWrap.mktoSimple .mktoButton:focus {
      outline: none;
      border: 1px solid #447f19;
    }

    .mktoForm .mktoButtonWrap.mktoSimple .mktoButton:active {
      background-color: #75ae4c;
      background-image: -webkit-gradient(linear, left top, left bottom, from(#75ae4c), to(#99c47c));
      background-image: -webkit-linear-gradient(top, #75ae4c, #99c47c);
      background-image: -moz-linear-gradient(top, #75ae4c, #99c47c);
      background-image: linear-gradient(to bottom, #75ae4c, #99c47c);
    }
  </style>
  <div class="mktoFormRow">
    <div class="mktoFieldDescriptor mktoFormCol" style="margin-bottom: 10px;">
      <div class="mktoOffset" style="width: 10px;"></div>
      <div class="mktoFieldWrap mktoRequiredField input-group float-none"><label for="Email" id="LblEmail" class="mktoLabel mktoHasWidth" style="width: 17px;">
          <div class="mktoAsterix">*</div>
        </label>
        <div class="mktoGutter mktoHasWidth" style="width: 10px;"></div><input id="Email" name="Email" placeholder="Email*" maxlength="255" aria-labelledby="LblEmail InstructEmail" type="email"
          class="mktoField mktoEmailField mktoHasWidth mktoRequired form-control border-0" aria-required="true" style="">
        <div class="mktoButtonRow" style="display: flex;"><span class="mktoButtonWrap mktoSimple" style=""><button type="submit" class="mktoButton">Subscribe</button></span></div><span id="InstructEmail" tabindex="-1" class="mktoInstruction"></span>
        <div class="mktoClear"></div>
      </div>
      <div class="mktoClear"></div>
    </div>
    <div class="mktoClear"></div>
  </div>
  <div class="mktoFormRow"><input type="hidden" name="Subscription_Podcast__c" class="mktoField mktoFieldDescriptor mktoFormCol" value="TRUE" style="margin-bottom: 10px;">
    <div class="mktoClear"></div>
  </div>
  <div class="mktoFormRow"><input type="hidden" name="Would_you_like_to_receive_email_updates__c" class="mktoField mktoFieldDescriptor mktoFormCol" value="Yes" style="margin-bottom: 10px;">
    <div class="mktoClear"></div>
  </div>
  <div class="mktoFormRow"><input type="hidden" name="optin" class="mktoField mktoFieldDescriptor mktoFormCol" value="TRUE" style="margin-bottom: 10px;">
    <div class="mktoClear"></div>
  </div>
  <div class="mktoFormRow"><input type="hidden" name="utm_adgroup__c" class="mktoField mktoFieldDescriptor mktoFormCol" value="" style="margin-bottom: 10px;">
    <div class="mktoClear"></div>
  </div>
  <div class="mktoFormRow"><input type="hidden" name="utm_campaign__c" class="mktoField mktoFieldDescriptor mktoFormCol" value="eblast-podcast-big-ideas-one-off-promo-best-of-2023" style="margin-bottom: 10px;">
    <div class="mktoClear"></div>
  </div>
  <div class="mktoFormRow"><input type="hidden" name="utm_content__c" class="mktoField mktoFieldDescriptor mktoFormCol" value="listen-now" style="margin-bottom: 10px;">
    <div class="mktoClear"></div>
  </div>
  <div class="mktoFormRow"><input type="hidden" name="utm_medium__c" class="mktoField mktoFieldDescriptor mktoFormCol" value="email" style="margin-bottom: 10px;">
    <div class="mktoClear"></div>
  </div>
  <div class="mktoFormRow"><input type="hidden" name="utm_source__c" class="mktoField mktoFieldDescriptor mktoFormCol" value="mkto" style="margin-bottom: 10px;">
    <div class="mktoClear"></div>
  </div>
  <div class="mktoFormRow"><input type="hidden" name="utm_term__c" class="mktoField mktoFieldDescriptor mktoFormCol" value="episode" style="margin-bottom: 10px;">
    <div class="mktoClear"></div>
  </div>
  <div class="mktoFormRow"><input type="hidden" name="utm_sfcamp" class="mktoField mktoFieldDescriptor mktoFormCol" value="" style="margin-bottom: 10px;">
    <div class="mktoClear"></div>
  </div>
  <div class="mktoFormRow"><input type="hidden" name="gclid_field" class="mktoField mktoFieldDescriptor mktoFormCol" value="" style="margin-bottom: 10px;">
    <div class="mktoClear"></div>
  </div><input type="hidden" name="formid" class="mktoField mktoFieldDescriptor" value="1480"><input type="hidden" name="munchkinId" class="mktoField mktoFieldDescriptor" value="350-QIN-827">
</form>

<form novalidate="novalidate" class="mktoForm mktoHasWidth mktoLayoutLeft" style="font-family: Helvetica, Arial, sans-serif; font-size: 13px; color: rgb(51, 51, 51); visibility: hidden; position: absolute; top: -500px; left: -1000px; width: 1600px;"
  __bizdiag="-1305081519" __biza="WJ__"></form>

<form class="mkto-install-form mkto-footer-form m-auto p-0 mktoForm mktoHasWidth mktoLayoutLeft" __bizdiag="-1305081519" __biza="WJ__" novalidate="novalidate"
  style="font-family: Helvetica, Arial, sans-serif; font-size: 13px; color: rgb(51, 51, 51); visibility: hidden; position: absolute; top: -500px; left: -1000px; width: 1600px;"></form>

Text Content

PRODUCT

Capabilities

* Elastic Scale
* Cloud-Native & Kubernetes
* Built-in Survivability

* Familiar, Consistent SQL
* Global Data

SOLUTIONS

By Industries

* Finance
* Gaming
* Manufacturing & Logistics
* Media and Streaming

* Real-Money Gaming
* Retail & eCommerce
* SaaS
* Startups

Customer Stories

See how our customers use CockroachDB to handle their critical workloads.

Read case studies

RESOURCES

COCKROACH
UNIVERSITY

World-class training and tutorials for beginners and advanced use cases.
Sign up for free

Learn

* Resource center
* Blog
* Developers
* Webinars
* Podcast

Support

* Forum
* GitHub
* Slack
* Support Portal

DOCS

DOCS HUB

Access tutorials, guides, example applications, and much more.
Explore

* Quickstart
* FAQ
* Example applications
* Architecture Overview

CUSTOMERS

Featured stories

* AllSaints
* Hard Rock Digital
* Netflix
* Shipt
* Starburst

All customer stories

* Pricing

* Contact us
* Sign In
* Start instantly

Big Ideas in App Architecture

Episode 8

Database Benchmarking Efficiency with OtterTune’s Andy Pavlo

Andy Pavlo

Associate Professor of Databaseology at Carnegie Mellon and Co-Founder at
OtterTune

Never miss an episode

Thank you

* EPISODE SUMMARY
* |
* TRANSCRIPT

Database building is not for the faint of heart! It’s a grueling process that
can take years to master and develop.

This week, we’re with one of those masters of database building, Andy Pavlo, an
Associate Professor with indefinite tenure of databaseology in the computer
science department at Carnegie Mellon University and the Co-Founder of
OtterTune.

Andy discusses how his introduction to “databaseology” changed the way databases
are not only built but also studied for more efficiency at companies. From
optimizing databases for efficient testing and usage to building effective
databases using the right knobs and building blocks, Andy shares his insights
and expertise on how to improve the performance and reliability of your
applications. Join as we discuss:

* The emerging field of databaseology and its importance for efficient testing
and usage.
* The design and implementation of effective databases using the right knobs
and building blocks.
* Latest trends and techniques in database management and design that can
improve application performance.

Tim Veil:

Well, welcome to another episode of Big Ideas in App Architecture. I am super
excited today to welcome Andy Pavlo as a guest on the show. Andy, as you and I
have talked, I wanted to make sure I properly introduced you with your proper
title, but then you threw me for a loop with this very long thing that I was too
embarrassed or afraid to restate. So, I’ve asked you to introduce yourself
properly and then we can jump right into all the fun stuff.

Andy Pavlo:

Again, to be very clear, it’s not like I go to bars and introduce myself with
these titles. If someone has to put a name tag up, this is what it says.

Tim Veil:

I fully understand.

Andy Pavlo:

I am an Associate Professor with indefinite tenure of Databaseology in the
Computer Science Department at Carnegie Mellon University.

Tim Veil:

We are thrilled to have you on the show today and thank you for explaining that
title. So, welcome. You and I got to know each other maybe informally. It’s a
couple years ago now when I was asked to poke around a benchmarking framework
that you had something to do with. So, I thought maybe we’d start there, but
maybe even before we get into database benchmarking and OLTP Bench and the like,
do tell us maybe just a little bit about you, how you got to Carnegie Mellon,
what your background is, because there’s some really interesting things
obviously that you’ve been doing, been researching, been looking at.

Then obviously, we want to spend a lot of time talking about your newest
venture, which is OtterTune. So, before we get into all that, maybe just spend a
few minutes and tell us a little bit on how you got to where you are.

Andy Pavlo:

I mean, what is databaseology? First of all, it’s not a real terms to be very
clear here. As I said, when they put name tags up, they put that on, because the
university thinks it’s like ecology or neurology, right? It’s made up, but my
area of research is focused on database systems. So, that means I’m interested
in how do you design and optimize software systems to efficiently store, retain,
and process queries for data. So, prior to being at Carnegie Mellon, this is
actually my 10th year, I completed my PhD at Brown University under Stanley
Zdonik and Mike Stonebraker. The Stonebraker name should resonate with your
audience because he’s the inventor of Postgres, Ingres, the Turing Award Winner
in 2014.

So, when I was a grad student, I worked on a system called H-Store that was
commercialized as VoltDB. When I was graduating with my PhD, I didn’t really
know what I wanted to do. I know what I didn’t want to do, work at a big bank or
something like that. So, I applied to a bunch of startups, research labs,
universities. I did not think Carnegie Mellon would hire me. So, I was very
relaxed when I came here and interviewed and they wanted someone that does
databases. I’ve been here ever since and I love it here. This leads into, “Why
do we build OLTP bench or how did I meet you?” When I was in grad school, I
wrote four different variations of TPCC. That’s the standard OLTP Benchmark that
since 1992 that everyone uses to measure these systems. So, when I started
thinking about going to academia and what do I want to do next, I realized that
if I have students, I don’t want them to have to write TPCC four times. So, we
thought, “Let’s write it once in a single framework, have a bunch of other
workloads that we can reuse, and just have everyone take advantage of these
things.”

Tim Veil:

It’s been a fantastic tool. So, just a little bit of context on how I stumbled
upon it. Working at CockroachDB, people come to us all the time and say, “Geez,
how does your performance stack up against some other database?” Database
benchmarking is one way people ascertain whether or not your product is better
or worse than some other thing. What we have found, and that’s why we were so
happy to stumble upon the work that you had done, is that the benchmarking
frameworks that were widely available were in various states of I think
maturity, various states of supporting one database or another. Some were just
more accessible than others. I mean, I was a Java guy and spent a lot of time
writing Java. I could understand and reason about how you’d put together OLTP
Bench.

We looked at some of the other things that were popular at the time and this is
written in Tickle or Lua or some other thing. I don’t know what this is. So, I
ended up spending quite a bit of time in there and trying to make it work with
Cockroach and the rest is history. But to this day, it comes up all the time.
Hey, how can we test the database against some other thing? What has now become
BenchBase is what we talk to folks about. But before we go down that path a
little bit more, I wanted ask you another question. I don’t know why. I was back
in my hometown this past week. I was born in Orlando and we were down there for
the Gartner Analytics Summit. It just had me reflecting on childhood a little
bit.

When I graduated college, I didn’t know what I wanted do either I suppose, but I
ended up dual majoring in I think MIS and finance. It’s funny you say big bank,
I realized very quickly I didn’t want to go work for a bank either. So, I stayed
and did MIS. How did you end up getting into databases as a field? I mean, what
was the thing that got you down this path to begin with? I’m just so curious
about how folks end up in the industries they chose. Do you remember what drove
you down this path?

Andy Pavlo:

Absolutely, yes. So, in undergrad, I think a reoccurring theme that I
experienced was that for whatever reason, I understood databases better than
everyone else. My first interaction with MySQL, this is going back to 1999 with
MySQL 3 was I used to work for a… I don’t want to go into details. It was a
sketchy startup. My boss was a crook and he was doing shady stuff and his
business partners fired him. Then they were like, “Okay, well, it was doing web
programming stuff, start working on this new project.” I had to learn what MySQL
was. So, this is when I was in high school. For me, it just clicked and
relational database had made sense.

Throughout my undergraduate career and then when I went off to do a pre-doc at
the University of Wisconsin, the reoccurring theme that I noticed was, for
whatever reason, I seem to understand what databases were doing much more easily
than other people. I’m not saying that I’m smarter than everyone else, I’m the
coach. I realized this was my thing. Then when it came time to go to grad
school, I had the fortune opportunity to hook up with Stan and Stonebraker and
start building this database system from scratch.

Then it’s the same thing. I learned a lot as I went along and I realized that
incredibly, I enjoyed this. It’s fun. My databases are awesome because they’re
in everything. Yeah, it was just one of the things that I just picked up maybe
in my early twenties that this is something that I’m just better at than
everyone else. So, I just keep going and see how far I can go with this.

Tim Veil:

It’s funny you say that. I’ve found a similar thing. I think they rule the
world. I mean they run the world. There’s so very little that’s out there that
in some way, shape, or form, you can’t trace the lineage of what’s happening
back to something being stored in a database.

Andy Pavlo:

I tell my students that think of a two class of software that’s super important.
There’s several categories, but operative systems, of course, web browsers
everyone’s using, and databases. At the university, we don’t really teach a
course at how to build a web browser or we teach a course how to build a
database system. It’s that important. I can’t really think of any other class of
applications above the operating system where you have a whole dedicated course
on how to build specific software type, because as you said, it’s in everything.
They’re not going away. They’re the backbone of every major application.

Tim Veil:

Yeah, I totally agree. So, going back to this idea of OLTP Bench, so you wrote a
paper and please do correct me, because I’m going to buy some of this from
memory, but you wrote this paper where you described this approach to
benchmarking databases. In that paper, you described not only TPCC, which I
think many people would know about, but there were a handful of other well-known
or proposed benchmarks for doing this, which again was super fascinating for me
in my reading just because again, at working for a database company, people are
always like, “How do I test? What do I do?” Can you talk just a little bit about
some of the thinking or thoughts behind creating that? And then I would love to
maybe just talk a little bit about how OLTP Bench, how it’s been constructed,
how it evolved over the years, and then where it is today. Then I think that’s a
nice segue into OtterTune, which I’m really looking forward to hearing about.

Andy Pavlo:

OLTP Bench and I’m sure you guys provide the GitHub link in the description.
That actually was the precursor to OtterTune. That’s a natural segue. So, the
original OLTP Bench project was a collaboration between myself, Carlo Curino,
who was a postdoc at MIT, who’s now at Microsoft, and then Philippe
Cudré-Mauroux. He’s a Swiss, I’m butchering it, but in University of Fribourg
and his student Djellel. So, the MIT guys had this other project, relational
cloud, and they had a Java-based benchmarking framework. Then when I was working
the H-Store, I had my own Java-based benchmarking framework and we realized,
“Okay, we basically have the same thing. Other people implemented the same
thing. Let’s implement this once and for all and see how far we can push it.”

The reason why we had so many workloads beyond TPCs because in academia, it’s
very hard to get access to real workloads to run experiments. If you’re building
a system, it’s a research project, you don’t have customers. You don’t have try
things out on other than synthetic workloads. So, that was the additional side
of OLTP Benches. In addition to building a single framework, we actually then
try to find other workloads that we could then port into OLTP Bench. So, for
example, we have a workload based on what we think of as the access patterns in
Twitter.

We actually took the Wikipedia source code, that MediaWiki PHP source code,
converted their transactions into Java. Then there’s a benchmark for that. So,
obviously, the data’s all synthetic, but it’s based on actually analyzing some
real workload traces and understanding what they actually do. That was the
original motivation of, “Okay, it’s hard to get real workloads in academia.
Let’s just have a suite that has a bunch of them that we can use for all our
different projects.”

Tim Veil:

I think that’s so important because it really started as this way that it was
this extensible way just as you described, to add additional workloads to
simulate new kinds of things. Because certainly in our work, somebody may
describe what they’re currently doing today in a database and we have to do this
pattern matching between what their workload load looks like, what their schema
looks like, what their workload looks like versus something else. Having lots of
different options because not every workload looks like TPCC or looks like YCSB,
which is another certainly well-known one. So, I think one of the really neat
things that drew us initially to OLTP Bench was there was these different
flavors.

So, we could say, “You know what? Okay, I get it. What you’re trying to do is
more like this thing than this other thing and let’s let it rip.” Then the other
thing is there’s tons of customization you could do. So, this ideal of
parallelism, I’m going to launch any number of these threads. I want to generate
this amount of work. So, you could really get in there and fine tune it to be at
least as close as possible to represent some workload to give people a sense of
what it is. Of course, you supported multiple databases, which again was nice.
You happened to support Postgres, which was a close cousin of Cockroach.

Andy Pavlo:

Well, you asked about in the early days, DB sent us patches for their data
system. Cockroach did a lot of work as well. We’ll talk about that later, but we
had some vendors actually send patches for their databases as well. The only
challenge to that one is some of them we can’t test with the cloud platforms or
the proprietary. So, I think we’ve gotten better at making sure we test and make
sure that things aren’t broken for other different databases. But as the goal,
the idea is a single framework is part of different workloads for different
databases. So, you can start doing true apples to apples comparisons, which is
hard to do.

Tim Veil:

Yeah, it’s super, super hard to do. I may have this wrong, but I think over the
years, you’ve had your students participate in working on adding to resolving.
Isn’t that correct?

Andy Pavlo:

Yes. So, yeah, over the years, I’ve had new students get started… I’m not saying
I’m vetting them or as a job interview, but new student comes to me and says, “I
want to work on databases.” So, before we start throwing them on the bigger
project that’s more complicated, we give them something smaller in BenchBase,
OLTP Bench and see how they do with it. Part of the reason we changed the name
for OLTP Bench to BenchBase is we started adding analytical workloads like TPCH,
something like TPCDS. We’re working on that. But new students add new workloads
over the years. So, even though it’s a 10-year-old project at this point, we are
still trying to keep things up to date and added new workloads as we need as
they come along.

Tim Veil:

Yeah, it’s great. Then my recollection of this might be different than reality,
but at one point, I based on the direction of my boss at the time who wanted me
to go and see if I could get OLTP Bench to work with Cockroach, I forked it and
then I have this bad habit of just doing all these things at once. So, I made
all these crazy changes and I think I realized not terribly long after I did
that, there’s no way. Now, the train has left the station. There’s no way these
guys are ever going to want any of my stuff back into OLTP Bench. Then I
randomly got this email one day and I feel like this was a long time after I’d
gone down this path that is either you or one of the students reached out. I was
like, “Hey, we’re trying to merge this stuff.” I’m like, “Really?” Oh, okay. I’d
love to help. That’s awesome.

Andy Pavlo:

You did a hard fork because I think a student found you. I forget what they were
searching for because it’s not like you showed up in the GitHub forklift and
these changes are amazing. We want all these things. Let’s see we can do to get
you to help us put it back in.

Tim Veil:

So that was fun. We worked for a while to get everything back in there, but it’s
been fantastic. I have so enjoyed and I appreciate you all taking a look at the
work that I did, because I’ve so enjoyed learning a lot, not only about
databases, about benchmarking and how to write code. We actually learned a ton
about Cockroach I think in the process, because just as you described, the
ability to run these multiple workloads that have different flavors, different
query patterns, different requirements for indexing, different types of schema,
doing that in an efficient way has been a neat process for us.

Andy Pavlo:

I would say also, recently, Carlo Curino’s team at Microsoft, they’ve been
helping out contributing, not just making work better on SQL Server, but also
fixing other bugs and issues like that. So, they’ve stepped up in the last year
or so and been contributing back to the project as well. It’s good to see Carlo
and his team 10 years that have come back around and still work with it.

Tim Veil:

Yeah. As I’ve said to you over Slack in other places, I’ve been a very bad
helper recently by getting distracted in other things. So, for Cockroach, the
interesting thing has been we’re running all these workloads and we’re finding
things that we can internally do better. It’s been a really good way for us to
identify potential optimizations within our code as a database product. It’s
been a great way and we’ve had a number of prospects and customers start their
journey with Cockroach using BenchBase. But I think that’s a nice segue into
what you’ve been doing on the side, which is this startup that you’ve created
called OtterTune. I did not realize you guys had been at it for as long as you
have. The more I learn and read about it, the more excited I am about what you
all are trying to do. So, maybe we use that as a jumping off point into what is
OtterTune. Because look, we’re named after a database or a cockroach. We have
this funny name. So, now I have this interest in naming. I’d love to know why or
how you came up with this name, but then, yeah, tell us all about what you’ve
been doing with OtterTune.

Andy Pavlo:

There is a connection with BenchBase with OtterTune. I’ll discuss in a second.
The name OtterTune is just a play on the words of autotune. So, my PhD student,
Dana Van Aken, she went to, I think, some zoo, liked otters, got a t-shirt of an
otter, and then she’s like, “Oh, we should call this OtterTune.” I’m like,
“Brilliant, let’s do it.” I think the name, there’s like some acapella group at
some university that’s also called the OtterTunes, but that was our only
competition for this. So, it’s just a play on autotune.

Tim Veil:

Okay. I thought like cockroaches, they’ll survive anything. I thought maybe
otters, there’s something about the animal that was indicative of high
performance or something.

Andy Pavlo:

We did not realize this until later that otters, the animal are vicious animals.

Tim Veil:

Really?

Andy Pavlo:

Oh yeah, go Google. Look on YouTube, type otter tight otter fights X or whatever
another animal. They’ll fight. They’ll look all cute and cuddly, but they’re
vicious.

Tim Veil:

Oh, that’s funny. I didn’t know that. I’ve heard that about pandas by the way,
or is it pandas or koala bears? I think they’re cuddly.

Andy Pavlo:

No, koalas, they’re like brain-dead, because the eucalyptus doesn’t give enough
energy.

Tim Veil:

Oh, really?

Andy Pavlo:

Yeah.

Tim Veil:

So, panda is it.

Andy Pavlo:

So, I think there was a Davis conference in Australia at some point. We went
down and there’s some petting zoo that where you can touch koalas and they
smelled terrible. They barely moved because the eucalyptus doesn’t give enough
nutrients.

Tim Veil:

I had no idea.

Andy Pavlo:

I’ll describe what OtterTune is in a segment. I would tell one quick story about
how vicious otters are. For marketing reasons, we were going to sponsor an otter
at the Pittsburgh Zoo at the local zoo, because one of our investors is the
founder of Duolingo. They sponsored an owl after their mascot at the National
Avery here in the city. So, we’re like, “Okay, let’s sponsor an otter. That
can’t cost that much.” So, when we called the zoo to do it, one was like, “Oh,
yeah, let me go check with the trainers and so forth,” but then she’s like,
“Just so you know, you can’t go into the enclosure with them.”

She’s like, “They’re so vicious that even the handlers don’t even want to go in
there because they’ll fight and kill anything.” But we didn’t know this before.
We were like, “Oh, otters are cute.” Then apparently they’re like murderers.

Tim Veil:

Have you thought of changing the name at all or now it’s just like we’re
embracing the violence of this mascot?

Andy Pavlo:

I think that as long as we’re careful in what we say, again, there’s your
podcast, I don’t want to cause you fired.

Tim Veil:

You just lean into it. You’ll just lean into the name.

Andy Pavlo:

Well, yes, absolutely, yes. So, as I said, so OtterTune, the names that play on
autotune and the idea of autotune, the big picture we’re trying to do is apply
machine learning techniques to optimize database systems. The original research
project at the university was on doing knob configuration tuning. So, these are
runtime parameters that control the behavior of the system. Every database
system has them, even though they’re huge pain. They’re basically buffer pool
sizes, cashing policies, log file sizes, things you can tune as the end user for
how the data system is going to be used by the application.

The reason why these knobs exist is because when the developer is actually
building the database system, at some point, they have to make a decision about
how much memory to allocate for a hash table. Then instead of putting a pound to
find in the source code or some hard code of value, they expose it as a knob,
because they assume someone else who knows more about databases or knows more
about the applications come along and tune it, but it never happens. So, you
just accumulate these knobs over time. So, now, the new version of OtterTune
that we’re releasing, by the end of this podcast comes out, it’ll be out. But
the new version that we’re putting out this year expands and goes beyond knob
tuning.

We’re doing index tuning, we’re doing query tuning. We’ll talk about why that
matters as well, but the original research project of OtterTune was just doing
knob tuning. So, how this relates actually to BenchBase or OLTP Bench was one of
the things that we were building out when I started at CMU in OLTP Bench was the
ability to collect the runtime metrics from the database system like the
internal telemetry that every system generates, pages read, pages written, locks
out and so forth. OLTP would then uploaded automatically to a website to keep
track of these things. We were heavily inspired by the code beat project from
PyPI, thinking of continuous integration, keeping track of performance of things
over time.

What I really wanted was I wanted to be able to run experiments and then store
it all in a single repository. Click one button to make me the graph that I
could put in my research papers. That was what I really, really wanted. Then
from there, I realized, “Oh, well, okay, if you have all this telemetry, what
can you start do to do with it?” My PhD thesis was on using ML-like techniques
before ML was a big thing, but basically automated techniques to optimize and
store multi databases. But as I said, the big challenge I was facing was that
it’s impossible to get real workloads from customers. So, we were relying on the
synthetic workloads in OLTP Bench.

So, the idea then with original version of OtterTune was well, “What can I do or
what can I optimize in a database system automatically using a machine learning
or automation technique that does not require the database, does not actually
need to look at the queries?” So the idea was that by this runtime telemetry,
that is a stand in or represent representation of what the workloads actually is
without actually seeing the workload. So, we were trying to use that as the
signal to decide how to optimize the system.

Also too, at the time, there hadn’t been a lot of work on doing automated knob
tuning. There was maybe some work done it in the mid-2000s, but it wasn’t the
long history of research projects like an index tuning or query tuning. So, then
the OLTP Bench project, that website then morphed into the first version
OtterTune, where we just used simple machine learning techniques to tune basic
knobs and then it got more sophisticated as over time went.

Tim Veil:

Yeah, I remember making some of my changes. I remember what the hell is
uploading. What is this, IP address? Where is this going? I think one of the
first changes I made is I’m ripping this out. I don’t know what this is doing
here. I can’t use this, but now it makes sense what you all are doing.

Andy Pavlo:

So basically, what happened was we published the first paper about OtterTune in
Sigma, which is the top conference in databases. Yeah, the other academics were
like, “Oh, this seems cool, this seems nice,” but nobody in the industry paid
attention to it. Then we met the guy that runs all of Amazon’s machine learning
division or group, whatever you want to call it. He came to Carnegie Mellon. I
got five minutes at a time just to thank him for giving Dana, my PhD student a
couple thousand dollars to run her experiment with EC2.

He was like, “Great, can you write a blog article for us? We just started the
new Amazon AI blog. We need material.” So, we just converted the Sigma paper
into a blog article and that got published in Amazon site. Then that’s when
everyone started emailing us and say, “We have the exact problem. We’ll give you
money to fly student out, set up OtterTune for us.” This happened so many times,
we’re like, “Okay. Clearly, there’s a signal here. We should go do a startup.”

Tim Veil:

Well, it’s funny. I think it’s such an interesting thing, because again, I’m
more of a blue collar approach to this, but obviously, in the applications I’ve
been building and teams I’ve run and even at Cockroach when we’re working on
evaluations for folks, people are always trying to squeeze as much out of the
database as they can. We do this today too. We start at the schema as the
schema. What do your queries look like? Are those properly tuned? Do we have
indexes? But I know Cockroach does this and I know many databases I’ve ever
worked with have hundreds of these knobs as you call them to tune, changing this
parameter. You’re out there searching and it’s like, “Oh, somebody said do this.
Somebody said do that.”

You end up with this list of 5 or 10, 15, 20 things I’m going to adjust, but I
have no idea where this is working. You run something. It’s like, “Well, it
didn’t break so maybe somewhere it’s going to be better.” So, I find this
fascinating. How do you all determine what knobs to even touch and how do you
set expectations about whether or not this is even a relevant or related change
to whatever I’m trying to do? That to me was one of the biggest questions I had
as I was reading through. How do you know what would make sense?

Andy Pavlo:

So, the first step, as you said, you got to figure out what knobs to actually
tune. Let’s be honest. So, I think it’s about 500 knobs in MySQL, 400 knobs in
Postgres, but not all of them, things you actually would want tune
automatically. They’re the directory names, file names, port numbers. If you
tune those, the system doesn’t work. So, we obviously put those in a denial
list.

Tim Veil:

We really like this port over here. Why don’t you try this?

Andy Pavlo:

Yes. So, then the next step is you do have to do some manual curation to find
knobs or particular values for knobs that could affect the safety of the
database. Because the most obvious thing is if you turn off disc rights or
calling S-Sync to flush the log to disk when you commit a transaction, the
machine learning algorithms find out really quickly of not writing things to
disk means your database goes faster. But if you now crash, you lose the last 10
seconds of the log, you lose all your data. There’s an external cost that the
machine learning models or algorithms can’t reason about. So, a human has to
come in and say, “Okay, well, we don’t turn off S-Sync,” because that’s an
external cost the algorithms can’t even reason about.

So, then the next step is we basically did a random walk, if you will, of just
trying out different knobs in different workloads, different situations just to
collect the training data. Then you run a ranking algorithm to figure out which
knobs you think have the most impact on performance. It actually works
reasonably well to no surprise if you’re on Postgres, it’s inner DB buffer pool
size. Postgres is shared buffer size. The buffer pool is almost always the
biggest thing. Because if you’re writing from disc, your performance is
terrible. So, we basically looked at the list, it seemed reasonable. It’s hard
to say whether or not one’s different than another. In some situations,
actually, depending on the workload too, the ranking might differ, but it’s
usually not for the top 8 or top 10, almost always the same.

So, that gives you the knobs you think you should target first. The original
version of OtterTune would do an incremental approach of maybe tuning 5 knobs,
let it go for a little while and then tune 8 and then to 10, expand it, because
the knobs that have the most impact will give you the most benefit right away.
There are other methods in the academic literature that try all at once using
deep nets or reinforcement learning. So, there’s proof techniques for maybe
looking at a wider number of knobs than the version of OtterTune did. Then your
next question is, okay, how do you determine what you think is going to make a
difference or tell the user or what do you want to know? Do you want to know how
do the albums figure out this is the knobs I should tune this way?

Tim Veil:

Yeah. Not only that, but then I would imagine I do something and it could have a
negative impact for some reason. So how is the system then making sure that I’m
making forward progress continually is an interesting thing?

Andy Pavlo:

So this would be a good difference between the academic version and the
commercial version. So, in the academic version, we made the assumption that
people would not tune production databases with something like OtterTune,
because it’s machine learning, the models have to learn. So, that means that
there may be times where it tries out things that make performance gets worse.
Then we also assume that because you’re not running production, that you’re
running on a clone, that you would’ve a workload trace, that you can repeatedly
run over and over again to see whether they’re making things better. Until the
models converge, you think you have the best recommendation and then you can
then apply it to the production database.

Again, when we were at the university, everybody we talked to could do this. We
did a deployment at a major bank in France. They had a whole team that could set
up things on clones. We talked to the patent office. We talked to other pretty
big companies that could do this. Since we commercialized it, we’ve realized
that not everyone can do this. Most people cannot capture our workload trace.
Here’s all the sequel queries executed and then run that on the side. Then most
people also don’t want to set up another clone or a machine, because that that’s
expensive to run experiments for OtterTune to tune. So, in the commercial
version, we’re actually tuning the production database directly.

So, that means that we have to be more cautious in the recommendations we make
and set up more guardrails to make sure that things don’t go wrong. So, that
just means that it’ll be less aggressive in exploring the solution space and
that we try to use some training data we collected from previous databases to
help make sure that the first things we try out for your database is not way out
of line, way out of whack.

Tim Veil:

Now I’m always surprised when I go and visit prospects or customers and maybe
not even so much now, just in previous work about just quite honestly, people’s
existing knowledge of databases or whatever technology it is. I think sometimes
you assume that people who have been in a company for many years and working on
a product have this deep knowledge. Well, it’s not true necessarily. There’s
folks out there that are doing important things and maybe don’t understand all
the details.

I’m curious to the extent that you can share, is there, “Hey, we walked into one
place and we were able to do this amazing work”? In other words, is there an
outlier there where OtterTune came in and just blew somebody’s doors off because
their existing setup was so potentially problematic? I’m just curious what the
average-

Andy Pavlo:

This doesn’t sound like I’m bragging, so I don’t want to come off that way or
being pretentious, but we have found that for knob tuning, the algorithms work
better in the real world than they did in the university. Because as a baseline,
when we did our experiments for research projects, we assume that people maybe
have done a bit more tuning than they actually do. We assume that the user would
be a bit more sophisticated than we are finding. Now, I think in both cases
though, there’s people that have done almost no tuning or actually zero tuning.
We’ve had customers tell us, since we’re targeting Postgres, MySQL, running on
Amazon, people have told us that they thought Amazon had already tuned their
database for them. They’re not.

Jeff Bezos is not doing anything in a database. But even places where they had
in-house DBAs that have tuned the database, the OtterTune algorithms can still
carve off another 20, 25% improvement, because it’s just hard. There’s so many
different things you have to deal with. Even if you have full-time DBA, if you
have a lot of databases, which you would have if you’re going to pay for a DBA,
they’re doing so many other things that they don’t really have time to tune
every single database for exactly-

Tim Veil:

Look, I mean as somebody who works with databases every day and represents a
database company, our own documentation and our own list of knobs, it’s
incredible. It is daunting to understand and fully appreciate what each and
every change does. So, that doesn’t surprise me because it’s hard enough to do
all the other stuff right, to get into these esoteric settings is a whole other
ball of wax. You had said though that you know started with knobs, but you all
are moving into index tuning, query tuning, and the like. What’s that look like?
The reason I’m asking is I’m curious as you’ve gotten into that, where’s the
biggest bang for the buck? Is it schema and index tuning? Is it query plans? Is
it all of this stuff? Have you even measured that? What are you doing and what
are your thoughts on that?

Andy Pavlo:

So, we haven’t measured it, but it varies. For some applications, the knob
tuning is they have all the right indexes. The query plans could use some work,
but the biggest win they can get right away is knob tuning. In other cases, if
you don’t have the right indexes and all your queries are sequential scans or
you’re just doing terrible nested loop joins, then we can tune the knobs all
day, but we can’t magically make your query run faster, because it doesn’t have
the index. So, I would say it varies, but even then everybody needs everything.
You know what I mean? We have not come across any database like, “Oh, my God.
Here’s your money back. You don’t need us.” There’s always something.

The reason is because databases aren’t static in that they’re ingesting new
data, but also upstream, the application’s never static. The only place we’ve
ever come across where someone has told us, “Yeah, our application hasn’t
changed in three years,” was the patent office, rightfully so. What changes? But
everybody else is like any other company, they’re putting out new features based
on what customers want and so forth. So, new features mean new queries, new
complexity, and new data for the database. So, in that case, people struggle
with understanding how as their application evolves, the configuration, the
tuning set up for the database evolves over time as well.

There’s other things too where people may know how to do certain things, but it
just falls through the cracks or people forget to do it. So, one example would
be we had somebody where they had backups turned off on their production
database. The reason is because that database used to be the staging database
and then they were going from MySQL 5 or 7 to 8 or something like that. So, when
they upgraded to 8, it made the staging database, the production base, someone
forgot to turn on backups.

So, things like that, it’s really just hard to know what are some of the best
practices for databases, because again, most of these companies don’t have DBAs.
It’s developers. It’s people who are writing application code and it’s somebody
who may be set up the database at the last job. They’re responsible to set it in
database at this job, but they’re doing a bunch of other stuff. So, there’s just
too many things going on.

Tim Veil:

Yeah, it’s always somebody else’s responsibility. We see that a lot and I’ve
always seen that. It’s interesting.

Andy Pavlo:

So, the new version of OtterTune that we’re putting out now is at its core, it’s
still trying to optimize performance, efficiency of the data through knob
tuning, indexes, query tuning, and so forth. The way we’re pitching it is we’re
selling peace of mind. I realize it’s a fuzzy term to say about your database
and especially as a scientist. It seems like a marketing thing and it’s hard to
actually quantify what does it mean to be a peace of mind for your database. But
this is the thing that people tell us is that they just don’t know what they
don’t know about their database and they don’t know what they should be doing.

So, the new version of OtterTune does all the things I said before, but it’s
also checking to make sure your database is set up correctly, like the backups
example. So, as your application evolves over time, OtterTune is there with you
seeing how it evolves and making suggestions along the way. So, the original
version of OtterTune, at least to the academic project, was like, “Okay, you
tune once and then you’re done.” But it’s really this long-term lifecycle of the
databases. That’s really what people need help with. That’s where we’re going.

Tim Veil:

Again, something we see play out over and over and over again, because once the
people understand a particular technology, they tend to use it for everything,
whether they should or shouldn’t. So, we’ve seen certainly many instances where
database design and like sized provision for a certain task or workload all of a
sudden is accepting work from some other thing and it’s just no longer suited
for that. So, yeah, that makes total sense to me. This ability to go in and
constantly reevaluate whether or not things are set up correctly based on new
workloads makes a ton of sense. I’m curious just on that, because again, this is
a topic that comes up for us a little bit, is this idea of multi-tenancy.

I don’t know if that makes sense or how you think about that. It’s again, this
idea that maybe a single database instance is used to serve multiple workloads,
maybe different applications out of different schemes. Do you guys run into
that? Do you have challenges with that? I’d imagine that’d be a tricky thing. Or
are you just saying, “Hey look, that’s not exactly how we would advise you to
set things up anyway,” which is a totally fair response, by the way.

Andy Pavlo:

So, the current version of OtterTune, we see the metadata about what it is, but
we’re not differentiating which of the databases are being taxed the most. The
thing that we’re working on now is looking at the fleet of databases
holistically. So, what I mean by that is not just tuning this one instance, let
me get the best performance of that one thing, but it’s really understanding how
that instance interacts and is related to other databases in the fleet. Now some
things are obvious. You have the replication set up. You know what the primary
is or the reader or writer. The Amazon sees that relationship.

But oftentimes we see where there’s implicit relationships and there are some
optimizations or recommendations you can make if you understand how database are
related to each other. So, a classic example would be staging in production. So,
Amazon doesn’t know the staging base or the production database, but because we
actually can see the schemas of the database, we can identify, “Oh, they’re
actually the same.” Therefore, if we see a schema migration happen on the
staging database, we can learn, “Okay, it’s going to get applied to the
production database in one week, two week. We can ask the user when they’re
going to apply it.”

You can start making recommendations for like, “Okay, you’re going to add a
column. That’s expensive, do it for your database, do it at this time. We won’t
interfere with things,” or “You’re renaming a column. That’s cheap. Do it at
this time.” So, you can start making those recommendations and understanding if
you understand how these things are related at a logical application level.
Another example would be we had a customer where they deployed their application
in two different locations. So, it was two different front-end applications, two
different database instances. One in the US, one in the EU, same schema, same
workload, same application code, just different physical instances.

Amazon doesn’t know that they’re related because they’re not replicating to each
other. But then what happened was they found that the query latency on the EU
database was 10X slower than the US one. It’s because they forgot to add an
index that they added to the US one, they didn’t add it to the EU one. So, the
new version of OtterTune can identify, okay, these schemas are the same and
prompt the user and say, “Hey, look, you added this index. You really should be
adding it over here too because we think they’re the same. Yes or no?” So again,
none of that is not machine learning because it’s just identifying that these
things are related, but it matters a lot. This is the new version of OtterTune
that it’s trying to provide.

Tim Veil:

Well, it’s like you say though, it’s peace of mind. You’re running a complex
system. You’re doing tough stuff and this is hard. It’s hard to be an expert at
everything. So, having that idea that somebody can be out there auto or otter
tuning this is pretty neat. I know we’re running up toward the end of the time
we have allotted. I wanted to just ask you though, we’ve been talking a lot
about the technology, but obviously starting a business, starting a startup
company, it can’t be for the faint of heart.

I know I’ve been privileged to watch from the inside as we’ve been trying to
build Cockroach. I’m just curious on your take on trying to build this thing and
getting investors and getting customers, what’s that journey been like for you?
If you want to share, you certainly don’t have to, but I’m just curious, less
technical and more business, what’s this been like?

Andy Pavlo:

I mean, definitely scratching a niche. Many academics do startups. I was very
fortunate that my fellow co-founders or my former students that I’ve worked with
before on OtterTune, so Dana Van Aken is the CTO. She did her PhD with me at
Carnegie Mellon and then Bohan Zhang’s also co-founder. He did his master’s
degree at CMU and he worked on the first version of OtterTune. So, that was
fortunate that I could go into the company, get it started with people that I
worked for and I trust, and they’re smarter than I am. They just don’t know it.
Then when we raised money, we raised money at the beginning of the pandemic,
which is choppy waters, but we used the funds that we got in the very beginning
to go hire my best former students.

Again, I’m very fortunate that they came along with me and they’ve been
fantastic. That part’s been good, but I get to work with the students that I
liked at the university with me at the company. I mean there’s ebbs and flows,
of course. There’s lows and highs. I think the one thing I would say coming from
academia, the thing that I underappreciated was the importance of marketing,
sales, and operations. It’s one of the things you don’t know what you don’t
know. So, now that we have people that are helping most of these things, I’m
like, “Okay, now I see it.” Our operations manager’s fantastic. She does a bunch
of stuff for me and it has been a huge lifesaver. So, that part, if I knew that
sooner, I would’ve hired people like that earlier.

I think that would be the one thing I’ve learned the most. I’ve enjoyed it
because I’m seeing more databases, more things, more real use cases than I
would’ve just building stuff in the university. In some cases, there’s been
problems that have come up with customers that OtterTune’s not really going to
solve in a position of solving this. But then there’s things like, “Oh, this is
a hard problem and this does then guide my own research back in the university.”
So, one example would be proxies like PgBouncer or OtterTune or Yandex. I don’t
think that people can even realize how widely deployed these things are.

Tim Veil:

Everywhere.

Andy Pavlo:

Everywhere.

Tim Veil:

They’re everywhere.

Andy Pavlo:

Nobody does any research on them and they’re actually not very good. They’re not
very high performance. The Yandex one is pretty good. I have a PhD student
working on proxy stuff now because I saw them a lot in OtterTune. It’s not a
problem OtterTune can solve. So, we’ve been looking at this now.

Tim Veil:

Can I give you just a little bit of an anecdote there?

Andy Pavlo:

Yes.

Tim Veil:

We see PgBouncer everywhere. It frustrates us because the documentation isn’t
great and nobody really understands how to tune it, but it sits in the middle of
every single connection to the database. So, if that thing goes sideways or is
suboptimal, it can look like the database is not doing its job and that may not
be the case. So, that’s really interesting because it is pervasive in Postgres
obviously and it is not well understood. It’s not well understood here. Again,
in part just because the whole thing seems very opaque, like the documentation,
everything. So, that’s really fascinating.

Andy Pavlo:

We’ve come across people that run two or three layers of PgBouncer. PgBouncer
talks to PgBouncer then talks to PgBouncer then talks to Postgres. It’s insane.

Tim Veil:

Yeah, it’s everywhere.

Andy Pavlo:

Yeah. So, I think, like I said, doing a startup plus I’m now teaching again
full-time at the university. I’m back. I did a one-year leave of absence and
then plus having a three-year-old daughter, I don’t know how long I keep doing
this, but I enjoy it.

Tim Veil:

All right. Well, first of all, I know this is mostly audio only, but for those
on video, I’ll probably get shot if I don’t ask what is behind you there. At
least in my view, there is a mannequin. I feel like you owe it to the people who
may be on video to explain what that is.

Andy Pavlo:

Yeah, actually, that’s Little Billy. That’s actually the child mannequin I got
when I was in grad school, it’s how I proposed to my wife for marriage. We can
go to that story if you want, but then when I came to Carnegie Mellon, she’s
like, “You can’t bring that child mannequin anywhere until you get tenure
because it creeps people out.” So, I got tenure last year. All right, I taught
Cindy or vectorized instructions in my database class. So, we had to use the
mannequin as a prop.

Tim Veil:

That’s so awesome. Well, we’ll definitely do the proposal story on part two of
our conversation. So, I’ve been ending a lot of the podcasts like this and
because I know we’re running out of a time, for us, it’s like the beginning of a
new fiscal year. It’s spring. It’s like this time of optimism. I’m just curious,
what are you looking forward to this year? I mean, you’ve got a lot going on.
You’ve got a daughter, you’ve got this startup, you’ve got things at the
university. What’s exciting? What are you looking forward to this coming year?

Andy Pavlo:

Yeah, so I think I was very COVID cautious and I’m actually starting to travel
more and go visit places and give talks so that I’m looking forward to seeing
all my database friends at universities and other places that in ways that
hadn’t done it in recent years. My daughter, she can’t program yet. If you ask
her what her favorite programming language is, she says SQL. So, that’s so far
so good there. I’m looking to see how she grows. Then my wife is fantastic, so I
want to spend more time with her as much as possible. It’s hard with a
three-year-old. On the database side, I think with the economy looking dicey,
I’m interested in seeing what the startup landscape looks like for database
companies.

I realize you’re at a database startup, although you guys have been around for a
while, but I think it’s going to shake out a lot of the weaker companies and
introducing what that looks like in this year. I don’t think there’s any
exciting hardware on the horizon that Intel’s putting out or the video’s putting
out that we’ll have to change how we think we build database systems. There’s
nothing really, as far as I know that’s going to change anything.

Then the large language models and ChatGPT stuff, I think that’s super
fascinating. We’ve tried using it to tune databases. It actually doesn’t always
work because it doesn’t know what your database is actually doing. If you asked
it for certain things, it just regurgitates the Stack Overflow, which isn’t
always correct. But I’m really interested in seeing where that goes next in the
context of databases.

Tim Veil:

Yeah, we ran into a company at Gartner this week that was using it as an
interface to query databases, which was really fascinating.

Andy Pavlo:

At some point, I want to start building something new. I don’t know what it’s
going to be yet. I don’t have any free time, but I was like, I might do
generative art for databases. You upload your schema and it makes a pretty
picture or something like that using Midjourney or Dall-E. We’ll see.

Tim Veil:

Well, what I’m going to do after this, because I did this for something at our
recent sales kickoff, I want to see what Dall-E or one of these things says for
an otter fighting a cockroach.

Andy Pavlo:

Yes.

Tim Veil:

See what kind of savagery is revealed there. Well, listen, I’ve always enjoyed
our chats and certainly enjoyed the work that you’ve done and the projects that
you’ve started. I think what you all are doing at OtterTune is incredibly
fascinating. So, I appreciate you joining and telling us a little bit about
that. If you are willing, I’d love to have you on again sometime in the future
and we can talk about even more interesting things.

Andy Pavlo:

Absolutely. Like I told you, I can do this all day. I really appreciate you
having me.

Tim Veil:

Yeah, it’s been great. So, thanks again, Andy. We’ll talk soon.

Andy Pavlo:

Okay, thanks, Tim. See you.

Tim Veil:

Thanks again for listening to this week’s episode. If you’re a fan of the show,
be sure to subscribe to the podcast to get every new episode in your feed as
they’re available. Also, rate us five stars on your favorite podcast platform.
If you like what you heard, you can also watch Big Ideas in App Architecture on
our YouTube page, linked in the description. Thanks again. Bye.

Big Ideas in App Architecture

A podcast for architects and engineers who are building modern, data-intensive
applications and systems. In each weekly episode, an innovator joins host Tim
Veil to share useful insights from their experiences building reliable,
scalable, maintainable systems.

Tim Veil

Host, Big Ideas in App Architecture

Cockroach Labs

Latest episodes

Mastering Multi-Cloud with PwC’s Erol Kavas

Erol Kavas

Director at PwC Canada

From FedEx to Five Guys: Designing digital experiences with Yext’s VP of
Software Engineering

Matt Bowman

VP of Software Engineering at Yext

Reliability and scalability in a data-driven world with Fivetran’s VP of
Platform Engineering

Mike Gordon

VP of Platform Engineering at Fivetran

Enabling a data-driven and innovative engineering culture at Amplitude

Shadi Rostami

SVP of Engineering at Amplitude

How Estée Lauder scales strong engineering culture

Meg Adams

Executive Director of Platform Engineering at Estée Lauder

Can I take your order? Building conversational AI to improve the customer
experience

Akshay Kayastha

Senior Engineering Manager at ConverseNow

Engineering resilient systems: Rescuing old treasures and unleashing modern
capabilities

Marianne Bellotti

Author, Engineering Leader, Systems Geek

The Full Package: How Route architects its all-in-one post-purchase platform

Siddhartha Sandhu

Engineering Manager at Route

A historical journey in developer technologies

Mike Willbanks

CTO at Spark Labs

From Legacy to Cloud: Success stories from migrating mission-critical
applications

Kishore Koduri

Senior Director of Enterprise Architecture at Ameren

Building purpose-driven engineering cultures

Jason Valentino

Head of Engineering Enablement at BNY Mellon

Modernizing Insurance Application Architecture at New York Life

Mike Murphy

Corporate Vice President and Life Insurance Domain Architect at New York Life

Innovation and Disruption: How Materialize pioneered a new era in data streaming

Arjun Narayan

Co-Founder and CEO at Materialize

Stories from an SRE: How Hans Knecht builds better developer experiences

Hans Knecht

Cloud Consultant at Knechtions Consulting (Ex: Capital One; Ex: Mission Lane)

Inside Chick-fil-A’s infrastructure recipe for a perfect customer experience

Brian Chambers

Chief Architect at Chick-fil-A Corporate

Modernizing from the Mainframe: An Exploration of Distributed Systems

Chris Stura

Director, PwC UK

IoT Standards & Data Mesh: Utility Facility App Architecture

Grant Muller

Vice President, Applications and Technology Architecture at Xylem

Relational Data Problems: Doubble Dating Application Architecture

Mattias Siø Fjellvang

CTO & Co-Founder at Doubble

From Legacy Systems to Limitless Scaling with Paycor’s Systems Engineering
Fellow

Adam Koch

Systems Engineering Fellow at Paycor

How to Understand Problems & Build Better Software with Technical Leader Joe
Lynch

Joe Lynch

Technical Leader

Observability in the Cloud & Dataflow Modifications with Yolanda Davis from
Cloudera

Yolanda Davis

Principal Software Engineer, Data Flow Operations

Early Days at Google & Building CockroachDB with Peter Mattis

Peter Mattis

Co-Founder and CTO of Cockroach Labs

Database Benchmarking Efficiency with OtterTune’s Andy Pavlo

Andy Pavlo

Associate Professor of Databaseology at Carnegie Mellon and Co-Founder at
OtterTune

Observability & Statelessness with TripleLift’s Chief Architect

Dan Goldin

Chief Architect at TripleLift

Understanding AI: PubNub CTO Stephen Blum’s Key to Faster App Development

Stephen Blum

PubNub

Building reliable systems with DoorDash's
Matt Ranney

Matt Ranney

DoorDash

Real-Time Data Capturing: The Future of Fitness Technology

Paul Lawler

Head of Software at Wahoo Fitness

Building Efficient App Architecture with Alloy Automation’s Gregg Mojica

Gregg Mojica

Co-Founder and CTO Alloy Automation

Unleashing the Power of Hiring Software with Greenhouse CTO Mike Boufford

Mike Boufford

CTO at Greenhouse Software

Decoding Data Warehousing: Insights from Ken Pickering, SVP of Engineering at
Starburst Data

Ken Pickering

Senior Vice President of Engineering,
at Starburst Data

PRODUCT

* CockroachDB
* CockroachDB Dedicated
* Pricing
* Get CockroachDB
* Sign In
* Download

RESOURCES

* Guides
* Video & Webinars
* Podcast
* Compare
* Architecture Overview
* FAQ
* Security

LEARN MORE

* Docs
* University
* GitHub

SUPPORT CHANNELS

* Forum
* Slack
* Support Portal
* Contact us

COMPANY

* About
* Blog
* Careers
* Customers
* Partners
* Events
* News
* Trust
* Privacy
* Legal Notices

Ask AI

COOKIE CONSENT

We use cookies to personalise content and ads, to provide social media features
and to analyse our traffic. We also share information about your use of our site
with our social media, advertising and analytics partners.Privacy Policy

Cookies Settings Reject All Accept All Cookies

PRIVACY PREFERENCE CENTER

When you visit any website, it may store or retrieve information on your
browser, mostly in the form of cookies. This information might be about you,
your preferences or your device and is mostly used to make the site work as you
expect it to. The information does not usually directly identify you, but it can
give you a more personalized web experience. Because we respect your right to
privacy, you can choose not to allow some types of cookies. Click on the
different category headings to find out more and change our default settings.
However, blocking some types of cookies may impact your experience of the site
and the services we are able to offer.
More information
Allow All

MANAGE CONSENT PREFERENCES

STRICTLY NECESSARY COOKIES

Always Active

These cookies are necessary for the website to function and cannot be switched
off in our systems. They are usually only set in response to actions made by you
which amount to a request for services, such as setting your privacy
preferences, logging in or filling in forms. You can set your browser to
block or alert you about these cookies, but some parts of the site will not then
work. These cookies do not store any personally identifiable information.

TARGETING COOKIES

Targeting Cookies

These cookies may be set through our site by our advertising partners. They may
be used by those companies to build a profile of your interests and show you
relevant adverts on other sites. They do not store directly personal
information, but are based on uniquely identifying your browser and internet
device. If you do not allow these cookies, you will experience less targeted
advertising.

PERFORMANCE COOKIES

Performance Cookies

These cookies allow us to count visits and traffic sources so we can measure and
improve the performance of our site. They help us to know which pages are the
most and least popular and see how visitors move around the site. All
information these cookies collect is aggregated and therefore anonymous. If you
do not allow these cookies we will not know when you have visited our site, and
will not be able to monitor its performance.

Back Button

PERFORMANCE COOKIES

Search Icon
Filter Icon

Clear
checkbox label label
Apply Cancel
Consent Leg.Interest
checkbox label label
checkbox label label
checkbox label label

Confirm My Choices

www.cockroachlabs.com Open in urlscan Pro 2a05:d014:58f:6201::1f4 Public Scan

Form analysis 3 forms found in the DOM

Text Content

www.cockroachlabs.com Open in urlscan Pro
2a05:d014:58f:6201::1f4 Public Scan

Form analysis
3 forms found in the DOM