www.dolthub.com Open in urlscan Pro
2600:1f14:9ee:eb04:7998:d0c5:dea7:5082  Public Scan

Submitted URL: https://databaseversioncontrol.com/
Effective URL: https://www.dolthub.com/blog/2021-09-17-database-version-control/
Submission: On July 10 via automatic, source certstream-suspicious — Scanned from DE

Form analysis 1 forms found in the DOM

<form data-cy="mailing-list-form" data-hs-cf-bound="true">
  <div class="mailing-list__form"><input type="email" placeholder="your email address" data-cy="mailchimp-input" class="mailing-list__form-input"><button class="Button_button__mradH Button_white__ThttD" type="submit" data-testid="default-button"
      data-cy="mailchimp-submit-button" disabled="">Submit</button></div>
  <div class="mailing-list__message" data-testid="submitting-message"></div>
</form>

Text Content

DatabasesPricingDocumentationBlog

Discord17.4kSign In

Blog


SO YOU WANT DATABASE VERSION CONTROL?

Tim Sehn
September 16, 2021
REFERENCE
6 min read

Database Version Control is a poorly ranked Google search. For me, it starts
with a horizontally scroll-able section on code version control: irrelevant.



Then, next up is ads for Redgate, Liquibase, and verta.ai. The rest of the
non-sponsored results are a mixture of Liquibase, Redgate, and some not very
useful articles about why you need to version control your database. Liquibase
and Redgate offer schema migration tools, version control for your database
schema. Google maps this search term to a specific aspect of database version
control. In 2021, this query response to "database version control" is both too
specific and antiquated.

We have new technology! We now have true database version control, versioning
both schema and data. This blog surveys the space and attempts to answer the
query more aptly for the state of technology in 2021.


WHAT DO YOU MEAN BY DATABASE VERSION CONTROL?

What do you mean by database? Do you mean relational databases? If relational,
do you care about schema or just data (or vice versa)? For most of this blog
article, we're going to assume you mean a SQL database though towards the end, a
graph database will make an appearance.

What do you want to version control? Do you care about rollback? Do you want
time travel? Lineage, who changed what and when? Diff, do you want to be able to
quickly calculate the difference between versions of the schema or data?
Branch/merge, multiple editors making concurrent long running changes? What do
you want to do when a conflict happens? What's your conflict detection and
resolution strategy?


SCHEMA

The term of art for version controlling database schema is database migrations.
You start with a core schema, check that into version control, and then any
schema changes you do, you apply as a patch to that schema, also checking that
into version control.

At DoltHub, we run a PostgreSQL database backing the DoltHub website and we
check the schema and schema patches into Git. As part of our development
environment set up and production deployment process we reference this schema
and patches. This is common practice.

There are a bunch of tools that help with various pieces of this process if you
don't want to roll your own. For this article, we focused on some of the most
popular and interesting tools, Liquibase, Redgate Deploy, and Planetscale. We
dive a bit deeper into what they do so you can get a clearer picture of the
space.


DATA

If you want your application to be aware of data versioning, you can create some
semblance of version control with schema. The term of art for this is Slowly
Changing Dimension. This is not true version control. You concoct a schema using
columns that indicate inactive or active or "active from" to "active to" dates.
Version number or timestamp columns on rows can also be used. Then, your
application uses these columns to construct present and historical views of the
data. Your application never deletes, just changes state from active to
inactive, called soft deletes. I don't think anyone would consider this true
version control as the application or an operator can still make a mistake and
do really bad things. But, a number of databases are designed this way to give
version control like features to their applications.

If you just want rollback, backups and transaction logs are supported by every
database solution. Maybe you take a nightly backup and you keep transaction logs
since the last backup. A mistake that requires using these backups is usually
corrected in hours at best but most likely days. This AWS Elastic Load Balancer
outage is case in point. True version control means rollback on the order of
seconds.

We have SQL "as of" syntax. This is in the SQL standard as of 2011 (pun
intended) and supported by Oracle, Microsoft SQL Server, and MariaDB. With this
syntax you can configure a table to be versioned or a "temporal table" and then
query a version using as of <timestamp>. You don't get diffs and merges but you
do get instant rollbacks.

We now have the technology to fully version (ie. lineage, branch, merge, diff)
large data! Noms laid the groundwork with a core storage format called a prolly
tree, a content addressed binary tree. This data structure allows for fast diff
and merge on binary trees, the core data structure in most databases, without
compromising too much read or write performance. Recently, a couple new
databases have emerged using this or similar technology, TerminusDB for graph or
document databases and Dolt for relational databases.

We're biased but we think database version control means full Git-style
versioning, branch, merge, and diff, of schema and data.


THE OPTIONS

With that introduction, let's consider our options. If we missed you, please
email me at tim@dolthub.com. We broke the tools down into two categories:
Database Migration Tools and Version Controlled Databases.




DATABASE MIGRATION TOOLS


LIQUIBASE

Tagline Version Control for Databases Initial Release April 2012 GitHub
https://github.com/liquibase/liquibase

Liquibase formalizes your database migration language with a configuration
language for database schema and alterations. The examples lean into XML but
further digging say they support SQL, JSON, and YAML as well. Using something
other than SQL makes Liquibase cross platform. You can use the same Liquibase
XML descriptors to migrate from say, PostgreSQL to MySQL, which is nice.

The Liquibase magic is how it applies these changes to your database. Liquibase
supports rollbacks and preview.

Liquibase has been around for a decade and enjoys high relevance in this
category. But there's been a lot of innovation in this space, so keep reading
and check out what's new. If you're looking for the leading database migration
tool, Liquibase is for you.


REDGATE DEPLOY

Tagline Source Control Your Database Initial Release Earliest Mention 2014
GitHub Not Open Source

Redgate deploy is for Microsoft SQL Server only. The main selling point is its
integration with the Microsoft Developer ecosystem including Visual Studio. The
approach is the same as Liquibase, define your schema and alterations and apply
those in a principled way.

The one feature that seems cool from a versioning perspective is that Redgate
deploy allows for configurable version control of "static table data". From the
documentation, this feature is "intended only where the table has fewer than a
thousand rows". If you want to version more than 1000 rows, keep reading.


PLANETSCALE

Tagline The database for developers Initial Release March 2016 GitHub
https://github.com/planetscale

Planetscale is awesome and takes database migrations to the next level. Imagine
if someone ran the database for you and controlled how schema patches were
applied. What would be possible?

Planetscale is run by the good folks who wrote Vitess. Vitess is an open source
"database clustering system for horizontal scaling of MySQL". Dolt is a heavy
user of Vitess' MySQL dialect parsing code. We wouldn't be here without them.

So, on Planetscale, you get all the schema branch/merge functionality of
Liquibase or Redgate. Additionally you also get a world class, modern deployment
environment for your changes. On the downside, if this is a downside for you,
Planetscale runs your database for you. Planetscale is MySQL only so if you have
some aversion to that database format, that's also a downside.

Note, for the release date, I went with the first release of Planetscale's fork
of Vitess. I'm not exactly sure when Planetscale in its current form launched
and my cursory research didn't turn it up.


VERSION CONTROLLED DATABASES


TERMINUS DB

Tagline Making Data Collaboration Easy Initial Release October 2019 GitHub
https://github.com/terminusdb/terminusdb

TerminusDB has full schema and data versioning capability but offers a graph
database interface using a custom query language called Web Object Query
Language (WOQL). WOQL is schema optional. TerminusDB just released the option to
query JSON directly, similar to MongoDB, giving users a more document database
style interface.

The versioning syntax is exposed via TerminusDB Console or a command line
interface. The versioning metaphors are similar to Git. You branch, push, and
pull. See their how to documentation for more information.

TerminusDB is new but we like what we see. The company is very responsive, has
an active Discord, and is well funded. If you think your database version
control makes more sense in graph or document form, check them out.


DOLT

Tagline It's Git for Data Initial Release August 2019 GitHub
https://github.com/dolthub/dolt

Dolt takes “Database Version Control” rather literally. Dolt implements the Git
command line and associated operations on table rows instead of files. Data and
schema are modified in the working set using SQL. When you want to permanently
store a version of the working set, you make a commit. In SQL, dolt implements
Git read operations (ie. diff, log) as system tables and write operations (ie.
commit, merge) as functions. Dolt produces cell-wise diffs and merges, making
data debugging between versions tractable. That makes Dolt the only SQL database
on the market that has branches and merges. You can run Dolt offline, treating
data and schema like source code. Or you can run Dolt online, like you would
PostgreSQL or MySQL.

We are biased but we think if you want Database Version Control for a SQL
database, there is only one product that fits that label and that's Dolt.

SHARE


Blog
Dolt is open source
JOIN THE DATA EVOLUTION


GET STARTED WITH DOLT

Install DoltCreate an account

Or join our mailing list to get product updates.

Submit

 * Pricing
 * Documentation
 * Blog
 * Team
 * Security
 * Privacy
 * Terms


© 2024 DoltHub, Inc. All rights reserved.