docs.snowplow.io Open in urlscan Pro
2606:4700::6812:6b3  Public Scan

Submitted URL: https://djq1bt04.eu1.hubspotlinks.com/Ctc/I9+113/djq1bt04/VVNypr9l0Dx2W6nCh9d2M0t5YW6y9Q6956v8XkN3Bm-7x5n4LbW6N1X8z6lZ3mhW7GrHcl7cZD08...
Effective URL: https://docs.snowplow.io/docs/understanding-tracking-design/versioning-your-data-structures/amending/?utm_campaign=INB.T3...
Submission: On December 06 via api from ES — Scanned from ES

Form analysis 0 forms found in the DOM

Text Content

Skip to main content

AcceleratorsDiscourseGitHub
Try Snowplow for freeBook a demo

SearchK

 * Introduction
 * Feature comparison
 * Getting started
 * Snowplow fundamentals
 * First steps
 * Recipes and tutorials
 * Installing Snowplow
 * Setting up BDP Enterprise
 * Setting up BDP Cloud
 * Setting up Community Edition
 * Try Snowplow
 * Using Snowplow
 * Defining the data to collect
   * Introduction to tracking design
   * Creating a tracking plan 🆕
   * Managing data structures
   * Versioning data structures
     * Using the UI
     * Using the Data Structures Builder
     * Using Iglu
     * Amending schemas
   * Managing tracking scenarios 🆕
 * Collecting data
 * Testing and debugging
 * Enriching data
 * Storing and querying data
 * Routing data elsewhere
 * Modeling data
 * Managing data quality
 * Discovering data
 * 🆕 Visualizing your data
 * Managing your account
 * Reference
 * Components & applications
 * Community & contributing

 * 
 * Defining the data to collect
 * Versioning data structures
 * Amending schemas

On this page


AMENDING SCHEMAS

info
This documentation only applies to Snowplow BDP Enterprise and Snowplow
Community Edition. See the feature comparison page for more information about
the different Snowplow offerings.

Sometimes, small mistakes creep into your schemas. For example, you might mark
an optional field as required, or make a typo in the name of one of the fields.
In these cases, you will want to update the schema to correct the mistake.


TREAT SCHEMAS AS IMMUTABLE

It might be tempting to somehow “overwrite” the schema without updating the
version. But this can bring several problems:

 * Events that were previously valid could become invalid against the new
   changes.
 * Your warehouse loader, which updates the table according to the schema, could
   get stuck if it’s not possible to cast the data in the existing table column
   to the new definition (e.g. if you change a field type from a string to a
   number).
 * Similarly, data models or other applications consuming the data downstream
   might not be able to deal with the changes.

The best approach is to just create a new schema version and update your
tracking code to use it. However, there are two alternatives for when it’s not
ideal.


PATCHING THE SCHEMA

If you are working on a new schema version in a development environment, there
is usually little risk in overwriting the schema instead of creating a new
version. That’s because the new schema version has not made it to production, so
changing it will not corrupt any production data. Moreover, if you overwrite all
incorrect schema versions, you will be left with a neat and tidy schema version
history.

Before:

1-0-2
(incorrect)
1-0-0
1-0-1

After:

1-0-2
(corrected)
1-0-0
1-0-1

We call this approach “patching”. To patch the schema, i.e. apply changes to it
without updating the version:

 * If you are using Snowplow BDP, select the “Patch” option in the UI when
   saving the schema
 * If you are using Snowplow Community Edition, do not increment the schema
   version when uploading it with igluctl

danger

Never patch schemas in a production environment. This can break your loading,
especially if your patch contains breaking changes (see above).

Also, never patch a schema version that exists in a production environment, even
if you are doing the patching in a development environment. This will lead to
problems later when you try to promote that schema to production.

For Snowplow BDP customers, patching is disabled for production pipelines.
Community Edition users have to explicitly enable patching (if desired) in the
Iglu Server configuration (patchesAllowed) at their own risk.


MARKING THE SCHEMA AS SUPERSEDED

If your events are failing in production because of an incorrect schema, you
might not be able to instantly update the tracking code to use a new schema
version. This is a common situation for mobile tracking, for example. You can
resolve this by marking the old schema version as superseded by the new schema
version.

note

You need to be on Enrich 3.8.0+ and Iglu Server 0.11.0+ to use this feature.
Additionally, if you are using Snowplow Mini or Snowplow Micro, you will need
version 0.17.0+ or 1.7.1+ respectively.

Before:

1-0-2
(incorrect)
1-0-1
1-0-0

After:

supersedes
1-0-2
(incorrect)
1-0-3
(corrected)
1-0-1
1-0-0

Here’s how this works, at a glance:

 * Suppose schema 1-0-2 is wrong.
 * Draft a new schema version correcting the issue.
 * In the new schema, add the following field at the root: "$supersedes":
   ["1-0-2"].
 * Set the version of the new schema as usual, i.e. 1-0-3 if there are no
   breaking changes or 2-0-0 if there are.
 * Add the new schema to your production environment.
 * Events or entities that use schema 1-0-2 will now be automatically updated
   (in the Enrich application) to use version 1-0-3, and will be validated
   against that version. (A special entity will be added to these events to
   record this fact.)


EXAMPLE

Let’s say we have a mobile application. We are sending certain events from this
application, and these events contain entities with following schema:

Geolocation 1-0-2

{
    "$schema": "http://iglucentral.com/schemas/com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0#",
    "description": "Schema for client geolocation contexts",
    "self": {
        "vendor": "com.acme",
        "name": "geolocation",
        "format": "jsonschema",
        "version": "1-0-2"
    },
    "type": "object",
    "properties": {
        "latitude": {
            "type": "number",
        },
        "longitude": {
            "type": "number",
        }
    },
    "additionalProperties": false
}


Later, we realize that when implementing tracking, we have mistakenly included
an altitude field in the entity objects:

Wrong tracking code (iOS)

let event = ScreenView(name: "Screen")
event.entities.add(
    SelfDescribingJson(schema: "iglu:com.acme/geolocation/jsonschema/1-0-2",
        andDictionary: [
            "latitude": 38.7223,
            "longitude": 9.1393,
            "altitude": 20 // extra field not defined in the schema
        ])!)
tracker.track(event)


Since additionalProperties is set to false, all events with the altitude field
end up as failed events.

We can create a new schema with version 1-0-3 that contains the altitude field
and then use this schema in the next version of the application. This would make
the events valid. However, users will not update their application to the new
version all at once. Events from the older version will continue to come,
therefore there will still be failed events until all users start to use a newer
version.

To solve this problem, we simply add the $supersedes definition to the new
schema.

Geolocation 1-0-3 with $supersedes

{
    "$schema": "http://iglucentral.com/schemas/com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0#",
    "$supersedes": ["1-0-2"],
    "description": "Schema for client geolocation contexts",
    "self": {
        "vendor": "com.acme",
        "name": "geolocation",
        "format": "jsonschema",
        "version": "1-0-3"
    },
    "type": "object",
    "properties": {
        "latitude": {
            "type": "number",
        },
        "longitude": {
            "type": "number",
        },
        "altitude": {
            "type": "number",
        }
    },
    "additionalProperties": false
}


Now, when we receive events from the mobile application that use schema 1-0-2,
these events will be updated to use schema 1-0-3 and will be validated against
that schema. Therefore, these events will be valid.

To record this fact, an extra entity will be added to all such events:

{
    "schema": "iglu:com.snowplowanalytics.iglu/validation_info/jsonschema/1-0-0",
    "data": {
        "originalSchema": "iglu:com.acme/geolocation/jsonschema/1-0-2",
        "validatedWith": "1-0-3"
    }
}


Finally, if we browse schema version 1-0-2, we will see that Iglu Server
automatically keeps track of which schema supersedes which. Specifically, it
will now contain a $supersededBy definition:

Geolocation 1-0-2 with $supersededBy

{
    "$schema": "http://iglucentral.com/schemas/com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0#",
    "$supersededBy": "1-0-3",
    "description": "Schema for client geolocation contexts",
    "self": {
        "vendor": "com.acme",
        "name": "geolocation",
        "format": "jsonschema",
        "version": "1-0-2"
    },
    "type": "object",
    "properties": {
        "latitude": {
            "type": "number",
        },
        "longitude": {
            "type": "number",
        }
    },
    "additionalProperties": false
}



USAGE

The $supersedes field states that the schema version defined in the self part
supersedes the schema versions listed in the $supersedes field (one or more).
Its value must be an array of strings (even if it only includes one item). For
example:

...
"$supersedes": ["1-0-2", "1-0-3"],
...


Patching and superseding

Once you’ve defined the $supersedes field for a schema version, you can’t update
it — even in the development environment where patching is allowed. However, you
can change which schema version supersedes which by creating new schema
versions.

For example, if version 1-0-2 is defined to supersede version 1-0-1, and you
create version 1-0-3 which also supersedes 1-0-1, then 1-0-1 will be superseded
by the newest version, i.e. 1-0-3. See diagrams below for more information on
how this is determined.


RULES

A SCHEMA VERSION CAN ONLY SUPERSEDE PREVIOUS VERSIONS

For example, 1-0-2 can supersede 1-0-1, but can’t supersede 1-0-3, 1-1-0, or
2-0-0. Iglu Server will reject a schema with a definition that breaks this rule.

✅ OK❌ Invalid
supersedes
1-0-2
1-0-1
supersedes
1-0-2
2-0-0

A SCHEMA VERSION CAN SUPERSEDE MULTIPLE PREVIOUS VERSIONS AT ONCE

Events referencing either of those previous versions will be treated as
explained above.

âś… OK
supersedes
supersedes
1-0-2
1-0-1
1-0-3

AT ANY GIVEN MOMENT, A SCHEMA VERSION CAN ONLY BE SUPERSEDED BY A SINGLE SCHEMA
VERSION

Iglu Server automatically upholds this rule.

For example, if you specify that 1-0-3 supersedes 1-0-2 and (later) that 1-0-4
also supersedes 1-0-2, the latest schema — 1-0-4 — will automatically become the
one that supersedes 1-0-2.

SpecifiedBecomes
supersedes
supersedes
1-0-3
1-0-2
1-0-4
supersedes
1-0-4
1-0-3
1-0-2

The same happens if you specify “chains”, e.g. 1-0-3 supersedes 1-0-2 and 1-0-4
supersedes 1-0-3. This will be automatically updated so that 1-0-4 supersedes
1-0-2 and 1-0-3.

SpecifiedBecomes
supersedes
supersedes
1-0-3
1-0-2
1-0-4
supersedes
supersedes
1-0-3
1-0-2
1-0-4

Edit this page
Last updated on Nov 27, 2023
Was this page helpful?YesNo
Previous
Using Iglu
Next
Managing tracking scenarios 🆕
 * Treat schemas as immutable
 * Patching the schema
 * Marking the schema as superseded
   * Example
   * Usage
   * Rules

Change cookie preferences·Terms and conditions
Copyright © 2023 Snowplow Analytics Ltd. Built with Docusaurus.