docs.snowplow.io Open in urlscan Pro
2606:4700::6812:6b3  Public Scan

Submitted URL: https://djq1bt04.eu1.hubspotlinks.com/Ctc/I9+113/djq1bt04/VXfpCQ5SX9v1W7yGXwC7mfGvQVcSh1755hT5dN1xVmYg5n4LbW50kH_H6lZ3lfVnRjg-6PFkM_N1...
Effective URL: https://docs.snowplow.io/docs/getting-started-on-snowplow-open-source/quick-start/?utm_campaign=External%20Newsletter&utm...
Submission: On October 31 via api from ES — Scanned from ES

Form analysis 0 forms found in the DOM

Text Content

Skip to main content

DocsDiscourseGitHub
Try Snowplow for freeBook a demo

SearchK

 * Introduction
 * Feature comparison
 * Getting started
 * Snowplow fundamentals
 * First steps
 * Recipes and tutorials
 * Installing Snowplow
 * Setting up BDP Enterprise
 * Setting up BDP Cloud
 * Setting up Open Source
   * What is Snowplow Open Source?
   * Before you begin
   * Quick start guide
   * What is deployed?
   * Upgrade guide
   * Frequently asked questions
   * Telemetry principles
 * Try Snowplow
 * Using Snowplow
 * Defining the data to collect
 * Collecting data
 * Testing and debugging
 * Enriching data
 * Storing and querying data
 * Routing data elsewhere
 * Modeling data
 * Managing data quality
 * Discovering data
 * 🆕 Visualizing your data
 * Managing your account
 * Reference
 * Components & applications
 * Community & contributing

 * 
 * Setting up Open Source
 * Quick start guide

On this page


QUICK START GUIDE

info
This documentation only applies to Snowplow Open Source. See the feature
comparison page for more information about the different Snowplow offerings.

This guide will take you through how to spin up an open source pipeline using
the Snowplow Terraform modules. (Not familiar with Terraform? Take a look at
Infrastructure as code with Terraform.)

Skip installation
Would you like to explore Snowplow for free with zero setup? Check out our free
trial.

👉 Take me there! 👈


PREREQUISITES

Install Terraform 1.0.0 or higher. Follow the instructions to make sure the
terraform binary is available on your PATH. You can also use tfenv to manage
your Terraform installation.

Clone the repository at https://github.com/snowplow/quickstart-examples to your
machine:

git clone https://github.com/snowplow/quickstart-examples.git


 * AWS
 * GCP
 * Azure đŸ§Ș

Install AWS CLI version 2.

Configure the CLI against a role that has the AdminstratorAccess policy
attached.

caution

AdminstratorAccess allows all actions on all AWS services and shouldn't be used
in production

Details on how to configure the AWS Terraform Provider can be found on the
registry.

Install Google Cloud SDK.

Make sure the following APIs are active in your GCP account (this list might not
be exhaustive and is subject to change as GCP APIs evolve):

 * Compute Engine API
 * Cloud Resource Manager API
 * Identity and Access Management (IAM) API
 * Cloud Pub/Sub API
 * Cloud SQL Admin API

Configure a Google Cloud service account. See details on using the service
account with the Cloud SDK. You will need to:

 * Navigate to your service account on Google Cloud Console
 * Create a new JSON Key and store it locally
 * Create the environment variable by running export
   GOOGLE_APPLICATION_CREDENTIALS="KEY PATH" in your terminal

Early release

Currently, Azure support is in its pilot phase and we encourage you to share
your thoughts and feedback with us on Discourse to help shape future
development.

If you are interested in deploying BDP Enterprise (a private SaaS with extra
features and SLAs) in your Azure account, please join our waiting list.

Alternatively, if you prefer loading data into Snowflake or Databricks hosted on
Azure without managing any non-Azure infrastructure, consider BDP Cloud - our
SaaS offering that fully supports these destinations.

Install the Azure CLI.

If your organisation has an existing Azure account, make sure your user has been
granted the following roles on a valid Azure Subscription:

 * Contributor
 * User Access Administrator
 * Storage Blob Data Contributor

caution

User Access Administrator allows the user to modify, create and delete
permissions across Azure resources, and shouldn’t be used in production.
Instead, you can use a custom role with the following permissions:

 * Microsoft.Authorization/roleAssignments/write to deploy the stacks below
 * Microsoft.Authorization/roleAssignments/delete to destroy them

If you don’t have an Azure account yet, you can get started with a new
pay-as-you-go account.

Details on how to configure the Azure Terraform Provider can be found on the
registry.


STORAGE OPTIONS

The sections below will guide you through setting up your destination to receive
Snowplow data, but for now here is an overview.

WarehouseAWSGCPAzurePostgres✅✅❌Snowflake✅❌✅Databricks✅❌✅Redshift✅——BigQuery—✅—Synapse
Analytics đŸ§Ș——✅

 * AWS
 * GCP
 * Azure đŸ§Ș

There are four main storage options for you to select: Postgres, Redshift,
Snowflake and Databricks. Additionally, there is an S3 option, which is
primarily used to archive enriched (and/or raw) events and to store failed
events.

We recommend to only load data into a single destination, but nothing prevents
you from loading into multiple destinations with the same pipeline (e.g. for
testing purposes).

There are two alternative storage options for you to select: Postgres and
BigQuery.

We recommend to only load data into a single destination, but nothing prevents
you from loading into multiple destinations with the same pipeline (e.g. for
testing purposes).

There are two storage options for you to select: Snowflake and data lake (ADLS).
The latter option enables querying data from Databricks and Synapse Analytics.

We recommend to only load data into a single destination (Snowflake or data
lake), but nothing prevents you from loading into both with the same pipeline
(e.g. for testing purposes).


SET UP A VPC TO DEPLOY INTO

 * AWS
 * GCP
 * Azure đŸ§Ș

AWS provides a default VPC in every region for your sub-account. Take a note of
the identifiers of this VPC and the associated subnets for later parts of the
deployment.

GCP provides a default VPC for your project. In the steps below, it is
sufficient to set network = default and leave subnetworks empty, and Terraform
will discover the correct network to deploy into.

Azure does not provide a default VPC or resource group, so we have added a
helper module to create a working network we can deploy into.

To use our out-of-the-box network, you will need to navigate to the
terraform/azure/base directory in the quickstart-examples repository and update
the input variables in terraform.tfvars.

Once that’s done, you can use Terraform to create your base network.

terraform init
terraform plan
terraform apply


After the deployment completes, you should get an output like this:

...
vnet_subnets_name_id = {
  "collector-agw1" = "/subscriptions/<...>/resourceGroups/<...>/providers/Microsoft.Network/virtualNetworks/<...>/subnets/collector-agw1"
  "iglu-agw1" = "/subscriptions/<...>/resourceGroups/<...>/providers/Microsoft.Network/virtualNetworks/<...>/subnets/iglu-agw1"
  "iglu1" = "/subscriptions/<...>/resourceGroups/<...>/providers/Microsoft.Network/virtualNetworks/<...>/subnets/iglu1"
  "pipeline1" = "/subscriptions/<...>/resourceGroups/<...>/providers/Microsoft.Network/virtualNetworks/<...>/subnets/pipeline1"
}


These are the subnet identifiers, e.g.
"/subscriptions/<...>/resourceGroups/<...>/providers/Microsoft.Network/virtualNetworks/<...>/subnets/pipeline1"
is the identifier of the pipeline1 subnet. Take note of these four identifiers,
as you will need them in the following steps.


SET UP IGLU SERVER

The first step is to set up the Iglu Server stack required by the rest of your
pipeline.

This will allow you to create and evolve your own custom events and entities.
Iglu Server stores the schemas for your events and entities and fetches them as
your events are processed by the pipeline.


STEP 1: UPDATE THE IGLU_SERVER INPUT VARIABLES

Once you have cloned the quickstart-examples repository, you will need to
navigate to the iglu_server directory to update the input variables in
terraform.tfvars.

 * AWS
 * GCP
 * Azure đŸ§Ș

cd quickstart-examples/terraform/aws/iglu_server/default
nano terraform.tfvars # or other text editor of your choosing


cd quickstart-examples/terraform/gcp/iglu_server/default
nano terraform.tfvars # or other text editor of your choosing


cd quickstart-examples/terraform/azure/iglu_server
nano terraform.tfvars # or other text editor of your choosing


If you used our base module, you will need to set these variables as follows:

 * resource_group_name: use the same value as you supplied in base
 * subnet_id_lb: use the identifier of the iglu-agw1 subnet from base
 * subnet_id_servers: use the identifier of the iglu1 subnet from base

To update your input variables, you’ll need to know a few things:

 * Your IP Address. Help.
 * A UUIDv4 to be used as the Iglu Server’s API Key. Help.
 * How to generate an SSH Key.

tip

On most systems, you can generate an SSH Key with: ssh-keygen -t rsa -b 4096.
This will output where you public key is stored, for example: ~/.ssh/id_rsa.pub.
You can get the value with cat ~/.ssh/id_rsa.pub.

Telemetry notice

By default, Snowplow collects telemetry data for each of the Quick Start
Terraform modules. Telemetry allows us to understand how our applications are
used and helps us build a better product for our users (including you!).

This data is anonymous and minimal, and since our code is open source, you can
inspect what’s collected.

If you wish to help us further, you can optionally provide your email (or just a
UUID) in the user_provided_id variable.

If you wish to disable telemetry, you can do so by setting telemetry_enabled to
false.



See our telemetry principles for more information.


STEP 2: RUN THE IGLU_SERVER TERRAFORM SCRIPT

You can now use Terraform to create your Iglu Server stack.

 * AWS
 * GCP
 * Azure đŸ§Ș

You will be asked to select a region, you can find more information about
available AWS regions here.

terraform init
terraform plan
terraform apply


The deployment will take roughly 15 minutes.

terraform init
terraform plan
terraform apply


terraform init
terraform plan
terraform apply


Once the deployment is done, it will output iglu_server_dns_name. Make a note of
this, you’ll need it when setting up your pipeline. If you have attached a
custom SSL certificate and set up your own DNS records, then you don’t need this
value.


PREPARE THE DESTINATION

Depending on the destination(s) you’ve choosen, you might need to perform a few
extra steps to prepare for loading data there.

tip

Feel free to go ahead with these while your Iglu Server stack is deploying.

 * Postgres
 * Redshift
 * BigQuery
 * Snowflake
 * Databricks
 * Synapse Analytics đŸ§Ș

No extra steps needed — the necessary resources like a PostgreSQL instance,
database, table and user will be created by the Terraform modules.

Assuming you already have an active Redshift cluster, execute the following SQL
(replace the ${...} variables with your desired values). You will need the
permissions to create databases, users and schemas in the cluster.

-- 1. (Optional) Create a new database - you can also use an existing one if you prefer
CREATE DATABASE ${redshift_database};
-- Log back into Redshift with the new database:
-- psql --host <host> --port <port> --username <admin> --dbname ${redshift_database}

-- 2. Create a schema within the database
CREATE SCHEMA IF NOT EXISTS ${redshift_schema};

-- 3. Create the loader user
CREATE USER ${redshift_loader_user} WITH PASSWORD '${redshift_password}';

-- 4. Ensure the schema is owned by the loader user
ALTER SCHEMA ${redshift_schema} OWNER TO ${redshift_loader_user};


note

You will need to ensure that the loader can access the Redshift cluster over
whatever port is configured for the cluster (usually, 5439).

No extra steps needed.

Execute the following SQL (replace the ${...} variables with your desired
values). You will need access to both SYSADMIN and SECURITYADMIN level roles to
action this:

-- 1. (Optional) Create a new database - you can also use an existing one if you prefer
CREATE DATABASE IF NOT EXISTS ${snowflake_database};

-- 2. Create a schema within the database
CREATE SCHEMA IF NOT EXISTS ${snowflake_database}.${snowflake_schema};

-- 3. Create a warehouse which will be used to load data
CREATE WAREHOUSE IF NOT EXISTS ${snowflake_warehouse} WITH WAREHOUSE_SIZE = 'XSMALL' WAREHOUSE_TYPE = 'STANDARD' AUTO_SUSPEND = 60 AUTO_RESUME = TRUE;

-- 4. Create a role that will be used for loading data
CREATE ROLE IF NOT EXISTS ${snowflake_loader_role};
GRANT USAGE, OPERATE ON WAREHOUSE ${snowflake_warehouse} TO ROLE ${snowflake_loader_role};
GRANT USAGE ON DATABASE ${snowflake_database} TO ROLE ${snowflake_loader_role};
GRANT ALL ON SCHEMA ${snowflake_database}.${snowflake_schema} TO ROLE ${snowflake_loader_role};

-- 5. Create a user that can be used for loading data
CREATE USER IF NOT EXISTS ${snowflake_loader_user} PASSWORD='${snowflake_password}'
  MUST_CHANGE_PASSWORD = FALSE
  DEFAULT_ROLE = ${snowflake_loader_role}
  EMAIL = 'loader@acme.com';
GRANT ROLE ${snowflake_loader_role} TO USER ${snowflake_loader_user};

-- 6. (Optional) Grant this role to SYSADMIN to make debugging easier from admin users
GRANT ROLE ${snowflake_loader_role} TO ROLE SYSADMIN;


Azure-specific instructions

On Azure, we currently support loading data into Databricks via a data lake. You
can still follow Step 1 below to create the cluster, however you should skip the
rest of these steps. Instead, proceed with deploying the pipeline — we will
return to configuring Databricks at the end of this guide.

STEP 1: CREATE A CLUSTER

note

The cluster spec described below should be sufficient for a monthly event volume
of up to 10 million events. If your event volume is greater, then you may need
to increase the size of the cluster.

Create a new cluster, following the Databricks documentation, with the following
settings:

 * the runtime version must be 13.0 or greater (but not 13.1 or 13.2)
 * single node cluster
 * "smallest" size node type
 * auto-terminate after 30 minutes.

Advanced cluster configuration (optional)

You might want to configure cluster-level permissions, by following the
Databricks instructions on cluster access control. Snowplow's RDB Loader must be
able to restart the cluster if it is terminated.

If you use AWS Glue Data Catalog as your metastore, follow these Databricks
instructions for the relevant spark configurations. You will need to set
spark.databricks.hive.metastore.glueCatalog.enabled true and
spark.hadoop.hive.metastore.glue.catalogid <aws-account-id-for-glue-catalog> in
the spark configuration.

You can configure your cluster with an instance profile if it needs extra
permissions to access resources. For example, if the S3 bucket holding the delta
lake is in a different AWS account.

STEP 2: NOTE THE JDBC CONNECTION DETAILS

 1. In the Databricks UI, click on "Compute" in the sidebar
 2. Click on the RDB Loader cluster and navigate to "Advanced options"
 3. Click on the "JDBC/ODBC" tab
 4. Note down the JDBC connection URL, specifically the host, the port and the
    http_path

STEP 3: CREATE AN ACCESS TOKEN FOR THE LOADER

caution

The access token must not have a specified lifetime. Otherwise, the loader will
stop working when the token expires.

 1. Navigate to the user settings in your Databricks workspace
    * For Databricks hosted on AWS, the "Settings" link is in the lower left
      corner in the side panel
    * For Databricks hosted on Azure, "User Settings" is an option in the
      drop-down menu in the top right corner.
 2. Go to the "Access Tokens" tab
 3. Click the "Generate New Token" button
 4. Optionally enter a description (comment). Leave the expiration period empty
 5. Click the "Generate" button
 6. Copy the generated token and store it in a secure location

STEP 4: CREATE THE CATALOG AND THE SCHEMA

Execute the following SQL (replace the ${...} variables with your desired
values). The default catalog is called hive_metastore and is what you should use
in the loader unless you specify your own.

-- USE CATALOG ${catalog_name}; -- Uncomment if your want to use a custom Unity catalog and replace with your own value.

CREATE SCHEMA IF NOT EXISTS ${schema_name}
-- LOCATION s3://<custom_location>/ -- Uncomment if you want tables created by Snowplow to be located in a non-default bucket or directory.
;


Advanced security configuration (optional)

The security principal used by the loader needs a Databricks SQL access
permission, which can be enabled in the Admin Console.

Databricks does not have table access enabled by default. Enable it with an
initialization script:

dbutils.fs.put("dbfs:/databricks/init/set_spark_params.sh","""
|#!/bin/bash
|
|cat << 'EOF' > /databricks/driver/conf/00-custom-table-access.conf
|[driver] {
|  "spark.databricks.acl.sqlOnly" = "true"
|}
|EOF
""".stripMargin, true)


After adding the script, you need to restart the cluster. Verify that changes
took effect by evaluating spark.conf.get("spark.databricks.acl.sqlOnly"), which
should return true.

Follow the rest of the quick start guide so that the loader creates the required
tables. Afterwards, reconfigure the permissions:

-- Clean initial permissions from tables
REVOKE ALL PRIVILEGES ON TABLE <catalog>.<schema>.events FROM `<principal>`;
REVOKE ALL PRIVILEGES ON TABLE <catalog>.<schema>.manifest FROM `<principal>`;
REVOKE ALL PRIVILEGES ON TABLE <catalog>.<schema>.rdb_folder_monitoring FROM `<principal>`;

-- Clean initial permissions from schema
REVOKE ALL PRIVILEGES ON SCHEMA <catalog>.<schema> FROM `<principal>`;

-- Loader will run CREATE TABLE IF NOT EXISTS statements, so USAGE and CREATE and both required.
GRANT USAGE, CREATE ON SCHEMA <catalog>.<schema> TO `<principal>`;

-- COPY TO statement requires ANY FILE and MODIFY for the receiving table
GRANT SELECT ON ANY FILE TO `<principal>`;
GRANT MODIFY  ON TABLE  <catalog>.<schema>.events TO `<principal>`;

-- These tables are used to store internal loader state
GRANT MODIFY, SELECT ON TABLE  <catalog>.<schema>.manifest TO `<principal>`;
GRANT MODIFY, SELECT ON TABLE  <catalog>.<schema>.rdb_folder_monitoring TO `<principal>`;


No extra steps needed. Proceed with deploying the pipeline — we will return to
configuring Synapse at the end of this guide.


SET UP THE PIPELINE

In this section, you will update the input variables for the Terraform module,
and then run the Terraform script to set up your pipeline. At the end you will
have a working Snowplow pipeline ready to receive web, mobile or server-side
data.


STEP 1: UPDATE THE PIPELINE INPUT VARIABLES

Navigate to the pipeline directory in the quickstart-examples repository and
update the input variables in terraform.tfvars.

 * AWS
 * GCP
 * Azure đŸ§Ș

cd quickstart-examples/terraform/aws/pipeline/default
nano terraform.tfvars # or other text editor of your choosing


cd quickstart-examples/terraform/gcp/pipeline/default
nano terraform.tfvars # or other text editor of your choosing


cd quickstart-examples/terraform/azure/pipeline
nano terraform.tfvars # or other text editor of your choosing


If you used our base module, you will need to set these variables as follows:

 * resource_group_name: use the same value as you supplied in base
 * subnet_id_lb: use the identifier of the collector-agw1 subnet from base
 * subnet_id_servers: use the identifier of the pipeline1 subnet from base

Confluent Cloud

If you already use Confluent Cloud, you can opt to create the necessary message
topics there, instead of relying on Azure Event Hubs. This way, you will also
benefit from features like Stream Lineage.

To do this, you will need to:

 * Set the stream_type variable to confluent_cloud
 * Create 3 or 4 topics manually in your Confluent cluster and add their names
   in the respective variables (confluent_cloud_..._topic_name)
 * Create an API key and fill the relevant fields (confluent_cloud_api_key,
   confluent_cloud_api_secret)
 * Add a bootstrap server in confluent_cloud_bootstrap_server

Topic partitions

If you need to stay within the free tier for your Confluent cluster, make sure
to select no more than 2 partitions for each topic.

To update your input variables, you’ll need to know a few things:

 * Your IP Address. Help.
 * Your Iglu Server’s domain name from the previous step
 * Your Iglu Server’s API Key from the previous step
 * How to generate an SSH Key.

tip

On most systems, you can generate an SSH Key with: ssh-keygen -t rsa -b 4096.
This will output where you public key is stored, for example: ~/.ssh/id_rsa.pub.
You can get the value with cat ~/.ssh/id_rsa.pub.

DESTINATION-SPECIFIC VARIABLES

 * AWS
 * GCP
 * Azure đŸ§Ș

As mentioned above, there are several options for the pipeline’s destination
database. For each destination you’d like to configure, set the
<destination>_enabled variable (e.g. redshift_enabled) to true and fill all the
relevant configuration options (starting with <destination>_).

When in doubt, refer back to the destination setup section where you have picked
values for many of the variables.

caution

For all active destinations, change any _password setting to a value that only
you know.

If you are using Postgres, set the postgres_db_ip_allowlist to a list of CIDR
addresses that will need to access the database — this can be systems like BI
Tools, or your local IP address, so that you can query the database from your
laptop.

As mentioned above, there are two options for pipeline’s destination database.
For each destination you’d like to configure, set the <destination>_enabled
variable (e.g. postgres_db_enabled) to true and fill all the relevant
configuration options (starting with <destination>_).

Postgres only

Change the postgres_db_password setting to a value that only you know.

Set the postgres_db_authorized_networks to a list of CIDR addresses that will
need to access the database — this can be systems like BI Tools, or your local
IP address, so that you can query the database from your laptop.

As mentioned above, there are two options for the pipeline’s destination:
Snowflake and data lake (the latter enabling Databricks and Synapse Analytics).
For each destination you’d like to configure, set the <destination>_enabled
variable (e.g. snowflake_enabled) to true and fill all the relevant
configuration options (starting with <destination>_).

When in doubt, refer back to the destination setup section where you have picked
values for many of the variables.

caution

If loading into Snowflake, change the snowflake_loader_password setting to a
value that only you know.


STEP 2: RUN THE PIPELINE TERRAFORM SCRIPT

 * AWS
 * GCP
 * Azure đŸ§Ș

You will be asked to select a region, you can find more information about
available AWS regions here.

terraform init
terraform plan
terraform apply


This will output your collector_dns_name, postgres_db_address, postgres_db_port
and postgres_db_id.

terraform init
terraform plan
terraform apply


This will output your collector_ip_address, postgres_db_address,
postgres_db_port, bigquery_db_dataset_id, bq_loader_dead_letter_bucket_name and
bq_loader_bad_rows_topic_name.

terraform init
terraform plan
terraform apply


This will output your collector_lb_ip_address and collector_lb_fqdn.

Make a note of the outputs: you'll need them when sending events and (in some
cases) connecting to your data.

Empty outputs

Depending on your cloud and chosen destination, some of these outputs might be
empty — you can ignore those.

If you have attached a custom SSL certificate and set up your own DNS records,
then you don't need collector_dns_name, as you will use your own DNS record to
send events from the Snowplow trackers.

Terraform errors

For solutions to some common Terraform errors that you might encounter when
running terraform plan or terraform apply, see the FAQs section.


CONFIGURE THE DESTINATION

 * Postgres
 * Redshift
 * BigQuery
 * Snowflake
 * Databricks
 * Synapse Analytics đŸ§Ș

No extra steps needed.

No extra steps needed.

No extra steps needed.

No extra steps needed.

Azure-specific instructions

On Azure, we currently support loading data into Databricks via a data lake. To
complete the setup, you will need to configure Databricks to access your data on
ADLS.

First, follow the Databricks documentation to set up authentication using either
Azure service principal, shared access signature tokens or account keys. (The
latter mechanism is not recommended, but is arguably the easiest for testing
purposes.)

You will need to know a couple of things:

 * Storage account name — this is the value of the storage_account_name variable
   in the pipeline terraform.tvars file
 * Storage container name — lake-container

Once authentication is set up, you can create an external table using Spark SQL
(replace <storage-account-name> with the corredponding value):

CREATE TABLE events
LOCATION 'abfss://lake-container@<storage-account-name>.dfs.core.windows.net/events/';


Your data is loaded into ADLS. To access it, follow the Synapse documentation
and use the OPENROWSET function.

You will need to know a couple of things:

 * Storage account name — this is the value of the storage_account_name variable
   in the pipeline terraform.tvars file
 * Storage container name — lake-container

Example query

SELECT TOP 10 *
FROM OPENROWSET(
    BULK 'https://<storage-account-name>.blob.core.windows.net/lake-container/events/',
    FORMAT = 'delta'
) AS events;


We recommend creating a data source, which simplifies future queries (note that
unlike the previous URL, this one does not end with /events/):

CREATE EXTERNAL DATA SOURCE SnowplowData
WITH (LOCATION = 'https://<storage-account-name>.blob.core.windows.net/lake-container/');


Example query with data source

SELECT TOP 10 *
FROM OPENROWSET(
    BULK 'events',
    DATA_SOURCE = 'SnowplowData',
    FORMAT = 'delta'
) AS events;


Fabric and OneLake

You can also consume your ADLS data via Fabric and OneLake:

 * First, create a Lakehouse or use an existing one.
 * Next, create a OneLake shortcut to your storage account. In the URL field,
   specify
   https://<storage-account-name>.blob.core.windows.net/lake-container/events/.
 * You can now use Spark notebooks to explore your Snowplow data.

Do note that currently not all Fabric services support nested fields present in
the Snowplow data.

--------------------------------------------------------------------------------

If you are curious, here’s what has been deployed. Now it’s time to send your
first events to your pipeline!

Edit this page
Last updated on Oct 17, 2023
Was this page helpful?YesNo
Previous
Before you begin
Next
What is deployed?
 * Prerequisites
 * Storage options
 * Set up a VPC to deploy into
 * Set up Iglu Server
   * Step 1: Update the iglu_server input variables
   * Step 2: Run the iglu_server Terraform script
 * Prepare the destination
 * Set up the pipeline
   * Step 1: Update the pipeline input variables
   * Step 2: Run the pipeline Terraform script
 * Configure the destination

Change cookie preferences·Terms and conditions
Copyright © 2023 Snowplow Analytics Ltd. Built with Docusaurus.