Insights

Blue-Green Deployment with dbt and Snowflake

Cloud Data Engineering
dbt

Blue-Green deployment is a major upgrade for organisations looking for more stability in their dbt continuous integration/continuous deployment (CI/CD) pipeline. It allows you to test before publishing your data, and it always gives you a readily available backup for when things go awry. This can save you from damaging downtimes and after-the-fact test failures, which will help your stakeholders be more confident in your data pipeline and reports.

Beyond just dbt, leveraging Snowflake’s unique features like swap and zero-copy cloning makes implementing Blue-Green deployment more effortless and efficient. This article will cover what Blue-Green deployment is and guidelines on how to implement it using dbt and Snowflake.

What is Blue-Green deployment?

Blue-Green deployment is a methodology that aims to create a stable and reliable production environment by running and rotating two identical databases. One database (prod) has the most up-to-date data and is the target of all downstream applications like Looker, Sigma, or any other production use case. The other database (stage) has the previous, slightly stale state of data, which is not the target of any querying and will stand idle until the next swap.

There are many benefits to implementing this methodology. To name a few, Blue-Green deployment allows you to push new deployments without risking any downtime caused by broken data, is an easy way to roll back if the deployment did not go as expected, and it allows you to complete tests and validations before publishing changes to production.

The worst thing that can happen with Blue-Green is that your data warehouse becomes slightly stale, which is better than leaving off for the weekend with a non-functioning pipeline, and is easy to fix.

Blue-Green deployment is implemented by executing dbt commands in the following order:

  1. dbt run: When a dbt job is executed, dbt will target the staging database to perform the run without affecting the production database.
  2. dbt test: After the dbt run is completed in the staging database, dbt will perform all tests on models to make sure the project passes all requirements. If the run fails, the job will exit, and it will not proceed to the next step.
  3. dbt run-operation swap_database: At this final step, the staging database gets promoted to host production data, and the previous hosting database gets demoted to staging and becomes idle.

 

Tutorial

1. Create and configure the staging database

The first step of implementing Blue-Green Deployment is to create a staging database with the same configuration as production. The role used by the tool that runs the dbt project must be granted permission to write and read on both production and staging databases. Cloning all schemas and tables from production to staging is cost-efficient. However, dbt should rebuild all models from scratch.

Below is an example of Snowflake SQL commands to create a new staging database named stage_dband grant ownership to dbtcloud_role, the exact role we set up dbt Cloud to run transformations.

-- Create a new database and grant ownership
create or replace database stage_db;
grant ownership on database stage_db to role dbtcloud_role;


-- Optional step: clone production schemas to the staging database
create or replace schema
Stage_db.dbt_analytics
Clone
analytics_db.dbt_analytics;

2. Configure dbt environment to target the staging database

Next, configure the environment that dbt uses to connect with the data warehouse. This is to target the staging database when dbt is executed. This step can be done in two ways depending on the orchestration tool used to run dbt transformation.

dbt Cloud

dbt Cloud has a native interface to configure dbt environments. Create a new environment and set the target database to use the staging database. This will be the environment to push new deployments to production and perform database swaps.

 

 

Other tools

The configuration for other tools will most likely be in the profiles.yml file. Input the newly created staging database as the default target.

database: stage_db

3. Create swap_database operation

Create a dbt operation named swap_database in the macro folder. After executing all runs and tests, this operation will promote the staging database to production. The swap with function is a command unique to Snowflake that performs a rename of both databases as a single operation.

{% macro swap_database() %}
{% set sql='alter database analytics_db swap with stage_db' %}
{% do run_query(sql) %}
{{ log("database swapped", info=True) }}
{% endmacro %}

4. Dealing with Snapshots

The best practice for Snapshot models is to have a dedicated database for them. The swapping of databases will break the continuity of any single snapshot table leading to incomplete histories. By having snapshot models written to a third database, it will not be affected by the swap.

Below are four steps to configure snapshot models for Blue-Green deployment and migrate existing snapshots to the new database. If your current dbt project does not have any snapshot models, only follow steps A and B.

A. In Snowflake, create a new database and a schema to store the snapshot models.

-- Create a new database and grant ownership
create or replace database snapshots_db;
grant ownership on database snapshots_db to role dbtcloud_role;

-- Create a new schema
create or replace schema snapshots_db.dbt_snapshots;

 

B. In your dbt project, configure all snapshots to write to the new target database and schema. This could be done in the dbt_project.yml file by defining the target schema and database destination under snapshots.

snapshots:
  datatonic:
+target_schema: snapshots_db
+target_database: dbt_snapshots

 

C. In Snowflake, migrate all existing snapshot tables from the prod and database to the new snapshot database and schema, if there are any. Snapshot models are stored as transient tables in Snowflake; therefore, the clone function does not work. Instead, use the classic insert method like the example below to copy records to the new target destination.

-- Create a new snapshot table and insert records from original target
create or replace transient table snapshots_db.dbt_snapshots.cdc_datatonic__subscribers as
select * from analytics_db.dbt_analytics.cdc_datatonic__subscribers;

 

D. This step is optional, but it is highly recommended to delete the old snapshot tables from the prod and staging database.


-- Drop table from the old target

drop table if exists analytics_db.dbt_analytics.cdc_datatonic__subscribers;
drop table if exists stage_db.dbt_analytics.cdc_datatonic__subscribers;

5. Overwrite your ref macro

When using Blue-Green deploy, databases are renamed on every successful run. The renaming of databases will cause issues whenever a view is executed in production. That is because views are saved as queries in Snowflake along with many other database services. In dbt, the ref function includes the complete object address (database.schema.object) by default after compilation. When a view is executed, the query will reference the staging database instead of the production database because it is set as the default target in step 2. To have the ref function only reference the schema.object, overwrite the ref macro to exclude the database name. Create a new macro in the macros folder named ref and include the following code snippet:

{% macro ref(model_name) %}


  {% set rel = builtins.ref(model_name) %}


  {% if rel.database == 'snapshots_db' %}
   {% do return(builtins.ref(model_name)) %}
  {% else %}
   {% do return(builtins.ref(model_name).include(database=false)) %}
  {% endif %}


{% endmacro %}

With the ref macro being overwritten, the macro will compile the Snowflake address without the database name like the example below. With this setup, views will no longer cause issues relating to database swapping. The database name does not need to be referenced as long as the schema and table exist in prod and stage databases.


-- Before dbt Compile

select * from {{ ref('dim_subscribers') }}


-- After dbt Compile
select * from dbt_analytics.dim_subscribers

Note: We added a condition to the ref macro where if the database name of the referenced table is snapshot_db, then include the database name. This condition is needed to tell Snowflake where to look for referenced tables outside the prod and stage database, like snapshot models, because they are not part of the swapping databases.

-- Before dbt Compile
select * from {{ ref('cdc_datatonic__subscribers') }}

 

-- After dbt Compile
select * from snapshots_db.dbt_snapshots.cdc_datatonic__subscribers

6. Configure dbt job commands

The final step of implementing Blue-Green is adding the swap_database operation as the last command in the dbt Cloud deployment process. Create a dbt job using the same order of the dbt commands below, and don’t forget to set the job using the same environment created in step 2.

Jobs that do partial dbt runs should use an environment that writes directly to the production database. Only apply the swap_database operation on jobs that perform full runs. Otherwise, end-users will receive incomplete data.

Blue-Green Snowflake

Conclusion

By creating two new databases, two new macros, and some tweaks in dbt Cloud settings, there you have Blue-Green deployment, a simple and robust technique to add to your dbt and Snowflake project. It allows stakeholders to have more confidence in your data stack by providing a more stable production environment without risking damaging downtimes.

Datatonic has a wealth of experience in data + AI, including dbt and Snowflake migrations and optimisation. To discuss your current challenges or work on a project together, contact us here

Related
View all
View all
UNDP Solar
Insights
Providing Access to Energy with UNDP
Computer Vision
Partner of the Year Awards
Insights
Datatonic Wins Four 2024 Google Cloud Partner of the Year Awards
Women in Data and Analytics
Insights
Coding Confidence: Inspiring Women in Data and Analytics