dbt, BigQuery and Looker: Your Modern BI Tech Stack
Author: David Badovinac, Senior BI Developer
Datatonic have recently partnered dbt labs, to enable seamless BI transformation for clients by efficiently connecting their data warehouse and enabling BI teams to transform their data into a format that serves them the best.
With market leaders such as Sky, Inmarsat, Pets at Home, and MoneySupermarket choosing BigQuery for their modern data warehouse, dbt + Looker are fast becoming the go-to data transformation tools to complete the modern-day data stack.
Clients often ask us: How do dbt and Looker work well together? What are the benefits of using the two tools together? When should you model data in Looker vs. dbt?
In this article, we dwell deeper into dbt and Looker, how they work together, and how they complement BigQuery.
1. About dbt + Looker – The Basics
dbt
dbt (see more here) is a data modelling tool combining software engineering best practices and analytics best practices to build modular models that can be easily reused across the business and used as the source for your BI needs.
dbt allows you to:
- Apply testing to ensure the integrity of your data.
- Visualise complex DAGs and understand your data journey from source in an easy-to-understand view.
- Write amazing documentation and describe your models.
- Simply integrate various transform materialisations.
dbt is open-source, so engineers own their code and have complete control over it.
Looker
Looker is a leading enterprise data platform for business intelligence, enabling embedded analytics and driving and self-service analytics. It is completely cloud-native, extensible, and fully web-based with a full API First extensibility.
LookML, a powerful data modelling layer that sits behind Looker, enables you to:
- Model data and define metrics in a single unified way.
- Write custom SQL queries and surface them to the end-users such as Persistent Derived Tables.
- Keep code dry bridging methodologies used in software development, like version controlling code and collaborative development with git integration.
- Create a modelling layer between the database and the end-user by informing Looker in abstracting SQL.
Looker also has powerful security features that allow you to implement columnar and row-level security on top of limiting the data based on the end-user needs. Find out more on implementing security in Looker (see more here).
2. How do dbt + Looker work together?
Now that we have covered a brief overview of the tools, you might have noticed that some of their functionalities overlap. Indeed, clients often ask us why and when it makes sense to use Looker and dbt together.
To answer “why”, we take a look at how Looker and dbt complement each other in the following ways:
- Both tools have version-controlled business logic.
- The tools have isolated development and production environments.
- They allow you to keep your code base dry via modularity and reusability of code.
- You can write unit tests and test your data before it reaches the end-user, minimising the chances of downtime.
To address when to use these tools together, it’s important to note that both Looker and dbt serve a specific purpose when building a modern BI stack.
Overall, Looker is a great tool to get started as quickly as possible. With its powerful modelling layer, you can connect all your source tables directly to Looker, potentially cutting back on your data warehouse design. When BI teams want to get up-to-speed quickly, modelling within the tool itself enables them to do this by reading directly from Looker’s staging or source tables. Many analysts opt to build all of their business logic in Looker, such as:
- Applying transformations to fields.
- Adding Derived Tables, persistent derived tables.
- Creating complex logic within LookML
All of this can be done in Looker, however, it’s not a scalable solution. You will be required to decouple various layers of code and logic to apply changes each time the underlying data changes. This can lead to a significant overhead cost for your analyst, and a very large, complex codebase of LookML difficult to navigate and understand.
Let’s look at an example: A company just acquired Looker and they want to get up to speed as quickly as possible to be able to serve their end-user, building all the self-service layer, and ultimately, the content being consumed. This will result in a lot of business logic created in the LookML layers (see below).
This is a great way to get started and it works perfectly within the Looker application. The limitation arises if you wanted to use this query outside of Looker, for example:
- If anyone else wants to consume this data for other purposes, such as a data scientist, they will either have to do API calls to get this data or manually download it from Looker.
- PDT’s essentially allow you to persist a table back to your warehouse, allowing you to query it. However, Looker becomes the orchestration tool for this, so if you use another tool for the rest of your tech stack (e.g. Airflow), you don’t have any out-of-the-box integrations thus, you would need to develop your own triggers between the various jobs.
- If you have to change anything later, you’re going to have to figure out exactly what this query does (which could be an issue if it was built years ago). You then might need to build another derived table with slight variations, adding another layer of dependency.
You can prevent any of the above limitations created by Looker by shifting this logic to dbt. It works the same way as if you were to take your derived table queries, and break them up into dbt models, creating modular queries. Adding modularity allows you to be more flexible with your models, meaning they can be reused in various locations. They are easier to read, and you can see how everything is connected through DAGs.
Therefore all your transformations and logic is sitting in your warehouse, creating one single source of truth accessible to everyone.
Overall, combining the power of dbt and Looker allows you to build a robust, scalable warehouse layer, so your analysts can focus on delivering an amazing user experience in the front-end for data consumers.
3. Choosing what to model in dbt vs. LookML
One of the common questions that arise from integrating these two tools is: What gets modelled where?
dbt has written an insightful article around this going into more detail, but here is our recommendation:
dbt
- All transformations should stay in dbt.
- All tests – for data integrity.
- Logical statements, such as case whens, extracts flattening, and anything related to building your final models to be consumed by Looker.
- Model documentation – dbt allows you to create amazing documentation for your models that are easy to access and understand.
Looker
- Code that enhances user experience such as all your customisation, labelling descriptions, custom HTML.
- Access rights and user attributes – Looker is a great tool for limiting or filtering the data for the end-user. Read more on adjusting access rights in Looker here.
- Anything that needs to be computed at query time, such as aggregations, sums, and averages on the top of your fields. E.g. in an e-commerce setup where you have a table with all orders per customer, you don’t want to build a model in dbt where you aggregate all orders per customer, because Looker already does a fantastic job abstracting the SQL from LookML and creating the most optimal SQL for your queries.
4. Why choose dbt, Looker + BigQuery to build your modern BI stack
BigQuery is a scalable modern data warehouse, with real-time capabilities. It is quick and easy to integrate and has unique activation and analytics capabilities via Looker and dbt (e.g. BigQuery ML). We cover more about harnessing the power of BigQuery with Looker in this free online workshop.
BigQuery ML (BQML) integrates seamlessly with Looker and dbt such that data from trained models never leaves BigQuery. This removes the need for additional data pipelines and management and ensures faster delivery of insights to the customers. One of our favourite features is the model descriptions in dbt that flow through to BigQuery and Looker, so once it’s defined, it’s accessible across all tools. To find out more about using BQML, have a look at our how-to blog.