dbt, also known as Data Build Tool, is a transformation workflow that enables business analysts with knowledge of SQL to independently design and implement data transformations. This reduces dependencies on technical teams, allowing analysts to collaborate on data models and implement analytical code with software engineering best practices such as modularity, portability, CI/CD and documentation. With features such as version control in Data Build Tool, analysts can first test their analytical code in the development phase before safely deploying it to the production environment.
In essence, dbt focuses on transforming data within the domain of analytics, as opposed to traditional ETL (Extract, Transform, Load) processes. It adopts the ELT (Extract, Load, Transform) approach, where data is first extracted from various sources and then loaded into a data warehouse or database. Transformations are then applied to the raw data within the data warehouse itself.
Core principles of dbt
- Data Warehouse-Centric: Raw data is fed into the data warehouse, leveraging the computational power of modern data warehouses such as Snowflake, BigQuery or Redshift for in-database transformations.
- ELT Workflow: Instead of transforming data beforehand (ETL), dbt supports ELT, where raw data is loaded into the data warehouse (Extract, Load) and then transformed using SQL-based models (Transform).
- SQL as the DSL (Domain-Specific Language): Dbt uses SQL as its domain-specific language, eliminating the need for proprietary transformation languages or GUI-based ETL tools, with direct and transparent transformation logic.
- Git-based Version Control: Dbt projects are typically versioned with Git, allowing branch-based development, commit history and collaboration via pull-requests.
- Testing and documentation: dbt streamlines the process of automated testing for data models, ensuring precision and consistency in the data. Furthermore, it promotes the habit of documenting data models to enhance comprehension and facilitate knowledge sharing.
dbt products
dbt Core
dbt Core is the open-source version of dbt. It can be installed locally on your machine and it requires dbt infrastructure management, version control and possibly an orchestration platform to run and monitor tasks on a schedule.
dbt Cloud
Dbt Cloud is a cloud-based service built around dbt Core, and provides a Web-based UI for accessibility, a dbt cloud-powered command line (CLI) for development and execution of dbt projects, and a hosted environment for faster deployment. It offers additional features such as metadata, in-app job scheduler, observability and integrations with other tools. Dbt Cloud also manages infrastructure, database connections and credentials, and provides advanced analytics and reporting.
For a more comprehensive comparison between the two versions, you can read our blog on dbt-core and dbt-cloud: cloud vs. core