dbt, the Data Build Tool that has become the foundation for organizations managing data transformations. In our previous blog about dbt Mesh, we have unraveled the potential of dbt Mesh and its ability to converge different dbt functions, which enables scalability and security within and across projects. In this blog, we will discuss its evolution since the last Coalesce.
The need for multi-platform flexibility
dbt has been growing rapidly over the past eight years. In the beginning, a project with a few hundred models was already considered to be quite complex. Today, 5% of dbt projects have over 5000 models. This implies that the adoption of dbt and its data complexion have been increased tremendously. With this explosion in growth, more value is driven by the dbt workflow, which leads to more user engagement and more assets under management.
The need to tame this complexity now comes into the picture and practitioners need better approaches and tools. Not only is this one of the most interesting and important challenges for dbt, but for the data industry in general.
One of the dimensions of complexity which has not been addressed until now is the ability to work across multiple different data platforms. dbt Mesh already excels at handling multi-project deployments within a single data platform, making it highly effective in such environments. However, its limitation lies in the ability to operate across different platforms. This limitation implies that:
- Practitioners across different business units can’t discover or re-use the contents of other platforms
- Architects are not able to govern how data is exchanged
- Work happens in silos
Introducing cross-platform dbt Mesh
dbt Mesh’s current features include:
- breaking down a monolithic application into constituent parts
- governing how different datasets can be used downstream
- discovery of lineage across multiple projects
Since Coalesce 2024, the cross-platform ref support is added to dbt Mesh. Cross-platform dbt Mesh enables the creation of a unified, organization-wide data model that all teams can collaborate on, regardless of the technologies they use. By providing a shared foundation, cross-platform dbt Mesh bridges the diversity of our data ecosystem, making it possible for everyone to contribute and consume together.
How does cross-platform dbt Mesh work?
Cross-platform mesh leverages open table formats, in particular Apache Iceberg, to interchange data. Iceberg support exists today, but in a limited form since each platform only supports a subset of the Iceberg spec. The goal is to have an open table format support where data platforms can seamlessly interchange data, which enables cross-platform mesh.
Adopting Iceberg is a prerequisite to using cross-platform mesh. This can be seen as a limitation but it is not required to migrate every single table you manage to an Iceberg catalog. Access to an Iceberg catalog where public models are “staged” is required so that they can be referenced by downstream projects. Once you have access, you can “share” public models across warehouses without copying data.
The cross-platform dbt Mesh beta will include support for Athena, Databricks, Redshift and Snowflake.
Example
Athena upstream can be referenced by a Redshift downstream. No data is copied or duplicated in this process.
At the model level, it works as follows:
- Integrate both warehouses with the same Iceberg catalog, in the upstream and downstream warehouses
- In the upstream project: classify your model as public via dbt Mesh and configure it to be written into your Iceberg project
- In the downstream project:
ref
the upstream model - Under the hood: dbt Cloud translates the
ref
to point to the right place in the Iceberg catalog - When dbt Cloud executes the upstream: it writes data into the Iceberg catalog
- When dbt Cloud executes the downstream: it looks up the table in the catalog and then loads data directly from the Iceberg store
Conclusion: you can build a model in an upstream project, ref
it from a model in a downstream project, and the newly-built model will be immediately available for consumption by the downstream model
Improvement points
As cross-platform brings us many benefits, there are still a few improvements that can be addressed in the future. The two main points are complete compatibility and supporting a broad set of platforms, beside the ones mentioned above.
Conclusion
The new cross-platform feature of dbt Mesh marks a significant milestone in the evolution of the data ecosystem. By enabling seamless collaboration across dataplatforms, practitioners can break down silos and build a unified, organization-wide data model. Leveraging open table formats like Apache Iceberg ensures that data can be efficiently shared and consumed without duplication, which can drive efficiency and reduce complexity.
While the current beta supports key platforms like Athena, Databricks, Redshift and Snowflake, the journey doesn’t stop here. Future efforts will focus on achieving full compatibility with the Iceberg spec and extending support to a broader range of platforms, further democratizing data access and collaboration.