In data engineering, teams often receive daily full snapshots of data from legacy systems or third-party sources. Traditionally, ingesting these periodic snapshots and identifying what changed (inserts, updates, deletes) each day has been cumbersome. Engineers need to write and maintain source-specific Change Data Capture (CDC) logic for each data source. Databricks has introduced Lakeflow Spark Declarative […]
The data industry has been focused on optimizing backend infrastructure, centralizing information into Data Lakes and Warehouses to break down data silos. While the ability to report on this data has improved significantly, we’ve encountered a “last mile” problem. Although data is easily accessible for analysis, delivering it into actionable, interactive user experiences has proven […]
Using semantic links in Microsoft Fabric became a lot easier with the introduction of Semantic Link Labs. Semantic Link Labs is an open-source Python library built upon the existing Semantic Links implementation in Fabric. As of the writing of this blog post Semantic Link Labs is still in early access but a lot of useful […]
Snowflake has introduced Workspaces in Snowsight – an integrated, browser-based IDE designed to streamline the way data engineers develop within the Snowflake ecosystem. With Workspaces, users can now develop, test and manage their code all from one interface. Key Benefit With the introduction of this feature, Snowflake consolidates the data engineering lifecycle into a single […]
Databricks has introduced Power BI as a task within its Workflows feature, allowing users to automate the publishing of semantic models directly from Unity Catalog to Power BI. This enhancement bridges the gap between data engineering and business intelligence, enabling faster, more reliable, and secure reporting. What is Databricks workflows? Databricks Workflows is a managed […]
In our previous blog post, we explored the BEAM✲ (Business Event Analysis & Modeling) methodology for data warehouse design, emphasizing the importance of structuring event-based tables to capture key business processes effectively. The BEAM✲ method is described by Lawrence Corr and Jim Stagnitto in their book Agile Data Warehouse Design: Collaborative Dimensional Modeling, from Whiteboard to […]
Like in any area of technological development and design, agile has made its way to data warehouse design. Over the past few decades, the agile mindset has gained momentum, becoming the standard in software and technology development. The well-known Agile Manifesto has twelve points that can be summarized as: Individuals and interactions over processes and tools […]
On June 10th 2024 Apache Iceberg tables became generally available on Snowflake. This blog will focus on creating Iceberg tables on Snowflake and illustrate how this is integrated with dbt. What are Iceberg Tables? Iceberg tables utilize the Apache Iceberg open table format, which acts as an abstraction layer over data files stored in open […]
In a previous DataTalks,Wouter Pardon talked about how to implement CI/CD for Azure Data Factory and SQL Server, focusing on making deployments easier and improving the quality of data pipelines. In this post, he wants to dive deeper into unit testing for Azure Data Factory (ADF). It’ll explain how Azure DevOps can help you run […]
Managing and extracting actionable insights from documents can be a time-consuming and error-prone task, especially when it comes to complex technical documents. These documents are often filled with critical information that needs to be efficiently analyzed and organized for operational decisions. That’s where Snowflake’s Document AI comes in. Snowflake’s advanced AI-powered platform is revolutionizing the […]