Usually, business stakeholders are no data experts. They are primarily interested in getting valuable insights out of data, upon which they can take the appropriate actions. As a result, it is difficult for stakeholders to know what is possible until an analyst shows them. Furthermore, they have no idea about the effort it requires to deliver valuable insights.
In today’s fast changing market, it is rather impossible for business stakeholders to define clear and feasible goals upfront as they are not even aware what these goals might be. Moreover, when an opportunity presents itself, the company needs to explore and act on it faster than its competitors.
Consequently, requirements often come in very late with tight deadlines. Even when the analytics team is finally able to deliver the required insights, the objectives might have already changed, or it is not entirely how business expected it to be. The delivered analytics can also generate more questions than it provides answers because it has enabled stakeholders to understand the business in new ways which leads to a new series of questions, requirements and objectives.
Agile methodologies have known a serious uprising in the last few years. It is therefore that we have personalized an agile methodology to a “multiple speed approach”. This approach allows us to divide a Data Hub, as defined in our previous blog post, in two main parts. The first part is the integration of all data source systems, while the second part will be for the consumption of the data for analytics and applications. The integration layer contains all integrated and historicized source data without applying any business rules (transformations, filters, functions). This means that the data captured in this layer represents the version of the facts. It is a more technical layer that reflects the data as it is stored within the data sources. When moving further to the consumption layer, business rules will be applied for data analytics and applications purposes. The raw data captured in the integration layer will be transformed in the consumption layer and will represent the version of the truth. It is a functional / business layer where the data will be transformed into valuable and understandable data products. How does this approach solve the challenge?
- Changing business requirements:
Business requirements will change over time because today’s use case differs from the one within a day, month, year,… That’s why the consumption layer should be resilient to change. By dividing the data hub into two layers, the consumption layer can change whenever the business wants because all data will be kept in the integration layer without changing it by applying business rules.
- Divide and conquer:
- When implementing the data hub, we split up the team in two sub teams. One team will focus on integrating data entities coming from different data sources. All functional relevant data will be integrated, even if the end user does not need this data at that moment. This process is automated as much as possible by applying data warehouse automation, continuous integration and continuous delivery (CI/CD). This approach offers organizations scalability and flexibility in adding new data sources and new data entities of existing data sources.
- The other team will focus on the transformation of the raw data (stored in the integration layer) into valuable data products which will be used by business stakeholders. When requirements change and f.e. new data is needed in the consumption layer, it can be sourced from the integration layer because all functional relevant data is present in this layer.