From BEAM✲ table to star schema

In our previous blog post, we explored the BEAM✲ (Business Event Analysis & Modeling) methodology for data warehouse design, emphasizing the importance of structuring event-based tables to capture key business processes effectively. The BEAM✲ method is described by Lawrence Corr and Jim Stagnitto in their book Agile Data Warehouse Design: Collaborative Dimensional Modeling, from Whiteboard to Star Schema (AmazoneBook). By leveraging BEAM✲ tables, we established a foundation for an event-driven data model that facilitates analytical insights.

The next step in this agile approach is the transformation of our BEAM✲ tables into a structured star schema. This involves identifying key dimensions and facts, establishing business keys, and ensuring that the model supports meaningful reporting and analytics. Through this transformation, we aim to create a dimensional data model that enables stakeholders to gain deeper insights into their operations while maintaining a flexible and scalable design.

Use Case - Frosty Adventures

Remember Frosty Adventures? This multinational B2B company contacted us for building a scalable and centralized data warehouse solution so that they gain critical insights into their processes. In a previous article, we defined a BEAM✲ table for their customer orders process. We’ve shortly repeated the final BEAM✲ table.

Upon this moment, we’ve completed our workshop and documented all their events in BEAM✲ tables. With this information we can start designing our data warehouse solution. We aim to develop a dimensional data warehouse with dimensions and fact tables. The dimensions are the entities of the event with descriptive attributes: who, what, when, where, why and how. The facts are used for the quantities and the measurable entities described by the attribute how many.

Modelling the dimensions

Detailed, descriptive dimensions allow meaningful reporting. Stakeholders need descriptive attributes to show the information from different perspectives, filter the reports and roll-up and drill-down measures as needed.

The dimensions are discovered in the event stories and listed in the BEAM✲ event table. All the event details that have additional descriptive attributes are modelled as dimensions in the data warehouse. Typically, these are six of the seven Ws of the event story: who, what, when, where, why and how. Other event details that do not have additional descriptions are translated directly into fact tables as degenerate dimensions (e.g. the order ID). These degenerate dimensions usually don’t have any descriptive attributes that cannot be transposed to the fact or other dimensions.

Dimensions are modelled using tables similar to the event tables. Preferably, the same examples as the event table are used. To demonstrate this, the product dimension is modelled in this section. We start by drawing a table similar to the event table with the name of the dimension and the values from the event table.

The next step is defining the granularity of the dimension. The question that we need answered is “what uniquely identifies the dimension?”. The granularity is defined by the business keys, the set of columns that uniquely define the subject. This means that the fields in this set are mandatory and stable and that the set uniquely defines the row.

The business keys are usually the primary keys of the source system. We add the identity columns to the table and tag them with BK (business key) and MD (mandatory). In our use case we have a centrally managed product catalogue that assigns a unique code to each product. In other situations, we might be less fortunate and have to combine the keys from multiple sources to find a set of business keys.

We are now ready to discover the attributes of the dimension. We search for the attributes that are interesting for reporting and what attributes the stakeholders like to sort, group or filter on. The attributes are added to the BEAM✲ table and for each the examples are filled. Using examples is a good way to

  • align on the definition because it makes it clear and tangible to the stakeholders (descriptions might be too abstract to think of any special cases),
  • discuss on the attributes and how they are retrieved (maybe some of them are nice to haves and don’t exist yet or some are too difficult to extract for the current scope),
  • see if the attributes belongs in the dimension and have a single value for the dimension (granularity).

The figure below shows the attributes of the product dimension. During the modelling, consider to add the descriptive translation of cryptic codes and the hidden meaning in smart keys in attributes. Encoding the information in the data warehouse avoids that the definition has to be done in the BI applications, and avoids multiple truths and outdated logic when e.g. the smart keys are replaced.

The BEAM✲ method recommends to add one or a few examples with missing values. This will show the preferred way of showing missing (N/A, Missing, Not Applicable, Error, …). Even for the mandatory fields, a missing value should be added. This allows to add a dummy key for the dimension that can be used when the event does not have a value for the dimension yet. Using this dummy key will provide a meaningful missing value in the reports rather than Null.

When the modelstorming gets stuck, the 7 Ws can be used to dig out details about the dimension. Not all will make sense in every situations.

For the product dimension we can ask:

  • To what belongs a product? Product category.
  • Under what regulation belongs the product? maybe the product is ECO rated.
  • Who made the product? Origin might be a valid attribute.
  • Since when is the product available? Since the launch date.

BEAM✲ uses change stories to describe the business rules for handling slowly changing dimensions (SCD). This temporal property is the last piece of information we need to add to the BEAM✲ table. It is important to align on this property with the stakeholders as this affects the design of the warehouse. There are four common types of SCD:

  • Fixed Value or type 0: The value of the attribute never changes once it’s set. Example: A product’s initial release date does not change and therefor does not warrant updates. Note: changes in the source system are not reflected in the dimension’s attribute.
  • Current Value or type 1: The value of the attribute is updated to reflect the most recent value. Previous values are overwritten. Example: The ****product description of the product with product key 2001 is changed from “Helmet” to “Cool Helmet”.
  • Historic Value or type 2: Each change adds a new row in the dimension table whenever an attribute changes in order to preserve the historical values. It uses three additional columns: ‘From EFFECTIVE DATE’, ‘To END DATE’ and ‘is CURRENT’ flag to track the changes.

    In the following example, the ****product description is changed in the same way as the previous example:

  • Previous Value or type 3: The previous value is stored alongside the current value, allowing you to retain a limited history. This can be done by adding an extra column.

    Continuing with the previous example, the history in an SCD3 would be stored as follows:

Using a change story makes it easier to capture the requirements of the temporal property. A possible story for the product list here is: “The manufacturer of the gloves relocates to another country. This will effectively change the value for country of manufacturing. When this happens, do we need to assign everything to the new country?”. If the answer is yes then we have a type 1, else we have a type 2.

We have now sufficiently modelled the dimension product. After doing the other dimensions we can go to the next phase and start with the technical part of the modelling.

Data Profiling

With the event and dimensions modelled, we can start to design the star schema. The first step is to identify candidate data sources for the events and dimensions. This process of examining data sources to learn about their structure, content and data quality, is called data profiling. There exist fancy and sophisticated tools to do data profiling, but simple SQL scripts also suffice.

  1. Missing values

Check for missing values by counting the occurrence of Null values or blanks in each candidate source column and calculate the percentage missing. This is especially important for fields that stakeholders have identified as mandatory fields.

				
					SELECT count(*)
FROM [SourceTable]
WHERE [SourceColumn] IS NULL;
				
			

2. Unique values and frequency

A second important property of a candidate source is the cardinality, the number of unique values and the frequency of each value. A good candidate for a business key is a source column with 100% unique values.

				
					SELECT count(distinct [SourceColumn]),
              (count(distinct [SourceColumn]) / count(*)) * 100
FROM [SourceTable];
				
			

3. Data ranges and lengths

Identify source data ranges by querying

  • the minimum, maximum and average values for numeric columns
  • earliest and latest date or timestamps in datetime columns
  • shortest and longest strings for character columns

This information will help you with defining data types and set date ranges, but also identifying outliers that represent errors.

Now that we have our profiling results, we need to find a convenient way to present these results to the stakeholders, so that they can review data issues. There is no better way to present the results with a format that they’re already familiar with: the BEAM✲ table. We’ll annotate the BEAM✲ model with our data profiling results.

For each column in the BEAM✲ table, we’ll record its best candidate source within braces ({Source.Table.Column}). If the source table is the same for all columns in the BEAM✲ table, it’s placed at the top and we only mention the individual column name between braces. If a table or column is derived from multiple sources, you can comma delimit them within the braces. During this phase, we’ll not only identify column sources, but also record their data type and length. It’s extremely likely that you will discover relevant data that the stakeholders didn’t request and you should add them to the model for review. Those new columns are written in bold italics. On the opposite side, if you cannot find an appropriate data source for a column or a table, use bold strikethrough on the column and its examples. We can also apply bold strikethrough on column codes when there is an attribute definition mismatch. For example, according to the stakeholders a column should be mandatory, however, the data profiling results show that it has 20% missing values so it has been marked as MD. Another useful practice, is to denote columns as ‘not Null’ (NN). This opens up new areas of analysis that previously weren’t considered.

Schema design

Once the profiling results are added to the BEAM✲ table, you should discuss this with the stakeholders and define priorities. Within iterations, you’ll create a dimensional model and draw the star schemas that will be used by the DW/BI team to generate the physical data warehouse. In order to call a BEAM✲ business model a dimensional model, the following tasks need to be done:

  • Adding keys: primary, foreign, surrogate and business keys

    During the modelstorming phase, we’ve defined at least one attribute in each dimension as a business key (BK) to uniquely identify each dimension member. For example, the product code uniquely identifies a member of the dimension PRODUCT. However, from a technical point of view, using the business key as a primary key is not a great idea as the primary key needs to be stable, minimal and not null across multiple business processes. Instead we calculate a leading surrogate key column (SK) as an integer sequence number assigned to each dimension table row and we’ll use this as the primary key of the dimension. This primary/surrogate key is used as a foreign key to join dimensions to fact tables.

  • How to handle changes

    Previously, you modelled how each dimensional attribute should handle change by denoting attributes as fixed values (FV), current values (CV) or historic values (HV). This principal matches with the different types of slowly changing dimensions (SCD), which is explained above.

  • Adding ETL and Audit attributes

    By modelling how the dimension should track history (SCD), you’ve already added following new administrative attributes in your dimensions

    • EFFECTIVE DATE
    • END DATE
    • CURRENT

    Effective date and end date define the date range for each dimension row wherein the record is valid. There should be no overlaps or gaps. The most recent record does not really have an end date, but we set it to the maximum date in the future supported by the database. This way we can use IN BETWEEN logic in our SQL scripts. The current flag is set to ‘Y’ for the current record and ‘N’ for the old records.

    There are five additional administrative attributes that you should also consider adding to every dimension

    • MISSING
    • CREATED DATE
    • CREATED BY
    • UPDATED DATE
    • UPDATED BY

The missing flag is set to ‘Y’ for the special missing row with a surrogate key value 0 and default values for the other attributes. This row will act as a default row when no match is found based upon the join. The other CREATED and UPDATED attributes provide basic ETL audit information.

Notice the change in product description for product code ‘IPPBS16G’ from ‘Helmet’ to ‘Cool Helmet’.

Fact tables

Since all the dimensions have been updated with surrogate keys, we can convert the event tables into fact tables by replacing their who, what, when, where and why details with dimension foreign keys, while leaving quantities (how many) and degenerate dimension (how) details in place. As we will change the final BEAM✲ table, save copies of the original event tables for future modelstorming.

In our customer orders example, we can rename the event name ‘CUSTOMER ORDERS’ to a fact table name ‘ORDERS FACT’ and change the story type ‘DE’ to the fact table type ‘TF’ (transaction fact). Moreover, we replace all the dimensional details with surrogate keys by renaming the columns and changing their type to ‘SK’. In the below BEAM✲ table for example, we see that the CUSTOMER column has been replaced by the appropriately named surrogate keys and the examples changed to surrogate key integer values. Degenerate dimensions (DD) such as ORDER ID are not replaced by a surrogate key, because they have no additional descriptive attributes that need to be referenced. The remaining quantity columns are defined as facts, which should be modelled in their most additive form, so that they can be easily aggregated at query time. Additivity defines how easy it is to sum up a fact and get a meaningful result.

There are three types of facts:

  • Full Additive fact (FA): this fact can be summed using any of their dimensions.
  • Non-Additive fact (NA): this fact cannot be summed.
  • Semi-Additive fact (SA): this fact can be summed by certain dimensions only.

In the below example are ORDER QUANTITY, REVENUE and DISCOUNT defined as fully additive (FA). They must be stored with a consistent unit of measurement (UOM): for ORDER QUANTITY this is units, but **REVENUE and DISCOUNT need to be transformed during ETL into one currency.

Star schema

Finally, we have converted all our BEAM✲ tables to dimensional models and we are ready to draw our star schemas. As a best practice, you should use a graphical modelling tool to do this. This step is mostly straightforward as you have one star schema for each fact table and add their relevant dimensions.

However, by doing these two small things, you can create an enhanced star schema instead of a standard star schema:

  • Use a consistent star schema layout

    This maybe sounds trivial, but the DW/BI team scan multiple schemas every day and they find it helpful if each star schema uses the same layout. Based on the 7Ws framework, this is the recommended layout for star schemas. The four corners are reserved for the four major W’s: who, what, when and where.

  • Display BEAM✲ short codes on star schema

    Enhancing a star schema with BEAM✲ codes allow you to document dimensional properties and design decisions, otherwise not supported by standard ER modeling tools. In the below example, we describe the ORDERS fact as a transaction fact (TF) and the table level codes describe the default slowly changing policy for each dimension.

By now, we’ve explained the whole BEAM✲ technique in this article, and it wouldn’t be an agile approach to dimensional modelling, if we don’t follow the agile principle of early delivery of working software. As soon as you have a star schema, validate this design by prototyping the reports and dashboards that the stakeholders have wanted to talk about all along. They finally get the chance to see a working product in practice. You cannot know how well your data warehouse design matches the available data until you try to load it, nor how well it matches the stakeholders’ actual BI requirements until they use it. Load your design with a small amount of real data, often the last 2 years will suffice for the stakeholders to get a true feel of what the final solution will look like. Organize workshops to help stakeholders get their hands dirty using their design with real data and real BI tools.

Conclusion

Through the BEAM✲ methodology, we have successfully transformed Frosty Adventures’ business processes into a well-structured dimensional model, enabling a scalable and insightful data warehouse. By systematically defining event-driven dimensions and fact tables, applying agile data profiling, and incorporating best practices for schema design, we have laid a solid foundation for meaningful business intelligence.

The final star schema design ensures that the data warehouse not only aligns with stakeholder requirements but also remains adaptable to future changes. By emphasizing early validation with real data and interactive BI tools, we create an iterative process where stakeholders can actively engage with their data, ensuring the final solution meets their analytical needs.

With this agile approach, Frosty Adventures is now equipped with a robust, scalable, and user-friendly data warehouse—empowering them to derive actionable insights and optimize their business processes with confidence.

Sources

Agile Data Warehouse Design, Collaborative Dimensional Modeling, from Whiteboard to Star Schema, Lawrence Corr with Jim Stagnitto.

https://www.bisystembuilders.com/beam/

// Related articles

Read our other posts

// CONTACT US

Want to know more?