Express Analytics Logo
express analyticsAI-Powered Smarter Marketing
DATA ENGINEERING

How to DeploySlowly Changing Dimensions?

February 27, 2025
By Express Analytics
In order to track the changes in a dimension, the SCD Slowly Changing Dimensions Type 2 technique is used. In this type when values for current dimension record change, the current record is marked as closed and the new record gets inserted.
How to Deploy Slowly Changing Dimensions?

In the previous blog, I had explained the different types of Slowly Changing Dimensions (SCD) that can be used in dimensional modeling for tracking changes. In this blog, I shall be focusing on the Type 2 form of SCD (Slowly Changing Dimensions) and the issues associated with its implementation.

Type 2 Slowly Changing Dimensions (SCD) Implementation Recap

Before that, using an example, I would like to do a quick recap on Type 2 Slowly Changing Dimensions (SCD) implementation. While building star schema's in a data warehouse, the dimensions tables are joined with the fact tables. To track the changes in a dimension, the Type 2 technique of Slowly Changing Dimensions (SCD) is used. In this type when values for the current dimension record change, the current record is marked as "closed", and the new record gets inserted.

So, for example, if Mike moves from Los Angeles to New York, then there will be 2 records associated with Mike in the updated dimension table. The latest record will be marked as open/active while the old record will be marked as inactive/closed.

Original Record

Customer IDMaster Customer IDNameCityIsActive
11111111MikeLos Angeles1
22222222JaneSan Francisco1

Updated Record

Customer IDMaster Customer IDNameCityIsActive
11113333MikeLos Angeles0
22222222JaneSan Francisco1
33333333MikeNew York1

Implementation Challenges

There are a few challenges one is likely to face while fetching the latest information related to a customer. Let's take the example of Mike and Jane which I had talked of in the first part while explaining a Type 2 SCD (slowly changing dimensions type 2), further.

Let's suppose both Mike and Jane have one order containing one product when they were living in Los Angeles and San Francisco, respectively. After moving to New York, Mike updated his city information on the e-commerce website but did not purchase any product.

Sales Data

Customer IDOrder IDProduct IDOrder Amount
1111ABCD123599
2222EFGH1234499
1111IJKL12201

In the above scenario, if all we care about is Mike and Jane's Master Customer ID and associated sale, then we can simply join the customer table with the sales table using customer id and roll the information by Master Customer ID. The query should look like below:

Select cu.Master Customer ID,
sum(sl.Order Amount) as Total Amount
From Customer cu
Join Sales sl on cu.Customer ID = sl.Customer ID
Group by cu.Master Customer ID

The result would look like below:

Master Customer IDOrder Amount
3333800
2222400

However, if we need some more recent information about the customer then the above method will not work. There are a few ways by which you can achieve this.

Achieve business success with our data analytics consulting services

Solution 1: Joining the Customer Table Twice

In this method, there is another join needed with the customer table in the above query so that the customers' latest information is selected and displayed in the result. Modified query is copied below:

Select cu.Master Customer ID,
Mast.Name,
Mast.City,
sum(Order Amount) as Total Amount
From Customer cu
Join Sales sl on cu.Customer ID = sl.Customer ID
Join Customer Mast on cu.Master Customer ID = Mast.Customer ID and Mast.IsActive = 1
Group by cu.Master Customer ID,
Mast.Name,
Mast.City

Although the above method is straightforward, it increases the cost of the query, especially if the number of columns in the customer table is more. This results in slow query performance.

The above query still gives faster results for columnar databases such as Redshift, Actian, MonetDB. But row-based databases such as Oracle or Teradata really struggle in producing the output quickly. To mitigate this, 2 views can be created:

  1. One view will have only 2 columns containing Master Customer ID and related customer IDs
  2. Another view will have a snapshot of the latest customer information frequently needed

Two customer joins mentioned in the above query will be replaced by these 2 views.

Solution 2: Using Analytic Functions

We can get similar results by using Analytic functions such as First_value. Modified query is copied below:

Select cu.Master Customer ID,
cu.Name,
FIRST_VALUE(cu.City) OVER (PARTITION BY Master Customer ID ORDER BY Customer ID DESC ROWS UNBOUNDED PRECEDING) AS LatestCity,
sum(NVL(Order Amount,0)) as Total Amount
From Customer cu
Left Join Sales sl on cu.Customer ID = sl.Customer ID
Group by cu.Master Customer ID,
cu.Name,
LatestCity
;

The result is copied below:

Master Customer IDNameCityOrder Amount
3333MikeNew York800
2222JaneSan Francisco400

The above query should produce similar results.

Drawbacks of Analytic Functions Approach

There are some drawbacks to the above method:

  1. Left Join Requirement: We have to create a left join between Customer & Sales. If we create an inner join, then the results will be incorrect.

  2. Additional Rows: Apart from that, the left join will create an additional row for the records which do not have associated sales information.

    Example: In the case of Mike, a record with null order information will be created for Customer ID 3333 since that ID does not have any corresponding sales information. This can be tackled by adding the NVL function to replace the null order amount with 0.

  3. Null Record Handling: If we do not want to aggregate then we can write a nested query where we can add a not null condition in where clause to remove the null record associated with blank order id.

Best Practices for Type 2 SCD Implementation

  1. Performance Optimization: Use views for frequently accessed customer information
  2. Query Optimization: Consider database type (columnar vs row-based) when writing queries
  3. Data Integrity: Always use proper joins to maintain referential integrity
  4. Monitoring: Track query performance and optimize as needed
  5. Documentation: Maintain clear documentation of SCD implementation strategy

Conclusion

It is clear that although SCDs type 2 (slowly changing dimensions type 2) are used to track historical changes one needs to be careful while writing queries & fetching the correct information. The choice between different approaches depends on:

  • Database type (columnar vs row-based)
  • Query performance requirements
  • Data volume and complexity
  • Business requirements for historical tracking

Proper implementation of Type 2 SCD requires careful consideration of these factors to ensure both data accuracy and query performance.

Share this article

Ready to Transform Your Data Strategy?

Get expert guidance on data cleaning, analytics, and business intelligence solutions tailored to your needs.

Tags

#slowly changing dimensions type 2#SCD type 2#data warehouse deployment#dimensional modeling#ETL implementation#data engineering#SQL optimization#analytical functions#customer data management#historical tracking