How to Deploy Slowly Changing Dimensions?

Miscellaneous

Hemant Warudkar | 03 Dec 2019

How to Deploy Slowly Changing Dimensions?

In the previous blog, I had explained the different types of Slowly Changing Dimensions (SCD) that can be used in dimensional modeling for tracking changes. In this blog, I shall be focusing on the Type 2 form of SCD (Slowly Changing Dimensions) and the issues associated with its implementation.

Type 2 Slowly Changing Dimensions (SCD) Implementation Recap

Before that, using an example, I would like to do a quick recap on Type 2 Slowly Changing Dimensions (SCD) implementation. While building star schema’s in a data warehouse, the dimensions tables are joined with the fact tables. To track the changes in a dimension, the Type 2 technique of Slowly Changing Dimensions (SCD) is used. In this type when values for the current dimension record change, the current record is marked as “closed”, and the new record gets inserted.
So, for example, if Mike moves from Los Angeles to New York, then there will be 2 records associated with Mike in the updated dimension table. The latest record will be marked as open/active while the old record will be marked as inactive/closed.

Original Record

Customer ID	Master Customer ID	Name	City	IsActive
1111	1111	Mike	Los Angeles	1
2222	2222	Jane	San Francisco	1

Updated Record

Customer ID	Master Customer ID	Name	City	IsActive
1111	3333	Mike	Los Angeles	0
2222	2222	Jane	San Francisco	1
3333	3333	Mike	New York	1

There are a few challenges one is likely to face while fetching the latest information related to a customer. Let’s take the example of Mike and Jane which I had talked of in the first part while explaining a Type 2 SCD (slowly changing dimensions type 2), further.

Let’s suppose both Mike and Jane have one order containing one product when they were living in Los Angeles and San Francisco, respectively. After moving to New York, Mike updated his city information on the e-commerce website but did not purchase any product.

Customer ID	Order ID	Product ID	Order Amount
1111	ABCD	123	599
2222	EFGH	1234	499
1111	IJKL	12	201

In the above scenario, if all we care about is Mike and Jane’s Master Customer ID and associated sale, then we can simply join the customer table with the sales table using customer id and roll the information by Master Customer ID. The query should look like below

Select cu.Master Customer ID,
sum(sl.Order Amount) as Total Amount
From Customer cu
Join Sales sl on cu.Customer ID = sl.Customer ID
Group by cu.Master Customer ID

The result would look like below.

Master Customer ID	Order Amount
3333	800
2222	400

However, if we need some more recent information about the customer then the above method will not work. There are a few ways by which you can achieve this.

Achieve business success with our data analytics consulting services

LEARN MORE

Joining the customer table twice
In this method, there is another join needed with the customer table in the above query so that the customers’ latest information is selected and displayed in the result. Modified query is copied below:

Select cu.Master Customer ID,
Mast.Name,
Mast.City,
sum(Order Amount) as Total Amount
From Customer cu
Join Sales sl on cu.Customer ID = sl.Customer ID
Join Customer Mast on cu.Master Customer ID = Mast.Customer ID and Mast.IsActive = 1
Group by cu.Master Customer ID,
Mast.Name,
Mast.City

Although the above method is straightforward, it increases the cost of the query, especially if the number of columns in the customer table is more. This results in slow query performance.

The above query still gives faster results for columnar databases such as Redshift, Actian, MonetDB. But row-based databases such as Oracle or Teradata really struggle in producing the output quickly. To mitigate this, 2 views can be created. One will have only 2 columns containing Master Customer ID and related customer IDs. The other view will have a snapshot of the latest customer information frequently needed. Two customer joins mentioned in the above query will be replaced by these 2 views.

Using Analytic Functions:

We can get similar results by using Analytic functions such as First_value. Modified query is copied below:

Select cu.Master Customer ID,
cu.Name,
FIRST_VALUE(cu.City) OVER (PARTITION BY Master Customer ID ORDER BY Customer ID DESC ROWS UNBOUNDED PRECEDING) AS LatestCity,
sum(NVL(Order Amount,0)) as Total Amount
From Customer cu
Left Join Sales sl on cu.Customer ID = sl.Customer ID
Group by cu.Master Customer ID,
cu.Name,
LatestCity
;

The result is copied below:

Master Customer ID	Name	City	Order Amount
3333	Mike	New York	800
2222	Jane	San Francisco	400

The above query should produce similar results.

There are some drawbacks to the above method. We have to create a left join between Customer & Sales. If we create an inner join, then the results will be incorrect. Apart from that, the left join will create an additional row for the records which do not have associated sales information.

E.g.: In the case of Mike, a record with null order information will be created for Customer ID 3333 since that ID does not have any corresponding sales information. This can be tackled by adding the NVL function to replace the null order amount with 0. If we do not want to aggregate then we can write a nested query where we can add a not null condition in where clause to remove the null record associated with blank order id.

In conclusion: It is clear that although SCDs type 2 (slowly changing dimensions type 2) are used to track historical changes one needs to be careful while writing queries & fetching the correct information.

An Engine That Drives Customer Intelligence

Oyster is not just a customer data platform (CDP). It is the world’s first customer insights platform (CIP). Why? At its core is your customer. Oyster is a “data unifying software.”

Explore More

Download case studies

See how Express Analytics helped a department store and a restaurant chain bridge the digital-physical divide.

download

How to Deploy Slowly Changing Dimensions?

Type 2 Slowly Changing Dimensions (SCD) Implementation Recap

Achieve business success with our data analytics consulting services

An Engine That Drives Customer Intelligence

Leave a comment Cancel reply

Recent Posts

Download case studies

Do you want more customers

How to Deploy Slowly Changing Dimensions?

Type 2 Slowly Changing Dimensions (SCD) Implementation Recap

Achieve business success with our data analytics consulting services

An Engine That Drives Customer Intelligence

Liked This Article?

Leave a comment Cancel reply

Recent Posts

Download case studies

Do you want more customers