Slowly changing dimensions

Slowly changing dimensions in MSTR

Slowly changing dimensions (SCDs) are a common characteristic in many business intelligence environments. Usually, dimensional hierarchies are presented as independent of time. For example, a company may annually reorganize their sales organization or recast their product hierarchy for each retail season. “Slowly” typically means after several months or even years. Indeed, if dimensional relationships change more frequently, it may be better to model separate dimensions.

SCDs are well documented in the data warehousing literature. Ralph Kimball has been particularly influential in describing dimensional modeling techniques for SCDs (see The Data Warehouse Toolkit, for instance). Kimball has further coined different distinctions among ways to handle SCDs in a dimensional model. For example, a Type I SCD presents only the current view of a dimensional relationship, a Type II SCD preserves the history of a dimensional relationship, and so forth.

The discussion below is based on an example sales organization that changes slowly in time as the territories are reorganized; for example, sales representatives switch districts in time.

As-is vs. as-was analysis

One of the capabilities available with slowly changing dimensions is the ability to perform either “as-is” analysis or “as-was” analysis:

•

“As-is” analysis presents a current view of the slowly changing relationships. For example, you can display sales by District according to the way Districts are organized today.

•

“As-was” analysis presents a historical view of the slowly changing relationships. For example, you can display sales by District according to the way Districts were organized at the time the sales transactions occurred.

The techniques described here provide the flexibility to perform either type of analysis. They also provide you an easy way to specify which type of analysis you would like to perform.

Example 1: Compound key with Effective Date and End Date

One way to physically store an SCD is to employ Effective Date and End Date columns that capture the period of time during which each element relationship existed. In the example below, Sales Rep Jones moved from District 37 to District 39 on 1/1/2004, and Kelly moved from District 38 to 39 on 7/1/2004.

For information on compound keys, please refer to Lookup tables: Attribute storage.

LU_SALES_REP

Sales_Rep_ID	Sales_Rep_Name	District_ID	Eff_Dt	End_Dt
1	Jones	37	1/1/1900	12/31/2003
2	Smith	37	1/1/1900	12/31/2099
3	Kelly	38	1/1/1900	6/30/2004
4	Madison	38	1/1/1900	12/31/2099
1	Jones	39	1/1/2004	12/31/2099
3	Kelly	39	7/1/2004	12/31/2099

When using this type of dimensional lookup table, the fact table must include a date field, such as a transaction date.

FACT_TABLE

Sales_Rep_ID	Trans_Dt	Sales
1	9/1/2003	100
2	9/10/2003	200
3	9/15/2003	150
1	3/1/2004	200
2	3/10/2004	250
3	3/15/2004	300
2	9/5/2004	125
3	9/15/2004	275
4	9/20/2004	150

To specify the MicroStrategy schema

Create a logical view to represent just the current District-Sales Rep relationships.

LVW_CURRENT_ORG

select Sales_Rep_ID, District_ID
from LU_SALES_REP
where End_Dt = '12/31/2099'

Create another logical view that performs the “as-was” join between the lookup table and fact table, resulting in a fact view at the District level.

The resulting view is an “as-was” or historical view, which captures the Sales Rep-District relationships that existed at the time the transactions occurred.

LVW_HIST_DISTRICT_SALES

select District_ID, Trans_Dt, sum(sales)
sales 
from LU_SALES_REP L
join FACT_TABLE F
on(L.Sales_Rep_ID = F.Sales_Rep_ID)
where F.Trans_Dt between L.Eff_Dt and
L.End_Dt
group by District_ID, Trans_Dt

Create a table alias LU_CURRENT_DISTRICT for LU_DISTRICT.

Define the following attributes:

•

Sales Rep:

—

@ID = sales_rep_id; @Desc = sales_rep_name

—

Tables: LU_SALES_REP (lookup), LVW_CURRENT_ORG, FACT_TABLE

•

Current District:

—

@ID = district_id; @Desc = district_name

—

Tables: LU_CURRENT_DISTRICT (lookup), LVW_CURRENT_ORG

—

Child: Sales Rep

•

Historical District:

—

@ID = district_id; @Desc = district_name

—

Tables: LU_DISTRICT (lookup), LU_SALES_REP, LVW_HIST_DISTRICT_SALES

—

Child: Sales Rep

•

Date:

—

@ID = date_id, trans_dt

—

Tables: LU_TIME (lookup) , FACT_TABLE, LVW_HIST_DISTRICT_SALES

•

Month:

—

@ID = MONTH_ID

—

Tables: LU_TIME (lookup)

Define the Sales fact:

•

Expression: sales

•

Tables: FACT_TABLE, LVW_HIST_DISTRICT_SALES

Define the metric as required:

•

Sales: SUM(sales)

The result of this is a logical schema that looks like the following:

As-was analysis

Specify the “as-was” analysis by using the Historical District attribute on reports:

•

Report definition: Historical District, Month, Sales

•

Resulting SQL

Select a11.District_ID District_ID,
max(a13.District_Name) District_Name,
a12.Month_ID Month_ID,
sum(a11.SALES) WJXBFS1
From (select District_ID, Trans_dt,sum(sales) sales
from LU_SALES_REP L
join FACT_TABLE F
on (L.Sales_rep_ID = F.Sales_rep_ID)
where F.trans_dt between L.EFF_DT and
L.END_DT
group by District_ID, Trans_dt)
a11
join LU_TIME a12
on (a11.Trans_dt = a12.Date_ID)
join LU_DISTRICT a13
on (a11.District_ID = a13.District_ID)
group by a11.Distrcit_ID,
a12.Month_ID

•

Report results

As-is analysis

Specify the “as-is” analysis by using the Current District attribute on reports:

•

Report definition: Current District, Month, Sales

•

Resulting SQL

select a12.District_ID District_ID,
max (a14.District_Name) District_Name,
a13.Month_ID Month_ID,
sum(a11.SALES) WJXBFS1
from FACT_TABLE a11
join (select Sales_rep_ID, District_ID
from LU_SALES_REP
where END_DT = '12/31/2099')a12

on (a11.Sales_Rep_ID =
a12.Sales_Rep_ID)
join LU_TIME a13
on (a11.Trans_dt = a13.Date_ID)
join LU_DISTRICT a14
on (a12.District_ID = a14.District_ID)
group by a12.District_ID,
a13.Month_ID

• Report result

Example 2: New surrogate key for each changing element

A more flexible way to physically store a SCD is to employ surrogate keys and introduce new rows in the dimension table whenever a dimensional relationship changes. Another common characteristic is to include an indicator field that identifies the current relationship records. An example set of records is shown below.

LU_SALES_REP

Sales_Rep_CD	Sales_Rep_ID	Sales_Rep_Name	District_ID	Current_Flag
1	1	Jones	37	0
2	2	Smith	37	1
3	3	Kelly	38	0
4	4	Madison	38	1
5	1	Jones	39	1
6	3	Kelly	39	1

When using this type of dimensional lookup table, the fact table must also include the surrogate key. A transaction date field may or may not exist.

FACT_TABLE

Sale-Rep_CD	Sale
1	100
2	200
3	150
5	200
2	250
3	300
2	125
6	275
4	150

Specifying the MicroStrategy schema

Create a logical view to represent just the current District-Sales Rep relationship.

LVW_CURRENT_ORG

select Sales_rep_ID, District_ID
from LU_SALES_REP
where Current_flag = 1

Create a table alias LU_CURRENT_DISTRICT for LU_DISTRICT.

Define the following attributes:

•

Sales Rep Surrogate:

—

@ID = sales_rep_cd

—

Tables: LU_SALES_REP (lookup), FACT_TABLE

•

Sales Rep:

—

@ID = sales_rep_id; @Desc = sales_rep_name

—

Tables: LU_SALES_REP (lookup), LVW_CURRENT_ORG

—

Child: Sales Rep Surrogate

•

Current District:

—

@ID = district_id; @Desc = district_name

—

Tables: LU_CURRENT_DISTRICT (lookup), LVW_CURRENT_ORG

—

Child: Sales Rep

•

Historical District:

—

@ID = district_id; @Desc = district_name

—

Tables: LU_DISTRICT (lookup), LU_SALES_REP

—

Child: Sales Rep

•

Date:

—

@ID = date_id, trans_dt

—

Tables: LU_TIME (lookup), FACT_TABLE

•

Month:

—

@ID = MONTH_ID

—

Tables: LU_TIME (lookup)

—

Child: Date

Define the Sales fact:

•

Expression: sales

•

Tables: FACT_TABLE, LVW_HIST_DISTRICT_SALES

Define the metric as required:

•

Sales: SUM(sales)

The result is a logical schema as follows:

As-was analysis

Specify the “as-was” analysis by using the Historical District attribute on reports:

•

Report definition: Historical District, Month, Sales

•

Resulting SQL

select a12.District_ID District_ID,
max(a14.Distrcit_Name) Distrcit_Name,
a13.Month_ID Month_ID,
sum(a11.SALES) WJXBFS1
from FACT_TABLE a11
join LU_SALES_REP a12
on (a11.Sales_Rep_CD =
a12.Sales_Rep_CD)
join LU_TIME a13
on (a11.Trans_dt = a13.Date_ID)
join LU_DISTRICT a14
on (a12.District_ID =
a14.District_ID)
group by a12.District_ID, 
a13.Month_ID

•

Report results

As-is analysis

Specify the “as-is” analysis by using the Current District attribute on reports:

•

Report definition: Current District, Month, Sales

•

Resulting SQL:

select a13.District_ID District_ID,
max(a15.Distrcit_Name) District_Name,
a14.Month_ID Month_ID,
sum(a11.SALES) WJXBFS1
from FACT_TABLE a11
join LU_SALES_REP a12
on (a11.Sales_Rep_CD =
a12.Sales_Rep_CD)
join (select Sales_rep_ID, District_ID
from LU_SALES_REP
where current_flag = 1) 
a13
on (a12.Sales_Rep_ID =
a13.Sales_Rep_ID)
join LU_TIME a14
on (a11.Trans_dt = a14.Date_ID)
join LU_DISTRICT a15
on (a13.District_ID =
a15.District_ID)
group by a13.District_ID,
a14.Month_ID

•

Report result

Microstrategy Analytics

Search This Blog