Microstrategy Analytics

Star schemas and aggregate (or summary) fact tables

Aggregate tables can further improve query performance by reducing the number of rows over which higher-level metrics must be aggregated.

However, the use of aggregate tables with dimension tables is not a valid physical modeling strategy. Whenever aggregation is performed over fact data, it is a general requirement that tables joined to the fact table must be at the same attribute level or at a higher level. If the auxiliary table is at a lower level, fact rows will be replicated prior to aggregation and this will result in inflated metric values (also known as "multiple counting").

With the above Time dimension table, a fact table at the level of Day functions correctly because there is exactly one row in DIM_TIME for each day. To aggregate the facts to the level of Quarter, it is valid to join the fact table to the dimension table and group by the quarter ID from the dimension table.

Sql

select DT.QUARTER_ID,

max(DT.QUARTER_DESC) Quarter_Desc

sum(FT.REVENUE) Revenue

from DAY_FACT_TABLE FT

join DIM_TIME DT

on (FT.DAY_ID = DT.DAY_ID)

group by DT.QUARTER_ID

If, however, there is an aggregate fact table already at the level of Quarter, the results will not be correct. This is because the query must join on Quarter ID, but the quarter ID is not a unique key of the dimension table. Because any given quarter of a year contains 90, 91 or 92 days, the dimension table will contain that many rows with the same quarter ID. Thus fact rows will be replicated prior to taking the sum, and the sum will be too high.

Sql

select FT.QUARTER_ID,

max(DT.QUARTER_DESC) Quarter_Desc

sum(FT.REVENUE) Revenue

from QTR_FACT_TABLE FT

join DIM_TIME DT

on (FT.QUARTER_ID = DT.QUARTER_ID)

group by FT.QUARTER_ID

This is a generally recognized problem with star schemas, and is not strictly a MicroStrategy limitation.

Star schemas will function correctly with MicroStrategy SQL Generation Engine 8.x as long as they obey the general data warehousing principle that fact tables should not be at a higher level than the dimension tables to which they are joined.

If aggregate tables are required, it is necessary to provide higher-level lookup tables with unique rows corresponding to each aggregate table's key. Logical views are a way to do this without adding tables or views to the warehouse. For example, LWV_LU_QUARTER may be defined using the following SQL statement:

Sql

select distinct QUARTER_ID, QUARTER_DESC, YEAR_ID

from DIM_TIME

With this logical view, it becomes possible for MicroStrategy SQL Generation Engine 8.x to query the quarter-level fact table as follows. Since the logical view has distinct rows per quarter, multiple counting will not occur in this query.

Sql

select FT.QUARTER_ID,

max(LQ.QUARTER_DESC) Quarter_Desc

sum(FT.REVENUE) Revenue

from QTR_FACT_TABLE FT

join (select distinct QUARTER_ID, QUARTER_DESC, YEAR_ID

from DIM_TIME) LQ

on (FT.QUARTER_ID = LQ.QUARTER_ID)

group by FT.QUARTER_ID

For more information on the use of logical views in MicroStrategy SQL Generation Engine 8.1.x and 9.x, consult the MicroStrategy Project Design Guide manual, Appendix B: Logical Tables, "Creating logical tables."

Aggregate tables store pre-summarized totals at a higher level of aggregation than the most granular fact table. They allow reports to be generated from small, rather than large, tables; therefore, performance is enhanced. A successful aggregation strategy seeks to choose aggregate tables that will have the most impact while taking the least amount of space.

Aggregation decisions are driven by the following factors:

Usage patterns: Build aggregate tables that are likely to be used the most.
Compression ratios: The compression ratio between two tables is defined as the size of the aggregate compared to the size of the smallest table from which the aggregate can be derived.
Volatility: Changes in hierarchies over time impact the accuracy of aggregate tables. Sometimes aggregate tables must be rebuilt as a result of changes in dimensions.

A good candidate for aggregation should have at most 10-15 percent of the size of the smallest table from which it is derived.

EXAMPLE:

The MicroStrategy Tutorial project uses aggregate tables by default. A simple metric sum (Revenue) will go to different aggregate tables depending on the attributes on the template.

Create a report with Year on the rows and Revenue on the columns.
Execute the report and view the SQL:
Drill from Year to Item and view the SQL:

The query will go from using ORDER_FACT to ORDER_DETAIL. When Year is on the template, the engine selects the smaller table and the fact is calculated as:

sum(a11.ORDER_AMT)

instead of:

sum((a11.QTY_SOLD * (a11.UNIT_PRICE - a11.DISCOUNT)))

Comments

hinaFebruary 5, 2021 at 12:19 PM
Is there any solution now to use aggregate tables with star schema without creating logical tables?
ReplyDelete
Replies

Add comment

Microstrategy Analytics

Search This Blog

Star schemas and aggregate (or summary) fact tables

Labels

Comments

Post a Comment

Popular posts from this blog

Case functions Microstrategy

Microstrategy Dossiers explained

Microstrategy Caches explained

Derived metric based on attribute values

mstrio – Python and R wrappers for the MicroStrategy

MicroStrategy URL API Parameters

Export a Report Services document to Excel with formatting using URL API

Microstrategy Dashboard performance improvements steps

Microstrategy Custom number formatting symbols

Prompt-in-prompt(Nested Prompts) in Microstrategy