Star schemas and aggregate (or summary) fact tables
Aggregate tables can further improve query performance by reducing the number of rows over which higher-level metrics must be aggregated.
However, the use of aggregate tables with dimension tables is not a valid physical modeling strategy. Whenever aggregation is performed over fact data, it is a general requirement that tables joined to the fact table must be at the same attribute level or at a higher level. If the auxiliary table is at a lower level, fact rows will be replicated prior to aggregation and this will result in inflated metric values (also known as "multiple counting").
With the above Time dimension table, a fact table at the level of Day functions correctly because there is exactly one row in DIM_TIME for each day. To aggregate the facts to the level of Quarter, it is valid to join the fact table to the dimension table and group by the quarter ID from the dimension table.
Sql
select DT.QUARTER_ID,
max(DT.QUARTER_DESC) Quarter_Desc
sum(FT.REVENUE) Revenue
from DAY_FACT_TABLE FT
join DIM_TIME DT
on (FT.DAY_ID = DT.DAY_ID)
group by DT.QUARTER_ID
If, however, there is an aggregate fact table already at the level of Quarter, the results will not be correct. This is because the query must join on Quarter ID, but the quarter ID is not a unique key of the dimension table. Because any given quarter of a year contains 90, 91 or 92 days, the dimension table will contain that many rows with the same quarter ID. Thus fact rows will be replicated prior to taking the sum, and the sum will be too high.
Sql
select FT.QUARTER_ID,
max(DT.QUARTER_DESC) Quarter_Desc
sum(FT.REVENUE) Revenue
from QTR_FACT_TABLE FT
join DIM_TIME DT
on (FT.QUARTER_ID = DT.QUARTER_ID)
group by FT.QUARTER_ID
This is a generally recognized problem with star schemas, and is not strictly a MicroStrategy limitation.
Star schemas will function correctly with MicroStrategy SQL Generation Engine 8.x as long as they obey the general data warehousing principle that fact tables should not be at a higher level than the dimension tables to which they are joined.
If aggregate tables are required, it is necessary to provide higher-level lookup tables with unique rows corresponding to each aggregate table's key. Logical views are a way to do this without adding tables or views to the warehouse. For example, LWV_LU_QUARTER may be defined using the following SQL statement:
Sql
select distinct QUARTER_ID, QUARTER_DESC, YEAR_ID
from DIM_TIME
With this logical view, it becomes possible for MicroStrategy SQL Generation Engine 8.x to query the quarter-level fact table as follows. Since the logical view has distinct rows per quarter, multiple counting will not occur in this query.
Sql
select FT.QUARTER_ID,
max(LQ.QUARTER_DESC) Quarter_Desc
sum(FT.REVENUE) Revenue
from QTR_FACT_TABLE FT
join (select distinct QUARTER_ID, QUARTER_DESC, YEAR_ID
from DIM_TIME) LQ
on (FT.QUARTER_ID = LQ.QUARTER_ID)
group by FT.QUARTER_ID
For more information on the use of logical views in MicroStrategy SQL Generation Engine 8.1.x and 9.x, consult the MicroStrategy Project Design Guide manual, Appendix B: Logical Tables, "Creating logical tables."
Aggregate tables store pre-summarized totals at a higher level of aggregation than the most granular fact table. They allow reports to be generated from small, rather than large, tables; therefore, performance is enhanced. A successful aggregation strategy seeks to choose aggregate tables that will have the most impact while taking the least amount of space.
Aggregation decisions are driven by the following factors:
- Usage patterns: Build aggregate tables that are likely to be used the most.
- Compression ratios: The compression ratio between two tables is defined as the size of the aggregate compared to the size of the smallest table from which the aggregate can be derived.
- Volatility: Changes in hierarchies over time impact the accuracy of aggregate tables. Sometimes aggregate tables must be rebuilt as a result of changes in dimensions.
EXAMPLE:
The MicroStrategy Tutorial project uses aggregate tables by default. A simple metric sum (Revenue) will go to different aggregate tables depending on the attributes on the template.
- Create a report with Year on the rows and Revenue on the columns.
- Execute the report and view the SQL:
- Drill from Year to Item and view the SQL:
The query will go from using ORDER_FACT to ORDER_DETAIL. When Year is on the template, the engine selects the smaller table and the fact is calculated as:
sum(a11.ORDER_AMT)
instead of:
sum((a11.QTY_SOLD * (a11.UNIT_PRICE - a11.DISCOUNT)))
Is there any solution now to use aggregate tables with star schema without creating logical tables?
ReplyDelete