Skip to main content

Star schemas and aggregate (or summary) fact tables


Aggregate tables can further improve query performance by reducing the number of rows over which higher-level metrics must be aggregated. 
However, the use of aggregate tables with dimension tables is not a valid physical modeling strategy. Whenever aggregation is performed over fact data, it is a general requirement that tables joined to the fact table must be at the same attribute level or at a higher level. If the auxiliary table is at a lower level, fact rows will be replicated prior to aggregation and this will result in inflated metric values (also known as "multiple counting").

With the above Time dimension table, a fact table at the level of Day functions correctly because there is exactly one row in DIM_TIME for each day. To aggregate the facts to the level of Quarter, it is valid to join the fact table to the dimension table and group by the quarter ID from the dimension table.

Sql
select DT.QUARTER_ID,
   max(DT.QUARTER_DESC) Quarter_Desc
   sum(FT.REVENUE) Revenue
from DAY_FACT_TABLE FT
   join DIM_TIME DT
      on (FT.DAY_ID = DT.DAY_ID)
group by DT.QUARTER_ID

If, however, there is an aggregate fact table already at the level of Quarter, the results will not be correct. This is because the query must join on Quarter ID, but the quarter ID is not a unique key of the dimension table. Because any given quarter of a year contains 90, 91 or 92 days, the dimension table will contain that many rows with the same quarter ID. Thus fact rows will be replicated prior to taking the sum, and the sum will be too high.

Sql
select FT.QUARTER_ID,
   max(DT.QUARTER_DESC) Quarter_Desc
   sum(FT.REVENUE) Revenue
from QTR_FACT_TABLE FT
   join DIM_TIME DT
      on (FT.QUARTER_ID = DT.QUARTER_ID)
group by FT.QUARTER_ID

This is a generally recognized problem with star schemas, and is not strictly a MicroStrategy limitation.

Star schemas will function correctly with MicroStrategy SQL Generation Engine 8.x as long as they obey the general data warehousing principle that fact tables should not be at a higher level than the dimension tables to which they are joined.

If aggregate tables are required, it is necessary to provide higher-level lookup tables with unique rows corresponding to each aggregate table's key. Logical views are a way to do this without adding tables or views to the warehouse. For example, LWV_LU_QUARTER may be defined using the following SQL statement:

Sql
select distinct QUARTER_ID, QUARTER_DESC, YEAR_ID
from DIM_TIME

 

With this logical view, it becomes possible for MicroStrategy SQL Generation Engine 8.x to query the quarter-level fact table as follows. Since the logical view has distinct rows per quarter, multiple counting will not occur in this query.

Sql
select FT.QUARTER_ID,
   max(LQ.QUARTER_DESC) Quarter_Desc
   sum(FT.REVENUE) Revenue
from QTR_FACT_TABLE FT
   join (select distinct QUARTER_ID, QUARTER_DESC, YEAR_ID
         from DIM_TIME) LQ
      on (FT.QUARTER_ID = LQ.QUARTER_ID)
group by FT.QUARTER_ID

For more information on the use of logical views in MicroStrategy SQL Generation Engine 8.1.x and 9.x, consult the MicroStrategy Project Design Guide manual, Appendix B: Logical Tables, "Creating logical tables."

Aggregate tables store pre-summarized totals at a higher level of aggregation than the most granular fact table. They allow reports to be generated from small, rather than large, tables; therefore, performance is enhanced. A successful aggregation strategy seeks to choose aggregate tables that will have the most impact while taking the least amount of space.

Aggregation decisions are driven by the following factors:

  • Usage patterns: Build aggregate tables that are likely to be used the most.
  • Compression ratios: The compression ratio between two tables is defined as the size of the aggregate compared to the size of the smallest table from which the aggregate can be derived.
  • Volatility: Changes in hierarchies over time impact the accuracy of aggregate tables. Sometimes aggregate tables must be rebuilt as a result of changes in dimensions.
A good candidate for aggregation should have at most 10-15 percent of the size of the smallest table from which it is derived.

EXAMPLE:
The MicroStrategy Tutorial project uses aggregate tables by default. A simple metric sum (Revenue) will go to different aggregate tables depending on the attributes on the template.

  1. Create a report with Year on the rows and Revenue on the columns.
  2. Execute the report and view the SQL:

  3. Drill from Year to Item and view the SQL:

The query will go from using ORDER_FACT to ORDER_DETAIL. When Year is on the template, the engine selects the smaller table and the fact is calculated as:

sum(a11.ORDER_AMT)
    instead of:

    sum((a11.QTY_SOLD * (a11.UNIT_PRICE - a11.DISCOUNT)))

    Comments

    1. Is there any solution now to use aggregate tables with star schema without creating logical tables?

      ReplyDelete

    Post a Comment

    Popular posts from this blog

    MicroStrategy URL API Parameters

    MicroStrategy URL Structure The following table summarizes the root URL structure used for every request to MicroStrategy Web. Environment Main Application URL Administration URL J2EE http://webserver/MicroStrategy/servlet/mstrWeb http://webserver/MicroStrategy/servlet/mstrWebAdmin .NET http://webserver/MicroStrategy/asp/Main.aspx http://webserver/MicroStrategy/asp/Admin.aspx Every request sent to MicroStrategy Web calls a central controller. Parameters are appended to  Main.aspx  or  mstrWeb  (in a .NET and J2EE environment, respectively) to indicate to the controller how the request should be internally forwarded and handled. The following examples show a URL for accessing a MicroStrategy folder when the user does not have an existing session. The URL contains not only the parameters needed to connect to MicroStrategy Web, but also the parameters needed to log on and create a session. J2EE environment: <a href="http:...

    Microstrategy "Error type: Odbc error. Odbc operation attempted

     "Error type: Odbc error. Odbc operation attempted: SQLExecDirect. [HYT00:0: on SQLHANDLE] [MicroStrategy][ODBC Oracle Wire Protocol driver]Timeout expired" is shown when executing reports from Web When users are trying to execute some reports in MicroStrategy web in particular, they may receive the Error “SQL Generation Complete Index out of range” and “Timeout expired” error as shown below: Possible Causes: One possible cause is that the MicroStrategy Intelligence Server using a cached database connection that was already dropped by the RDBMS. To resolve this: Admin should delete the database connection caches and create a new DSNs in case they are sharing DSNs to connect to different databases. In addition, change the settings for the ‘Connection lifetime’ and the ‘Connection idle time out’.  Follow the steps below to perform the mentioned changes and verify the report after each step and some of the settings require i-server r...

    Microstrategy document/dashboard applying selections as filters or slices

    Applying selections as filters or slices In a Microstrategy Document the selections a user makes in a selector can either filter or slice the data in the target: Filtering means that the data for the current selection is calculated only when it is requested by the user. The selections are used to filter the underlying dataset before the metric values are aggregated at the level of the Grid/Graph that is displayed in the document. If the source attribute is not included in the Grid/Graph, the metric values from all the selected elements are aggregated and shown at the level specified in the Grid/Graph. All metric condition selectors (which filter metric values or ranks) and selectors that target other selectors filter data by default. You cannot change them to slicing selectors. Slicing means that the data for each available item in the selector is calculated in advance when the document is first displayed. Selections made while ...

    HyperIntelligence Training Videos

    HyperIntelligence  Training Videos           Design and build hyper cards Optimizing Datasets for HyperIntelligence Using the HyperIntelligence for Office Outlook Add-In Building HyperIntelligence Cards Using HyperIntelligence for Mobile on Android Deploying HyperIntelligence for Outlook Insights On-The-Go: HyperIntelligence for Mobile Building HyperIntelligence Profile Cards Designing Custom HyperIntelligence Cards Using the Calendar with HyperIntelligence for Mobile

    MicroStrategy VLDB properties with Hive

     Recommended VLDB Properties for use of  MicroStrategy 9 with Hive 0.7x The recommended VLDB optimizations for Hive 0.7x are listed below. These values are set by default when the "Hive 0.7x" database object is used (set at  Configuration Managers > Database Instances > Database Instance > Database connection type ) Selected Default VLDB Properties for Hive 0.7x  VLDB Category  VLDB Property Setting  Value   Tables  Fallback Table Type  Permanent Table  Tables  Maximum SQL Passes Before FallBack   0 (no threshold)  Tables  Maximum Tables in FROM Clause Before FallBack  0 (no threshold)  Tables  Drop Temp Table Method  Drop after final pass   Tables  Table Creation Type  Implicit Table  Query Optimizations   Sub Query Type   Use Temporary Table, falling back to IN (SELECT COL) for cor...

    Case functions Microstrategy

    Ca se functions Microstrategy Case functions return specified data in a SQL query based on the evaluation of user-defined conditions. In general, a user specifies a list of conditions and corresponding return values. Case This function evaluates multiple expressions until a condition is determined to be true, then returns a corresponding value. If all conditions are false, a default value is returned.  Case  can be used for categorizing data based on multiple conditions. This is a single-value function. Syntax Case ( Condition1 ,  ReturnValue1 ,  Condition2 , ReturnValue2 ,...,  DefaultValue ) Example Case(([Total Revenue] < 300000), 0, ([Total Revenue] < 600000), 1, 2) sum(Case (Day@DESC in (“Sat”,”Sun”), Sales, 0) {~+} Sum(Case(Category@DESC In("Books","Electronics"),Revenue,0)){~+} CaseV (case vector) CaseV  evaluates a single metric and returns different values according to the results. It can be used to perfo...

    Control the display of null and zero metric values

    Show   Control the display of null and zero metric values in a grid report You can determine how to display or hide rows and columns in a grid report that consist only of null or zero metric values. You can have MicroStrategy hide the rows and columns in the following ways: Hide rows and columns that consist only of null metric values Hide rows and columns that consist only of zero metric values Hide rows and columns that consist only of null or zero metric values (default) Once you have defined how MicroStrategy hides null and zero metric values in the grid, you can quickly show or hide the grid using the Hide Nulls/Zeros option in the Data menu, as described below, or by clicking the  Hide Nulls/Zeros  icon  in the Data toolbar. To determine how null and zero metric values are displayed or hidden in a grid report Open the report in Edit mode. From the  Tools  menu, select  Report Options . The Report Options...

    Like / Contains operator in Tableau

    Contains (Like) operator in Tableau SInce there is no like operator in Tableau it provides CONTAINS operator.  Little different syntax is there but not too complex. Syntax for contains is:  Below is the example for using CONTAINS operator in a calculated field in Tableau. IF (CONTAINS([campaign_name],"Intent")) OR (CONTAINS([campaign_name_derived],"Intent")) THEN "Yes"   ELSE "No" END  

    Microstrategy Caches explained

    Microstrategy Caches Improving Response Time: Caching A  cache is a result set that is stored on a system to improve response time in future requests.  With caching, users can retrieve results from Intelligence Server rather than re-executing queries against a database. To delete all object caches for a project 1 In Developer, log into a project. You must log in with a user account that has administrative privileges. 2 From the  Administration  menu, point to  Projects , and then select  Project Configuration . The Project Configuration Editor opens. 3 Expand  Caching , expand  Auxiliary Caches , then select  Objects . To delete all configuration object caches for a server 1 Log in to the project source. 2 From the  Administration  menu in Developer, point to  Server , and then select  Purge Server Object Caches . 4 Click  Purge Now . To purge web cache follow the steps in the link ...