Skip to main content

Star schemas and aggregate (or summary) fact tables


Aggregate tables can further improve query performance by reducing the number of rows over which higher-level metrics must be aggregated. 
However, the use of aggregate tables with dimension tables is not a valid physical modeling strategy. Whenever aggregation is performed over fact data, it is a general requirement that tables joined to the fact table must be at the same attribute level or at a higher level. If the auxiliary table is at a lower level, fact rows will be replicated prior to aggregation and this will result in inflated metric values (also known as "multiple counting").

With the above Time dimension table, a fact table at the level of Day functions correctly because there is exactly one row in DIM_TIME for each day. To aggregate the facts to the level of Quarter, it is valid to join the fact table to the dimension table and group by the quarter ID from the dimension table.

Sql
select DT.QUARTER_ID,
   max(DT.QUARTER_DESC) Quarter_Desc
   sum(FT.REVENUE) Revenue
from DAY_FACT_TABLE FT
   join DIM_TIME DT
      on (FT.DAY_ID = DT.DAY_ID)
group by DT.QUARTER_ID

If, however, there is an aggregate fact table already at the level of Quarter, the results will not be correct. This is because the query must join on Quarter ID, but the quarter ID is not a unique key of the dimension table. Because any given quarter of a year contains 90, 91 or 92 days, the dimension table will contain that many rows with the same quarter ID. Thus fact rows will be replicated prior to taking the sum, and the sum will be too high.

Sql
select FT.QUARTER_ID,
   max(DT.QUARTER_DESC) Quarter_Desc
   sum(FT.REVENUE) Revenue
from QTR_FACT_TABLE FT
   join DIM_TIME DT
      on (FT.QUARTER_ID = DT.QUARTER_ID)
group by FT.QUARTER_ID

This is a generally recognized problem with star schemas, and is not strictly a MicroStrategy limitation.

Star schemas will function correctly with MicroStrategy SQL Generation Engine 8.x as long as they obey the general data warehousing principle that fact tables should not be at a higher level than the dimension tables to which they are joined.

If aggregate tables are required, it is necessary to provide higher-level lookup tables with unique rows corresponding to each aggregate table's key. Logical views are a way to do this without adding tables or views to the warehouse. For example, LWV_LU_QUARTER may be defined using the following SQL statement:

Sql
select distinct QUARTER_ID, QUARTER_DESC, YEAR_ID
from DIM_TIME

 

With this logical view, it becomes possible for MicroStrategy SQL Generation Engine 8.x to query the quarter-level fact table as follows. Since the logical view has distinct rows per quarter, multiple counting will not occur in this query.

Sql
select FT.QUARTER_ID,
   max(LQ.QUARTER_DESC) Quarter_Desc
   sum(FT.REVENUE) Revenue
from QTR_FACT_TABLE FT
   join (select distinct QUARTER_ID, QUARTER_DESC, YEAR_ID
         from DIM_TIME) LQ
      on (FT.QUARTER_ID = LQ.QUARTER_ID)
group by FT.QUARTER_ID

For more information on the use of logical views in MicroStrategy SQL Generation Engine 8.1.x and 9.x, consult the MicroStrategy Project Design Guide manual, Appendix B: Logical Tables, "Creating logical tables."

Aggregate tables store pre-summarized totals at a higher level of aggregation than the most granular fact table. They allow reports to be generated from small, rather than large, tables; therefore, performance is enhanced. A successful aggregation strategy seeks to choose aggregate tables that will have the most impact while taking the least amount of space.

Aggregation decisions are driven by the following factors:

  • Usage patterns: Build aggregate tables that are likely to be used the most.
  • Compression ratios: The compression ratio between two tables is defined as the size of the aggregate compared to the size of the smallest table from which the aggregate can be derived.
  • Volatility: Changes in hierarchies over time impact the accuracy of aggregate tables. Sometimes aggregate tables must be rebuilt as a result of changes in dimensions.
A good candidate for aggregation should have at most 10-15 percent of the size of the smallest table from which it is derived.

EXAMPLE:
The MicroStrategy Tutorial project uses aggregate tables by default. A simple metric sum (Revenue) will go to different aggregate tables depending on the attributes on the template.

  1. Create a report with Year on the rows and Revenue on the columns.
  2. Execute the report and view the SQL:

  3. Drill from Year to Item and view the SQL:

The query will go from using ORDER_FACT to ORDER_DETAIL. When Year is on the template, the engine selects the smaller table and the fact is calculated as:

sum(a11.ORDER_AMT)
    instead of:

    sum((a11.QTY_SOLD * (a11.UNIT_PRICE - a11.DISCOUNT)))

    Comments

    1. Is there any solution now to use aggregate tables with star schema without creating logical tables?

      ReplyDelete

    Post a Comment

    Popular posts from this blog

    Fact tables levels tables in Microstrategy explained

    Fact tables levels in Microstrategy: Fact tables are used to store fact data. Fact tables should contain attribute Id's and fact values which are measurable. All the descriptive information about the fact tables should stored in Dimension tables either in Star Schema fashion or Snow Flake Schema fashion which is best suited to your reporting solution. Since attributes provide context for fact values, both fact columns and attribute ID columns are included in fact tables. Facts help to link indirectly related attributes using these attribute ID columns. The attribute ID columns included in a fact table represent the level at which the facts in that table are stored. So the level of a fact table in the Fact_Item_Day_Customer can be the attribute Id's which is at Day, Item & Customer Id level. For example, fact tables containing sales and inventory data look like the tables shown in the following diagram: Base fact columns ver...

    Case functions Microstrategy

    Ca se functions Microstrategy Case functions return specified data in a SQL query based on the evaluation of user-defined conditions. In general, a user specifies a list of conditions and corresponding return values. Case This function evaluates multiple expressions until a condition is determined to be true, then returns a corresponding value. If all conditions are false, a default value is returned.  Case  can be used for categorizing data based on multiple conditions. This is a single-value function. Syntax Case ( Condition1 ,  ReturnValue1 ,  Condition2 , ReturnValue2 ,...,  DefaultValue ) Example Case(([Total Revenue] < 300000), 0, ([Total Revenue] < 600000), 1, 2) sum(Case (Day@DESC in (“Sat”,”Sun”), Sales, 0) {~+} Sum(Case(Category@DESC In("Books","Electronics"),Revenue,0)){~+} CaseV (case vector) CaseV  evaluates a single metric and returns different values according to the results. It can be used to perfo...

    Optimizing queries in Microstrategy using VLDB properties

    Optimizing queries in  Microstrategy using VLDB properties #vldb #vldbproperties The table b elow summarizes the Query Optimizations VLDB properties. Additional details about each property, including examples where necessary, are provided in the sections following the table. Property Description Possible Values Default Value Additional Final Pass Option Determines whether the Engine calculates...

    Microstrategy "Error type: Odbc error. Odbc operation attempted

     "Error type: Odbc error. Odbc operation attempted: SQLExecDirect. [HYT00:0: on SQLHANDLE] [MicroStrategy][ODBC Oracle Wire Protocol driver]Timeout expired" is shown when executing reports from Web When users are trying to execute some reports in MicroStrategy web in particular, they may receive the Error “SQL Generation Complete Index out of range” and “Timeout expired” error as shown below: Possible Causes: One possible cause is that the MicroStrategy Intelligence Server using a cached database connection that was already dropped by the RDBMS. To resolve this: Admin should delete the database connection caches and create a new DSNs in case they are sharing DSNs to connect to different databases. In addition, change the settings for the ‘Connection lifetime’ and the ‘Connection idle time out’.  Follow the steps below to perform the mentioned changes and verify the report after each step and some of the settings require i-server r...

    Settings for Outer Join between metrics in MicroStrategy

    Settings for Outer Join between metrics in MicroStrategy MicroStrategy adopts multi-pass logic to determine the execution plan for a report. This means that every metric is evaluated in separate SQL passes. Outer Joins come into play when MicroStrategy Engine merges the results from all SQL passes into one report. For a multi-pass report, different Outer Join behaviors can give the user completely different results. In addition, report metrics can be of different types which can, in some cases, influence the result of the outer join. In MicroStrategy, there are two settings that users can access to control Outer Join behavior : Formula Join Type and Metric Join Type . Metric Join Type: VLDB Setting at Database Instance Level Report and Template Levels Report Editor > Data > Report Data Options Metric Level   Metric editor > Tools > Metric Join Type Control Join between Metrics Formula Join Type: Only at Compound/Split...

    System Manager workflow to execute on a schedule

    Creating a System Manager workflow to execute on a schedule System Manager workflow can execute on a schedule or after an event has been triggered. This can be accomplished by creating a simple batch file, and scheduling that batch file to execute with a third-party tool like Microsoft Task Scheduler.   Note : To avoid user permission conflicts, the following steps must be performed with highest privileges.   In the below example, the workflow makes the i-server restarts every day.   1. The user must first have a valid workflow. This particular workflow is a template that is delivered out-of-the-box with System Manager.   2. Save the workflow in  .smw  format.   3. In a text editor (such as Notepad), enter the command line statement that the task scheduler should execute.     MASysMgr.exe -w C:\filename.smw” “UserName=User1 “Password=1234”   4. Save the file in  .bat  ...

    Multi-Table Data Import(MTDI) from one or more supported data sources

    Multi-Table Data Import(MTDI) from one or more supported data sources In MicroStrategy Analytics Enterprise Web 10 onewards, users can now simultaneously import two or more tables from one or more supported data sources, this feature is called Multi-Table Data Import (MTDI) which has been renamed as Super Cubes in MSTR 2019 (Does it sound like multisourcing for all the users without admin help?) Currently, all connectors in MicroStrategy Web 10 except " OLAP " and " Search Engine Indices " support Multi-Table Data Import. Users are able to add multiple tables/files when doing data import from single connector, as shown below: Users are also able to combine multiple tables/files from different sources and store them into one single Intelligent Cube, as shown below:

    Fiscal Week, Fiscal Month, Fiscal Quarter and Fiscal Year calculations in Microstrategy

    Fiscal Week, Fiscal Month, Fiscal Quarter and Fiscal Year calculations in Microstrategy FiscalWeek Returns the numeric position of a week within a fiscal year, for a given  input date. This function is useful in financial reporting when the start of the fiscal year is different than the start of the calendar year. Syntax FiscalWeek< firstWeekDay ,  firstMonth >( Date / Time ) Where: • Date / Time  is the input date or timestamp. • firstWeekDay  (default value is 1) is a parameter that determines which day of the week is considered as the first day of the week. You can type an integer value from 1 to 7, with 1 representing Sunday, 2 representing Monday, and so on until 7 representing Saturday. • firstMonth  (default value is 1) is a parameter that determines which month is considered as the start of the fiscal year. You can type an integer value from 1 to 12, with 1 representing January, 2 representing February, and so on until ...

    Microstrategy Latest Video Tutorials including Transaction Services

    Microstrategy Video Tutorials How to define VLDB properties? How to Configure Library Services: How to use Microstrategy for Tableau: Explore Sample Dossiers in Library How to add data to Dossiers How to explore and Search Library How to manage Dossier color palette How to create Combo Charts How to open a Dossier in New Window How to use sub-totals and sorting How to monitor with topology and platform analytics How to optimize performance with Cache MSTR on AWS: Cloud Console How to use bookmarks in Dossiers How to send Dossiers to user libraries How to create Histograms How to define relationship of multi table dataset How to create Line Charts How to add counter to your library: Microstrategy Dataset connection for Qlik Microstrategy Survey Monkey connector Transaction Services Tutorials Using Transaction ...

    Microstrategy Dossiers explained

    Microstrategy  Dossiers With the release of MicroStrategy 10.9, we’ve taken a leap forward in our dashboarding capabilities by simplifying the user experience, adding storytelling, and collaboration.MSTR has  evolved dashboards to the point that they are more than dashboards - they are  interactive, collaborative analytic stories . Ultimately, it was time to go beyond dashboards, both in concept and in name, and so  the've  renamed VI dashboards to  ‘ dossiers ’.  Dossiers can be created by using the new Desktop product or Workstation or simply from the Web interface which replaces Visual Insights. All the existing visual Insights dashboards will be converted to Dossiers   With MicroStrategy 10.9, there was an active focus on making it easier to build dashboards for the widest audience of end users. To achieve this, some key new capabilities were added that make it easier to author, read, interact and collaborate on dashboards ...