Skip to main content

Star schemas and aggregate (or summary) fact tables


Aggregate tables can further improve query performance by reducing the number of rows over which higher-level metrics must be aggregated. 
However, the use of aggregate tables with dimension tables is not a valid physical modeling strategy. Whenever aggregation is performed over fact data, it is a general requirement that tables joined to the fact table must be at the same attribute level or at a higher level. If the auxiliary table is at a lower level, fact rows will be replicated prior to aggregation and this will result in inflated metric values (also known as "multiple counting").

With the above Time dimension table, a fact table at the level of Day functions correctly because there is exactly one row in DIM_TIME for each day. To aggregate the facts to the level of Quarter, it is valid to join the fact table to the dimension table and group by the quarter ID from the dimension table.

Sql
select DT.QUARTER_ID,
   max(DT.QUARTER_DESC) Quarter_Desc
   sum(FT.REVENUE) Revenue
from DAY_FACT_TABLE FT
   join DIM_TIME DT
      on (FT.DAY_ID = DT.DAY_ID)
group by DT.QUARTER_ID

If, however, there is an aggregate fact table already at the level of Quarter, the results will not be correct. This is because the query must join on Quarter ID, but the quarter ID is not a unique key of the dimension table. Because any given quarter of a year contains 90, 91 or 92 days, the dimension table will contain that many rows with the same quarter ID. Thus fact rows will be replicated prior to taking the sum, and the sum will be too high.

Sql
select FT.QUARTER_ID,
   max(DT.QUARTER_DESC) Quarter_Desc
   sum(FT.REVENUE) Revenue
from QTR_FACT_TABLE FT
   join DIM_TIME DT
      on (FT.QUARTER_ID = DT.QUARTER_ID)
group by FT.QUARTER_ID

This is a generally recognized problem with star schemas, and is not strictly a MicroStrategy limitation.

Star schemas will function correctly with MicroStrategy SQL Generation Engine 8.x as long as they obey the general data warehousing principle that fact tables should not be at a higher level than the dimension tables to which they are joined.

If aggregate tables are required, it is necessary to provide higher-level lookup tables with unique rows corresponding to each aggregate table's key. Logical views are a way to do this without adding tables or views to the warehouse. For example, LWV_LU_QUARTER may be defined using the following SQL statement:

Sql
select distinct QUARTER_ID, QUARTER_DESC, YEAR_ID
from DIM_TIME

 

With this logical view, it becomes possible for MicroStrategy SQL Generation Engine 8.x to query the quarter-level fact table as follows. Since the logical view has distinct rows per quarter, multiple counting will not occur in this query.

Sql
select FT.QUARTER_ID,
   max(LQ.QUARTER_DESC) Quarter_Desc
   sum(FT.REVENUE) Revenue
from QTR_FACT_TABLE FT
   join (select distinct QUARTER_ID, QUARTER_DESC, YEAR_ID
         from DIM_TIME) LQ
      on (FT.QUARTER_ID = LQ.QUARTER_ID)
group by FT.QUARTER_ID

For more information on the use of logical views in MicroStrategy SQL Generation Engine 8.1.x and 9.x, consult the MicroStrategy Project Design Guide manual, Appendix B: Logical Tables, "Creating logical tables."

Aggregate tables store pre-summarized totals at a higher level of aggregation than the most granular fact table. They allow reports to be generated from small, rather than large, tables; therefore, performance is enhanced. A successful aggregation strategy seeks to choose aggregate tables that will have the most impact while taking the least amount of space.

Aggregation decisions are driven by the following factors:

  • Usage patterns: Build aggregate tables that are likely to be used the most.
  • Compression ratios: The compression ratio between two tables is defined as the size of the aggregate compared to the size of the smallest table from which the aggregate can be derived.
  • Volatility: Changes in hierarchies over time impact the accuracy of aggregate tables. Sometimes aggregate tables must be rebuilt as a result of changes in dimensions.
A good candidate for aggregation should have at most 10-15 percent of the size of the smallest table from which it is derived.

EXAMPLE:
The MicroStrategy Tutorial project uses aggregate tables by default. A simple metric sum (Revenue) will go to different aggregate tables depending on the attributes on the template.

  1. Create a report with Year on the rows and Revenue on the columns.
  2. Execute the report and view the SQL:

  3. Drill from Year to Item and view the SQL:

The query will go from using ORDER_FACT to ORDER_DETAIL. When Year is on the template, the engine selects the smaller table and the fact is calculated as:

sum(a11.ORDER_AMT)
    instead of:

    sum((a11.QTY_SOLD * (a11.UNIT_PRICE - a11.DISCOUNT)))

    Comments

    1. Is there any solution now to use aggregate tables with star schema without creating logical tables?

      ReplyDelete

    Post a Comment

    Popular posts from this blog

    Microstrategy Custom number formatting symbols

    Custom number formatting symbols If none of the built-in number formats meet your needs, you can create your own custom format in the Number tab of the Format Cells dialog box. Select  Custom  as the Category and create the format using the number format symbols listed in the table below. Each custom format can have up to four optional sections, one each for: Positive numbers Negative numbers Zeros Text Each section is optional. Separate the sections by semicolons, as shown in the example below: #,###;(#,###);0;"Error: Entry must be numeric" For more examples, see  Custom number formatting examples . To jump to a section of the formatting symbol table, click one of the following: Numeric symbols Character/text symbols Date and time symbols Text color symbols Currency symbols Conditional symbols Numeric symbols For details on how numeric symbols apply to the Big Decimal data type, refer to the  Project Design Guide . ...

    MicroStrategy URL API Parameters

    MicroStrategy URL Structure The following table summarizes the root URL structure used for every request to MicroStrategy Web. Environment Main Application URL Administration URL J2EE http://webserver/MicroStrategy/servlet/mstrWeb http://webserver/MicroStrategy/servlet/mstrWebAdmin .NET http://webserver/MicroStrategy/asp/Main.aspx http://webserver/MicroStrategy/asp/Admin.aspx Every request sent to MicroStrategy Web calls a central controller. Parameters are appended to  Main.aspx  or  mstrWeb  (in a .NET and J2EE environment, respectively) to indicate to the controller how the request should be internally forwarded and handled. The following examples show a URL for accessing a MicroStrategy folder when the user does not have an existing session. The URL contains not only the parameters needed to connect to MicroStrategy Web, but also the parameters needed to log on and create a session. J2EE environment: <a href="http:...

    Custom Tooltips in Microstrategy developer and Web

    Custom Tooltips in Microstrategy developer and Web The following table describes the macros you can use to customize graph tooltips in both MicroStrategy Developer and MicroStrategy Web: Macro Information Displayed {&TOOLTIP} All relevant labels and values associated with a graph item. {&GROUPLABEL} Name of the graph item's category. This value is often the graph item's attribute element information, as attributes are commonly used as the categories of graph reports. {&SERIESLABEL} Name of the graph item’s series. This value is often the graph item's metric name information, as metrics are commonly used as the series of graph reports. {&VALUE} The value of a given data point. {&XVALUE} The X-value of a data point. Only applicable to Bubble charts and Scatter plots. {&YVALUE} The Y-value of a data point. Only applicable to Bubble charts and Scatter plots. {&ZVALUE} The Z-value of a data point. Only applicable to Bubble charts and Scatter plots. {...

    Case functions Microstrategy

    Ca se functions Microstrategy Case functions return specified data in a SQL query based on the evaluation of user-defined conditions. In general, a user specifies a list of conditions and corresponding return values. Case This function evaluates multiple expressions until a condition is determined to be true, then returns a corresponding value. If all conditions are false, a default value is returned.  Case  can be used for categorizing data based on multiple conditions. This is a single-value function. Syntax Case ( Condition1 ,  ReturnValue1 ,  Condition2 , ReturnValue2 ,...,  DefaultValue ) Example Case(([Total Revenue] < 300000), 0, ([Total Revenue] < 600000), 1, 2) sum(Case (Day@DESC in (“Sat”,”Sun”), Sales, 0) {~+} Sum(Case(Category@DESC In("Books","Electronics"),Revenue,0)){~+} CaseV (case vector) CaseV  evaluates a single metric and returns different values according to the results. It can be used to perfo...

    Control the display of null and zero metric values

    Show   Control the display of null and zero metric values in a grid report You can determine how to display or hide rows and columns in a grid report that consist only of null or zero metric values. You can have MicroStrategy hide the rows and columns in the following ways: Hide rows and columns that consist only of null metric values Hide rows and columns that consist only of zero metric values Hide rows and columns that consist only of null or zero metric values (default) Once you have defined how MicroStrategy hides null and zero metric values in the grid, you can quickly show or hide the grid using the Hide Nulls/Zeros option in the Data menu, as described below, or by clicking the  Hide Nulls/Zeros  icon  in the Data toolbar. To determine how null and zero metric values are displayed or hidden in a grid report Open the report in Edit mode. From the  Tools  menu, select  Report Options . The Report Options...

    Multi-Table Data Import(MTDI) from one or more supported data sources

    Multi-Table Data Import(MTDI) from one or more supported data sources In MicroStrategy Analytics Enterprise Web 10 onewards, users can now simultaneously import two or more tables from one or more supported data sources, this feature is called Multi-Table Data Import (MTDI) which has been renamed as Super Cubes in MSTR 2019 (Does it sound like multisourcing for all the users without admin help?) Currently, all connectors in MicroStrategy Web 10 except " OLAP " and " Search Engine Indices " support Multi-Table Data Import. Users are able to add multiple tables/files when doing data import from single connector, as shown below: Users are also able to combine multiple tables/files from different sources and store them into one single Intelligent Cube, as shown below:

    Fiscal Week, Fiscal Month, Fiscal Quarter and Fiscal Year calculations in Microstrategy

    Fiscal Week, Fiscal Month, Fiscal Quarter and Fiscal Year calculations in Microstrategy FiscalWeek Returns the numeric position of a week within a fiscal year, for a given  input date. This function is useful in financial reporting when the start of the fiscal year is different than the start of the calendar year. Syntax FiscalWeek< firstWeekDay ,  firstMonth >( Date / Time ) Where: • Date / Time  is the input date or timestamp. • firstWeekDay  (default value is 1) is a parameter that determines which day of the week is considered as the first day of the week. You can type an integer value from 1 to 7, with 1 representing Sunday, 2 representing Monday, and so on until 7 representing Saturday. • firstMonth  (default value is 1) is a parameter that determines which month is considered as the start of the fiscal year. You can type an integer value from 1 to 12, with 1 representing January, 2 representing February, and so on until ...

    Execute Integrity manager test from Command line

    Execute Integrity manager test from Command line  MSTR Integrity Manager allows the user to execute a test without having to load the GUI, or to schedule a test to run later at specific times or dates. Go over using Windows AT command at: https://support.microsoft.com/en-us/help/313565/how-to-use-the-at-command-to-schedule-tasks https://www.lifewire.com/at-command-2618090 Here are the prerequisites to execute a test from the command line: Create a test and saved using the Integrity Manager graphical interface. Make sure that the users has the ' Use Integrity Manager ' privilege for that project(provided by the administrator) and the ' Execute ' permission for the reports to be tested. Keep in mind that MicroStrategy Integrity Manager can only test three-tier projects, i.e., projects which are connected to a MicroStrategy i- Server. Projects in Direct Connection (two-tier) mode cannot be tested with this tool...

    Predictive modelling in Data Science using Microstrategy

    Creating a predictive modelling in MicroStrategy MicroStrategy Data Mining Services has been evolving to include more data mining algorithms and functionality. One key feature is MicroStrategy Developer’s Training Metric Wizard. The Training Metric Wizard can be used to create several different types of predictive models including linear and exponential regression, logistic regression, decision tree, cluster, time series, and association rules. Linear and exponential regression The linear regression data mining technique should be familiar to you if you have ever tried to extrapolate or interpolate data, tried to find the line that best fits a series of data points, or used Microsoft Excel’s LINEST or LOGEST functions. Regression analyzes the relationship between several predictive inputs, or independent variables, and a dependent variable that is to be predicted. Regression finds the line that best fits the data, with a minimum of error. For example, you have a dataset ...

    Metric values are repeated across rows when a report is executed in MicroStrategy

    Metric values are repeated across rows when a report is executed in MicroStrategy When comparing report results between DB Query Tool and MicroStrategy, some reports show repeated metric values in MicroStrategy where there were none in DB Query Tool. To illustrate the issue, a fact table CAT_ITEM_SLS has been added into the MicroStrategy Tutorial project and populated with a small set of three rows. CAT_ID ITEM_ID REVENUE  1 1  10   1 2  20  2  2  30  Report results in DB Query Tool: Report results in MicroStrategy: In MicroStrategy, the row for "Art As Experience" in the Spring 2007 catalog repeats the $20 value from the Winter 2007 catalog, where DB Query Tool shows the $30 value from the fact table. CAUSE The discrepancy occurs because the attribute elements for Catalog and Item are in a many-to-many relationship, but the attribute relationship in the MicroStrategy schema is defined incorrectly w...