Skip to main content

Star schemas and aggregate (or summary) fact tables


Aggregate tables can further improve query performance by reducing the number of rows over which higher-level metrics must be aggregated. 
However, the use of aggregate tables with dimension tables is not a valid physical modeling strategy. Whenever aggregation is performed over fact data, it is a general requirement that tables joined to the fact table must be at the same attribute level or at a higher level. If the auxiliary table is at a lower level, fact rows will be replicated prior to aggregation and this will result in inflated metric values (also known as "multiple counting").

With the above Time dimension table, a fact table at the level of Day functions correctly because there is exactly one row in DIM_TIME for each day. To aggregate the facts to the level of Quarter, it is valid to join the fact table to the dimension table and group by the quarter ID from the dimension table.

Sql
select DT.QUARTER_ID,
   max(DT.QUARTER_DESC) Quarter_Desc
   sum(FT.REVENUE) Revenue
from DAY_FACT_TABLE FT
   join DIM_TIME DT
      on (FT.DAY_ID = DT.DAY_ID)
group by DT.QUARTER_ID

If, however, there is an aggregate fact table already at the level of Quarter, the results will not be correct. This is because the query must join on Quarter ID, but the quarter ID is not a unique key of the dimension table. Because any given quarter of a year contains 90, 91 or 92 days, the dimension table will contain that many rows with the same quarter ID. Thus fact rows will be replicated prior to taking the sum, and the sum will be too high.

Sql
select FT.QUARTER_ID,
   max(DT.QUARTER_DESC) Quarter_Desc
   sum(FT.REVENUE) Revenue
from QTR_FACT_TABLE FT
   join DIM_TIME DT
      on (FT.QUARTER_ID = DT.QUARTER_ID)
group by FT.QUARTER_ID

This is a generally recognized problem with star schemas, and is not strictly a MicroStrategy limitation.

Star schemas will function correctly with MicroStrategy SQL Generation Engine 8.x as long as they obey the general data warehousing principle that fact tables should not be at a higher level than the dimension tables to which they are joined.

If aggregate tables are required, it is necessary to provide higher-level lookup tables with unique rows corresponding to each aggregate table's key. Logical views are a way to do this without adding tables or views to the warehouse. For example, LWV_LU_QUARTER may be defined using the following SQL statement:

Sql
select distinct QUARTER_ID, QUARTER_DESC, YEAR_ID
from DIM_TIME

 

With this logical view, it becomes possible for MicroStrategy SQL Generation Engine 8.x to query the quarter-level fact table as follows. Since the logical view has distinct rows per quarter, multiple counting will not occur in this query.

Sql
select FT.QUARTER_ID,
   max(LQ.QUARTER_DESC) Quarter_Desc
   sum(FT.REVENUE) Revenue
from QTR_FACT_TABLE FT
   join (select distinct QUARTER_ID, QUARTER_DESC, YEAR_ID
         from DIM_TIME) LQ
      on (FT.QUARTER_ID = LQ.QUARTER_ID)
group by FT.QUARTER_ID

For more information on the use of logical views in MicroStrategy SQL Generation Engine 8.1.x and 9.x, consult the MicroStrategy Project Design Guide manual, Appendix B: Logical Tables, "Creating logical tables."

Aggregate tables store pre-summarized totals at a higher level of aggregation than the most granular fact table. They allow reports to be generated from small, rather than large, tables; therefore, performance is enhanced. A successful aggregation strategy seeks to choose aggregate tables that will have the most impact while taking the least amount of space.

Aggregation decisions are driven by the following factors:

  • Usage patterns: Build aggregate tables that are likely to be used the most.
  • Compression ratios: The compression ratio between two tables is defined as the size of the aggregate compared to the size of the smallest table from which the aggregate can be derived.
  • Volatility: Changes in hierarchies over time impact the accuracy of aggregate tables. Sometimes aggregate tables must be rebuilt as a result of changes in dimensions.
A good candidate for aggregation should have at most 10-15 percent of the size of the smallest table from which it is derived.

EXAMPLE:
The MicroStrategy Tutorial project uses aggregate tables by default. A simple metric sum (Revenue) will go to different aggregate tables depending on the attributes on the template.

  1. Create a report with Year on the rows and Revenue on the columns.
  2. Execute the report and view the SQL:

  3. Drill from Year to Item and view the SQL:

The query will go from using ORDER_FACT to ORDER_DETAIL. When Year is on the template, the engine selects the smaller table and the fact is calculated as:

sum(a11.ORDER_AMT)
    instead of:

    sum((a11.QTY_SOLD * (a11.UNIT_PRICE - a11.DISCOUNT)))

    Comments

    1. Is there any solution now to use aggregate tables with star schema without creating logical tables?

      ReplyDelete

    Post a Comment

    Popular posts from this blog

    Algorithm to calculate Logical Table Size in Microstrategy

    How are the fact tables determined using the logical table size for SQL generation in MicroStrategy The logical table size is an integer number that represents the granularity or level of aggregation of a particular table. It is called 'logical' because it is not related to the physical size of the tables (number of rows). It is calculated according to the attribute IDs that are present in the table and their level in the system hierarchy.   Even though, the number does not reveal the actual number of rows in the table, it is an accurate way of measuring a table size without having to access its contents.   IMPORTANT:   The system hierarchy is defined by the parent-child relationships between attributes of the same family (formerly known as a dimension), not by user-defined hierarchies (i.e., drilling hierarchies).   MicroStrategy Engine utilizes an algorithm based on attribute keys to calculate the Logical Table Size (LTS): Given the following tables: ...

    Customers Who Live in the Same City as Call Centers

    Customers Who Live in the Same City as Call Centers Your new utility company has call centers located throughout the country, and your recent surveys indicate that customers who live in the same city as a call center are particularly satisfied with service due to extremely rapid repairs during power outages. To begin your new advertising campaign, you want to generate a list of Call Centers that coincide with Customer Cities. The following steps create an attribute-to-attribute qualification filter that generates the list of desired cities. To Create an Attribute-To-Attribute Qualification that Compares the Call Center and Customer City Attributes In MicroStrategy Web, log in to a project. Open any folder page (for example, click Shared Folders on the home page). Click the  Create Filter  icon  . From the Object browser on the left, select the  Customer City  attribute from the Customers hierarchy and drag it to the filter pane on the right. Change  Qualify...

    Types of prompts in Microstrategy

    Types of prompts in Microstrategy The different types of prompts allow you to create a  prompt  for nearly every part of a report. Prompts can be used in many objects including reports, filters, metrics, and custom groups, but all prompts require user interaction when the report is executed. The correct prompt type to create depends on what report objects you want users to be able to base a filter on to filter data, as described in the list below. Filter definition prompts   allow users to determine how the report's data is filtered, based on one of the following objects: Attributes in a hierarchy : Users can select prompt answers from one or more attribute elements from one or more attributes. The attribute elements that they select are used to filter data displayed on the report. This prompt lets you give users the largest number of attribute elements to choose from when they answer the prompt to define their filtering criteria. For example, on a repor...

    Settings for Outer Join between metrics in MicroStrategy

    Settings for Outer Join between metrics in MicroStrategy MicroStrategy adopts multi-pass logic to determine the execution plan for a report. This means that every metric is evaluated in separate SQL passes. Outer Joins come into play when MicroStrategy Engine merges the results from all SQL passes into one report. For a multi-pass report, different Outer Join behaviors can give the user completely different results. In addition, report metrics can be of different types which can, in some cases, influence the result of the outer join. In MicroStrategy, there are two settings that users can access to control Outer Join behavior : Formula Join Type and Metric Join Type . Metric Join Type: VLDB Setting at Database Instance Level Report and Template Levels Report Editor > Data > Report Data Options Metric Level   Metric editor > Tools > Metric Join Type Control Join between Metrics Formula Join Type: Only at Compound/Split...

    Conversion failed when converting the varchar value 'xxxx' Microstartegy

    Error "Conversion failed  Error "Conversion failed when converting the varchar value 'xxxx' to data type int" happens when displaying Picture type attribute form using ApplySimple in expression against SQL Server 2012 in MicroStrategy  The attribute form is in Picture type and defined with the following ApplySimple function with Int type column [ID_BARANG] as the input parameter against SQL Server 2012.  Solutions is to use  Concat("Images/demo/s", [BARANG_ID_INT], ".png") ApplySimple("'images/demo/'&#0&'.png'", [ID_BARANG]) However, when running reports with attribute to show the picture form in Web, error message happens in both Web and Developer. Conversion failed when converting the varchar value 'images/demo/s' to data type int. STEPS TO REPRODUCE: SQL Server 2012 database should be used as the warehouse.  Create an attribute form as type Picture and us custom expressi...
    MicroStrategy Developer Preferences options are expanded so big that some options are being cutoff. Show the hidden objects in the  Microstrategy  developer MicroStrategy Developer Preferences options are expanded so big that some options are being cut off. The steps below given in the MSTR article may not work. This can be simple handled by using the steps below:  In the Microstrategy Developer go to Tools -> Preferences (Not my prefernces :) ) Under Developer category -> select Browsing on the browsing tab you see all the options like below: 3. Now using the mouse place the cursor on text box of 10000 which is next to 'Maximum number of monitoring objects displayed per page. 4. Then Hit Tab on Keyboard and hit another Tab on keyboard 5. Then press the space or down arrow on keyboard and click on OK or Enter. That will show the hidden objects in the Microstrategy developer   Normal Version ...

    Internationalization Design Technics

    Microstrategy Internationalization Design Technics MicroStrategy supports data internationalization through two different techniques. You can either provide translated data through the use of extra tables and columns, or you can provide separate databases to store your translated data. These techniques are described below: You can support data internationalization in your database by using separate tables and columns to store your translated data. You can use various combinations of tables and columns to support and identify the translated data in your database. To support displaying the name of each month in multiple languages, you can include the translated names in a separate column, one for each required language, within the same table. Each column can use a suffix to identify that the column contains translated data for a certain language. The same LU_MONTH_OF_YEAR table with translated data for the Spanish and German langua...

    Non Aggregate metrics Beginning lookup, Ending lookup, Beginning fact., Ending fact

    Non Aggregate metrics  Beginning lookup,  Ending lookup,  Beginning fact .,  Ending fact A nonaggregatable metric, such as an inventory metric, is one that should not be aggregated across an attribute.   For example, if you have monthly inventory numbers in your data warehouse and want to calculate the yearly inventory, adding the monthly numbers together does not provide a useful business measure. Instead, you may want to use the end-on-hand and beginning-on-hand inventory numbers to see how the total inventory changed during the year.  The following options are available: • To use the first value in the lookup table, select  Beginning lookup . • To use the last value in the lookup table, select  Ending lookup . • To use the first value in the fact table, select  Beginning fact . • To use the last value in the fact table, select  Ending fact .

    Microstrategy "Error type: Odbc error. Odbc operation attempted

     "Error type: Odbc error. Odbc operation attempted: SQLExecDirect. [HYT00:0: on SQLHANDLE] [MicroStrategy][ODBC Oracle Wire Protocol driver]Timeout expired" is shown when executing reports from Web When users are trying to execute some reports in MicroStrategy web in particular, they may receive the Error “SQL Generation Complete Index out of range” and “Timeout expired” error as shown below: Possible Causes: One possible cause is that the MicroStrategy Intelligence Server using a cached database connection that was already dropped by the RDBMS. To resolve this: Admin should delete the database connection caches and create a new DSNs in case they are sharing DSNs to connect to different databases. In addition, change the settings for the ‘Connection lifetime’ and the ‘Connection idle time out’.  Follow the steps below to perform the mentioned changes and verify the report after each step and some of the settings require i-server r...

    MicroStrategy Hadoop Gateway operation modes

    MicroStrategy Hadoop Gateway operation modes MicroStrategy Hadoop Gateway is a native connector which was built based on Spark 1.6.x. You can choose one or more data files from Hadoop HDFS and load them into MicroStrategy Intelligence Server. Data files can be published as either an In-Memory Cube or a Live Connect Cube. MicroStrategy Hadoop Gateway supports three operation modes : YARN client Standalone (HDP cluster does not support Standalone) Local YARN client mode: The driver runs in the client process, and the application master is only used for requesting resources from YARN. Running in this mode requires YARN service to be enabled on your Hadoop cluster. Standalone mode:  MicroStrategy Hadoop Gateway uses Spark Master to coordinate data processing. Local mode:  no application is deployed in the Spark nodes. All data processing is executed in the MicroStrategy Hadoop Gateway host. This modality is intended for testing and troubleshooting.