Skip to main content

Star schemas and aggregate (or summary) fact tables


Aggregate tables can further improve query performance by reducing the number of rows over which higher-level metrics must be aggregated. 
However, the use of aggregate tables with dimension tables is not a valid physical modeling strategy. Whenever aggregation is performed over fact data, it is a general requirement that tables joined to the fact table must be at the same attribute level or at a higher level. If the auxiliary table is at a lower level, fact rows will be replicated prior to aggregation and this will result in inflated metric values (also known as "multiple counting").

With the above Time dimension table, a fact table at the level of Day functions correctly because there is exactly one row in DIM_TIME for each day. To aggregate the facts to the level of Quarter, it is valid to join the fact table to the dimension table and group by the quarter ID from the dimension table.

Sql
select DT.QUARTER_ID,
   max(DT.QUARTER_DESC) Quarter_Desc
   sum(FT.REVENUE) Revenue
from DAY_FACT_TABLE FT
   join DIM_TIME DT
      on (FT.DAY_ID = DT.DAY_ID)
group by DT.QUARTER_ID

If, however, there is an aggregate fact table already at the level of Quarter, the results will not be correct. This is because the query must join on Quarter ID, but the quarter ID is not a unique key of the dimension table. Because any given quarter of a year contains 90, 91 or 92 days, the dimension table will contain that many rows with the same quarter ID. Thus fact rows will be replicated prior to taking the sum, and the sum will be too high.

Sql
select FT.QUARTER_ID,
   max(DT.QUARTER_DESC) Quarter_Desc
   sum(FT.REVENUE) Revenue
from QTR_FACT_TABLE FT
   join DIM_TIME DT
      on (FT.QUARTER_ID = DT.QUARTER_ID)
group by FT.QUARTER_ID

This is a generally recognized problem with star schemas, and is not strictly a MicroStrategy limitation.

Star schemas will function correctly with MicroStrategy SQL Generation Engine 8.x as long as they obey the general data warehousing principle that fact tables should not be at a higher level than the dimension tables to which they are joined.

If aggregate tables are required, it is necessary to provide higher-level lookup tables with unique rows corresponding to each aggregate table's key. Logical views are a way to do this without adding tables or views to the warehouse. For example, LWV_LU_QUARTER may be defined using the following SQL statement:

Sql
select distinct QUARTER_ID, QUARTER_DESC, YEAR_ID
from DIM_TIME

 

With this logical view, it becomes possible for MicroStrategy SQL Generation Engine 8.x to query the quarter-level fact table as follows. Since the logical view has distinct rows per quarter, multiple counting will not occur in this query.

Sql
select FT.QUARTER_ID,
   max(LQ.QUARTER_DESC) Quarter_Desc
   sum(FT.REVENUE) Revenue
from QTR_FACT_TABLE FT
   join (select distinct QUARTER_ID, QUARTER_DESC, YEAR_ID
         from DIM_TIME) LQ
      on (FT.QUARTER_ID = LQ.QUARTER_ID)
group by FT.QUARTER_ID

For more information on the use of logical views in MicroStrategy SQL Generation Engine 8.1.x and 9.x, consult the MicroStrategy Project Design Guide manual, Appendix B: Logical Tables, "Creating logical tables."

Aggregate tables store pre-summarized totals at a higher level of aggregation than the most granular fact table. They allow reports to be generated from small, rather than large, tables; therefore, performance is enhanced. A successful aggregation strategy seeks to choose aggregate tables that will have the most impact while taking the least amount of space.

Aggregation decisions are driven by the following factors:

  • Usage patterns: Build aggregate tables that are likely to be used the most.
  • Compression ratios: The compression ratio between two tables is defined as the size of the aggregate compared to the size of the smallest table from which the aggregate can be derived.
  • Volatility: Changes in hierarchies over time impact the accuracy of aggregate tables. Sometimes aggregate tables must be rebuilt as a result of changes in dimensions.
A good candidate for aggregation should have at most 10-15 percent of the size of the smallest table from which it is derived.

EXAMPLE:
The MicroStrategy Tutorial project uses aggregate tables by default. A simple metric sum (Revenue) will go to different aggregate tables depending on the attributes on the template.

  1. Create a report with Year on the rows and Revenue on the columns.
  2. Execute the report and view the SQL:

  3. Drill from Year to Item and view the SQL:

The query will go from using ORDER_FACT to ORDER_DETAIL. When Year is on the template, the engine selects the smaller table and the fact is calculated as:

sum(a11.ORDER_AMT)
    instead of:

    sum((a11.QTY_SOLD * (a11.UNIT_PRICE - a11.DISCOUNT)))

    Comments

    1. Is there any solution now to use aggregate tables with star schema without creating logical tables?

      ReplyDelete

    Post a Comment

    Popular posts from this blog

    No 'Alert' option appear when trying to create an alert-based subscription in MicroStrategy Distribution Services

    The 'Alert' option does not appear when attempting to create an alert-based subscription in MicroStrategy Distribution Services In MicroStrategy Distribution Service 9.x and 10.x, and 11.x versions it is possible to create an alert-based subscription. When right-clicking the metric header of a report in MicroStrategy Web 9.0.x, the 'Alerts' option does not appear:    Cause : This issue occurs because the user attempting to create the alert does  not have all of the necessary privileges on alerts.   Fix : In order to create an alert-based subscription, the following privileges are required: In order ti get permissions to create alerts the user should be given the following privileges by the admin: New Version of Microstrategy 11.x: Server- Distribution: Older Versions of Microstrategy 9.x, 10.x etc..: Web Reporter > Web user Web Analyst > Web create alert   ...

    Scheduling a report or document to be sent to an FTP in MSTR

    Scheduling a report or document to be sent to an FTP server You can have a report or document automatically delivered to a location on your FTP server on a specific schedule. To do so, you must subscribe to the report or document, as described in the steps below. You can customize your subscription by typing macros in the  File Name ,  Sub-folder , or  Zip File Name  fields. These macros are automatically replaced with the appropriate text when the report or document is delivered. For example, you create a subscription to a document. If you type  {&Project}  in the  File Name field, the name of the project in which the document is saved is displayed in the name of the document when it is delivered. • This procedure assumes that an administrator has already added your FTP server as a new device in Developer. Steps to do so are included in the  System Administrator Help . To send a report or document to an FTP server on a schedule ...

    Create an alert-based subscription in MicroStrategy Distribution Services

    Create an alert-based subscription in MicroStrategy Distribution Services on Web Subscription to a report or Report Services document which will be executed when a certain conditional threshold is met based on another executing report. For example, a scheduled report executes which shows the Revenue by day for the past week. If the Revenue on any one day falls below a certain value, a subscription to another report or Report Services document can be triggered and delivered to a recipient. An alert based subscription can only be created directly on a report; however, another report or Report Services document can be delivered when the alert based subscription is triggered. Note: you need a grid report to create an alert and you cannot create if you want to create on a document with text boxes. The following example will walk through the basic steps on how to setup a subscription based on an alert like this: Follow the brief  steps bel...

    "System Prompt" and its uses in MicroStrategy

    System Prompt and its uses in MicroStrategy WHAT IS A "SYSTEM PROMPT"? "System Prompt" is a system object that was introduced back in version 8.0.0. The object is named as "User Login" and is implemented as a prompt object. The object can be found under Public Objects > Prompts > System prompts, as shown below: Unlike ordinary prompt objects, system prompts don't require any answers from the user. When a report containing a system prompt runs, the prompt is answered automatically with the login of the user who runs the report. On the other hand, like other prompt objects, answers to system prompts are used to match caches. Therefore, users don't share caches for reports that contain system prompts. For details on how caches are matched, refer to the following MicroStrategy Knowledge Base document: KB5300-7X0-0147 - How are caches matched in MicroStrategy Intelligence Server 7.x? WHEN ARE SYSTEM PROMPTS USED?    System pr...

    Custom Subtotal Displays in MicroStrategy

    Defining custom subtotal displays in MicroStrategy By default, when users apply subtotals in a report, the name of the subtotal is displayed in the subtotal line items that appear in the report. Users can use custom subtotals to give more control over the characteristics of a subtotal. Custom subtotals allow users to define custom subtotal line items that appear on the reports  U sers can make the subtotal name dynamic by typing special characters in the subtotal name field as listed in the following table. Character Description #A The name of the attribute under which the subtotal appears. #P The name of the attribute to the left of, or above the attribute under which the subtotal appears. #0 All the forms of the parent element. #1 The first form of the parent element reading from left to right or from top to bottom. #2 The second form of the parent element reading from left to right or from top to bottom. #3 The third form of th...

    Export a Report Services document to Excel with formatting using URL API

    Export a Report Services document to Excel with formatting using URL API in MSTR Web In order to export a document in excel format using the URL API, the executionMode must be set to 4.  If excutionMode is not provided in the URL, by default PDF will be used as executionMode.   Below are the list of parameters that the URL must contain in order to execute correctly.   evt= 3069 src= Main.aspx.3069 executionMode= 4 documentID= 7E1644CA424F482DA811569FCE8127FF( Replace the document Id with your document ID)   Sample URL for .NET environment: http://WebServerName/MicroStrategy/asp/Main.aspx?evt=3069&src=Main.aspx. 3069 &executionMode= 4 &documentID= 7E1644CA424F482DA811569FCE8127FF    

    Predictive modelling in Data Science using Microstrategy

    Creating a predictive modelling in MicroStrategy MicroStrategy Data Mining Services has been evolving to include more data mining algorithms and functionality. One key feature is MicroStrategy Developer’s Training Metric Wizard. The Training Metric Wizard can be used to create several different types of predictive models including linear and exponential regression, logistic regression, decision tree, cluster, time series, and association rules. Linear and exponential regression The linear regression data mining technique should be familiar to you if you have ever tried to extrapolate or interpolate data, tried to find the line that best fits a series of data points, or used Microsoft Excel’s LINEST or LOGEST functions. Regression analyzes the relationship between several predictive inputs, or independent variables, and a dependent variable that is to be predicted. Regression finds the line that best fits the data, with a minimum of error. For example, you have a dataset ...

    Microstrategy "Error type: Odbc error. Odbc operation attempted

     "Error type: Odbc error. Odbc operation attempted: SQLExecDirect. [HYT00:0: on SQLHANDLE] [MicroStrategy][ODBC Oracle Wire Protocol driver]Timeout expired" is shown when executing reports from Web When users are trying to execute some reports in MicroStrategy web in particular, they may receive the Error “SQL Generation Complete Index out of range” and “Timeout expired” error as shown below: Possible Causes: One possible cause is that the MicroStrategy Intelligence Server using a cached database connection that was already dropped by the RDBMS. To resolve this: Admin should delete the database connection caches and create a new DSNs in case they are sharing DSNs to connect to different databases. In addition, change the settings for the ‘Connection lifetime’ and the ‘Connection idle time out’.  Follow the steps below to perform the mentioned changes and verify the report after each step and some of the settings require i-server r...

    Prompts in URL API

    Prompts in URL API promptsAnswerXML An XML representation of a collection of prompt answers. Passing this parameter with any number of common events automatically applies the supplied prompt answers to as many of the prompts as are found in the particular report, document, or HTML document. originMessageID An existing report, document, or HTML document instance. This parameter is used by the resulting report, document, or HTML document to extract the prompt answers from the specified instance and use them as prompt answers to any corresponding prompts in the requested report, document, or HTML document. elementsPromptAnswers A convenience parameter that allows you to answer single or multiple element prompts by supplying individual prompt answers in the form of AttributeID;AttributeElementID^DisplayName . When there are multiple prompt answers, each individual answer is separated by a "," (comma) separator character. In addition, you can include multiple elem...

    Derived metric based on attribute values

    Derived metric based on attribute values Here is how could create and display data correctly on using below simple steps.  Create a report with Category, Subcategory and Revenue. Create New Metric in a report or VI.  Case((Category@ID = 1), Revenue, 0) Booksand Name it as Revenue for  where 2 is Category ID for "Books"  Report will display result as below.  Result for new metric is blank. Now to fix this create a new Derived metric on Category attribute first with formula as  Max(Category) {~ }  and calling Books Now Edit the "Revenue for Books metric and Replace Category@ID with this new Books metric formula would looks like this  Case((Books = 1), Revenue, 0).  Report result would now display as expected as shown below