Supporting many-to-many and joint child relationships in MSTR
While the topics are largely related to logical model design, a working knowledge of physical schemas is helpful when dealing with the challenges involved with these topics.
Before reading this section, you should know what logical data models and physical warehouse schemas are, and how to read and interpret them. Logical data models and physical warehouse schemas are discussed in The Logical Data Modeland Warehouse Structure for Your Logical Data Model respectively. These chapters discuss how to plan and create a conceptual framework for your business intelligence data.
Many-to-many relationships
The presence of many-to-many relationships introduces complexity during the warehouse design process. With the presence of many-to-many relationships, you must make additional considerations to effectively plan your design.Below are some real-life examples of many-to-many relationships which must be carefully handled in the data model and schema:
• | In a certain organization, each salesperson can work in more than one calling center. Likewise, each calling center has many salespeople. |
• | In a car manufacturing plant, many models of cars are produced, and each comes in several colors. That is, there are many colors for a single type of car, and many types of cars can be associated with the same color. |
Potential problems with many-to-many relationships usually come in the following forms, both of which can be avoided by correctly modeling the relationship:
• | Loss of analytical capability |
• | Multiple counting |
Loss of analytical capability
With the color and item many-to-many relationship, there are usually two business questions for which users want answers:1 | In what colors are certain items available? |
2 | How much of a particular item/color combination was sold? |
In many-to-many relationships this is not feasible. Rather, a distinct relationship table needs to be present in your warehouse. The following diagram shows the lookup and relationship tables for item and color:
The Rel_Color_Item table provides a row for every possible item/color combination.
Answering the second question requires a fact table that has sales information as well as color and item information. The following diagram shows the same scenario as before, but in addition it shows a simple fact table containing sales data keyed by item, color, and date.
The fact table in the above diagram alone is not sufficient to answer the first question. Only item and color combinations that were actually sold, and therefore have sales recorded, can be retrieved from this table. If you have item and color combinations that are available but that have never been sold, this fact table cannot provide a complete list of item and color combinations to answer question one.
In summary, to prevent any loss of analytical flexibility when dealing with a many-to-many attribute relationship, the following requirements must be met:
• | A distinct relationship table to identify all the possible combinations of attribute elements between attributes |
• | Both the attribute ID columns in the fact table |
Multiple counting
When dealing with many-to-many relationships, loss of analytical capability is only one challenge. Another equally significant issue is multiple counting. Multiple counting occurs when all of the following takes place:• | You attempt to aggregate data to the level of one of the attributes in the many-to-many relationship, or a higher level than one of the attributes in the many-to-many relationship. |
• | The relationship exists in a distinct relationship table. |
• | All of the attributes in the many-to-many relationship are not in the fact table. |
Assume that there are three items, including hats, dresses, and socks. These items come in three colors, including red, blue, and green, with the exception of socks, which come in only green and blue. The following diagram shows this data in the lookup tables as well as some simple sales data:
The risk of multiple counting occurs when you run a query requesting the sales by color, effectively aggregating to the item attribute level in the many-to-many relationship. This query would require both the fact table, which has the sales information by item, and the relationship table, since color is not recorded in the fact table.
The difficulty lies in the fact that color is not in the fact table. There is no way to directly relate the sales of an item in the fact table to the color of that particular item. For example, instead of calculating the sales of red items, the query aggregates sales for all items that come in red according to the relationship table. The sum includes all hats and all dresses, including blue ones and green ones. This obviously leads to numbers that are higher than the true sales for red items.
For example, using the given data, the following questions cannot all be answered accurately:
• | What are the total sales for hats? |
• | What are the total sales for red items? |
• | What are the total sales for red dresses? |
The following section describes several ways to prevent multiple counting when dealing with many-to-many relationships.
Working with many-to-many relationships
As you can see, seemingly simple questions can require you to take a number of steps to answer them when many-to-many relationships are involved.You can use one of three techniques to provide physical support to answer the types of questions that cannot be answered accurately when using many-to-many relationships. The three techniques all have differing levels of flexibility, and flexibility is always a trade-off with complexity. In all cases, the two fundamental components remain in place in one form or another:
• | A relationship table to define the attribute relationship |
• | Both the attribute’s ID columns in the fact table |
All of the following methods require additional data in the fact table. This means that you must capture the additional data in the source system. For example, you need to have data in the source system as to what the color is of each item sold. If this additional data was never captured in the source system, you cannot fully resolve the many-to-many relationship to calculate the amount of sales for items of a certain color.
Method 1
This method is the most straightforward way to effectively manage many-to-many relationships.
Method 1 requires you to create a separate relationship table (in this case, Rel_Color_Item) and add both attribute IDs to the fact table as shown in the following diagram.
Method 2
Method 2 eliminates the many-to-many relationship and the need for a distinct relationship table.
Here the many-to-many relationship is converted into a compound attribute relationship. You treat one attribute as a child of the other and have a compound key for the lower level attribute. Also, you add both attribute IDs, in this case Item_ID and Color_ID, to the fact table as shown in the following diagram.
While this method eliminates the need for a separate relationship table, you lose the ability to view items independent of color, or vice versa.
Method 3
Method 3 is the most versatile solution and has the following characteristics:
• | Further simplifies the compound attribute relationship from Method 2 into a simple attribute relationship |
• | Provides the ability to view item and color together or independently |
• | Requires only one attribute column in the fact table for complete flexibility, rather than two |
Finally, rather than including Color and Item in the fact table, you only need to include this new child attribute SKU, as shown in the following diagram.
This method is actually quite similar to Method 1. The major difference is that the distinct relationship table from Method 1 has an additional column, SKU, which extends the relationship of each item and color combination into a single value. Consequently, you can use this single value in the fact table.
The major disadvantage of Method 3 lies in creating the new attribute if your business model does not already use a similar structure, as well as possibly adding complexity to the ETL process.
Excerpt from https://www2.microstrategy.com/producthelp/10.7/ProjectDesignGuide/WebHelp/Lang_1033/Content/ProjectDesign/Supporting_many_to_many_and_joint_child_relationsh.htm#pd-attributes_overview_3631995489_1086149
I have found great and massive information.
ReplyDeleteMicrostrategy Online Course
Microstrategy Certification