Skip to main content

Designing a Normalized Database

Designing a Normalized Database from Microsoft

Tables representing propositions about entities of one type (that is, representing a single set) are fully normalized. Correct and complete mapping of a conceptual ORM model to a logical model yields fully normalized tables. Properly designed entities in an ER model lead to fully normalized tables as well. However, both ORM and ER modeling start with the business description of a problem; it is possible to miss some dependencies between entities and leave some tables denormalized. Of course, there could also be a bug in the tool that produces the DDL script from the ORM and ER models. However, any denormalization can lead to update anomalies. Data integrity and consistency are fundamental for databases. Remember that a database holds propositions, and propositions are facts. If propositions are not true, they are not facts; they are falsehoods. You need a logical method that yields a fully normalized database.
Normalization is the process of redesigning the model to unbundle any overlapping entities. The process involves decomposition; however, decomposition cannot yield a loss of information. You perform the decomposition by applying a linear progression of rules called normal forms. Normalization eliminates redundancy and incompleteness. Note the part that designers frequently overlook: normalization eliminates incompleteness, not just redundancy. Many normal forms (NFs) are defined; the first six are called first NF, second NF, third NF, Boyce-Codd NF, fourth NF, and fifth NF. If a database is in fifth NF, the database is fully normalized. Only the first three NFs are important; usually, if a database is in third NF, it is in fifth NF as well. You should understand the normalization form and use it to perform a final check of your database design, checking the model you created by using other methods.

First Normal Form

Imagine a table such as the one that Table 2-2 shows. The table holds information about sales. In this case, only the OrderId column is part of the primary key.
Cc505842.table_C02623422_2(en-us,TechNet.10).png
Table 2-2 Table Before First NF
With a design like this, you can have the following anomalies:
  • Insert How do you insert a customer without an order?
  • Update If item Bo is renamed, how do you perform an update?
  • Delete If order 3 is deleted, the data for customer 2 is lost.
  • Select How do you calculate the total quantity of bolts?
Note that only update and select anomalies deal with redundancy: they are problematic because the table contains redundant data. Insert and delete anomalies deal with incompleteness of the model. The rule for first NF is, “A table is in first normal form if all columns are atomic.” This means there can be no multi-valued columns—columns that would hold a collection such as an array or another table. First NF is somewhat redundant with the definition of a relational table or of a relation. A table is a relation if it fulfills the following conditions:
  • Values are atomic. The columns in a relational table are not a repeating group or arrays.
  • Columns are of the same kind. All values in a column come from the same domain.
  • Rows are unique. There is at least one column or set of columns, the values of which uniquely identify each row in the table.
  • The order of columns is insignificant. You can share the same table without worrying about table organization.
  • The sequence of rows is insignificant. A relational table can be retrieved in a different order and sequence.
  • Each column must have a unique name. This is required because the order of columns is not significant.
You can see in the example in Table 2-2 that the last column is multi-valued; it holds an array of items. Before starting with decomposition, let us briefly review the textual notation of a relational table. Remember the earlier example proposition, “Lubor Kollar was employed by Tail-spin Toys on March 19, 2004.” In a general form, you can write “Employee with (Name) was employed by (Company) on (EmploymentDate).” This generalized form of a proposition is a predicate. Terms in parentheses are value placeholders (entity attributes). A predicate defines the structure of a table. You can write the structure briefly as:
Employees(EmpId, EmployeeName, CompanyId, EmploymentDate) 
Underlined columns form the primary key. Actually, they form a candidate key, and a table can have multiple candidate keys. You could underline all candidate keys and double underline the primary key.
You decompose the table shown in Table 2-2 on the Items column. Every item leads to a new row, and every atomic piece of data of a single item (ProductId, ProductName, Quantity) leads to a new column. After the decomposition, you have multiple rows for a single order; therefore, you have to expand the primary key. You can compose the new primary key from the OrderId and ProductId columns. However, suppose you can allow multiple products on a single order, each time with a different discount, for example. Thus, you cannot use ProductId as part of the primary key. However, you can add the ItemId attribute and use it as a part of the new primary key. A decomposed table in first normal form would look like this:
Orders(OrderId, CustomerId, OrderDate, ItemId, ProductId, Quantity, 
ProductName)
Before moving to second NF, you have to understand a common misconception about first NF. You might have heard or read that you should not have a repeating group of columns. However, this advice is incorrect; repeating groups means you should not have a repeating group (that is, a collection) in a single column. For example, imagine this table:
Employees(EmployeeId, EmployeeName, Child1, Child2)
This table is perfectly in first NF. This design has a built-in constraint, allowing only employees who have two children. If you do not allow unknown (NULL) values for the Child1 and Child2 attributes, then you allow employees with exactly two children. This kind of constraint is not typical for business; nevertheless, it is a constraint built into the model, which is in first NF. Such constraints are rare, and a repeating group of columns typically represents a hidden collection. Take care not to decompose such groups automatically before checking whether this is a special constraint.

Second Normal Form

After achieving first NF, the decomposed table from Table 2-2 looks like Table 2-3.
Cc505842.table_C02623422_3(en-us,TechNet.10).png
Table 2-3 Table in First NF
You still have the following anomalies:
  • Insert How do you insert a customer without an order?
  • Update If customer 1 changes the order date for order 1, how do you perform the update? (In many places, possible inconsistencies could exist.)
  • Delete If you delete order 3, the data for customer 2 is lost.
To achieve second NF, a table must be in first NF, and every non-key column must be fully functionally dependent on the entire primary key. This means that no column can depend on part of the primary key only. In the example in Table 2-3, you know the customer and the order date if you know the value of the OrderId column; you do not need to know anything about ProductId, which is part of the primary key. The CustomerId and OrderDate columns depend on part of the primary key only—OrderId. To achieve second NF, you need to decompose the table into two tables:
Orders(OrderId, CustomerId, OrderDate)
OrderDetails(OrderId, ItemId, ProductId, Quantity, ProductName)
In the Orders table, you leave attributes that depend on OrderId only; then you introduce a new table, OrderDetails, to hold the other attributes. When achieving first NF, you are converting values from a multi-valued attribute to rows and changing the primary key; for second and all other NFs, you decompose tables into more tables. Second NF deals with relationships between columns that are part of a key and other columns.
After decomposing to multiple tables, you must have some common value that enables you to join the tables in queries; otherwise, you would lose some information. The decomposition has to be lossless. Of course, you need relationships between tables. A relationship is an association between two or more tables. Relationships are expressed in the data values of the primary and foreign keys. A primary key is a column or columns in a table whose values uniquely identify each row in the table. A foreign key is a column or columns whose values are the same as the primary key of another table—in other words, a copy of the primary key from another relational table. The relationship is made between two relational tables by matching the values of the foreign key with the values of the primary key.

Third Normal Form

After achieving second NF, the decomposed tables from Table 2-3 look like the tables in Table 2-4 and Table 2-5. Note that in the Orders table (Table 2-4), another attribute, CustomerName, is added to show that normalization violations can appear in any table.
Cc505842.Table_C02623422_4(en-us,TechNet.10).png
Table 2-4 Orders Table in Second NF
Cc505842.Table_C02623422_5(en-us,TechNet.10).png
Table 2-5 OrderDetails Table in Second NF
Second NF solves the update anomaly (if customer 1 changes the order date for order 1); however, you still have the following anomalies:
  • Insert How do you insert a customer without an order?
  • Delete If you delete order 3, the data for customer 2 is lost.
To achieve third NF, a table must be in second NF, and every non-key column must be non-transitively dependent on the primary key. For example, in Table 2-4, from OrderId, you can find CustomerId; then from CustomerId, you can get transitively to the CustomerName attribute value. Similarly, in Table 2-5, you can get transitively to ProductName through Pro-ductId from OrderId and ItemId. If you think of the rule for third NF from the non-key attributes point of view, it simply means you should have no functional dependencies between non-key columns. Non-key columns must depend on keys only. In the examples in Table 2-4 and Table 2-5, CustomerName depends on CustomerId, and ProductName depends on Pro-ductId. Thus, to achieve third NF, you must create new tables for dependencies between non-key columns:
Customers(CustomerId, CustomerName) Orders(OrderId, CustomerId, OrderDate) 
Products(ProductId, ProductName) 
OrderDetails(OrderId, ItemId, ProductId, Quantity)
This schema is free from all the update anomalies you had before normalization. However, it is not free from all update anomalies. For example, the schema itself cannot prevent you from inserting an unreasonable order date. (You will learn more about additional constraints in Chapter 3, “Designing a Physical Database.”) Note that this schema is also essentially the same (except for a couple of attributes omitted for the sake of brevity) as you received by using the ORM and ER approach. As mentioned earlier, use normalization for final checking and refining of your model.

Practice: Normalizing the Database

You are developing a database model that will support an application for managing projects (as in the Quick Check in Lesson 1, “Systematically Approaching Design Stages”). You collect the following information: each project has a single customer, each project can have many activities, and each project can have many employees assigned to it. You want to follow time spent (in hours) on projects by specific employee by activity for each day. Your initial design is:
Projects(ProjectId, ProjectName, CustomerId, CustomerName,
Activities(Activity1Id, Activity1Name, …, ActivityNId, 
ActivityNName), Employees(Employee1Id, Employee1Name, …, 
EmployeeNId, EmployeeNName), WorkDate, TimeSpent)

Exercise 1: Achieve the First Normal Form

In this exercise, you will bring this model to first NF. To achieve first NF, you need to eliminate all attributes that are collections.
  1. Check the Activities part of the table. Is this a collection?
  2. Check the Employees part of the table. Is this a collection? Your model should look like this:
Projects(ProjectId, ItemId, ProjectName, CustomerId, CustomerName, 
        ActivityId, ActivityName, EmployeeId, EmployeeName, WorkDate, 
        TimeSpent)

Exercise 2: Achieve the Second Normal Form

In this exercise, you will bring this model to second NF. To achieve second NF, you must make sure your model does not contain attributes that depend on only part of the primary key.
  1. The complete primary key in the table you created in Exercise 1, “Achieve the First Normal Form,” consists of ProjectId and ItemId.
  2. Do you really need both columns to find CustomerId and CustomerName associated with a project?
Your model should look like this:
Projects(ProjectId, ProjectName, CustomerId, CustomerName) 
        ProjectDetails(ProjectId, ItemId, ActivityId, ActivityName, 
        EmployeeId, EmployeeName, WorkDate, TimeSpent)

Exercise 3: Achieve the Third Normal Form

In this exercise, you will bring this model to third NF. To achieve third NF, you need to look at dependencies between non-key attributes.
  1. Is there any dependency between CustomerId and CustomerName?
  2. Is there any dependency between ActivityId and ActivityName?
  3. Is there any dependency between EmployeeId and EmployeeName? Your model should now look like this:
Projects(ProjectId, ProjectName, CustomerId)
       ProjectDetails(ProjectId, ItemId, ActivityId, EmployeeId, 
       WorkDate, TimeSpent) 
Customers(CustomerId, CustomerName) 
Activities(ActivityId, ActivityName)
Employees(EmployeeId, EmployeeName) 

Comments

Popular posts from this blog

Microstrategy Report Services documents and dashboards

Microstrategy Report Services documents vs Dashboards A MicroStrategy Report Services document displays data coming from multiple reports, with the data laid out and designed in presentation-quality format. Most data on a document is from one or more underlying datasets. A dataset is a standard MicroStrategy report. Other document components that do not originate from the dataset, such as static text used for a title or heading, page numbers, and images, are added by the document's designer and are stored in the document's definition. A Report Services (RS) dashboard is a special type of document. An RS dashboard is commonly only one page long, is intended to be viewed online, and usually provides interactive features that let analysts change how they view the dashboard’s data, as well as what data they are viewing. A broad selection of widgets and a wide variety of formatting options allow you to design a customized, interactive dashboard. Both documents and RS dashb...

Metric values are repeated across rows when a report is executed in MicroStrategy

Metric values are repeated across rows when a report is executed in MicroStrategy When comparing report results between DB Query Tool and MicroStrategy, some reports show repeated metric values in MicroStrategy where there were none in DB Query Tool. To illustrate the issue, a fact table CAT_ITEM_SLS has been added into the MicroStrategy Tutorial project and populated with a small set of three rows. CAT_ID ITEM_ID REVENUE  1 1  10   1 2  20  2  2  30  Report results in DB Query Tool: Report results in MicroStrategy: In MicroStrategy, the row for "Art As Experience" in the Spring 2007 catalog repeats the $20 value from the Winter 2007 catalog, where DB Query Tool shows the $30 value from the fact table. CAUSE The discrepancy occurs because the attribute elements for Catalog and Item are in a many-to-many relationship, but the attribute relationship in the MicroStrategy schema is defined incorrectly w...

MicroStrategy URL API Parameters

MicroStrategy URL Structure The following table summarizes the root URL structure used for every request to MicroStrategy Web. Environment Main Application URL Administration URL J2EE http://webserver/MicroStrategy/servlet/mstrWeb http://webserver/MicroStrategy/servlet/mstrWebAdmin .NET http://webserver/MicroStrategy/asp/Main.aspx http://webserver/MicroStrategy/asp/Admin.aspx Every request sent to MicroStrategy Web calls a central controller. Parameters are appended to  Main.aspx  or  mstrWeb  (in a .NET and J2EE environment, respectively) to indicate to the controller how the request should be internally forwarded and handled. The following examples show a URL for accessing a MicroStrategy folder when the user does not have an existing session. The URL contains not only the parameters needed to connect to MicroStrategy Web, but also the parameters needed to log on and create a session. J2EE environment: <a href="http:...

Microstrategy Dossiers explained

Microstrategy  Dossiers With the release of MicroStrategy 10.9, we’ve taken a leap forward in our dashboarding capabilities by simplifying the user experience, adding storytelling, and collaboration.MSTR has  evolved dashboards to the point that they are more than dashboards - they are  interactive, collaborative analytic stories . Ultimately, it was time to go beyond dashboards, both in concept and in name, and so  the've  renamed VI dashboards to  ‘ dossiers ’.  Dossiers can be created by using the new Desktop product or Workstation or simply from the Web interface which replaces Visual Insights. All the existing visual Insights dashboards will be converted to Dossiers   With MicroStrategy 10.9, there was an active focus on making it easier to build dashboards for the widest audience of end users. To achieve this, some key new capabilities were added that make it easier to author, read, interact and collaborate on dashboards ...

Activate MicroStrategy Geospatial Services

Activate MicroStrategy Geospatial Services MicroStrategy 10.11 introduces our new mapping capability: MicroStrategy Geospatial Services, powered by Mapbox. This enhanced map visualization is available for dossiers on all interfaces including MicroStrategy Desktop, Workstation, Web and Library (Mobile included). With MicroStrategy Geospatial Services, MicroStrategy now offers advanced geospatial analytics features that allow users to get more out of their location data. This new feature is available in addition to the out-of-the-box ESRI maps. MicroStrategy Geospatial Services allows users to: Plot polygon shapes for most countries, down to the zip code level Perform powerful interaction between layers (progressively hide or show data layers as zoom levels change) Identify and resolve location name conflicts Add thresholds to data points, size markers for metrics, and color by for both attributes and metrics Fine tune clustering behavior when aggregating data on a ma...

Prompt-in-prompt(Nested Prompts) in Microstrategy

Prompt-in-prompt(Nested Prompts) in  Microstrategy Nested prompts allows you to create one prompt based on the other and other bases on another, nested prompts allows us to prompt the highest level(Like year) to middle level(like Quarter, then to the low level(like Month). Here you can see how to  create a 3-level deep nested prompt that will prompt the user to select a year, then a quarter within that year, then a month within that quarter. Prompt-in-prompt is a feature in which the answer to one prompt is used to define another prompt. This feature is only implemented for element list prompts . The following procedure describes how to achieve this: Create the highest level filter. This is a filter which contains a prompt on an attribute element list. Create a filter on the attribute "Year." Click "prompt on attribute element list" and click "Next" through the rest of the screens to accept the default values. Do not set any additio...

Custom formatting Microstrategy metric format into 1M 2M etc

Custom formatting metric values of 1,100,000 into a 1.1 million 1.1M type format  In addition to the the pre-defined options for metric formatting, MicroStrategy supports custom formatting. The MicroStrategy Tutorial project is used to explain how users can customize numbers from "1,000,000" to a "1.00" format. Consider a report containing row data values greater than a million, as illustrated below: To format these metric values to use a decimal (i.e., 1.1) instead of showing all the numerals, right-click on the metric (' Revenue ' , for this example) and select  Formatting > Revenue Values , as shown below: Select " Custom " as a category in the Number tab and enter ' 0,,.## ' (without quotation marks), as shown below: Confirm that the format applied properly:

Purge Web Caches in MicroStrategy Web

Purge Web Caches in MicroStrategy Web By executing the mstr server by embedding the blue text below with admin privileges purges web server cache. https:// mstrserver.com /MicroStrategy/asp/Admin.aspx//mstrWebAdmin/?pg=purgecaches MicroStrategy Web and Web Universal caches various properties that are related to the user, project, or the MicroStrategy Intelligence Server. This helps in reducing the response time for every request by delivering the properties from a closer location than the original MicroStrategy Intelligence Server. In majority of the cases, the default caching properties should be applicable to the business intelligence infrastructure. There are 3 types of caches used in MicroStrategy Web. These are described below: Session Cache - Specific to a user and exists only while the user is logged in. This cache is deleted when the user logs out or if there's no activity for a specified timeout interval. The default value for a timeout is 1200 seco...

mstrio – Python and R wrappers for the MicroStrategy

mstrio – Python and R wrappers for the MicroStrategy REST APIs Connecting to MicroStrategy  Create a connection to the Intelligence Server using   Connection()   and    connect()  in Python and R, respectively. Required arguments for the   Connection()  function are the URL for the MicroStrategy REST API server, MicroStrategy Intelligence Server username and password, as well as the MicroStrategy project name. By default, the   connect()  function anticipates your MicroStrategy Intelligence Server username and password. LDAP authentication is also supported. Use the optional argument    login_mode=16    in the    connect()  function for LDAP authentication.  Extract data from cubes and reports  To extract data from MicroStrategy cubes and reports, use the   get_cube()  and   get_report()  functions. Use...

Prompt-in-prompt (nested prompt) in a FreeForm SQL Report in MicroStrategy

Prompt-in-prompt (nested prompt) feature in a FreeForm SQL Report in MicroStrategy  The following procedure describes how to achieve prompt-in-prompt in a Freeform SQL report in MicroStrategy Tutorial project: Create a new filter, select Add an Attribute qualification and choose the highest level attribute Country. Make sure to Qualify On: Elements. Click Prompt as the image shown below. Accept all the default values without any changes in the popped up window after clicking on "Prompt" as shown below. Save the filter as "Country Filter". Click "Prompt", choose "Use a filter to reduce the number of elements" option and select Country Filter created in last step, as shown below. Save the filter as "Region Filter". Select Filter definition prompt -> Choose from an attribute element list as shown below. Choose attribute Call Center and use the Region Filter created in the previous st...