Skip to main content

Designing a Normalized Database

Designing a Normalized Database from Microsoft

Tables representing propositions about entities of one type (that is, representing a single set) are fully normalized. Correct and complete mapping of a conceptual ORM model to a logical model yields fully normalized tables. Properly designed entities in an ER model lead to fully normalized tables as well. However, both ORM and ER modeling start with the business description of a problem; it is possible to miss some dependencies between entities and leave some tables denormalized. Of course, there could also be a bug in the tool that produces the DDL script from the ORM and ER models. However, any denormalization can lead to update anomalies. Data integrity and consistency are fundamental for databases. Remember that a database holds propositions, and propositions are facts. If propositions are not true, they are not facts; they are falsehoods. You need a logical method that yields a fully normalized database.
Normalization is the process of redesigning the model to unbundle any overlapping entities. The process involves decomposition; however, decomposition cannot yield a loss of information. You perform the decomposition by applying a linear progression of rules called normal forms. Normalization eliminates redundancy and incompleteness. Note the part that designers frequently overlook: normalization eliminates incompleteness, not just redundancy. Many normal forms (NFs) are defined; the first six are called first NF, second NF, third NF, Boyce-Codd NF, fourth NF, and fifth NF. If a database is in fifth NF, the database is fully normalized. Only the first three NFs are important; usually, if a database is in third NF, it is in fifth NF as well. You should understand the normalization form and use it to perform a final check of your database design, checking the model you created by using other methods.

First Normal Form

Imagine a table such as the one that Table 2-2 shows. The table holds information about sales. In this case, only the OrderId column is part of the primary key.
Cc505842.table_C02623422_2(en-us,TechNet.10).png
Table 2-2 Table Before First NF
With a design like this, you can have the following anomalies:
  • Insert How do you insert a customer without an order?
  • Update If item Bo is renamed, how do you perform an update?
  • Delete If order 3 is deleted, the data for customer 2 is lost.
  • Select How do you calculate the total quantity of bolts?
Note that only update and select anomalies deal with redundancy: they are problematic because the table contains redundant data. Insert and delete anomalies deal with incompleteness of the model. The rule for first NF is, “A table is in first normal form if all columns are atomic.” This means there can be no multi-valued columns—columns that would hold a collection such as an array or another table. First NF is somewhat redundant with the definition of a relational table or of a relation. A table is a relation if it fulfills the following conditions:
  • Values are atomic. The columns in a relational table are not a repeating group or arrays.
  • Columns are of the same kind. All values in a column come from the same domain.
  • Rows are unique. There is at least one column or set of columns, the values of which uniquely identify each row in the table.
  • The order of columns is insignificant. You can share the same table without worrying about table organization.
  • The sequence of rows is insignificant. A relational table can be retrieved in a different order and sequence.
  • Each column must have a unique name. This is required because the order of columns is not significant.
You can see in the example in Table 2-2 that the last column is multi-valued; it holds an array of items. Before starting with decomposition, let us briefly review the textual notation of a relational table. Remember the earlier example proposition, “Lubor Kollar was employed by Tail-spin Toys on March 19, 2004.” In a general form, you can write “Employee with (Name) was employed by (Company) on (EmploymentDate).” This generalized form of a proposition is a predicate. Terms in parentheses are value placeholders (entity attributes). A predicate defines the structure of a table. You can write the structure briefly as:
Employees(EmpId, EmployeeName, CompanyId, EmploymentDate) 
Underlined columns form the primary key. Actually, they form a candidate key, and a table can have multiple candidate keys. You could underline all candidate keys and double underline the primary key.
You decompose the table shown in Table 2-2 on the Items column. Every item leads to a new row, and every atomic piece of data of a single item (ProductId, ProductName, Quantity) leads to a new column. After the decomposition, you have multiple rows for a single order; therefore, you have to expand the primary key. You can compose the new primary key from the OrderId and ProductId columns. However, suppose you can allow multiple products on a single order, each time with a different discount, for example. Thus, you cannot use ProductId as part of the primary key. However, you can add the ItemId attribute and use it as a part of the new primary key. A decomposed table in first normal form would look like this:
Orders(OrderId, CustomerId, OrderDate, ItemId, ProductId, Quantity, 
ProductName)
Before moving to second NF, you have to understand a common misconception about first NF. You might have heard or read that you should not have a repeating group of columns. However, this advice is incorrect; repeating groups means you should not have a repeating group (that is, a collection) in a single column. For example, imagine this table:
Employees(EmployeeId, EmployeeName, Child1, Child2)
This table is perfectly in first NF. This design has a built-in constraint, allowing only employees who have two children. If you do not allow unknown (NULL) values for the Child1 and Child2 attributes, then you allow employees with exactly two children. This kind of constraint is not typical for business; nevertheless, it is a constraint built into the model, which is in first NF. Such constraints are rare, and a repeating group of columns typically represents a hidden collection. Take care not to decompose such groups automatically before checking whether this is a special constraint.

Second Normal Form

After achieving first NF, the decomposed table from Table 2-2 looks like Table 2-3.
Cc505842.table_C02623422_3(en-us,TechNet.10).png
Table 2-3 Table in First NF
You still have the following anomalies:
  • Insert How do you insert a customer without an order?
  • Update If customer 1 changes the order date for order 1, how do you perform the update? (In many places, possible inconsistencies could exist.)
  • Delete If you delete order 3, the data for customer 2 is lost.
To achieve second NF, a table must be in first NF, and every non-key column must be fully functionally dependent on the entire primary key. This means that no column can depend on part of the primary key only. In the example in Table 2-3, you know the customer and the order date if you know the value of the OrderId column; you do not need to know anything about ProductId, which is part of the primary key. The CustomerId and OrderDate columns depend on part of the primary key only—OrderId. To achieve second NF, you need to decompose the table into two tables:
Orders(OrderId, CustomerId, OrderDate)
OrderDetails(OrderId, ItemId, ProductId, Quantity, ProductName)
In the Orders table, you leave attributes that depend on OrderId only; then you introduce a new table, OrderDetails, to hold the other attributes. When achieving first NF, you are converting values from a multi-valued attribute to rows and changing the primary key; for second and all other NFs, you decompose tables into more tables. Second NF deals with relationships between columns that are part of a key and other columns.
After decomposing to multiple tables, you must have some common value that enables you to join the tables in queries; otherwise, you would lose some information. The decomposition has to be lossless. Of course, you need relationships between tables. A relationship is an association between two or more tables. Relationships are expressed in the data values of the primary and foreign keys. A primary key is a column or columns in a table whose values uniquely identify each row in the table. A foreign key is a column or columns whose values are the same as the primary key of another table—in other words, a copy of the primary key from another relational table. The relationship is made between two relational tables by matching the values of the foreign key with the values of the primary key.

Third Normal Form

After achieving second NF, the decomposed tables from Table 2-3 look like the tables in Table 2-4 and Table 2-5. Note that in the Orders table (Table 2-4), another attribute, CustomerName, is added to show that normalization violations can appear in any table.
Cc505842.Table_C02623422_4(en-us,TechNet.10).png
Table 2-4 Orders Table in Second NF
Cc505842.Table_C02623422_5(en-us,TechNet.10).png
Table 2-5 OrderDetails Table in Second NF
Second NF solves the update anomaly (if customer 1 changes the order date for order 1); however, you still have the following anomalies:
  • Insert How do you insert a customer without an order?
  • Delete If you delete order 3, the data for customer 2 is lost.
To achieve third NF, a table must be in second NF, and every non-key column must be non-transitively dependent on the primary key. For example, in Table 2-4, from OrderId, you can find CustomerId; then from CustomerId, you can get transitively to the CustomerName attribute value. Similarly, in Table 2-5, you can get transitively to ProductName through Pro-ductId from OrderId and ItemId. If you think of the rule for third NF from the non-key attributes point of view, it simply means you should have no functional dependencies between non-key columns. Non-key columns must depend on keys only. In the examples in Table 2-4 and Table 2-5, CustomerName depends on CustomerId, and ProductName depends on Pro-ductId. Thus, to achieve third NF, you must create new tables for dependencies between non-key columns:
Customers(CustomerId, CustomerName) Orders(OrderId, CustomerId, OrderDate) 
Products(ProductId, ProductName) 
OrderDetails(OrderId, ItemId, ProductId, Quantity)
This schema is free from all the update anomalies you had before normalization. However, it is not free from all update anomalies. For example, the schema itself cannot prevent you from inserting an unreasonable order date. (You will learn more about additional constraints in Chapter 3, “Designing a Physical Database.”) Note that this schema is also essentially the same (except for a couple of attributes omitted for the sake of brevity) as you received by using the ORM and ER approach. As mentioned earlier, use normalization for final checking and refining of your model.

Practice: Normalizing the Database

You are developing a database model that will support an application for managing projects (as in the Quick Check in Lesson 1, “Systematically Approaching Design Stages”). You collect the following information: each project has a single customer, each project can have many activities, and each project can have many employees assigned to it. You want to follow time spent (in hours) on projects by specific employee by activity for each day. Your initial design is:
Projects(ProjectId, ProjectName, CustomerId, CustomerName,
Activities(Activity1Id, Activity1Name, …, ActivityNId, 
ActivityNName), Employees(Employee1Id, Employee1Name, …, 
EmployeeNId, EmployeeNName), WorkDate, TimeSpent)

Exercise 1: Achieve the First Normal Form

In this exercise, you will bring this model to first NF. To achieve first NF, you need to eliminate all attributes that are collections.
  1. Check the Activities part of the table. Is this a collection?
  2. Check the Employees part of the table. Is this a collection? Your model should look like this:
Projects(ProjectId, ItemId, ProjectName, CustomerId, CustomerName, 
        ActivityId, ActivityName, EmployeeId, EmployeeName, WorkDate, 
        TimeSpent)

Exercise 2: Achieve the Second Normal Form

In this exercise, you will bring this model to second NF. To achieve second NF, you must make sure your model does not contain attributes that depend on only part of the primary key.
  1. The complete primary key in the table you created in Exercise 1, “Achieve the First Normal Form,” consists of ProjectId and ItemId.
  2. Do you really need both columns to find CustomerId and CustomerName associated with a project?
Your model should look like this:
Projects(ProjectId, ProjectName, CustomerId, CustomerName) 
        ProjectDetails(ProjectId, ItemId, ActivityId, ActivityName, 
        EmployeeId, EmployeeName, WorkDate, TimeSpent)

Exercise 3: Achieve the Third Normal Form

In this exercise, you will bring this model to third NF. To achieve third NF, you need to look at dependencies between non-key attributes.
  1. Is there any dependency between CustomerId and CustomerName?
  2. Is there any dependency between ActivityId and ActivityName?
  3. Is there any dependency between EmployeeId and EmployeeName? Your model should now look like this:
Projects(ProjectId, ProjectName, CustomerId)
       ProjectDetails(ProjectId, ItemId, ActivityId, EmployeeId, 
       WorkDate, TimeSpent) 
Customers(CustomerId, CustomerName) 
Activities(ActivityId, ActivityName)
Employees(EmployeeId, EmployeeName) 

Comments

Popular posts from this blog

Case functions Microstrategy

Ca se functions Microstrategy Case functions return specified data in a SQL query based on the evaluation of user-defined conditions. In general, a user specifies a list of conditions and corresponding return values. Case This function evaluates multiple expressions until a condition is determined to be true, then returns a corresponding value. If all conditions are false, a default value is returned.  Case  can be used for categorizing data based on multiple conditions. This is a single-value function. Syntax Case ( Condition1 ,  ReturnValue1 ,  Condition2 , ReturnValue2 ,...,  DefaultValue ) Example Case(([Total Revenue] < 300000), 0, ([Total Revenue] < 600000), 1, 2) sum(Case (Day@DESC in (“Sat”,”Sun”), Sales, 0) {~+} Sum(Case(Category@DESC In("Books","Electronics"),Revenue,0)){~+} CaseV (case vector) CaseV  evaluates a single metric and returns different values according to the results. It can be used to perfo...

Microstrategy Dossiers explained

Microstrategy  Dossiers With the release of MicroStrategy 10.9, we’ve taken a leap forward in our dashboarding capabilities by simplifying the user experience, adding storytelling, and collaboration.MSTR has  evolved dashboards to the point that they are more than dashboards - they are  interactive, collaborative analytic stories . Ultimately, it was time to go beyond dashboards, both in concept and in name, and so  the've  renamed VI dashboards to  ‘ dossiers ’.  Dossiers can be created by using the new Desktop product or Workstation or simply from the Web interface which replaces Visual Insights. All the existing visual Insights dashboards will be converted to Dossiers   With MicroStrategy 10.9, there was an active focus on making it easier to build dashboards for the widest audience of end users. To achieve this, some key new capabilities were added that make it easier to author, read, interact and collaborate on dashboards ...

Microstrategy Caches explained

Microstrategy Caches Improving Response Time: Caching A  cache is a result set that is stored on a system to improve response time in future requests.  With caching, users can retrieve results from Intelligence Server rather than re-executing queries against a database. To delete all object caches for a project 1 In Developer, log into a project. You must log in with a user account that has administrative privileges. 2 From the  Administration  menu, point to  Projects , and then select  Project Configuration . The Project Configuration Editor opens. 3 Expand  Caching , expand  Auxiliary Caches , then select  Objects . To delete all configuration object caches for a server 1 Log in to the project source. 2 From the  Administration  menu in Developer, point to  Server , and then select  Purge Server Object Caches . 4 Click  Purge Now . To purge web cache follow the steps in the link ...

mstrio – Python and R wrappers for the MicroStrategy

mstrio – Python and R wrappers for the MicroStrategy REST APIs Connecting to MicroStrategy  Create a connection to the Intelligence Server using   Connection()   and    connect()  in Python and R, respectively. Required arguments for the   Connection()  function are the URL for the MicroStrategy REST API server, MicroStrategy Intelligence Server username and password, as well as the MicroStrategy project name. By default, the   connect()  function anticipates your MicroStrategy Intelligence Server username and password. LDAP authentication is also supported. Use the optional argument    login_mode=16    in the    connect()  function for LDAP authentication.  Extract data from cubes and reports  To extract data from MicroStrategy cubes and reports, use the   get_cube()  and   get_report()  functions. Use...

MicroStrategy URL API Parameters

MicroStrategy URL Structure The following table summarizes the root URL structure used for every request to MicroStrategy Web. Environment Main Application URL Administration URL J2EE http://webserver/MicroStrategy/servlet/mstrWeb http://webserver/MicroStrategy/servlet/mstrWebAdmin .NET http://webserver/MicroStrategy/asp/Main.aspx http://webserver/MicroStrategy/asp/Admin.aspx Every request sent to MicroStrategy Web calls a central controller. Parameters are appended to  Main.aspx  or  mstrWeb  (in a .NET and J2EE environment, respectively) to indicate to the controller how the request should be internally forwarded and handled. The following examples show a URL for accessing a MicroStrategy folder when the user does not have an existing session. The URL contains not only the parameters needed to connect to MicroStrategy Web, but also the parameters needed to log on and create a session. J2EE environment: <a href="http:...

Microstrategy Dashboard performance improvements steps

Microstrategy  Dashboard performance improvements steps: Many times, causes of poor performance can be simplified to specific components. To troubleshoot performance issues, users must identify these components, then make the appropriate modifications to the environment and/or to the MicroStrategy dashboard to reduce bottlenecks. Dashboard execution stages can be represented below: MicroStrategy Intelligence Server When an end user makes a  Document Execution Request  through any client (a web browser via MicroStrategy Web, the MicroStrategy Desktop/Developer client, the MicroStrategy Mobile app, or the MicroStrategy Office client), the request is sent to the MicroStrategy Intelligence Server, which processes the request and prepares the response. The MicroStrategy Intelligence Server will execute all children datasets on the dashboard by either generating SQL and running this against the data warehouse, or by fetching data from a cache. The Inte...

Derived metric based on attribute values

Derived metric based on attribute values Here is how could create and display data correctly on using below simple steps.  Create a report with Category, Subcategory and Revenue. Create New Metric in a report or VI.  Case((Category@ID = 1), Revenue, 0) Booksand Name it as Revenue for  where 2 is Category ID for "Books"  Report will display result as below.  Result for new metric is blank. Now to fix this create a new Derived metric on Category attribute first with formula as  Max(Category) {~ }  and calling Books Now Edit the "Revenue for Books metric and Replace Category@ID with this new Books metric formula would looks like this  Case((Books = 1), Revenue, 0).  Report result would now display as expected as shown below

Export a Report Services document to Excel with formatting using URL API

Export a Report Services document to Excel with formatting using URL API in MSTR Web In order to export a document in excel format using the URL API, the executionMode must be set to 4.  If excutionMode is not provided in the URL, by default PDF will be used as executionMode.   Below are the list of parameters that the URL must contain in order to execute correctly.   evt= 3069 src= Main.aspx.3069 executionMode= 4 documentID= 7E1644CA424F482DA811569FCE8127FF( Replace the document Id with your document ID)   Sample URL for .NET environment: http://WebServerName/MicroStrategy/asp/Main.aspx?evt=3069&src=Main.aspx. 3069 &executionMode= 4 &documentID= 7E1644CA424F482DA811569FCE8127FF    

Microstrategy Custom number formatting symbols

Custom number formatting symbols If none of the built-in number formats meet your needs, you can create your own custom format in the Number tab of the Format Cells dialog box. Select  Custom  as the Category and create the format using the number format symbols listed in the table below. Each custom format can have up to four optional sections, one each for: Positive numbers Negative numbers Zeros Text Each section is optional. Separate the sections by semicolons, as shown in the example below: #,###;(#,###);0;"Error: Entry must be numeric" For more examples, see  Custom number formatting examples . To jump to a section of the formatting symbol table, click one of the following: Numeric symbols Character/text symbols Date and time symbols Text color symbols Currency symbols Conditional symbols Numeric symbols For details on how numeric symbols apply to the Big Decimal data type, refer to the  Project Design Guide . ...

Custom Subtotal Displays in MicroStrategy

Defining custom subtotal displays in MicroStrategy By default, when users apply subtotals in a report, the name of the subtotal is displayed in the subtotal line items that appear in the report. Users can use custom subtotals to give more control over the characteristics of a subtotal. Custom subtotals allow users to define custom subtotal line items that appear on the reports  U sers can make the subtotal name dynamic by typing special characters in the subtotal name field as listed in the following table. Character Description #A The name of the attribute under which the subtotal appears. #P The name of the attribute to the left of, or above the attribute under which the subtotal appears. #0 All the forms of the parent element. #1 The first form of the parent element reading from left to right or from top to bottom. #2 The second form of the parent element reading from left to right or from top to bottom. #3 The third form of th...