Skip to main content

Designing a Normalized Database

Designing a Normalized Database from Microsoft

Tables representing propositions about entities of one type (that is, representing a single set) are fully normalized. Correct and complete mapping of a conceptual ORM model to a logical model yields fully normalized tables. Properly designed entities in an ER model lead to fully normalized tables as well. However, both ORM and ER modeling start with the business description of a problem; it is possible to miss some dependencies between entities and leave some tables denormalized. Of course, there could also be a bug in the tool that produces the DDL script from the ORM and ER models. However, any denormalization can lead to update anomalies. Data integrity and consistency are fundamental for databases. Remember that a database holds propositions, and propositions are facts. If propositions are not true, they are not facts; they are falsehoods. You need a logical method that yields a fully normalized database.
Normalization is the process of redesigning the model to unbundle any overlapping entities. The process involves decomposition; however, decomposition cannot yield a loss of information. You perform the decomposition by applying a linear progression of rules called normal forms. Normalization eliminates redundancy and incompleteness. Note the part that designers frequently overlook: normalization eliminates incompleteness, not just redundancy. Many normal forms (NFs) are defined; the first six are called first NF, second NF, third NF, Boyce-Codd NF, fourth NF, and fifth NF. If a database is in fifth NF, the database is fully normalized. Only the first three NFs are important; usually, if a database is in third NF, it is in fifth NF as well. You should understand the normalization form and use it to perform a final check of your database design, checking the model you created by using other methods.

First Normal Form

Imagine a table such as the one that Table 2-2 shows. The table holds information about sales. In this case, only the OrderId column is part of the primary key.
Cc505842.table_C02623422_2(en-us,TechNet.10).png
Table 2-2 Table Before First NF
With a design like this, you can have the following anomalies:
  • Insert How do you insert a customer without an order?
  • Update If item Bo is renamed, how do you perform an update?
  • Delete If order 3 is deleted, the data for customer 2 is lost.
  • Select How do you calculate the total quantity of bolts?
Note that only update and select anomalies deal with redundancy: they are problematic because the table contains redundant data. Insert and delete anomalies deal with incompleteness of the model. The rule for first NF is, “A table is in first normal form if all columns are atomic.” This means there can be no multi-valued columns—columns that would hold a collection such as an array or another table. First NF is somewhat redundant with the definition of a relational table or of a relation. A table is a relation if it fulfills the following conditions:
  • Values are atomic. The columns in a relational table are not a repeating group or arrays.
  • Columns are of the same kind. All values in a column come from the same domain.
  • Rows are unique. There is at least one column or set of columns, the values of which uniquely identify each row in the table.
  • The order of columns is insignificant. You can share the same table without worrying about table organization.
  • The sequence of rows is insignificant. A relational table can be retrieved in a different order and sequence.
  • Each column must have a unique name. This is required because the order of columns is not significant.
You can see in the example in Table 2-2 that the last column is multi-valued; it holds an array of items. Before starting with decomposition, let us briefly review the textual notation of a relational table. Remember the earlier example proposition, “Lubor Kollar was employed by Tail-spin Toys on March 19, 2004.” In a general form, you can write “Employee with (Name) was employed by (Company) on (EmploymentDate).” This generalized form of a proposition is a predicate. Terms in parentheses are value placeholders (entity attributes). A predicate defines the structure of a table. You can write the structure briefly as:
Employees(EmpId, EmployeeName, CompanyId, EmploymentDate) 
Underlined columns form the primary key. Actually, they form a candidate key, and a table can have multiple candidate keys. You could underline all candidate keys and double underline the primary key.
You decompose the table shown in Table 2-2 on the Items column. Every item leads to a new row, and every atomic piece of data of a single item (ProductId, ProductName, Quantity) leads to a new column. After the decomposition, you have multiple rows for a single order; therefore, you have to expand the primary key. You can compose the new primary key from the OrderId and ProductId columns. However, suppose you can allow multiple products on a single order, each time with a different discount, for example. Thus, you cannot use ProductId as part of the primary key. However, you can add the ItemId attribute and use it as a part of the new primary key. A decomposed table in first normal form would look like this:
Orders(OrderId, CustomerId, OrderDate, ItemId, ProductId, Quantity, 
ProductName)
Before moving to second NF, you have to understand a common misconception about first NF. You might have heard or read that you should not have a repeating group of columns. However, this advice is incorrect; repeating groups means you should not have a repeating group (that is, a collection) in a single column. For example, imagine this table:
Employees(EmployeeId, EmployeeName, Child1, Child2)
This table is perfectly in first NF. This design has a built-in constraint, allowing only employees who have two children. If you do not allow unknown (NULL) values for the Child1 and Child2 attributes, then you allow employees with exactly two children. This kind of constraint is not typical for business; nevertheless, it is a constraint built into the model, which is in first NF. Such constraints are rare, and a repeating group of columns typically represents a hidden collection. Take care not to decompose such groups automatically before checking whether this is a special constraint.

Second Normal Form

After achieving first NF, the decomposed table from Table 2-2 looks like Table 2-3.
Cc505842.table_C02623422_3(en-us,TechNet.10).png
Table 2-3 Table in First NF
You still have the following anomalies:
  • Insert How do you insert a customer without an order?
  • Update If customer 1 changes the order date for order 1, how do you perform the update? (In many places, possible inconsistencies could exist.)
  • Delete If you delete order 3, the data for customer 2 is lost.
To achieve second NF, a table must be in first NF, and every non-key column must be fully functionally dependent on the entire primary key. This means that no column can depend on part of the primary key only. In the example in Table 2-3, you know the customer and the order date if you know the value of the OrderId column; you do not need to know anything about ProductId, which is part of the primary key. The CustomerId and OrderDate columns depend on part of the primary key only—OrderId. To achieve second NF, you need to decompose the table into two tables:
Orders(OrderId, CustomerId, OrderDate)
OrderDetails(OrderId, ItemId, ProductId, Quantity, ProductName)
In the Orders table, you leave attributes that depend on OrderId only; then you introduce a new table, OrderDetails, to hold the other attributes. When achieving first NF, you are converting values from a multi-valued attribute to rows and changing the primary key; for second and all other NFs, you decompose tables into more tables. Second NF deals with relationships between columns that are part of a key and other columns.
After decomposing to multiple tables, you must have some common value that enables you to join the tables in queries; otherwise, you would lose some information. The decomposition has to be lossless. Of course, you need relationships between tables. A relationship is an association between two or more tables. Relationships are expressed in the data values of the primary and foreign keys. A primary key is a column or columns in a table whose values uniquely identify each row in the table. A foreign key is a column or columns whose values are the same as the primary key of another table—in other words, a copy of the primary key from another relational table. The relationship is made between two relational tables by matching the values of the foreign key with the values of the primary key.

Third Normal Form

After achieving second NF, the decomposed tables from Table 2-3 look like the tables in Table 2-4 and Table 2-5. Note that in the Orders table (Table 2-4), another attribute, CustomerName, is added to show that normalization violations can appear in any table.
Cc505842.Table_C02623422_4(en-us,TechNet.10).png
Table 2-4 Orders Table in Second NF
Cc505842.Table_C02623422_5(en-us,TechNet.10).png
Table 2-5 OrderDetails Table in Second NF
Second NF solves the update anomaly (if customer 1 changes the order date for order 1); however, you still have the following anomalies:
  • Insert How do you insert a customer without an order?
  • Delete If you delete order 3, the data for customer 2 is lost.
To achieve third NF, a table must be in second NF, and every non-key column must be non-transitively dependent on the primary key. For example, in Table 2-4, from OrderId, you can find CustomerId; then from CustomerId, you can get transitively to the CustomerName attribute value. Similarly, in Table 2-5, you can get transitively to ProductName through Pro-ductId from OrderId and ItemId. If you think of the rule for third NF from the non-key attributes point of view, it simply means you should have no functional dependencies between non-key columns. Non-key columns must depend on keys only. In the examples in Table 2-4 and Table 2-5, CustomerName depends on CustomerId, and ProductName depends on Pro-ductId. Thus, to achieve third NF, you must create new tables for dependencies between non-key columns:
Customers(CustomerId, CustomerName) Orders(OrderId, CustomerId, OrderDate) 
Products(ProductId, ProductName) 
OrderDetails(OrderId, ItemId, ProductId, Quantity)
This schema is free from all the update anomalies you had before normalization. However, it is not free from all update anomalies. For example, the schema itself cannot prevent you from inserting an unreasonable order date. (You will learn more about additional constraints in Chapter 3, “Designing a Physical Database.”) Note that this schema is also essentially the same (except for a couple of attributes omitted for the sake of brevity) as you received by using the ORM and ER approach. As mentioned earlier, use normalization for final checking and refining of your model.

Practice: Normalizing the Database

You are developing a database model that will support an application for managing projects (as in the Quick Check in Lesson 1, “Systematically Approaching Design Stages”). You collect the following information: each project has a single customer, each project can have many activities, and each project can have many employees assigned to it. You want to follow time spent (in hours) on projects by specific employee by activity for each day. Your initial design is:
Projects(ProjectId, ProjectName, CustomerId, CustomerName,
Activities(Activity1Id, Activity1Name, …, ActivityNId, 
ActivityNName), Employees(Employee1Id, Employee1Name, …, 
EmployeeNId, EmployeeNName), WorkDate, TimeSpent)

Exercise 1: Achieve the First Normal Form

In this exercise, you will bring this model to first NF. To achieve first NF, you need to eliminate all attributes that are collections.
  1. Check the Activities part of the table. Is this a collection?
  2. Check the Employees part of the table. Is this a collection? Your model should look like this:
Projects(ProjectId, ItemId, ProjectName, CustomerId, CustomerName, 
        ActivityId, ActivityName, EmployeeId, EmployeeName, WorkDate, 
        TimeSpent)

Exercise 2: Achieve the Second Normal Form

In this exercise, you will bring this model to second NF. To achieve second NF, you must make sure your model does not contain attributes that depend on only part of the primary key.
  1. The complete primary key in the table you created in Exercise 1, “Achieve the First Normal Form,” consists of ProjectId and ItemId.
  2. Do you really need both columns to find CustomerId and CustomerName associated with a project?
Your model should look like this:
Projects(ProjectId, ProjectName, CustomerId, CustomerName) 
        ProjectDetails(ProjectId, ItemId, ActivityId, ActivityName, 
        EmployeeId, EmployeeName, WorkDate, TimeSpent)

Exercise 3: Achieve the Third Normal Form

In this exercise, you will bring this model to third NF. To achieve third NF, you need to look at dependencies between non-key attributes.
  1. Is there any dependency between CustomerId and CustomerName?
  2. Is there any dependency between ActivityId and ActivityName?
  3. Is there any dependency between EmployeeId and EmployeeName? Your model should now look like this:
Projects(ProjectId, ProjectName, CustomerId)
       ProjectDetails(ProjectId, ItemId, ActivityId, EmployeeId, 
       WorkDate, TimeSpent) 
Customers(CustomerId, CustomerName) 
Activities(ActivityId, ActivityName)
Employees(EmployeeId, EmployeeName) 

Comments

Popular posts from this blog

MicroStrategy URL API Parameters

MicroStrategy URL Structure The following table summarizes the root URL structure used for every request to MicroStrategy Web. Environment Main Application URL Administration URL J2EE http://webserver/MicroStrategy/servlet/mstrWeb http://webserver/MicroStrategy/servlet/mstrWebAdmin .NET http://webserver/MicroStrategy/asp/Main.aspx http://webserver/MicroStrategy/asp/Admin.aspx Every request sent to MicroStrategy Web calls a central controller. Parameters are appended to  Main.aspx  or  mstrWeb  (in a .NET and J2EE environment, respectively) to indicate to the controller how the request should be internally forwarded and handled. The following examples show a URL for accessing a MicroStrategy folder when the user does not have an existing session. The URL contains not only the parameters needed to connect to MicroStrategy Web, but also the parameters needed to log on and create a session. J2EE environment: <a href="http:...

Create an alert-based subscription in MicroStrategy Distribution Services

Create an alert-based subscription in MicroStrategy Distribution Services on Web Subscription to a report or Report Services document which will be executed when a certain conditional threshold is met based on another executing report. For example, a scheduled report executes which shows the Revenue by day for the past week. If the Revenue on any one day falls below a certain value, a subscription to another report or Report Services document can be triggered and delivered to a recipient. An alert based subscription can only be created directly on a report; however, another report or Report Services document can be delivered when the alert based subscription is triggered. Note: you need a grid report to create an alert and you cannot create if you want to create on a document with text boxes. The following example will walk through the basic steps on how to setup a subscription based on an alert like this: Follow the brief  steps bel...

Super Cubes in MicroStrategy 2019

Super Cubes in MicroStrategy 2019 Beginning in MicroStrategy 11.0 and 2019, users will notice objects referred to as " Super Cubes ".  Super Cubes are simply a renaming of what was previously referred to as MTDI (Multi Table Data Import) Cubes. Note that only the naming convention has changed, as the functionality remains the same . This name change is visible in MicroStrategy Workstation and MicroStrategy Web. In MicroStrategy Workstation, right click on any cube created via Data Import and select Properties . The object Type is listed as Super Cube: This can also be seen when editing or authoring a dossier. In your Datasets panel, hover over the name of your dataset. The tooltip will show the dataset type listed as a Super Cube.

Microstrategy Custom group and consolidation comparison

Custom group and consolidation comparison Both consolidations and custom groups provide flexibility in reports, but the objects differ in their structure and use. The essential distinction is that consolidations work with attributes and custom groups use filters. Consolidations are groupings of attribute elements while custom groups are based on filter criteria. Custom groups are used to apply different filters to different rows of a report. Consolidations are used to create virtual attributes to allow reporting on attributes that do not exist in the data model. Finally, row level math can be performed with consolidations but not with custom groups. Custom groups are more flexible than consolidations because you do not have to know much about your data to create filters for a custom group. In contrast, consolidations require that you know exactly which attribute elements to select when creating the consolidation. To continue with the examples from the previous sect...

Enabling the Quick Search feature for MicroStrategy Web

Enable the quick search for MicroStrategy Web When exporting a report containing an image attribute form (using an ApplySimple statement) to PDF in MicroStrategy Web 9.4.1 and 10.x, with the Intelligence Server running on Linux, the image in the exported PDF report appears as a red "X" as shown below: However, the images in the report display properly when the report is executed in MicroStrategy Developer and Web. Furthermore, when the report is exported to PDF on Desktop (with the original images saved in WebASPX\Images), the images in the report display properly, as indicated below: CAUSE This is expected behavior. When the report is displayed in MicroStrategy Developer and Web, or when the report is exported to PDF from MicroStrategy Developer, images in the "WebASPX\Images" folder are used. However, when a report that contains images is exported to PDF from Web, the Intelligence Server will attempt to reference the locati...

Create a transaction services photo uploader

Create a transaction services photo uploader   1.  Create a new table "photo_upload" in Tutorial warehouse database (the default location: C:\Program Files\MicroStrategy\Tutorial Reporting\TUTORIAL_DATA_7200.mdb), as shown below:    2. The 'photo_upload' table has to be pre-populated with *exactly* 10 rows of data, the values for the 'ID' column should be 1-10 and the values for the 'uploaded' column should all be 0 3.  In MicroStrategy Desktop, create a freeform report "R1" based on the new table "photo_upload" in Tutorial data created at step 1, as shown below:   SELECT Location, Description, ID, uploaded, numbers FROM PHOTO_UPLOAD 4.  Create another table for transaction insert SQL. Make sure to create an 'autonumber' type ID as primary key for this table, or auto_increment ID for different DBs.                     5. Create...

Multi-Select Drop Down Selector with Apply Button

Multi-Select Drop Down Selector with  OK Button  When creating a Report Services Document, you may want to create a drop down selector to save space, but also would like the drop down to be a multi-selector with OK/Apply button. Below are instructions to achieve this in a Report Services Document.  Steps to Create: 1. Right click on your drop down selector and choose Properties and Formatting 2. Navigate to the Layout Tab 3. In the layout tab, click "Allow multiple selections". It will  not  indicate that the check box is enabled, but this step is mandatory.  3. Next, navigate to the Theme tab. Choose the  L ight Theme  from the drop down menu and click the Apply button. 4) Next, change the Theme back to  None  and click Apply. 4. When you return to the Layout tab, you will see the checkbox for  Allow multiple selections  is now checked.  

Microstrategy "Error type: Odbc error. Odbc operation attempted

 "Error type: Odbc error. Odbc operation attempted: SQLExecDirect. [HYT00:0: on SQLHANDLE] [MicroStrategy][ODBC Oracle Wire Protocol driver]Timeout expired" is shown when executing reports from Web When users are trying to execute some reports in MicroStrategy web in particular, they may receive the Error “SQL Generation Complete Index out of range” and “Timeout expired” error as shown below: Possible Causes: One possible cause is that the MicroStrategy Intelligence Server using a cached database connection that was already dropped by the RDBMS. To resolve this: Admin should delete the database connection caches and create a new DSNs in case they are sharing DSNs to connect to different databases. In addition, change the settings for the ‘Connection lifetime’ and the ‘Connection idle time out’.  Follow the steps below to perform the mentioned changes and verify the report after each step and some of the settings require i-server r...

Scheduling a report or document to be sent to an FTP in MSTR

Scheduling a report or document to be sent to an FTP server You can have a report or document automatically delivered to a location on your FTP server on a specific schedule. To do so, you must subscribe to the report or document, as described in the steps below. You can customize your subscription by typing macros in the  File Name ,  Sub-folder , or  Zip File Name  fields. These macros are automatically replaced with the appropriate text when the report or document is delivered. For example, you create a subscription to a document. If you type  {&Project}  in the  File Name field, the name of the project in which the document is saved is displayed in the name of the document when it is delivered. • This procedure assumes that an administrator has already added your FTP server as a new device in Developer. Steps to do so are included in the  System Administrator Help . To send a report or document to an FTP server on a schedule ...

Microstrategy Dossiers explained

Microstrategy  Dossiers With the release of MicroStrategy 10.9, we’ve taken a leap forward in our dashboarding capabilities by simplifying the user experience, adding storytelling, and collaboration.MSTR has  evolved dashboards to the point that they are more than dashboards - they are  interactive, collaborative analytic stories . Ultimately, it was time to go beyond dashboards, both in concept and in name, and so  the've  renamed VI dashboards to  ‘ dossiers ’.  Dossiers can be created by using the new Desktop product or Workstation or simply from the Web interface which replaces Visual Insights. All the existing visual Insights dashboards will be converted to Dossiers   With MicroStrategy 10.9, there was an active focus on making it easier to build dashboards for the widest audience of end users. To achieve this, some key new capabilities were added that make it easier to author, read, interact and collaborate on dashboards ...