Skip to main content

Designing a Normalized Database

Designing a Normalized Database from Microsoft

Tables representing propositions about entities of one type (that is, representing a single set) are fully normalized. Correct and complete mapping of a conceptual ORM model to a logical model yields fully normalized tables. Properly designed entities in an ER model lead to fully normalized tables as well. However, both ORM and ER modeling start with the business description of a problem; it is possible to miss some dependencies between entities and leave some tables denormalized. Of course, there could also be a bug in the tool that produces the DDL script from the ORM and ER models. However, any denormalization can lead to update anomalies. Data integrity and consistency are fundamental for databases. Remember that a database holds propositions, and propositions are facts. If propositions are not true, they are not facts; they are falsehoods. You need a logical method that yields a fully normalized database.
Normalization is the process of redesigning the model to unbundle any overlapping entities. The process involves decomposition; however, decomposition cannot yield a loss of information. You perform the decomposition by applying a linear progression of rules called normal forms. Normalization eliminates redundancy and incompleteness. Note the part that designers frequently overlook: normalization eliminates incompleteness, not just redundancy. Many normal forms (NFs) are defined; the first six are called first NF, second NF, third NF, Boyce-Codd NF, fourth NF, and fifth NF. If a database is in fifth NF, the database is fully normalized. Only the first three NFs are important; usually, if a database is in third NF, it is in fifth NF as well. You should understand the normalization form and use it to perform a final check of your database design, checking the model you created by using other methods.

First Normal Form

Imagine a table such as the one that Table 2-2 shows. The table holds information about sales. In this case, only the OrderId column is part of the primary key.
Cc505842.table_C02623422_2(en-us,TechNet.10).png
Table 2-2 Table Before First NF
With a design like this, you can have the following anomalies:
  • Insert How do you insert a customer without an order?
  • Update If item Bo is renamed, how do you perform an update?
  • Delete If order 3 is deleted, the data for customer 2 is lost.
  • Select How do you calculate the total quantity of bolts?
Note that only update and select anomalies deal with redundancy: they are problematic because the table contains redundant data. Insert and delete anomalies deal with incompleteness of the model. The rule for first NF is, “A table is in first normal form if all columns are atomic.” This means there can be no multi-valued columns—columns that would hold a collection such as an array or another table. First NF is somewhat redundant with the definition of a relational table or of a relation. A table is a relation if it fulfills the following conditions:
  • Values are atomic. The columns in a relational table are not a repeating group or arrays.
  • Columns are of the same kind. All values in a column come from the same domain.
  • Rows are unique. There is at least one column or set of columns, the values of which uniquely identify each row in the table.
  • The order of columns is insignificant. You can share the same table without worrying about table organization.
  • The sequence of rows is insignificant. A relational table can be retrieved in a different order and sequence.
  • Each column must have a unique name. This is required because the order of columns is not significant.
You can see in the example in Table 2-2 that the last column is multi-valued; it holds an array of items. Before starting with decomposition, let us briefly review the textual notation of a relational table. Remember the earlier example proposition, “Lubor Kollar was employed by Tail-spin Toys on March 19, 2004.” In a general form, you can write “Employee with (Name) was employed by (Company) on (EmploymentDate).” This generalized form of a proposition is a predicate. Terms in parentheses are value placeholders (entity attributes). A predicate defines the structure of a table. You can write the structure briefly as:
Employees(EmpId, EmployeeName, CompanyId, EmploymentDate) 
Underlined columns form the primary key. Actually, they form a candidate key, and a table can have multiple candidate keys. You could underline all candidate keys and double underline the primary key.
You decompose the table shown in Table 2-2 on the Items column. Every item leads to a new row, and every atomic piece of data of a single item (ProductId, ProductName, Quantity) leads to a new column. After the decomposition, you have multiple rows for a single order; therefore, you have to expand the primary key. You can compose the new primary key from the OrderId and ProductId columns. However, suppose you can allow multiple products on a single order, each time with a different discount, for example. Thus, you cannot use ProductId as part of the primary key. However, you can add the ItemId attribute and use it as a part of the new primary key. A decomposed table in first normal form would look like this:
Orders(OrderId, CustomerId, OrderDate, ItemId, ProductId, Quantity, 
ProductName)
Before moving to second NF, you have to understand a common misconception about first NF. You might have heard or read that you should not have a repeating group of columns. However, this advice is incorrect; repeating groups means you should not have a repeating group (that is, a collection) in a single column. For example, imagine this table:
Employees(EmployeeId, EmployeeName, Child1, Child2)
This table is perfectly in first NF. This design has a built-in constraint, allowing only employees who have two children. If you do not allow unknown (NULL) values for the Child1 and Child2 attributes, then you allow employees with exactly two children. This kind of constraint is not typical for business; nevertheless, it is a constraint built into the model, which is in first NF. Such constraints are rare, and a repeating group of columns typically represents a hidden collection. Take care not to decompose such groups automatically before checking whether this is a special constraint.

Second Normal Form

After achieving first NF, the decomposed table from Table 2-2 looks like Table 2-3.
Cc505842.table_C02623422_3(en-us,TechNet.10).png
Table 2-3 Table in First NF
You still have the following anomalies:
  • Insert How do you insert a customer without an order?
  • Update If customer 1 changes the order date for order 1, how do you perform the update? (In many places, possible inconsistencies could exist.)
  • Delete If you delete order 3, the data for customer 2 is lost.
To achieve second NF, a table must be in first NF, and every non-key column must be fully functionally dependent on the entire primary key. This means that no column can depend on part of the primary key only. In the example in Table 2-3, you know the customer and the order date if you know the value of the OrderId column; you do not need to know anything about ProductId, which is part of the primary key. The CustomerId and OrderDate columns depend on part of the primary key only—OrderId. To achieve second NF, you need to decompose the table into two tables:
Orders(OrderId, CustomerId, OrderDate)
OrderDetails(OrderId, ItemId, ProductId, Quantity, ProductName)
In the Orders table, you leave attributes that depend on OrderId only; then you introduce a new table, OrderDetails, to hold the other attributes. When achieving first NF, you are converting values from a multi-valued attribute to rows and changing the primary key; for second and all other NFs, you decompose tables into more tables. Second NF deals with relationships between columns that are part of a key and other columns.
After decomposing to multiple tables, you must have some common value that enables you to join the tables in queries; otherwise, you would lose some information. The decomposition has to be lossless. Of course, you need relationships between tables. A relationship is an association between two or more tables. Relationships are expressed in the data values of the primary and foreign keys. A primary key is a column or columns in a table whose values uniquely identify each row in the table. A foreign key is a column or columns whose values are the same as the primary key of another table—in other words, a copy of the primary key from another relational table. The relationship is made between two relational tables by matching the values of the foreign key with the values of the primary key.

Third Normal Form

After achieving second NF, the decomposed tables from Table 2-3 look like the tables in Table 2-4 and Table 2-5. Note that in the Orders table (Table 2-4), another attribute, CustomerName, is added to show that normalization violations can appear in any table.
Cc505842.Table_C02623422_4(en-us,TechNet.10).png
Table 2-4 Orders Table in Second NF
Cc505842.Table_C02623422_5(en-us,TechNet.10).png
Table 2-5 OrderDetails Table in Second NF
Second NF solves the update anomaly (if customer 1 changes the order date for order 1); however, you still have the following anomalies:
  • Insert How do you insert a customer without an order?
  • Delete If you delete order 3, the data for customer 2 is lost.
To achieve third NF, a table must be in second NF, and every non-key column must be non-transitively dependent on the primary key. For example, in Table 2-4, from OrderId, you can find CustomerId; then from CustomerId, you can get transitively to the CustomerName attribute value. Similarly, in Table 2-5, you can get transitively to ProductName through Pro-ductId from OrderId and ItemId. If you think of the rule for third NF from the non-key attributes point of view, it simply means you should have no functional dependencies between non-key columns. Non-key columns must depend on keys only. In the examples in Table 2-4 and Table 2-5, CustomerName depends on CustomerId, and ProductName depends on Pro-ductId. Thus, to achieve third NF, you must create new tables for dependencies between non-key columns:
Customers(CustomerId, CustomerName) Orders(OrderId, CustomerId, OrderDate) 
Products(ProductId, ProductName) 
OrderDetails(OrderId, ItemId, ProductId, Quantity)
This schema is free from all the update anomalies you had before normalization. However, it is not free from all update anomalies. For example, the schema itself cannot prevent you from inserting an unreasonable order date. (You will learn more about additional constraints in Chapter 3, “Designing a Physical Database.”) Note that this schema is also essentially the same (except for a couple of attributes omitted for the sake of brevity) as you received by using the ORM and ER approach. As mentioned earlier, use normalization for final checking and refining of your model.

Practice: Normalizing the Database

You are developing a database model that will support an application for managing projects (as in the Quick Check in Lesson 1, “Systematically Approaching Design Stages”). You collect the following information: each project has a single customer, each project can have many activities, and each project can have many employees assigned to it. You want to follow time spent (in hours) on projects by specific employee by activity for each day. Your initial design is:
Projects(ProjectId, ProjectName, CustomerId, CustomerName,
Activities(Activity1Id, Activity1Name, …, ActivityNId, 
ActivityNName), Employees(Employee1Id, Employee1Name, …, 
EmployeeNId, EmployeeNName), WorkDate, TimeSpent)

Exercise 1: Achieve the First Normal Form

In this exercise, you will bring this model to first NF. To achieve first NF, you need to eliminate all attributes that are collections.
  1. Check the Activities part of the table. Is this a collection?
  2. Check the Employees part of the table. Is this a collection? Your model should look like this:
Projects(ProjectId, ItemId, ProjectName, CustomerId, CustomerName, 
        ActivityId, ActivityName, EmployeeId, EmployeeName, WorkDate, 
        TimeSpent)

Exercise 2: Achieve the Second Normal Form

In this exercise, you will bring this model to second NF. To achieve second NF, you must make sure your model does not contain attributes that depend on only part of the primary key.
  1. The complete primary key in the table you created in Exercise 1, “Achieve the First Normal Form,” consists of ProjectId and ItemId.
  2. Do you really need both columns to find CustomerId and CustomerName associated with a project?
Your model should look like this:
Projects(ProjectId, ProjectName, CustomerId, CustomerName) 
        ProjectDetails(ProjectId, ItemId, ActivityId, ActivityName, 
        EmployeeId, EmployeeName, WorkDate, TimeSpent)

Exercise 3: Achieve the Third Normal Form

In this exercise, you will bring this model to third NF. To achieve third NF, you need to look at dependencies between non-key attributes.
  1. Is there any dependency between CustomerId and CustomerName?
  2. Is there any dependency between ActivityId and ActivityName?
  3. Is there any dependency between EmployeeId and EmployeeName? Your model should now look like this:
Projects(ProjectId, ProjectName, CustomerId)
       ProjectDetails(ProjectId, ItemId, ActivityId, EmployeeId, 
       WorkDate, TimeSpent) 
Customers(CustomerId, CustomerName) 
Activities(ActivityId, ActivityName)
Employees(EmployeeId, EmployeeName) 

Comments

Popular posts from this blog

No 'Alert' option appear when trying to create an alert-based subscription in MicroStrategy Distribution Services

The 'Alert' option does not appear when attempting to create an alert-based subscription in MicroStrategy Distribution Services In MicroStrategy Distribution Service 9.x and 10.x, and 11.x versions it is possible to create an alert-based subscription. When right-clicking the metric header of a report in MicroStrategy Web 9.0.x, the 'Alerts' option does not appear:    Cause : This issue occurs because the user attempting to create the alert does  not have all of the necessary privileges on alerts.   Fix : In order to create an alert-based subscription, the following privileges are required: In order ti get permissions to create alerts the user should be given the following privileges by the admin: New Version of Microstrategy 11.x: Server- Distribution: Older Versions of Microstrategy 9.x, 10.x etc..: Web Reporter > Web user Web Analyst > Web create alert   ...

Custom Subtotal Displays in MicroStrategy

Defining custom subtotal displays in MicroStrategy By default, when users apply subtotals in a report, the name of the subtotal is displayed in the subtotal line items that appear in the report. Users can use custom subtotals to give more control over the characteristics of a subtotal. Custom subtotals allow users to define custom subtotal line items that appear on the reports  U sers can make the subtotal name dynamic by typing special characters in the subtotal name field as listed in the following table. Character Description #A The name of the attribute under which the subtotal appears. #P The name of the attribute to the left of, or above the attribute under which the subtotal appears. #0 All the forms of the parent element. #1 The first form of the parent element reading from left to right or from top to bottom. #2 The second form of the parent element reading from left to right or from top to bottom. #3 The third form of th...

Create an alert-based subscription in MicroStrategy Distribution Services

Create an alert-based subscription in MicroStrategy Distribution Services on Web Subscription to a report or Report Services document which will be executed when a certain conditional threshold is met based on another executing report. For example, a scheduled report executes which shows the Revenue by day for the past week. If the Revenue on any one day falls below a certain value, a subscription to another report or Report Services document can be triggered and delivered to a recipient. An alert based subscription can only be created directly on a report; however, another report or Report Services document can be delivered when the alert based subscription is triggered. Note: you need a grid report to create an alert and you cannot create if you want to create on a document with text boxes. The following example will walk through the basic steps on how to setup a subscription based on an alert like this: Follow the brief  steps bel...

Microstrategy Custom number formatting symbols

Custom number formatting symbols If none of the built-in number formats meet your needs, you can create your own custom format in the Number tab of the Format Cells dialog box. Select  Custom  as the Category and create the format using the number format symbols listed in the table below. Each custom format can have up to four optional sections, one each for: Positive numbers Negative numbers Zeros Text Each section is optional. Separate the sections by semicolons, as shown in the example below: #,###;(#,###);0;"Error: Entry must be numeric" For more examples, see  Custom number formatting examples . To jump to a section of the formatting symbol table, click one of the following: Numeric symbols Character/text symbols Date and time symbols Text color symbols Currency symbols Conditional symbols Numeric symbols For details on how numeric symbols apply to the Big Decimal data type, refer to the  Project Design Guide . ...

Scheduling a report or document to be sent to an FTP in MSTR

Scheduling a report or document to be sent to an FTP server You can have a report or document automatically delivered to a location on your FTP server on a specific schedule. To do so, you must subscribe to the report or document, as described in the steps below. You can customize your subscription by typing macros in the  File Name ,  Sub-folder , or  Zip File Name  fields. These macros are automatically replaced with the appropriate text when the report or document is delivered. For example, you create a subscription to a document. If you type  {&Project}  in the  File Name field, the name of the project in which the document is saved is displayed in the name of the document when it is delivered. • This procedure assumes that an administrator has already added your FTP server as a new device in Developer. Steps to do so are included in the  System Administrator Help . To send a report or document to an FTP server on a schedule ...

"System Prompt" and its uses in MicroStrategy

System Prompt and its uses in MicroStrategy WHAT IS A "SYSTEM PROMPT"? "System Prompt" is a system object that was introduced back in version 8.0.0. The object is named as "User Login" and is implemented as a prompt object. The object can be found under Public Objects > Prompts > System prompts, as shown below: Unlike ordinary prompt objects, system prompts don't require any answers from the user. When a report containing a system prompt runs, the prompt is answered automatically with the login of the user who runs the report. On the other hand, like other prompt objects, answers to system prompts are used to match caches. Therefore, users don't share caches for reports that contain system prompts. For details on how caches are matched, refer to the following MicroStrategy Knowledge Base document: KB5300-7X0-0147 - How are caches matched in MicroStrategy Intelligence Server 7.x? WHEN ARE SYSTEM PROMPTS USED?    System pr...

Execute Integrity manager test from Command line

Execute Integrity manager test from Command line  MSTR Integrity Manager allows the user to execute a test without having to load the GUI, or to schedule a test to run later at specific times or dates. Go over using Windows AT command at: https://support.microsoft.com/en-us/help/313565/how-to-use-the-at-command-to-schedule-tasks https://www.lifewire.com/at-command-2618090 Here are the prerequisites to execute a test from the command line: Create a test and saved using the Integrity Manager graphical interface. Make sure that the users has the ' Use Integrity Manager ' privilege for that project(provided by the administrator) and the ' Execute ' permission for the reports to be tested. Keep in mind that MicroStrategy Integrity Manager can only test three-tier projects, i.e., projects which are connected to a MicroStrategy i- Server. Projects in Direct Connection (two-tier) mode cannot be tested with this tool...

Export a Report Services document to Excel with formatting using URL API

Export a Report Services document to Excel with formatting using URL API in MSTR Web In order to export a document in excel format using the URL API, the executionMode must be set to 4.  If excutionMode is not provided in the URL, by default PDF will be used as executionMode.   Below are the list of parameters that the URL must contain in order to execute correctly.   evt= 3069 src= Main.aspx.3069 executionMode= 4 documentID= 7E1644CA424F482DA811569FCE8127FF( Replace the document Id with your document ID)   Sample URL for .NET environment: http://WebServerName/MicroStrategy/asp/Main.aspx?evt=3069&src=Main.aspx. 3069 &executionMode= 4 &documentID= 7E1644CA424F482DA811569FCE8127FF    

Predictive modelling in Data Science using Microstrategy

Creating a predictive modelling in MicroStrategy MicroStrategy Data Mining Services has been evolving to include more data mining algorithms and functionality. One key feature is MicroStrategy Developer’s Training Metric Wizard. The Training Metric Wizard can be used to create several different types of predictive models including linear and exponential regression, logistic regression, decision tree, cluster, time series, and association rules. Linear and exponential regression The linear regression data mining technique should be familiar to you if you have ever tried to extrapolate or interpolate data, tried to find the line that best fits a series of data points, or used Microsoft Excel’s LINEST or LOGEST functions. Regression analyzes the relationship between several predictive inputs, or independent variables, and a dependent variable that is to be predicted. Regression finds the line that best fits the data, with a minimum of error. For example, you have a dataset ...

Personalizing file locations, email and file subscriptions using macros in Microstrategy

Personalizing file locations MSTr allows to dynamically specify the  File Location  and  Backup File Location  in a file device using macros.  For example, if you specify the  File Location  as  C:\Reports\{&RecipientName}\ ,  all subscriptions using that file device are delivered to subfolders of  C:\Reports\ . Subscribed reports or documents for each recipient are delivered to a subfolder with that recipient’s name, such as  C:\Reports\Jane Smith\  or  C:\Reports\Hiro Protagonist\ . The table below lists the macros that can be used in the  File Location  and  Backup File Location  fields in a file device: Description Macro Date on which the subscription is sent {&Date} Time at which the subscription is sent {&Time} Name of the recipient {&RecipientName} User ID (32-character GUID) of the recipient {&RecipientID} Distribution Services add...