Database design and normalization

I would like to compare and contrast two options for the design of customer contact information for a relational database.

This is the first option, a “fat” customer table.

create table t_fat_customer (
  customer_id bigint not null primary key,
  customer_name varchar(255) not null,
  customer_phone_1 varchar(255) not null default '',
  customer_phone_2 varchar(255) not null default '',
  customer_tollfree_1 varchar(255) not null default '',
  customer_tollfree_2 varchar(255) not null default '',
  customer_mobile_1 varchar(255) not null default '',
  customer_mobile_2 varchar(255) not null default '',
  customer_record_created_on timestamp default current_timestamp,
  customer_record_updated_on timestamp default current_timestamp on update current_timestamp
);

insert into t_fat_customer (
  customer_id,
  customer_name,
  customer_phone_1,
  customer_phone_2,
  customer_tollfree_1,
  customer_tollfree_2,
  customer_mobile_1,
  customer_mobile_2
)
values (
  1,
  'John Doe',
  '123-456-7890',
  '123-456-7891',
  '800-123-4567',
  '',
  '123-456-7892',
  ''
), (
  2,
  'Jane Doe',
  '123-456-7894',
  '123-456-7895',
  '800-123-4569',
  '',
  '123-456-7896',
  ''
);

This is the second option, a “skinny” and “properly normalized” option. It has lots of skinny tables, which some will claim are the hallmarks of “good design”.

create table t_skinny_customer (
  customer_id bigint not null primary key,
  customer_name varchar(255) not null,
  customer_record_created_on timestamp default current_timestamp,
  customer_record_updated_on timestamp default current_timestamp on update current_timestamp
);

create table t_skinny_customer_phone (
  customer_phone_id bigint not null primary key,
  customer_id bigint not null references t_skinny_customer( customer_id ) on delete cascade,
  phone_number varchar(255) not null,
  phone_type enum( 'phone', 'tollfree', 'mobile' ) not null default 'phone',
  phone_type_order tinyint unsigned not null default 0,
  phone_record_created_on timestamp default current_timestamp,
  phone_record_updated_on timestamp default current_timestamp on update current_timestamp
);

insert into t_skinny_customer (
  customer_id,
  customer_name
)
values (
  1,
  'John Doe'
), (
  2,
  'Jane Doe'
);

insert into t_skinny_customer_phone (
  customer_phone_id,
  customer_id,
  phone_number,
  phone_type,
  phone_type_order
)
values (
  11,
  1,
  '123-456-7890',
  'phone',
  1
), (
  12,
  1,
  '123-456-7891',
  'phone',
  2
), (
  13,
  1,
  '800-123-4567',
  'tollfree',
  1
), (
  14,
  1,
  '123-456-7892',
  'mobile',
  1
), (
  21,
  2,
  '123-456-7894',
  'phone',
  1
), (
  22,
  2,
  '123-456-7895',
  'phone',
  2
), (
  23,
  2,
  '800-123-4569',
  'tollfree',
  1
), (
  24,
  2,
  '123-456-7896',
  'mobile',
  1
);

Here are some stats from the design options:

Fat option Skinny option
Lines of code: 42 85
Tables: 1 2
Columns: 10 11
Records: 2 10

The first option is simpler than the second option by every measure.

I know, you want to create a more general solution. I know, they say a hallmark of good database design and “proper normalization” is lots of skinny tables. I remind you: the first option is simpler than the second option by every measure.

My advice is don’t rush toward general solutions in an effort to solve problems you don’t have.

The UI isn’t coupled to the database design in any way. You could implement any type of UI on top of either database design alternative, such as:

Primary Secondary Operations
Phone:
Tollfree:
Mobile:

Or:

Phone Type Phone Number Operations

(My opinion is that the first UI option above is superior to the second UI option, it’s also exceedingly easy to implement with the “fat” customer table.)

Here are some implementation exercises to compare the difficulty of working with the database options:

  • given a customer ID get the primary tollfree number or the primary phone number if there’s no tollfree number on record
  • for each customer, fix the phone number records so that the secondary contact option is promoted to the primary contact option if there isn’t already a primary option on record, for each phone type (phone, tollfree, mobile).
  • implement user interface option A above
  • implement user interface option B above
  • create a view to present the “fat” customer contact options as a table like in the “skinny” t_skinny_customer_phone table
  • create a view to present the “skinny” customer contact options as a table like in the “fat” t_fat_customer table
  • enforce a business rule which says that not more than two contact options for each phone type should be stored in the database for each customer
  • check for duplicate contact phone numbers and remove them when found
  • put the customer contact information into a spreadsheet and email to marketing by COB

The Database Relational Fuckup

In his book The Database Relational Model on page 25 Date says:

Here I would just comment–here comes the hindsight once again–that if as suggested earlier the database is to be regarded as a (correct!) logical system, it must never be allowed to contain any inconsistencies, at least from the user’s point of view. In other words, “remedying inconsistencies” needs to be done on an individual statement-by-statement basis (not even on a transaction-by-transaction basis). See reference [52] for further elaboration on this point.

Reference [52] is then:

C. J. Date and Hugh Darwen: Foundation for Object/Relational Databases: The Third Manifesto. Reading, Mass.: Addison-Wesley (1998). A second edition of this book, under the revised title Foundation for Future Database Systems: The Third Manifesto, is due to appear concurrently with the present book.

I think Date did a lot of harm with this comment. The classic example of a transaction is taking $100 from my account and depositing it in your account. So there’s one statement to take the money out, and another statement to put the money in, and after the first statement but before the second statement the database will always go through an invalid state. If RDBMS integrity facilities were on a transaction-by-transaction basis we could actually use them, as they’re not we suffer. Either we need to do more work than necessary or we have to under specify our constraints, or some awkward combination of both.

A great MySQL session

I found this great MySQL session over here. It has a clever approach for creating a big table, and shows how to invoke shell commands from the `mysql` client.

USE test;
SET GLOBAL innodb_file_per_table=1;
SET GLOBAL autocommit=0;

-- Create an uncompressed table with a million or two rows.
CREATE TABLE big_table AS SELECT * FROM information_schema.columns;
INSERT INTO big_table SELECT * FROM big_table;
INSERT INTO big_table SELECT * FROM big_table;
INSERT INTO big_table SELECT * FROM big_table;
INSERT INTO big_table SELECT * FROM big_table;
INSERT INTO big_table SELECT * FROM big_table;
INSERT INTO big_table SELECT * FROM big_table;
INSERT INTO big_table SELECT * FROM big_table;
INSERT INTO big_table SELECT * FROM big_table;
INSERT INTO big_table SELECT * FROM big_table;
INSERT INTO big_table SELECT * FROM big_table;
COMMIT;
ALTER TABLE big_table ADD id int unsigned NOT NULL PRIMARY KEY auto_increment;

SHOW CREATE TABLE big_table\G

select count(id) from big_table;

-- Check how much space is needed for the uncompressed table.
\! ls -l data/test/big_table.ibd

CREATE TABLE key_block_size_4 LIKE big_table;
ALTER TABLE key_block_size_4 key_block_size=4 row_format=compressed;

INSERT INTO key_block_size_4 SELECT * FROM big_table;
commit;

-- Check how much space is needed for a compressed table
-- with particular compression settings.
\! ls -l data/test/key_block_size_4.ibd