Difference between revisions of "Database patterns"

From ProgClub
Jump to: navigation, search
Line 30: Line 30:
 
| char, code, name, etc.
 
| char, code, name, etc.
 
| not logged
 
| not logged
|
+
| lookup data is read-only and packaged with schema
 
|-
 
|-
 
| [[#ident|ident]]
 
| [[#ident|ident]]

Revision as of 23:15, 21 March 2020

This page documents the various types of database design patterns supported by the PHPBOM library.

Introduction

The page documents the type of tables we have in our databases.

Our schemas are acyclic. Tables can be self referential, otherwise their foreign keys must point to a previously defined table. If you find yourself with a cyclic situation you can fix it by splitting your two tables into three tables.

We have two types of data, being core mutable application data in data and hist tables and then everything else.

Many of our table patterns implement a parent pattern, being elemental data.

Elemental tables

The elemental data pattern is basically where we have a small thing, like say an DBT_UINT8, that points to a large thing, like say a DBT_HASH_BIN. The characteristics of elemental tables are that they are insert-only (updates and deletions are not allowed), and this property lends them to being manageable on a raw connection which uses low transaction isolation and atomic commits. Further, because inserts cannot be rolled back, we can cache all elemental mappings, so once an application server knows the small key for the corresponding long key it can cache that fact and in future does not need to request the values from the database again.

Note that most (but not all) elemental tables log the a_interaction_id for the interaction in which they were registered. Items only get registered once.

The table pattern which implement the elemental pattern are:

pattern small thing big things a_interaction_id note
lookup enum char, code, name, etc. not logged lookup data is read-only and packaged with schema
ident id creation date logged
particle id some larger integer or usually a case-sensitive string logged
token id DBT_TOKEN logged
name id some string value logged be careful with case-sensitivity
string id a string value and a DBT_HASH_BIN logged the DBT_HASH_BIN unsures case-sensitivity
jzon id a blob and a DBT_HASH_BIN logged
zlob id a blob and a DBT_HASH_BIN logged
upload id a DBT_HASH_BIN and other details logged
structure id a string value, a DBT_HASH_BIN, and relevant components logged the DBT_HASH_BIN unsures case-sensitivity

a_interaction_id

We have a standard ident table called t_ident_interaction which contains the serial number for an application interaction (e.g. a HTTP request). When most elemental data is registered (not lookup data) we log the a_interaction_id of the interaction that did the registration.

Patterns

about

The about tables contain database meta data and are accessed via the 'raw' connection.

An about table is named 't_about_{name}'.

lookup

The lookup tables contain hard-coded reference data and are accessed via the 'raw' connection.

A lookup table is named 't_lookup_{name}'.

A lookup table must contain a 'a_{name}_enum' column which identifies a row. The 'a_{name}_enum' column is usually be a DBT_UINT8.

The 'a_{name}_enum' must be the primary key.

If an optional 'a_{name}_char' column exists it must be of type DBT_ASCII_CHAR_BIN. The PHPBOM library will use lowercase char codes and applications can use uppercase char codes.

If you're extending a PHPBOM lookup table you should code from the max enum down as PHPBOM will code from the min enum up.

ident

The ident tables allocate unique identifiers and are accessed via the 'raw' connection.

An ident table is named 't_ident_{name}'.

An ident table has a 'a_{name}_id' column which is usually a DBT_ID24 or DBT_ID32.

particle

The particle tables contain particles and are accessed via the 'raw' connection. A particle table maps a wide column to a narrow column.

A particle table is named 't_particle_{name}'.

A particle table must contain a 'a_{name}_id' column which contains the narrow ID for the particle. The 'a_{name}_id' column is usually a DBT_ID24 or DBT_ID32.

A particle table must contain a 'a_{name}' column which is the wide value. When particle 'a_{name}' columns are strings they should be case-sensitive. If you want a case-insensitive string use a name table instead.

token

The token tables contain tokens and are accessed via the 'raw' connection.

A token table is named 't_token_{name}'.

A token table must contain a 'a_{name}_id' column which contains the ID for the token. The 'a_{name}_id' column is usually a DBT_ID24 or DBT_ID32.

The 'a_{name}_id' column must be the primary key.

A token table must contain a 'a_{name}' column which contains the token. The 'a_{name}' column is usually a DBT_TOKEN.

There should be a unique index on the 'a_{name}' column.

name

The name tables contain names and are accessed via the 'raw' connection. Names are case-insensitive strings.

A name table is named 't_name_{name}'.

A name table must contain a 'a_{name}_id' column which contains the ID for the name. The 'a_{name}_id' column is usually a DBT_ID24 or DBT_ID32.

A name table must contain a 'a_{name}' column which contains the name. The 'a_{name}' column is usually a DBT_ASCII_CI or DBT_UTF8_CI but it could be DBT_ASCII_BIN or DBT_UTF8_BIN depending on your requirements.

A name table must contain a 'a_{name}_lower' column which contains the lowercase name. The 'a_{name}_lower' column is usually a DBT_ASCII_BIN or DBT_UTF8_BIN. Ordinarily there will be a unique index on the 'a_{name}_lower' column, but you can make this a non-unique index depending upon your requirements.

string

The string tables contain strings and are accessed via the 'raw' connection. A string table maps a wide column to a narrow column. You would usually use a string table if a particle table is too small.

A string table is named 't_string_{name}'.

A string table must contain a 'a_{name}_id' column which contains the narrow ID for the string. The 'a_{name}_id' column is usually a DBT_ID24 or DBT_ID32.

A string table must contain a 'a_{name}' column which is the wide string value. The 'a_{name}' column should be case-insensitive. Case-sensitivity is ensured by the 'a_{name}_hash_bin' column, so the 'a_{name}' column can use a case-insensitive collation.

A string table must contain a 'a_{name}_hash_bin' column which contains the DBT_HASH_BIN binary hash of the string value.

jzon

The jzon tables contain compressed JSON and are accessed via the 'raw' connection.

A jzon table is named 't_jzon_{name}'.

A jzon table must contain a 'a_{name}_id' column which contains the narrow ID for the compressed data. The 'a_{name}_id' column is usually a DBT_ID24 or DBT_ID32.

A jzon table must contain a 'a_{name}' column which is the compressed binary data. The 'a_{name}' column is usually a DBT_MLOB.

A jzon table must contain a 'a_{name}_hash_bin' column which contains the DBT_HASH_BIN binary hash of the compressed data.

See the inflate.php program if you need to work with data in this format.

zlob

The zlob tables contain compressed blobs and are accessed via the 'raw' connection.

A zlob table is named 't_zlob_{name}'.

A zlob table must contain a 'a_{name}_id' column which contains the narrow ID for the compressed blob. The 'a_{name}_id' column is usually a DBT_ID24 or DBT_ID32.

A zlob table must contain a 'a_{name}' column which is the compressed binary data. The 'a_{name}' column is usually a DBT_MLOB.

A zlob table must contain a 'a_{name}_hash_bin' column which contains the DBT_HASH_BIN binary hash of the compressed blob.

See the inflate.php program if you need to work with data in this format.

upload

The upload tables contain data about uploaded files and are accessed via the 'raw' connection.

A upload table is named 't_upload_{name}'.

An upload table has 'a_{name}_id', 'a_{name}_hash_bin', 'a_{name}_file_name', 'a_{name}_file_type' and 'a_file_type_enum' columns. Usually the actual file will be stored in the file-system using the 'a_{name}_hash_hex' value for the file name.

structure

The structure tables contain structures and are accessed via the 'raw' connection. A structure table maps a wide column to a narrow column and constituent component columns.

A structure table is named 't_structure_{name}'.

A structure table must contain a 'a_{name}_id' column which contains the narrow ID for the structure. The 'a_{name}_id' column is usually a DBT_ID24 or DBT_ID32.

A structure table must contain a 'a_{name}' column which is the wide string value. The 'a_{name}' column should be case-insensitive. Case-sensitivity is ensured by the 'a_{name}_hash_bin' column, so the 'a_{name}' column can use a case-insensitive collation.

A structure table must contain a 'a_{name}_hash_bin' column which contains the DBT_HASH_BIN binary hash of the wide string value.

Other component columns will exist. For example for a 't_structure_email_address' there will be 'a_email_address_mailbox' and 'a_email_address_host' columns.

data

The data tables contain mutable application data and are accessed via the 'trn' connection. All data tables have a corresponding hist table.

A data table is named 't_data_{name}'.

A data table has a 'a_{name}_id' column which is is usually a DBT_ID24 or DBT_ID32.

A data table has a 'a_{name}_version' column which is usually a reference to t_ident_interaction.a_interaction_id.

A data table will have other columns too.

hist

The hist tables contain logs of changes to mutable application data stored in data tables and are accessed via the 'trn' connection. All hist tables have a corresponding data table.

A hist table is named 't_hist_{name}'.

A hist table has a 'a_{name}_hist_id' primary key which is is usually a DBT_ID32.

A hist table has a 'a_interaction_id' column, which is a reference to the t_ident_interaction.a_interaction_id in which the data was modified.

A hist table has a 'a_crud_enum' column which is a reference to the t_lookup_crud.a_crud_enum that indicates the CRUD operation which affected the data.

A hist table then contains a copy of all of the columns from its corresponding data table.

log

A log table contains log data and are accessed via the 'raw' connection.

A log table is named 't_log_{name}'.

There are no rules for the content of log tables, but you should usually (but not always) try to copy data into logs by value rather than by reference, because referenced data might change after the logs were written.

other

If we're dealing with tables which can't be categorised into one or more of the above patterns we call it other data and there are no restrictions on its format. The connection to use for 'other' tables is unspecified.

Database connections

In an application there are usually two separate connections to the database, being the 'raw' connection and the 'trn' connection. The database pattern indicates which connection to use when operating on any given type of table. Generally the 'log' connection is used for everything except for data and hist tables which use the 'trn' connection.

raw connection

The 'raw' connection uses the READ UNCOMMITTED transaction isolation level.

trn connection

The 'trn' connection uses the SERIALIZABLE transaction isolation level.

Database types

DBT_ID8

A DBT_ID8 is an auto generated integer from 1 to 255.

DBT_ID16

A DBT_ID16 is an auto generated integer from 1 to 65,535.

DBT_ID24

A DBT_ID24 is an auto generated integer from 1 to 16,777,215.

DBT_ID32

A DBT_ID32 is an auto generated integer from 1 to 4,294,967,295.

DBT_ID64

A DBT_ID64 is an auto generated integer from 1 to 264-1.

DBT_UINT8

A DBT_UINT8 is an integer from 0 to 255.

DBT_UINT16

A DBT_UINT16 is an integer from 1 to 65,535.

DBT_UINT24

A DBT_UINT24 is an integer from 1 to 16,777,215.

DBT_UINT32

A DBT_UINT32 is an integer from 1 to 4,294,967,295.

DBT_UINT64

A DBT_UINT64 is an integer from 1 to 264-1.

DBT_ASCII_CHAR_BIN

A DBT_ASCII_CHAR_BIN is a single case-sensitive ASCII character.

DBT_HASH_BIN

A DBT_HASH_BIN is a case-sensitive 28-byte binary string containing a SHA512/224 hash value.

DBT_HASH_HEX

A DBT_HASH_HEX is a case-insensitive 56-character hexadecimal string containing a SHA512/224 hash value.

DBT_TOKEN

A DBT_TOKEN is a case-sensitive 32-character alphanumeric ASCII string.

DBT_CREATED_ON

A DBT_CREATED_ON is the timestamp of when the record was created.

DBT_UPDATED_ON

A DBT_UPDATED_ON is the timestamp of when the record was last updated.

Standards

As standard all database tables should finish with DBT_CREATED_ON and DBT_UPDATED_ON columns. Even for insert-only data. The logic being that if insert-only data is modified then the DBT_CREATED_ON !== DBT_UPDATED_ON discrepancy might help reveal the issue.