Difference between revisions of "JJcode"

From ProgClub
Jump to: navigation, search
m (→‎Tasks: s/Done/TODO/)
 
(21 intermediate revisions by the same user not shown)
Line 1: Line 1:
JJcode is the ProgClub key management software. That's the software that helps you manage identity keys in your applications from the database into the UI. For other projects see [[projects]].
+
JJcode is the ProgClub key management software. That's the software that helps you manage identity keys in your applications from the database through to the UI. For other projects see [[projects]].
  
 
= Status =
 
= Status =
  
Latest production version: 0.1.2.
+
Latest production version: 0.1.12.
Latest development version: 0.1.3.
+
Latest development version: 0.1.13.
  
 
See [[#Tasks|tasks]] for work that still needs to be done.
 
See [[#Tasks|tasks]] for work that still needs to be done.
Line 24: Line 24:
 
Upstream contributors:
 
Upstream contributors:
  
* Our Base56 encoding adapted from [https://github.com/stephen-hill/base58php base58php]
+
* Our Base47 encoding adapted from [https://github.com/stephen-hill/base58php base58php]
  
 
== Copyright ==
 
== Copyright ==
Line 38: Line 38:
 
Libraries, tools, services or media from third parties used under license:
 
Libraries, tools, services or media from third parties used under license:
  
* Our Base56 encoding adapted from [https://github.com/stephen-hill/base58php base58php] provided under the MIT license
+
* Our Base47 encoding adapted from [https://github.com/stephen-hill/base58php base58php] provided under the MIT license
  
 
= Resources =
 
= Resources =
Line 62: Line 62:
 
== Links ==
 
== Links ==
  
* [https://en.wikipedia.org/wiki/Base58 Base58 on Wikipedia], see note RE: Base56
+
* [https://en.wikipedia.org/wiki/Base58 Base58 on Wikipedia]
  
 
= Specifications =
 
= Specifications =
Line 76: Line 76:
 
# unique keys (similar to GUIDs but more compact)
 
# unique keys (similar to GUIDs but more compact)
 
# safe hashes (better than MD5 and SHA1, shorter than SHA256/SHA512)
 
# safe hashes (better than MD5 and SHA1, shorter than SHA256/SHA512)
# short/human-readable hashes/keys suitable for URLs (28-29 char strings)
+
# short/human-readable hashes/keys suitable for URLs (29-31 char strings)
 
# database integration (keys are binary(21) SQL in hex format 0x1234...)
 
# database integration (keys are binary(21) SQL in hex format 0x1234...)
# profanity filtering (for English profanity and 1337 5p34k)
+
# profanity free (for English profanity and 1337 5p34k)
  
 
A jjcode can be in one of four formats:
 
A jjcode can be in one of four formats:
  
 
# [[#bid_format|bid]]: binary format, 21 bytes
 
# [[#bid_format|bid]]: binary format, 21 bytes
# [[#cid_format|cid]]: short character format (human readable), 28-29 character string
+
# [[#cid_format|cid]]: short character format (human readable), 29-31 character string
 
# [[#hid_format|hid]]: hexadecimal format, 42 character string
 
# [[#hid_format|hid]]: hexadecimal format, 42 character string
 
# [[#sql_format|sql]]: hexadecimal format for SQL, 44 character string
 
# [[#sql_format|sql]]: hexadecimal format for SQL, 44 character string
Line 89: Line 89:
 
There are a bunch of functions for converting between formats, e.g. bid2sql. If you have a key but you're not sure of the format you can call e.g. key2bid, key2cid, etc. to get it into a known format.
 
There are a bunch of functions for converting between formats, e.g. bid2sql. If you have a key but you're not sure of the format you can call e.g. key2bid, key2cid, etc. to get it into a known format.
  
So a '[[#bid_format|bid]]' is 21 bytes and suitable for use in database tables; a '[[#cid_format|cid]]' is 28-29 human readable characters and suitable for use in URLs or APIs; a '[[#hid_format|hid]]' is a 42 character string suitable for debugging; and '[[#sql_format|sql]]' format is suitable for use in SQL queries. By default data is validated before it is converted between formats (you can override this behaviour).
+
So a '[[#bid_format|bid]]' is 21 bytes and suitable for use in database tables; a '[[#cid_format|cid]]' is 29-31 human readable characters and suitable for use in URLs or APIs; a '[[#hid_format|hid]]' is a 42 character string suitable for debugging; and '[[#sql_format|sql]]' format is suitable for use in SQL queries. By default data is validated before it is converted between formats (you can override this behaviour).
  
To generate the '[[#cid_format|cid]]' format codes we use the jjencode function, which is basically a Base56 encoder. Similarly to convert '[[#cid_format|cid]]' codes back into other formats we begin with jjdecode, the Base56 decoder.
+
To generate the '[[#cid_format|cid]]' format codes we use the jjencode function, which is basically a Base47 encoder. Similarly to convert '[[#cid_format|cid]]' codes back into other formats we begin with jjdecode, the Base47 decoder.
  
 
Inputs to jjcode hashing functions are converted to strings if necessary. If you want to hash data/objects that aren't strings you can serialize your input before passing it in (which will obviously couple you to PHP). We didn't want to tie our data formats to PHP so we require string inputs (i.e. we don't do input serialization for you, but it's easy to do yourself if that's what you want).
 
Inputs to jjcode hashing functions are converted to strings if necessary. If you want to hash data/objects that aren't strings you can serialize your input before passing it in (which will obviously couple you to PHP). We didn't want to tie our data formats to PHP so we require string inputs (i.e. we don't do input serialization for you, but it's easy to do yourself if that's what you want).
Line 111: Line 111:
 
It's worth noting that jjcodes aren't the final word in key management for your application. For example, if you were to tell your new customer that their customer ID was 3tt5dfie39y6hJJPQg6PsM8P46N48, they might tell you to get stuffed! If you need keys for use in HTML forms or other business logic see our [[#KKcode_functionality|kkcode functions]] for shorter and more user-friendly identifiers with built-in classification and redundancy checks.
 
It's worth noting that jjcodes aren't the final word in key management for your application. For example, if you were to tell your new customer that their customer ID was 3tt5dfie39y6hJJPQg6PsM8P46N48, they might tell you to get stuffed! If you need keys for use in HTML forms or other business logic see our [[#KKcode_functionality|kkcode functions]] for shorter and more user-friendly identifiers with built-in classification and redundancy checks.
  
Regarding profanity filtering: there is a list of blacklisted strings which are English words, or parts thereof, which could possibly be considered offensive. The list of English words is processed to generate 1337 5p34k variations, so e.g. 455 53x would be filtered. Note particularly that only English terms have been catalogued, there is no support for other languages at this time.
+
Regarding profanity filtering: because we leave out the vowels (and 0, 1, 3, and 4) it's not possible to have English or 1337 5p34k profanity. There is however a [https://www.progclub.org/pcrepo/jjcode/branches/0.1/php/jjfilter.php#l1 list of blacklisted strings] which are English abbreviations which could possibly be interpreted as offensive. The list of blacklisted terms is processed to generate 1337 5p34k variations. Note particularly that only abbreviated English/1337-5p34k terms have been catalogued, there is no support for profanity filtering in other languages at this time.
  
The catalogue of profanity is "append only". The developers can append blacklisted terms to the list, but removing terms or reordering terms will be problematic. If the list of profanity is changed by new additions then all previously generated keys/hashes will need to be upgraded. There is a jjcode_upgrade function for this purpose. Remember: you can never ever remove things from the profanity list or change their order.
+
The catalogue of profanity is "append only". The developers can append blacklisted terms to the list, but removing terms or reordering terms will be problematic. If the list of profanity is changed by new additions then all previously generated keys/hashes will need to be upgraded. There is a jjcode_upgrade function for this purpose. Remember: you can never ever remove things from the profanity list or change their order.  
  
 
==== bid format ====
 
==== bid format ====
Line 121: Line 121:
 
==== cid format ====
 
==== cid format ====
  
The 'cid' format is a human-readable string of characters that is between 28 and 29 characters long. It is suitable for use in your URLs or your APIs. The 'cid' format is checked for profanity. A 'cid' may contain any of the following characters:
+
The 'cid' format is a human-readable string of characters that is between 29 and 31 characters long. It is suitable for use in your URLs or your APIs. A 'cid' may contain any of the following characters:
  
  23456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnpqrstuvwxyz
+
  256789BCDFGHJKLMNPQRSTVWXYZbcdfghjkmnpqrstvwxyz
  
 
==== hid format ====
 
==== hid format ====
Line 163: Line 163:
 
The technical specification describes how the project works.
 
The technical specification describes how the project works.
  
The software is implemented in two PHP libraries. The jjcode family of functions are in [https://www.progclub.org/pcrepo/jjcode/branches/0.1/src/jjcode.php src/jjcode.php] and the kkcode family of functions are in [https://www.progclub.org/pcrepo/jjcode/branches/0.1/src/kkcode.php src/kkcode.php].
+
The software is implemented in two PHP libraries. The jjcode family of functions are in [https://www.progclub.org/pcrepo/jjcode/branches/0.1/php/jjcode.php php/jjcode.php] and the kkcode family of functions are in [https://www.progclub.org/pcrepo/jjcode/branches/0.1/php/kkcode.php php/kkcode.php].
  
 
=== JJcode technicalities ===
 
=== JJcode technicalities ===
Line 183: Line 183:
 
  svn co https://www.progclub.org/svn/pcrepo/jjcode/tags/latest/0.1 jjcode
 
  svn co https://www.progclub.org/svn/pcrepo/jjcode/tags/latest/0.1 jjcode
  
Then include it in your application. If you want the jjcode family of functions:
+
Then include it in your PHP application. If you want the jjcode family of functions:
  
  require_once '/path/to/jjcode/src/jjcode.php';
+
  require_once '/path/to/jjcode/php/jjcode.php';
  
 
And if you want the kkcode family of functions:
 
And if you want the kkcode family of functions:
  
  require_once '/path/to/jjcode/src/kkcode.php';
+
  require_once '/path/to/jjcode/php/kkcode.php';
  
 
== Notes for developers ==
 
== Notes for developers ==
Line 196: Line 196:
  
 
See [[#Source_code|source code]] above RE: how to get the code for development purposes.
 
See [[#Source_code|source code]] above RE: how to get the code for development purposes.
 +
 +
=== Profanity filter configuration ===
 +
 +
If you're going to add to the profanity list (and you want to think really hard about doing that) then you need to bump the minor version before doing so and then you need to let all of your users know that they have to upgrade all of their keys if they move to the new version. Remember that the profanity list is "append only", you cannot reorder or otherwise modify released profanity content.
  
 
=== Doing a release ===
 
=== Doing a release ===
Line 212: Line 216:
  
 
= Tasks =
 
= Tasks =
 
[[Category:TODO]]
 
  
 
== TODO ==
 
== TODO ==
Line 220: Line 222:
  
 
* could have more comprehensive unit tests...
 
* could have more comprehensive unit tests...
 +
* should port to languages other than PHP...
 +
* should provide a database-upgrade script which can automatically upgrade all JJcodes
 +
** note: we don't need this functionality yet, but if we modify the profanity list we will
 +
** note: could assume that all binary(21) fields are JJcodes..?
  
 
== Done ==
 
== Done ==
Line 225: Line 231:
 
Stuff that's done. Latest stuff on top.
 
Stuff that's done. Latest stuff on top.
  
 +
* [[User:John|JE]] 2016-05-12: released version 0.1.12 (after minor updates)
 +
* [[User:John|JE]] 2016-05-12: released version 0.1.10 (passing $verify to validator)
 +
* [[User:John|JE]] 2016-05-12: released version 0.1.8 (after profanity filtering re-enabled)
 +
* [[User:John|JE]] 2016-05-12: released version 0.1.6 (after jjkey default format changed)
 +
* [[User:John|JE]] 2016-05-12: released version 0.1.4 (after directory layout changed)
 
* [[User:John|JE]] 2016-05-12: released version 0.1.2
 
* [[User:John|JE]] 2016-05-12: released version 0.1.2
 
* [[User:John|JE]] 2016-05-12: documented [[#KKcode_functionality|kkcode functionality]]
 
* [[User:John|JE]] 2016-05-12: documented [[#KKcode_functionality|kkcode functionality]]

Latest revision as of 16:14, 11 December 2017

JJcode is the ProgClub key management software. That's the software that helps you manage identity keys in your applications from the database through to the UI. For other projects see projects.

Status

Latest production version: 0.1.12. Latest development version: 0.1.13.

See tasks for work that still needs to be done.

Motivation

Why this software? So the idea is to make it easy to manage keys in your application. The jjcode keys/hashes are strong, short, human-readable, and profanity free; and the kkcode identifiers are opaque, classified, and redundancy checked (also they are shorter and more user-friendly than jjcodes, thus more suitable for data entry).

Administration

Contributors

Members who have contributed to this project. Newest on top.

All contributors have agreed to the terms of the Contributor License Agreement. This excludes any upstream contributors who tend to have different administrative frameworks.

Upstream contributors:

Copyright

Copyright 2016, Contributors.

License

Licensed under the MIT license.

Components

Libraries, tools, services or media from third parties used under license:

  • Our Base47 encoding adapted from base58php provided under the MIT license

Resources

Downloads

There are presently no downloads for this library. See below about how to get the source code.

Source code

The repository can be browsed online:

https://www.progclub.org/pcrepo/jjcode/branches/0.1

The latest stable released version of the code is available from:

https://www.progclub.org/svn/pcrepo/jjcode/tags/latest/0.1

Or if you want the latest version for development purposes:

https://www.progclub.org/svn/pcrepo/jjcode/branches/0.1

Links

Specifications

Functional specification

The functional specification describes what the project does.

JJcode functionality

The jjcode functions provide:

  1. unique keys (similar to GUIDs but more compact)
  2. safe hashes (better than MD5 and SHA1, shorter than SHA256/SHA512)
  3. short/human-readable hashes/keys suitable for URLs (29-31 char strings)
  4. database integration (keys are binary(21) SQL in hex format 0x1234...)
  5. profanity free (for English profanity and 1337 5p34k)

A jjcode can be in one of four formats:

  1. bid: binary format, 21 bytes
  2. cid: short character format (human readable), 29-31 character string
  3. hid: hexadecimal format, 42 character string
  4. sql: hexadecimal format for SQL, 44 character string

There are a bunch of functions for converting between formats, e.g. bid2sql. If you have a key but you're not sure of the format you can call e.g. key2bid, key2cid, etc. to get it into a known format.

So a 'bid' is 21 bytes and suitable for use in database tables; a 'cid' is 29-31 human readable characters and suitable for use in URLs or APIs; a 'hid' is a 42 character string suitable for debugging; and 'sql' format is suitable for use in SQL queries. By default data is validated before it is converted between formats (you can override this behaviour).

To generate the 'cid' format codes we use the jjencode function, which is basically a Base47 encoder. Similarly to convert 'cid' codes back into other formats we begin with jjdecode, the Base47 decoder.

Inputs to jjcode hashing functions are converted to strings if necessary. If you want to hash data/objects that aren't strings you can serialize your input before passing it in (which will obviously couple you to PHP). We didn't want to tie our data formats to PHP so we require string inputs (i.e. we don't do input serialization for you, but it's easy to do yourself if that's what you want).

So here is an example key in various formats:

bid: string(21) ";!F7?z???f?NU?n?q???"
cid: string(29) "7vHkwFsRXVUje7HsuZzQYCMtwmvVw"
hid: string(42) "3b214637be7ac03faf6615b44e55a06ee771adaa89"
sql: string(44) "0x3b214637be7ac03faf6615b44e55a06ee771adaa89"

Remember:

  1. bid for databases
  2. cid for URLs and APIs
  3. hid for debugging
  4. sql for queries

It's worth noting that jjcodes aren't the final word in key management for your application. For example, if you were to tell your new customer that their customer ID was 3tt5dfie39y6hJJPQg6PsM8P46N48, they might tell you to get stuffed! If you need keys for use in HTML forms or other business logic see our kkcode functions for shorter and more user-friendly identifiers with built-in classification and redundancy checks.

Regarding profanity filtering: because we leave out the vowels (and 0, 1, 3, and 4) it's not possible to have English or 1337 5p34k profanity. There is however a list of blacklisted strings which are English abbreviations which could possibly be interpreted as offensive. The list of blacklisted terms is processed to generate 1337 5p34k variations. Note particularly that only abbreviated English/1337-5p34k terms have been catalogued, there is no support for profanity filtering in other languages at this time.

The catalogue of profanity is "append only". The developers can append blacklisted terms to the list, but removing terms or reordering terms will be problematic. If the list of profanity is changed by new additions then all previously generated keys/hashes will need to be upgraded. There is a jjcode_upgrade function for this purpose. Remember: you can never ever remove things from the profanity list or change their order.

bid format

The 'bid' format is a byte array 21 bytes long. It is suitable for use on your database tables, e.g. binary(21).

cid format

The 'cid' format is a human-readable string of characters that is between 29 and 31 characters long. It is suitable for use in your URLs or your APIs. A 'cid' may contain any of the following characters:

256789BCDFGHJKLMNPQRSTVWXYZbcdfghjkmnpqrstvwxyz

hid format

The 'hid' format is a hexadecimal string format that is 42 characters long. It is suitable for use in debugging.

sql format

The 'sql' format is a hexadecimal string format that is 44 characters long. It is similar to the 'hid' format except that it begins with '0x'. It is suitable for use in SQL queries.

KKcode functionality

The kkcode functions provide identifiers for use within your business logic that are:

  1. shorter than jjcodes (user-friendly for manual data entry)
  2. classified (based on single-letter classifier)
  3. opaque (don't trivially expose how many objects you have per class)
  4. redundant/verified (include two check digits)

kkcode format

The kkcode format is:

{$class}{$check-1}{$code}{$check-2}

Where:

  • $class: is a single upper case letter that indicates the class, e.g.
    • 'C' for customer,
    • 'P' for part...
  • $check-1: the 1st check digit (0-9)
  • $code: the underlying ID + bump value
  • $check-2: the 2nd check digit (0-9)

The underlying ID and bump value are nominated by the caller. The underlying ID is usually an auto-incremented ID from the database. The bump value can be any positive integer but defaults to 123. The bump value is added to the underlying ID to render the opaque 'code' used in the data format.

Technical specification

The technical specification describes how the project works.

The software is implemented in two PHP libraries. The jjcode family of functions are in php/jjcode.php and the kkcode family of functions are in php/kkcode.php.

JJcode technicalities

The underlying hash function used by jjcodes is SHA512. It was recommended to use SHA512 instead of SHA256 for performance reasons.

KKcode technicalities

The underlying redundancy check is the PHP function crc32.

Notes

Notes for implementers

If you are interested in incorporating this software into your project, here's what you need to know:

Check-out the latest stable code from svn (or configure an svn:externals in your project):

svn co https://www.progclub.org/svn/pcrepo/jjcode/tags/latest/0.1 jjcode

Then include it in your PHP application. If you want the jjcode family of functions:

require_once '/path/to/jjcode/php/jjcode.php';

And if you want the kkcode family of functions:

require_once '/path/to/jjcode/php/kkcode.php';

Notes for developers

If you're looking to set up a development environment for this project here's what you need to know:

See source code above RE: how to get the code for development purposes.

Profanity filter configuration

If you're going to add to the profanity list (and you want to think really hard about doing that) then you need to bump the minor version before doing so and then you need to let all of your users know that they have to upgrade all of their keys if they move to the new version. Remember that the profanity list is "append only", you cannot reorder or otherwise modify released profanity content.

Doing a release

To release a version of this project use the pcrepo-branch-release script from the jj5-bin project:

$ pcrepo-branch-release jjcode $MAJOR.$MINOR $REVISION

Where:

$MAJOR = the major version number, presently 0
$MINOR = the minor version number, presently 1
$REVISION = the revision number for this release

Note: the $REVISION number starts at 1 and is incremented for each release. The revision number is odd for development releases and even for production releases. See status for last production release and increment by two for next production release.

Tasks

TODO

Things to do, in rough order of priority:

  • could have more comprehensive unit tests...
  • should port to languages other than PHP...
  • should provide a database-upgrade script which can automatically upgrade all JJcodes
    • note: we don't need this functionality yet, but if we modify the profanity list we will
    • note: could assume that all binary(21) fields are JJcodes..?

Done

Stuff that's done. Latest stuff on top.

  • JE 2016-05-12: released version 0.1.12 (after minor updates)
  • JE 2016-05-12: released version 0.1.10 (passing $verify to validator)
  • JE 2016-05-12: released version 0.1.8 (after profanity filtering re-enabled)
  • JE 2016-05-12: released version 0.1.6 (after jjkey default format changed)
  • JE 2016-05-12: released version 0.1.4 (after directory layout changed)
  • JE 2016-05-12: released version 0.1.2
  • JE 2016-05-12: documented kkcode functionality
  • JE 2016-05-12: documented jjcode functionality
  • JE 2016-05-12: copied initial jjcode/kkcode implementation from jj5-test
  • JE 2016-05-12: created project in svn
  • JE 2016-05-12: created project page