Today while reading Writing better Regular Expressions in PHP I learned that meta characters are treated as literals in character classes. So
'/^(\d[.]\d)$/' will match
'1.2' but not
'1x2'. Who knew!?
So I was having an issue with ViewVC wherein UTF-8 content (a copyright symbol) was being garbled in the web browser.
I chased a number of red herrings (Content-Type headers, http-equiv, XHTML vs HTML5) but eventually found the culprit in the viewvc.conf settings.
I needed to change the ‘detect_encoding’ setting from ‘1’ to ‘0’. Once that was done my content was presented correctly:
## detect_encoding: Should we attempt to detect versioned file
## character encodings? [Requires 'chardet' module, and is currently
## used only by the syntax coloration logic -- if enabled -- for the
## 'markup' and 'annotate' views; see 'enable_syntax_coloration'.]
# 2019-06-02 jj5 - OLD: this was bollocksing things up...
#detect_encoding = 1
# 2019-06-02 jj5 - NEW: so I changed it...
detect_encoding = 0
# 2019-06-02 jj5 - END
Found this which said:
$('<div/>').text('This is fun & stuff').html();
I needed to know my options for htmlentities character encoding support today. The PHP documentation had everything I needed to know. I ended up adding these constants to my code:
const UTF8_ENCODING = 'UTF-8';
const ASCII_ENCODING = 'ISO-8859-1';
Here’s an interesting article from way back in 2004: XML on the Web Has Failed.
On my list of things to do is read the document Handling character encodings in HTML and CSS from the W3C. For some reason I can’t quite bring myself to concentrate on it right now.