This was a quick romp around “plain text”: Plain Text – Dylan Beattie – NDC Oslo 2021. Would recommend if you don’t already know about such things.
So I was having an issue with ViewVC wherein UTF-8 content (a copyright symbol) was being garbled in the web browser.
I chased a number of red herrings (Content-Type headers, http-equiv, XHTML vs HTML5) but eventually found the culprit in the viewvc.conf settings.
I needed to change the ‘detect_encoding’ setting from ‘1’ to ‘0’. Once that was done my content was presented correctly:
## detect_encoding: Should we attempt to detect versioned file ## character encodings? [Requires 'chardet' module, and is currently ## used only by the syntax coloration logic -- if enabled -- for the ## 'markup' and 'annotate' views; see 'enable_syntax_coloration'.] ## # 2019-06-02 jj5 - OLD: this was bollocksing things up... #detect_encoding = 1 # 2019-06-02 jj5 - NEW: so I changed it... detect_encoding = 0 # 2019-06-02 jj5 - END
So I was running this:
/var/www/jj-web-1-www.jj5.net-sixsigma: file.recurse: - clean: True - user: root - group: root - dir_mode: 755 - file_mode: 644 - source: salt://inst/mediawiki-1.29 - require: - pkg: apache2
And getting an error like this:
---------- ID: /var/www/jj-web-1-www.jj5.net-sixsigma Function: file.recurse Result: False Comment: #### /var/www/jj-web-1-www.jj5.net-sixsigma/vendor/james-heinrich/getid3/getid3/module.audio.ac3.php #### Source file 'salt://inst/mediawiki-1.29/vendor/james-heinrich/getid3/getid3/module.audio.ac3.php?saltenv=base' not found #### /var/www/jj-web-1-www.jj5.net-sixsigma/vendor/james-heinrich/getid3/getid3/module.audio-video.mpeg.php #### Source file 'salt://inst/mediawiki-1.29/vendor/james-heinrich/getid3/getid3/module.audio-video.mpeg.php?saltenv=base' not found Started: 14:27:18.352264 Duration: 134735.945 ms Changes: ----------
The issue was that the source files mentioned weren’t in UTF-8 format. To convert the files I ran, e.g.:
$ iconv -f WINDOWS-1252 -t UTF-8//TRANSLIT < module.audio-video.mpeg.php.bak > module.audio-video.mpeg.php
(Actually I couldn’t get the ‘iconv’ command to work so I edited manually in Vim)
I was getting an error like this:
/etc/cron.daily/etckeeper: bzr: ERROR: exceptions.UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 34: ordinal not in range(128) Traceback (most recent call last): File "/usr/lib/python2.6/dist-packages/bzrlib/commands.py", line 853, in exception_to_return_code return the_callable(*args, **kwargs) File "/usr/lib/python2.6/dist-packages/bzrlib/commands.py", line 1055, in run_bzr ret = run(*run_argv) File "/usr/lib/python2.6/dist-packages/bzrlib/commands.py", line 661, in run_argv_aliases return self.run_direct(**all_cmd_args) File "/usr/lib/python2.6/dist-packages/bzrlib/commands.py", line 665, in run_direct return self._operation.run_simple(*args, **kwargs) File "/usr/lib/python2.6/dist-packages/bzrlib/cleanup.py", line 122, in run_simple self.cleanups, self.func, *args, **kwargs) File "/usr/lib/python2.6/dist-packages/bzrlib/cleanup.py", line 156, in _do_with_cleanups result = func(*args, **kwargs) File "/usr/lib/python2.6/dist-packages/bzrlib/builtins.py", line 659, in run no_recurse, action=action, save=not dry_run) File "/usr/lib/python2.6/dist-packages/bzrlib/mutabletree.py", line 50, in tree_write_locked return unbound(self, *args, **kwargs) File "/usr/lib/python2.6/dist-packages/bzrlib/mutabletree.py", line 521, in smart_add for subf in sorted(os.listdir(abspath)): UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 34: ordinal not in range(128) bzr 2.1.4 on python 2.6.5 (Linux-18.104.22.168-rscloud-x86_64-with-Ubuntu-10.04-lucid) arguments: ['/usr/bin/bzr', 'add', '-q', '.'] encoding: 'ANSI_X3.4-1968', fsenc: 'ANSI_X3.4-1968', lang: None plugins: bzrtools /usr/lib/python2.6/dist-packages/bzrlib/plugins/bzrtools [2.1.0] etckeeper /usr/lib/python2.6/dist-packages/bzrlib/plugins/etckeeper [unknown] launchpad /usr/lib/python2.6/dist-packages/bzrlib/plugins/launchpad [2.1.4] netrc_credential_store /usr/lib/python2.6/dist-packages/bzrlib/plugins/netrc_credential_store [2.1.4] news_merge /usr/lib/python2.6/dist-packages/bzrlib/plugins/news_merge [2.1.4] *** Bazaar has encountered an internal error. This probably indicates a bug in Bazaar. You can help us fix it by filing a bug report at https://bugs.launchpad.net/bzr/+filebug including this traceback and a description of the problem. etckeeper warning: bzr add failed Committing to: /etc/ modified apache2/passwd.htdigest modified apache2/sites-available/svn.jj5.net-ssl Committed revision 87.
I’ve tried to fix it by adding:
export LANG=en_AU.UTF-8 export LANGUAGE=en_AU:en
As lines 2 and 3 in /etc/cron.daily/etckeeper.
Now I’ll wait a day or two and see if it worked…
I needed to know my options for htmlentities character encoding support today. The PHP documentation had everything I needed to know. I ended up adding these constants to my code:
const UTF8_ENCODING = 'UTF-8'; const ASCII_ENCODING = 'ISO-8859-1';
Today I needed to convert a UTF-16 file to UTF-8 and I did it with iconv:
iconv -f UTF-16 -t UTF-8 /path/to/input > /path/to/output
Here’s an interesting article from way back in 2004: XML on the Web Has Failed.
On my list of things to do is read the document Handling character encodings in HTML and CSS from the W3C. For some reason I can’t quite bring myself to concentrate on it right now.