Using gzip, or bzip2, or..?

Wanted to know a little more about the pros and cons of different compression tools and strategies.

Found this article A Quick Benchmark: Gzip vs. Bzip2 vs. LZMA and looking at that gzip is a pretty clear winner I think wrt my concerns (i.e. fast compression over high compression). Also found gzip vs. bzip2 which concluded basically the same.

Compressing JavaScript in PHP (no comments or whitespace)

Note: if you’re a web developer you might be interested in registering to become a ProgClub member. ProgClub is a free international club for computer programmers and we run some mailing lists you might like to hang out on to chat about software development, life, etc.

Given that I’ve been working on compressing CSS and compressing HTML in PHP, it’s only natural that I’m interested in JavaScript compression too. I haven’t done much research on the topic — I’m sure there are better tools out there than the one I’ve cobbled together — but for the sake of it here’s my first take on JavaScript compression in PHP:

function slib_compress_script( $buffer ) {

  // JavaScript compressor by John Elliot <jj5@jj5.net>

  $replace = array(
    '#\'([^\n\']*?)/\*([^\n\']*)\'#' => "'\1/'+\'\'+'*\2'", // remove comments from ' strings
    '#\"([^\n\"]*?)/\*([^\n\"]*)\"#' => '"\1/"+\'\'+"*\2"', // remove comments from " strings
    '#/\*.*?\*/#s'            => "",      // strip C style comments
    '#[\r\n]+#'               => "\n",    // remove blank lines and \r's
    '#\n([ \t]*//.*?\n)*#s'   => "\n",    // strip line comments (whole line only)
    '#([^\\])//([^\'"\n]*)\n#s' => "\\1\n",
                                          // strip line comments
                                          // (that aren't possibly in strings or regex's)
    '#\n\s+#'                 => "\n",    // strip excess whitespace
    '#\s+\n#'                 => "\n",    // strip excess whitespace
    '#(//[^\n]*\n)#s'         => "\\1\n", // extra line feed after any comments left
                                          // (important given later replacements)
    '#/([\'"])\+\'\'\+([\'"])\*#' => "/*" // restore comments in strings
  );

  $search = array_keys( $replace );
  $script = preg_replace( $search, $replace, $buffer );

  $replace = array(
    "&&\n" => "&&",
    "||\n" => "||",
    "(\n"  => "(",
    ")\n"  => ")",
    "[\n"  => "[",
    "]\n"  => "]",
    "+\n"  => "+",
    ",\n"  => ",",
    "?\n"  => "?",
    ":\n"  => ":",
    ";\n"  => ";",
    "{\n"  => "{",
//  "}\n"  => "}", (because I forget to put semicolons after function assignments)
    "\n]"  => "]",
    "\n)"  => ")",
    "\n}"  => "}",
    "\n\n" => "\n"
  );

  $search = array_keys( $replace );
  $script = str_replace( $search, $replace, $script );

  return trim( $script );

}

It’s funny, but jQuery actually choked on my original function because it contains a few strings like “*/*”. To fix the problem I had to patch jQuery with “*/”+”*”, but then I decided to handle that case in my code. Of course jQuery comes pre-minified by tools much more sophisticated than mine. My tool compresses the 230 KB jQuery file to 150 KB, whereas the tool jQuery uses compresses the file to 90 KB. So I think I have my work cut out for me! It was a fun hack though.

Compressing HTML in PHP (no comments or whitespace)

Note: if you’re a web developer you might be interested in registering to become a ProgClub member. ProgClub is a free international club for computer programmers and we run some mailing lists you might like to hang out on to chat about software development, life, etc.

In addition to compressing CSS in PHP I’ve been compressing HTML. My HTML compressor is a bit of a hack. It doesn’t handle CDATA sections for instance. But it should generally work OK. Here it is:

function slib_compress_html( $buffer ) {
  $replace = array(
    "#<!--.*?-->#s" => "",      // strip comments
    "#>\s+<#"       => ">\n<",  // strip excess whitespace
    "#\n\s+<#"      => "\n<"    // strip excess whitespace
  );
  $search = array_keys( $replace );
  $html = preg_replace( $search, $replace, $buffer );
  return trim( $html );
}

I use this function to compress HTML generated by my PHP scripts by putting ob_start( ‘slib_compress_html’ ) at the beginning of my script (after the ob_gzhandler) and ob_end_flush() at the end. My HTML compression code looks like this:

    if ( extension_loaded( 'zlib' ) ) { ob_start( 'ob_gzhandler' ); }
    ob_start( 'slib_compress_html' );

    run_app( $app_factory );

    ob_end_flush();
    if ( extension_loaded( 'zlib' ) ) { ob_end_flush(); }

Compressing CSS in PHP (no comments or whitespace)

Note: if you’re a web developer you might be interested in registering to become a ProgClub member. ProgClub is a free international club for computer programmers and we run some mailing lists you might like to hang out on to chat about software development, life, etc.

I was searching for methods to remove comments and whitespace from CSS files in PHP and I found this article (3 ways to compress CSS files using PHP).

The article suggests this code by Reinhold Weber, which I thought was a pretty good place to start:

<?php
  header('Content-type: text/css');
  ob_start("compress");
  function compress($buffer) {
    /* remove comments */
    $buffer = preg_replace('!/\*[^*]*\*+([^/][^*]*\*+)*/!', '', $buffer);
    /* remove tabs, spaces, newlines, etc. */
    $buffer = str_replace(array("\r\n", "\r", "\n", "\t", '  ', '    ', '    '), '', $buffer);
    return $buffer;
  }

  /* your css files */
  include('master.css');
  include('typography.css');
  include('grid.css');
  include('print.css');
  include('handheld.css');

  ob_end_flush();
?>

Later I found css_strip_whitespace by nyctimus in the documentation for the PHP method strip_whitespace. It looks like this:

function css_strip_whitespace($css)
{
  $replace = array(
    "#/\*.*?\*/#s" => "",  // Strip C style comments.
    "#\s\s+#"      => " ", // Strip excess whitespace.
  );
  $search = array_keys($replace);
  $css = preg_replace($search, $replace, $css);

  $replace = array(
    ": "  => ":",
    "; "  => ";",
    " {"  => "{",
    " }"  => "}",
    ", "  => ",",
    "{ "  => "{",
    ";}"  => "}", // Strip optional semicolons.
    ",\n" => ",", // Don't wrap multiple selectors.
    "\n}" => "}", // Don't wrap closing braces.
    "} "  => "}\n", // Put each rule on it's own line.
  );
  $search = array_keys($replace);
  $css = str_replace($search, $replace, $css);

  return trim($css);
}

The latter function is the one that I used, and I’m quite happy with it. I’ve been working on HTML and JavaScript compressors too, and those are much more difficult file formats to deal with than CSS.