Read a little from the Open Web Application Security Project today.
Learned about a web design website today called A List Apart. It seems to have some interesting content centered on web design.
Read the spec for Mailman 3.0. Looks like it will be pretty good. The feature that I’m interested in, and I’m annoyed I can’t do this with my current version of Mailman, is to be able to put a link to the web archived message in the bottom of the outgoing SMTP message. I.e. so there’s a link back to that message on the web in the message itself. Would be really handy for referencing. At the moment if I want a link I have to go to the web archive for the particular list and find it.
While I was reading the Mailman 3.0 spec I noticed a link to Postfix Virtual Domain Hosting Howto. I think I might have read (at least some of) that before. But… reading that is now definitely on my TODO list.
I have a computer sitting on my desk that is always on (it’s my file server) and it has a monitor attached which is almost never in use (I ssh to that server if I want to do things so it’s hardly ever logged in).
I thought it would be cool if on that monitor the web-logs from all of the systems I manage were shown so I could keep an eye on things and maybe learn a thing or two about my web-sites and how people are using them.
So the first thing I did was write a script to grab any given web log:
root@orac:~# cat /root/get-web-log.sh #!/bin/bash echo Starting download of $3... while : ; do su -c "ssh $1 tail -f /var/log/apache2/$2 < /dev/null" jj5 \ | tee -a /var/log/web.log \ | grep --line-buffered -v "Mozilla.5.0 .compatible. Googlebot.2.1. .http...www.google.com.bot.html." \ | grep --line-buffered -v "Baiduspider...http...www.baidu.com.search.spider.htm." \ | grep --line-buffered -v "Mozilla.5.0 .compatible. Baiduspider.2.0. .http...www.baidu.com.search.spider.html." \ | grep --line-buffered -v "Mozilla.5.0 .compatible. Exabot.3.0. .http...www.exabot.com.go.robot." \ | grep --line-buffered -v "Mozilla.5.0 .compatible. YandexBot.3.0. .http...yandex.com.bots." \ > /var/log/web/$3 sleep 60 echo; echo; echo Restarting download of $3...; echo; echo; done
Then I wrote a series of scripts which call the get-web-log.sh script for specific web-sites on specific servers, e.g.:
root@orac:~# cat /root/web-log/get-jsphp.co #!/bin/bash /root/get-web-log.sh honesty www.jsphp.co-access.log jsphp.co exit
Then I wrote a main script, rather unoriginally called info.sh, that kicks off the web logs downloads and then monitors their progress as they come through:
root@orac:~# cat /root/info.sh #!/bin/bash # disable the screensaver setterm -blank 0 -powersave off -powerdown 0 # start downloading the web-logs cd /root/web-log ./get-jsphp.co & sleep 1 #...all the other downloaders, one for each site # watch the web-logs cd /var/log/web tail -f * # stop downloading the web-logs kill %1 #...all the other kills, one for each downloader exit
Then I edited /etc/init/tty1.conf so that on tty1, instead of having a login console, I automatically ran my info.sh script:
root@orac:~# cat /etc/init/tty1.conf # tty1 - getty # # This service maintains a getty on tty1 from the point the system is # started until it is shut down again. start on stopped rc RUNLEVEL= stop on runlevel [!2345] respawn #exec /sbin/getty -8 38400 tty1 exec /root/info.sh < /dev/tty1 > /dev/tty1 2>&1
And that was it. The only trick was that I needed to disable the screen saver (as shown in the info.sh script) so that the screen didn’t constantly blank.
And now I can watch the web activity on all of my sites in real time.
I found this article (Some Guidelines for Determining Web Page and File Size) today which talks about the average size of HTML and other files on the web. According the article (and I’m not clear how they got their data) the average HTML file is 25k, JPEG 11.9k, GIF 2.9k, PNG 14.5k, SWF 32k, external scripts 11.2k and external CSS 17k with the average total size of a web page being 130k. Interesting stuff. Particularly that scripts are typically 11.2k given that jQuery is 90k.
So many pros and cons, and it’s all hypothetical… what I really need is data. Anyway, I don’t have data, nor do I really have the tools to get it. So given that I have to fly in the dark, here’s my plan: