Log analysis and stats _ _ _
INDEX BACK NEXT _ _ _

Introduction
Squid (in it's default configuration) makes 4 logfiles.

The first three of these logfiles can be safely be cycled. Although the file in the cache directory is called a log, it is actually an index of each object on disk (when squid is started it reads this file, rather than reading each and every object in the cache), and cannot thus be cycled, though it does get smaller if you send squid a 'kill -HUP'

More info on the format of the logs can be found here, or in the Release Notes included in the distribution (these are the release notes for version 1.1, you may want to check in the doc directory in the source if you have a different version.)

Quite often, the logs can start taking a lot of space... due to the way the unix filesystem works, simply deleting the files (while squid is writing to them) will not free the space until all programs close the file. You can get squid to close the file by sending it a 'kill -USR1'. Squid will then move the current access.log, cache.log and store.log to access.log.0, cache.log.0 and store.log.0. If these files exist, however, it will move the old access.log.0 to access.log.1 and then create a new access.log.0 with the current data-set. It will increase the number at the end until it reaches the value set in the config file for 'logfile_rotate'. Once the file has been moved, and the new log is being written, you can either analyse the logs or simply delete them.

Now we know what logfiles there are, let's see what we can do with them:

Tools
Analysis of the file access.log can be done with several tools.
  1. Original NLANR scripts (also contains a list of scripts like this)
    usage:
      access-extract.pl < access.log > summary
      access-extract-urls.pl < access.log >> summary
      access-summary.pl < summary > report.txt
    The file "report.txt" then contains all the relevant information. Here is a version that creates html output that also works with the netscape cache logs.
  2. Calamaris
    usage: calamaris.pl < access.log > stats.html
  3. squidclients
    usage: squidclients -H < access.log > clients.html
  4. squidtimes
    usage: squidtimes < access.log > times.html
  5. pwebstats
    usage: see webpage
  6. PY_Squid_Stats
    usage: see webpage of PY_Squid_Stats
  7. Many more not yet listed here. Mail them
Analysis
These may not give you the stats that you need, or they may give you too many stats, and take large amounts of time to work them out. I can only suggest that you change them slightly (this is the main advantage of free software :)

Not everyone expects the same thing from their proxy - the most common usage is to save network traffic, though some people use it as an access-control system. Some of the above utilities will give you only a portion of these.

You should run the analysis scripts on a different machine to your cache server, as squid seems to get very touchy if it's disk throughput is very slow. It's probably best to set up a script that copies the logs to another machine in the middle of the night (using something like scp -c none cache:/usr/local/squid/access.log.0 . Don't use rcp, since it has 0 security)

Note also that some of the utilities above give you information about 'the average time to complete a request'. This is actually a little misleading, since when squid sends data to a client it normally puts the data in a kernel buffer, the contents of which the kernel then handles as it transmits the data to the client. In most cases this buffer is larger than the actual object being sent, and if it comes from the disk cache it will seem to take a very short time to send, since once it's in a kernel buffer, squid has no idea how long it takes for the kernel to send the data. If you want to know how loaded your cache is, make a query from a completely unloaded machine to the cache for a page that is 'close' to the cache network wise (such as your local web server), and then do a query to the same server directly. Check the difference in latency then to see if the cache is slowing down the connections. Check the performance section for more details.


The Squid Users guide is copyright Oskar Pearson oskar@is.co.za This page is copyright Mark Visser and Oskar Pearson

If you like the layout (I do), I can only thank William Mee, and hope he forgives me for stealing it. This section was almost entirely contributed by Mark Visser (mark@cal026031.student.utwente.nl). Thanks to Mark!