NLANR Hierarchical Caching System Usage Statistics


We collect a lot of statistics on our caches. We generate daily text reports for each individual cache, and also one report for all caches combined. These tell us such things as which are the most popular clients? Which are the most popular servers? For each of these, we show the hit rates for ICP and HTTP by access count and by trafic volume. These also break down the day's requests by type of object and by Top Level Domain. The reports are generated from summary data files, and the summaries are generated from the original log files. We can not make the original log files available, however we do make sanitized log files available.

Our long term graphs show the trend in cache activity since the start of the project. For each cache we show the number of HTTP requests, the number of ICP requests, and the volume of HTTP requests served to clients. The ICP graphs stop at around 475 days because we had to stop logging ICP requests to keep the log files to a managable size. We also have one graph which shows the byte volume for all caches. Note how the most popular cache changes from time-to-time.

Every day we check the cache's vital statistics to make sure they are running efficiently. These plot important values such as the number of requests, disk usage, memory usage, CPU utilization, filedescriptor usage, and page faults. With the CGI script it is possible to examine any time period. Pages exist to show the vitals for the past day, week, and month.

We look at the hierarchy stats to understand how the cache mesh/hierarchy is behaving. For every cache request, we categorize it in one of four ways, either a:

In addition to plotting those categories, we also plot the percent of requests that experienced an ICP timeout. Ideally, the percent of timeouts will be very close to zero.

We are also interested in how the composition of web traffic might be changing over time. For example, is the trend to have more or fewer images? Are HTML files getting larger or smaller? In the long-term per-type graphs we plot the number and byte volume values for the different object types. They are also shown in normalized plots. The final graph shows the average object size for each type. Note that around October 1996 we changed the script which determines the object type (basically by adding more types). There are some interesting varitions in the data which we have not yet fully analyzed.

Our service times graph shows how the median request service time changes from day-to-day for each of the caches.

Similarly, our throughput graph shows the median HTTP request throughput from day-to-day for each of the caches.

Our Size-Distribution graph shows how the median object size varies over time. The graph shows some anomolous data. These are from mostly-idle caches which did not have a large numbers of requests to achieve a suitable population sample.

Our popularity graph is an initial attempt to quantify the nature of repeat-accesses to the caches. It is based on similar work done for HTTP servers: Characterizing Reference Locality in the WWW.

Our Netdb graph shows how the median measured RTT varies over time. Squid's utilizes ICMP to measure RTTs to origin servers and stores these in its `Network Probe Database.' We download the database and generate histograms of the RTT distributions to all of the origin servers. If the cache with a lower median RTT has better connectivity to the set of origin servers

We have a simple cachability experiment to see what the differences are between cachable and uncachable objects. This experiment might show us by what factor origin server access logs underestimate the true number of requests for their objects because some are served from caches.

We have some statistics-generating scripts for Squid and other software.


$Id: index.html,v 1.8 1998/10/28 17:33:57 wessels Exp $