11

I made a bash script for my website that anonymises the visitor IPs in the Awstats logs by replacing the last octet with 0. It can either process all logfiles except the one of the current month, or only the one of the previous month. The latter mode is how I put it in a cron job to be called on the first day of each month.

Everything worked flawlessly with test data, but on the server, some visitor IPs were not anonymised. I noticed that all of them were from the last day of the previous month. Looking at the time stamp of the logfile, it was indeed from the first of the current month, but not from 00:21 where my cron job runs - instead, it was modified around 14:30.

Then I realised that the Awstats engine seems to be configured to batch add the log entries once per day at 14:30 so that when my cron job ran, the visitor data from between 14:30 and 00:00 were not yet in the file!

Solution: batch process all previous logfiles once to clean them up, and schedule the cron job on the 2nd of each month at 00:21.

Comments
  • 1
    Why even capture stats for your site, which I assume is a static portfolio/blog?
  • 2
    @RememberMe That's correct, and it's because I want to have stats. Browsers, OS, popular files, 404s. I think the IPs are the basis for deriving country rankings.
Add Comment