ライン

ポイント:欲しい機能がほぼ用意されたアクセスログ解析ツール

ライン

 はじめに

japanese/webalizer

 ウェブへのアクセスログ解析を行うツールで最も有名で使われているものの1つです。

 FreeBSD 9.1での例です。

 導入と設定

導入

  導入を開始します。

# cd /usr/ports/japanese/webalizer
# make config
# make install clean

欲しい物をチェックして導入を終えます。

# cd /usr/local/etc 
# cp ja-webalizer.conf-dist ja-webalizer.conf

として、設定ファイルを準備します。
 実際には、バーチャルドメインごとに設定ファイルを準備して処理を実行しています。

設定

 上記で作成した、設定ファイルを修正して保存します。

#
# Sample Webalizer configuration file
# Copyright 1997-2000 by Bradford L. Barrett (brad@mrunix.net)
#
# Distributed under the GNU General Public License.  See the
# files "Copyright" and "COPYING" provided with the webalizer
# distribution for additional information.
#
# This is a sample configuration file for the Webalizer (ver 2.01)
# Lines starting with pound signs '#' are comment lines and are
# ignored.  Blank lines are skipped as well.  Other lines are considered
# as configuration lines, and have the form "ConfigOption  Value" where
# ConfigOption is a valid configuration keyword, and Value is the value
# to assign that configuration option.  Invalid keyword/values are
# ignored, with appropriate warnings being displayed.  There must be
# at least one space or tab between the keyword and its value.
#
# As of version 0.98, The Webalizer will look for a 'default' configuration
# file named "webalizer.conf" in the current directory, and if not found
# there, will look for "/etc/webalizer.conf".


# LogFile defines the web server log file to use.  If not specified
# here or on on the command line, input will default to STDIN.  If
# the log filename ends in '.gz' (ie: a gzip compressed file), it will
# be decompressed on the fly as it is being read.

#LogFileの絶対パスで記述。
#ここには http-access.log.gz のような形式でも可能。
#コマンド入力時にログファイルを指定するのであれば、ここで指定しなくとも可。

#LogFile        /var/lib/httpd/logs/access_log
LogFile         /home/foo/log/http-access-output.log

# LogType defines the log type being processed.  Normally, the Webalizer
# expects a CLF or Combined web server log as input.  Using this option,
# you can process ftp logs as well (xferlog as produced by wu-ftp and
# others), or Squid native logs.  Values can be 'clf', 'ftp' or 'squid',
# with 'clf' the default.

#ログ形式を指定。apacheならCLFで良い。
#デフォルトであり、未指定のままで可。
#-Fで指定可能
#LogType        clf

# OutputDir is where you want to put the output files.  This should
# should be a full path name, however relative ones might work as well.
# If no output directory is specified, the current directory will be used.

#出力されるディレクトリ先を指定。
#指定しなければ実行したディレクトリにファイルは作成される。
#-oで指定可能
#OutputDir      /var/lib/httpd/htdocs/usage
OutputDir       /home/foo/log/

# HistoryName allows you to specify the name of the history file produced
# by the Webalizer.  The history file keeps the data for up to 12 months
# worth of logs, used for generating the main HTML page (index.html).
# The default is a file named "webalizer.hist", stored in the specified
# output directory.  If you specify just the filename (without a path),
# it will be kept in the specified output directory.  Otherwise, the path
# is relative to the output directory, unless absolute (leading /).

#12ヶ月分の集計データが書かれるファイル。OutputDirにできる。
#指定しなければ実行したディレクトリにファイルは作成される。
#HistoryName    webalizer.hist
HistoryName     /home/foo/log/webalizer.hist

# Incremental processing allows multiple partial log files to be used
# instead of one huge one.  Useful for large sites that have to rotate
# their log files more than once a month.  The Webalizer will save its
# internal state before exiting, and restore it the next time run, in
# order to continue processing where it left off.  This mode also causes
# The Webalizer to scan for and ignore duplicate records (records already
# processed by a previous run).  See the README file for additional
# information.  The value may be 'yes' or 'no', with a default of 'no'.
# The file 'webalizer.current' is used to store the current state data,
# and is located in the output directory of the program (unless changed
# with the IncrementalName option below).  Please read at least the section
# on Incremental processing in the README file before you enable this option.

#ログが何かのきっかけで分割するようなケースを扱う場合には yes を指定。
#-p指定で可能
#Incremental    no
Incremental     yes

# IncrementalName allows you to specify the filename for saving the
# incremental data in.  It is similar to the HistoryName option where the
# name is relative to the specified output directory, unless an absolute
# filename is specified.  The default is a file named "webalizer.current"
# kept in the normal output directory.  If you don't specify "Incremental"
# as 'yes' then this option has no meaning.

#IncrementalName        webalizer.current
IncrementalName         /home/foo/log/webalizer.current

# ReportTitle is the text to display as the title.  The hostname
# (unless blank) is appended to the end of this string (seperated with
# a space) to generate the final full title string.
# Default is (for english) "Usage Statistics for".

#「利用統計」となっている部分を他の言葉に置き換えたいときには指定
#コメント(#)をはずすと日本語の利用統計という文字は出なくなる
#ReportTitle    Usage Statistics for

# HostName defines the hostname for the report.  This is used in
# the title, and is prepended to the URL table items.  This allows
# clicking on URL's in the report to go to the proper location in
# the event you are running the report on a 'virtual' web server,
# or for a server different than the one the report resides on.
# If not specified here, or on the command line, webalizer will
# try to get the hostname via a uname system call.  If that fails,
# it will default to "localhost".

#タイトルで使用される
#ホスト名を記述 www.example.ne.jp のように記述すれば良い。
#-n で直接指定も可能
#HostName       localhost
HostName        example.ne.jp

# HTMLExtension allows you to specify the filename extension to use
# for generated HTML pages.  Normally, this defaults to "html", but
# can be changed for sites who need it (like for PHP embeded pages).

#出力させるHTMLの拡張子を決定。
#-x で指定可能
#HTMLExtension  html

# PageType lets you tell the Webalizer what types of URL's you
# consider a 'page'.  Most people consider html and cgi documents
# as pages, while not images and audio files.  If no types are
# specified, defaults will be used ('htm*', 'cgi' and HTMLExtension
# if different for web logs, 'txt' for ftp logs).

#-P で指定可能
PageType        htm*
PageType        cgi
PageType        shtml
#PageType       phtml
#PageType       php3
#PageType       php
#PageType       pl
PageType        rb

# UseHTTPS should be used if the analysis is being run on a
# secure server, and links to urls should use 'https://' instead
# of the default 'http://'.  If you need this, set it to 'yes'.
# Default is 'no'.  This only changes the behaviour of the 'Top
# URL's' table.

#webalizerの表示をhttps://として解釈する場合にはyesを指定
#UseHTTPS       no

# DNSCache specifies the DNS cache filename to use for reverse DNS lookups.
# This file must be specified if you wish to perform name lookups on any IP
# addresses found in the log file.  If an absolute path is not given as
# part of the filename (ie: starts with a leading '/'), then the name is
# relative to the default output directory.  See the DNS.README file for
# additional information.

# -D で直接指定可能
#DNSCache       dns_cache.db
DNSCache        /home/foo/log/dns_cache.db

# DNSChildren allows you to specify how many "children" processes are
# run to perform DNS lookups to create or update the DNS cache file.
# If a number is specified, the DNS cache file will be created/updated
# each time the Webalizer is run, immediately prior to normal processing,
# by running the specified number of "children" processes to perform
# DNS lookups.  If used, the DNS cache filename MUST be specified as
# well.  The default value is zero (0), which disables DNS cache file
# creation/updates at run time.  The number of children processes to
# run may be anywhere from 1 to 100, however a large number may effect
# normal system operations.  Reasonable values should be between 5 and
# 20.  See the DNS.README file for additional information.

# -N で直接指定も可能
#DNSChildren    0
DNSChildren     30

# HTMLPre defines HTML code to insert at the very beginning of the
# file.  Default is the DOCTYPE line shown below.  Max line length
# is 80 characters, so use multiple HTMLPre lines if you need more.

#HTMLPre <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">

# HTMLHead defines HTML code to insert within the <HEAD></HEAD>
# block, immediately after the <TITLE> line.  Maximum line length
# is 80 characters, so use multiple lines if needed.

#<HEAD>〜</HEAD>に記述したいものを記載 1行は80バイトまでなれど、
#複数行の指定は可能。
#HTMLHead <META NAME="author" CONTENT="The Webalizer">
HTMLHead <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=x-euc-jp">

# HTMLBody defined the HTML code to be inserted, starting with the
# <BODY> tag.  If not specified, the default is shown below.  If
# used, you MUST include your own <BODY> tag as the first line.
# Maximum line length is 80 char, use multiple lines if needed.

#<BODY>の記述を書くことが可能。続きを書けば</BODY>に書くようなHTMLも記述可能
#これを書くともっとも最初の行から書ける
HTMLBody <BODY BGCOLOR="#E8E8E8" TEXT="#000000" LINK="#0000FF" VLINK="#FF0000">

# HTMLPost defines the HTML code to insert immediately before the
# first <HR> on the document, which is just after the title and
# "summary period"-"Generated on:" lines.  If anything, this should
# be used to clean up in case an image was inserted with HTMLBody.
# As with HTMLHead, you can define as many of these as you want and
# they will be inserted in the output stream in order of apperance.
# Max string size is 80 characters.  Use multiple lines if you need to.

HTMLPost        <BR CLEAR="all">

# HTMLTail defines the HTML code to insert at the bottom of each
# HTML document, usually to include a link back to your home
# page or insert a small graphic.  It is inserted as a table
# data element (ie: <TD> your code here </TD>) and is right
# alligned with the page.  Max string size is 80 characters.

#HTMLTail <IMG SRC="msfree.png" ALT="100% Micro$oft free!">

# HTMLEnd defines the HTML code to add at the very end of the
# generated files.  It defaults to what is shown below.  If
# used, you MUST specify the </BODY> and </HTML> closing tags
# as the last lines.  Max string length is 80 characters.

#HTMLEnd </BODY></HTML>

# The Quiet option suppresses output messages... Useful when run
# as a cron job to prevent bogus e-mails.  Values can be either
# "yes" or "no".  Default is "no".  Note: this does not suppress
# warnings and errors (which are printed to stderr).

#Webalizerの出力時メッセージ。デフォルトはno。
#-qで指定可能
#Quiet          no

# ReallyQuiet will supress all messages including errors and
# warnings.  Values can be 'yes' or 'no' with 'no' being the
# default.  If 'yes' is used here, it cannot be overriden from
# the command line, so use with caution.  A value of 'no' has
# no effect.

#エラーや警告を含めたエラーメッセージの出力。デフォルトはno。
#-Qで指定可能
#ReallyQuiet    no

# TimeMe allows you to force the display of timing information
# at the end of processing.  A value of 'yes' will force the
# timing information to be displayed.  A value of 'no' has no
# effect.

#処理終了後にタイミング情報を表示することができます。
#デフォルトはno
#TimeMe         no

# GMTTime allows reports to show GMT (UTC) time instead of local
# time.  Default is to display the time the report was generated
# in the timezone of the local machine, such as EDT or PST.  This
# keyword allows you to have times displayed in UTC instead.  Use
# only if you really have a good reason, since it will probably
# screw up the reporting periods by however many hours your local
# time zone is off of GMT.

#グリニッジ標準時間で出力する場合にはyesに。
#ローカル時間のままならnoのまま(デフォルト)
#GMTTime                no

# Debug prints additional information for error messages.  This
# will cause webalizer to dump bad records/fields instead of just
# telling you it found a bad one.   As usual, the value can be
# either "yes" or "no".  The default is "no".  It shouldn't be
# needed unless you start getting a lot of Warning or Error
# messages and want to see why.  (Note: warning and error messages
# are printed to stderr, not stdout like normal messages).

#デバックメッセージを付加したい場合にはyes。
#Debug          no

# FoldSeqErr forces the Webalizer to ignore sequence errors.
# This is useful for Netscape and other web servers that cache
# the writing of log records and do not guarentee that they
# will be in chronological order.  The use of the FoldSeqErr
# option will cause out of sequence log records to be treated
# as if they had the same time stamp as the last valid record.
# Default is to ignore out of sequence log records.

#シーケンスエラーを無視する場合にはno。
#-fで指定可能
#FoldSeqErr     no

# VisitTimeout allows you to set the default timeout for a visit
# (sometimes called a 'session').  The default is 30 minutes,
# which should be fine for most sites.
# Visits are determined by looking at the time of the current
# request, and the time of the last request from the site.  If
# the time difference is greater than the VisitTimeout value, it
# is considered a new visit, and visit totals are incremented.
# Value is the number of seconds to timeout (default=1800=30min)

#30分(1800秒)の間にアクセスがあった場合には継続した訪問として
#扱うというような設定。これ以上の間隔で訪問した場合には新規訪問
#として扱われる
#-mで指定可能
#VisitTimeout   1800

# IgnoreHist shouldn't be used in a config file, but it is here
# just because it might be usefull in certain situations.  If the
# history file is ignored, the main "index.html" file will only
# report on the current log files contents.  Usefull only when you
# want to reproduce the reports from scratch.  USE WITH CAUTION!
# Valid values are "yes" or "no".  Default is "no".

#hintファイルを使わないで新規で作成するような場合にはyesを指定。
#使い方には要注意
#-i で指定可能
#IgnoreHist     no

# Country Graph allows the usage by country graph to be disabled.
# Values can be 'yes' or 'no', default is 'yes'.

#国別表示を行う
#-Yで指定可能
#CountryGraph   yes

# DailyGraph and DailyStats allows the daily statistics graph
# and statistics table to be disabled (not displayed).  Values
# may be "yes" or "no". Default is "yes".

#日別グラフを行う
#日別状態表示を行う
#DailyGraph     yes
#DailyStats     yes

# HourlyGraph and HourlyStats allows the hourly statistics graph
# and statistics table to be disabled (not displayed).  Values
# may be "yes" or "no". Default is "yes".

#時間別
#-Gで指定可能
#-Hで指定可能
#HourlyGraph    yes
#HourlyStats    yes

# GraphLegend allows the color coded legends to be turned on or off
# in the graphs.  The default is for them to be displayed.  This only
# toggles the color coded legends, the other legends are not changed.
# If you think they are hideous and ugly, say 'no' here :)

#-Lで指定可能
#GraphLegend    yes

# GraphLines allows you to have index lines drawn behind the graphs.
# I personally am not crazy about them, but a lot of people requested
# them and they weren't a big deal to add.  The number represents the
# number of lines you want displayed.  Default is 2, you can disable
# the lines by using a value of zero ('0').  [max is 20]
# Note, due to rounding errors, some values don't work quite right.
# The lower the better, with 1,2,3,4,6 and 10 producing nice results.

#各グラフの中にガイドとして入れられている横線を何本入れるかを指定。
#2-6本がお勧め。
GraphLines      6

# The "Top" options below define the number of entries for each table.
# Defaults are Sites=30, URL's=30, Referrers=30 and Agents=15, and
# Countries=30. TopKSites and TopKURLs (by KByte tables) both default
# to 10, as do the top entry/exit tables (TopEntry/TopExit).  The top
# search strings and usernames default to 20.  Tables may be disabled
# by using zero (0) for the value.

#-Sで指定可能
TopSites        20
TopKSites       20
#-Uで指定可能
#TopURLs         30
#TopKURLs        10
#-Rで指定可能
TopReferrers    5
#-Aで指定可能
#TopAgents       15
#-Cで指定可能
TopCountries    10
#-eで指定可能
TopEntry        15
#-Eで指定可能
TopExit         15
TopSearch       50
#TopUsers        20

# The All* keywords allow the display of all URL's, Sites, Referrers
# User Agents, Search Strings and Usernames.  If enabled, a seperate
# HTML page will be created, and a link will be added to the bottom
# of the appropriate "Top" table.  There are a couple of conditions
# for this to occur..  First, there must be more items than will fit
# in the "Top" table (otherwise it would just be duplicating what is
# already displayed).  Second, the listing will only show those items
# that are normally visable, which means it will not show any hidden
# items.  Grouped entries will be listed first, followed by individual
# items.  The value for these keywords can be either 'yes' or 'no',
# with the default being 'no'.  Please be aware that these pages can
# be quite large in size, particularly the sites page,  and seperate
# pages are generated for each month, which can consume quite a lot
# of disk space depending on the traffic to your site.

#View All Search Stringsというリンクを各グラフの直後にリンクとして
#作成し、別ページで詳細を眺めることができるようにできる。
#デフォルトは作成しない。yesにすればつく
AllSites        yes
AllURLs         yes
AllReferrers    yes
AllAgents       yes
AllSearchStr    yes
#AllUsers       no

# The Webalizer normally strips the string 'index.' off the end of
# URL's in order to consolidate URL totals.  For example, the URL
# /somedir/index.html is turned into /somedir/ which is really the
# same URL.  This option allows you to specify additional strings
# to treat in the same way.  You don't need to specify 'index.' as
# it is always scanned for by The Webalizer, this option is just to
# specify _additional_ strings if needed.  If you don't need any,
# don't specify any as each string will be scanned for in EVERY
# log record... A bunch of them will degrade performance.  Also,
# the string is scanned for anywhere in the URL, so a string of
# 'home' would turn the URL /somedir/homepages/brad/home.html into
# just /somedir/ which is probably not what was intended.

#-Iで指定可能
#IndexAlias     home.htm
#IndexAlias     homepage.htm

# The Hide*, Group* and Ignore* and Include* keywords allow you to
# change the way Sites, URL's, Referrers, User Agents and Usernames
# are manipulated.  The Ignore* keywords will cause The Webalizer to
# completely ignore records as if they didn't exist (and thus not
# counted in the main site totals).  The Hide* keywords will prevent
# things from being displayed in the 'Top' tables, but will still be
# counted in the main totals.  The Group* keywords allow grouping
# similar objects as if they were one.  Grouped records are displayed
# in the 'Top' tables and can optionally be displayed in BOLD and/or
# shaded. Groups cannot be hidden, and are not counted in the main
# totals. The Group* options do not, by default, hide all the items
# that it matches.  If you want to hide the records that match (so just
# the grouping record is displayed), follow with an identical Hide*
# keyword with the same value.  (see example below)  In addition,
# Group* keywords may have an optional label which will be displayed
# instead of the keywords value.  The label should be seperated from
# the value by at least one 'white-space' character, such as a space
# or tab.
#
# The value can have either a leading or trailing '*' wildcard
# character.  If no wildcard is found, a match can occur anywhere
# in the string. Given a string "www.yourmama.com", the values "your",
# "*mama.com" and "www.your*" will all match.

#-sで指定可能
# Your own site should be hidden
#HideSite       *mrunix.net
#HideSite       localhost

#-rで指定可能
# Your own site gives most referrals
#HideReferrer   mrunix.net/
HideReferrer    example.ne.jp

# This one hides non-referrers ("-" Direct requests)
#HideReferrer   Direct Request

#-uで指定可能
# Usually you want to hide these
HideURL         *.gif
HideURL         *.GIF
HideURL         *.jpg
HideURL         *.JPG
HideURL         *.png
HideURL         *.PNG
HideURL         *.ra
HideURL         *Count.cgi
HideURL         *.css
HideURL         *.CSS
HideURL         *.ico

#-aで指定可能
# Hiding agents is kind of futile
#HideAgent      RealPlayer

#ウェブでの認証ユーザを記録させたくない場合には、ここにアカウントIDを記述しておく
# You can also hide based on authenticated username
#HideUser       root
#HideUser       admin

# Grouping options
#GroupURL       /cgi-bin/*      CGI Scripts
#GroupURL       /images/*       Images

#GroupSite      *.aol.com
#GroupSite      *.compuserve.com

#GroupReferrer  yahoo.com/      Yahoo!
#GroupReferrer  excite.com/     Excite
#GroupReferrer  infoseek.com/   InfoSeek
#GroupReferrer  webcrawler.com/ WebCrawler
#GroupReferrer  yahoo.co.jp/    Yahoo!Japan
#GroupReferrer  google.co.jp/   GoogleJapan
#GroupReferrer  infoseek.co.jp/ InfoSeekJapan
#GroupReferrer  goo.ne.jp/      Goo
#GroupReferrer  msn.co.jp/      MSNJapan

GroupReferrer   yahoo.co.jp/    Yahoo!
HideReferrer    yahoo.co.jp/
GroupReferrer   google          Google
HideReferrer    google
GroupReferrer   goo.ne.jp       goo
HideReferrer    goo.ne.jp
GroupReferrer   search.msn      MSN
HideReferrer    search.msn
GroupReferrer   lycos.co.jp     Lycos Japan
HideReferrer    lycos.co.jp
GroupReferrer   infoseek.co.jp  infoseek.co.jp
HideReferrer    infoseek.co.jp
GroupReferrer   example.ne.jp   example.ne.jp


#GroupUser      root            Admin users
#GroupUser      admin           Admin users
#GroupUser      wheel           Admin users

# The following is a great way to get an overall total
# for browsers, and not display all the detail records.
# (You should use MangleAgent to refine further...)

#GroupAgent     MSIE            Micro$oft Internet Exploder
#HideAgent      MSIE
#GroupAgent     Mozilla         Netscape
#HideAgent      Mozilla
#GroupAgent     Lynx*           Lynx
#HideAgent      Lynx*
GroupAgent      Opera/8         Browser: Opera v8.xx
HideAgent       Opera/8
GroupAgent      Opera/          Browser: Opera/ (unknown version)
HideAgent       Opera/
GroupAgent      Opera           Browser: Opera (unknown version)
HideAgent       Opera
GroupAgent      Firefox/1.0.4   Browser: Firefox 1.0.4
HideAgent       Firefox/1.0.4
GroupAgent      Firefox/        Browser: UNKNOWN Firefox (unknown version, not hidden)

# HideAllSites allows forcing individual sites to be hidden in the
# report.  This is particularly useful when used in conjunction
# with the "GroupDomain" feature, but could be useful in other
# situations as well, such as when you only want to display grouped
# sites (with the GroupSite keywords...).  The value for this
# keyword can be either 'yes' or 'no', with 'no' the default,
# allowing individual sites to be displayed.

#-Xで指定可能
#HideAllSites   no

# The GroupDomains keyword allows you to group individual hostnames
# into their respective domains.  The value specifies the level of
# grouping to perform, and can be thought of as 'the number of dots'
# that will be displayed.  For example, if a visiting host is named
# cust1.tnt.mia.uu.net, a domain grouping of 1 will result in just
# "uu.net" being displayed, while a 2 will result in "mia.uu.net".
# The default value of zero disable this feature.  Domains will only
# be grouped if they do not match any existing "GroupSite" records,
# which allows overriding this feature with your own if desired.

#-gで指定可能
#ピリオドの数でドメインを切り分けするexample.ne.jpは2つ
GroupDomains    2

# The GroupShading allows grouped rows to be shaded in the report.
# Useful if you have lots of groups and individual records that
# intermingle in the report, and you want to diferentiate the group
# records a little more.  Value can be 'yes' or 'no', with 'yes'
# being the default.

#GroupShading   yes

# GroupHighlight allows the group record to be displayed in BOLD.
# Can be either 'yes' or 'no' with the default 'yes'.

#GroupHighlight yes

# The Ignore* keywords allow you to completely ignore log records based
# on hostname, URL, user agent, referrer or username.  I hessitated in
# adding these, since the Webalizer was designed to generate _accurate_
# statistics about a web servers performance.  By choosing to ignore
# records, the accuracy of reports become skewed, negating why I wrote
# this program in the first place.  However, due to popular demand, here
# they are.  Use the same as the Hide* keywords, where the value can have
# a leading or trailing wildcard '*'.  Use at your own risk ;)

#IgnoreSite     bad.site.net
#IgnoreURL      /test*
IgnoreURL       /log*
#IgnoreReferrer file:/*
#IgnoreAgent    RealPlayer
#IgnoreUser     root

# The Include* keywords allow you to force the inclusion of log records
# based on hostname, URL, user agent, referrer or username.  They take
# precidence over the Ignore* keywords.  Note: Using Ignore/Include
# combinations to selectivly process parts of a web site is _extremely
# inefficent_!!! Avoid doing so if possible (ie: grep the records to a
# seperate file if you really want that kind of report).

# Example: Only show stats on Joe User's pages...
#IgnoreURL      *
#IncludeURL     ~joeuser*

# Or based on an authenticated username
#IgnoreUser     *
#IncludeUser    someuser

# The MangleAgents allows you to specify how much, if any, The Webalizer
# should mangle user agent names.  This allows several levels of detail
# to be produced when reporting user agent statistics.  There are six
# levels that can be specified, which define different levels of detail
# supression.  Level 5 shows only the browser name (MSIE or Mozilla)
# and the major version number.  Level 4 adds the minor version number
# (single decimal place).  Level 3 displays the minor version to two
# decimal places.  Level 2 will add any sub-level designation (such
# as Mozilla/3.01Gold or MSIE 3.0b).  Level 1 will attempt to also add
# the system type if it is specified.  The default Level 0 displays the
# full user agent field without modification and produces the greatest
# amount of detail.  User agent names that can't be mangled will be
# left unmodified.

#ユーザエージェントの名称で6つのレベルを設定することができます
#0(デフォルト:そのままを表示)        1 レベル2にシステムタイプを加えたもの
#2 ブラウザ名と(X.xx) a/b/c           3 ブラウザ名とメジャー.マイナー(X.xx)
#4 ブラウザ名とメジャー.マイナー(X.x) 5 ブラウザ名とメジャーバージョン(X)
#-Mで指定可能
MangleAgents    3

# The SearchEngine keywords allow specification of search engines and
# their query strings on the URL.  These are used to locate and report
# what search strings are used to find your site.  The first word is
# a substring to match in the referrer field that identifies the search
# engine, and the second is the URL variable used by that search engine
# to define it's search terms.

SearchEngine    yahoo.com       p=
SearchEngine    altavista.com   q=
SearchEngine    google.com      q=
SearchEngine    eureka.com      q=
SearchEngine    lycos.com       query=
SearchEngine    hotbot.com      MT=
SearchEngine    msn.com         MT=
SearchEngine    infoseek.com    qt=
SearchEngine    webcrawler      searchText=
SearchEngine    excite          search=
SearchEngine    netscape.com    search=
SearchEngine    mamma.com       query=
SearchEngine    alltheweb.com   query=
SearchEngine    northernlight.com  qr=

SearchEngine    yahoo.co.jp     p=
SearchEngine    google.co.jp    q=
SearchEngine    infoseek.co.jp  qt=
SearchEngine    msn.co.jp       q=
# ocn
SearchEngine    goo.ne.jp       MT=
SearchEngine    biglobe.ne.jp   q=
SearchEngine    nifty.com       Text=
# so-net odn
SearchEngine    excite.co.jp    search=
SearchEngine    livedoor.com    q=
SearchEngine    jp.aol.com      query=
#SearchEngine   .google.        q=
#SearchEngine   bulkfeeds.net   q=

# The Dump* keywords allow the dumping of Sites, URL's, Referrers
# User Agents, Usernames and Search strings to seperate tab delimited
# text files, suitable for import into most database or spreadsheet
# programs.

# DumpPath specifies the path to dump the files.  If not specified,
# it will default to the current output directory.  Do not use a
# trailing slash ('/').

#DumpPath       /var/lib/httpd/logs

# The DumpHeader keyword specifies if a header record should be
# written to the file.  A header record is the first record of the
# file, and contains the labels for each field written.  Normally,
# files that are intended to be imported into a database system
# will not need a header record, while spreadsheets usually do.
# Value can be either 'yes' or 'no', with 'no' being the default.

#DumpHeader     no

# DumpExtension allow you to specify the dump filename extension
# to use.  The default is "tab", but some programs are pickey about
# the filenames they use, so you may change it here (for example,
# some people may prefer to use "csv").

#DumpExtension  tab

# These control the dumping of each individual table.  The value
# can be either 'yes' or 'no'.. the default is 'no'.

#DumpSites      no
#DumpURLs       no
#DumpReferrers  no
#DumpAgents     no
#DumpUsers      no
#DumpSearchStr  no

# If you compiled Webalizer with GeoIP library, it becomes enabled
# by default. But if you wish to disable it, just set GeoIP to 'no'.
# You may also want to specify database file path manually, if you
# don't have one installed on system (in case of static build).

#GeoIP          yes
#GeoIPDatabase  /usr/local/share/GeoIP/GeoIP.dat

# End of configuration file...  Have a nice day!

 定時にcronで処理を実行するように設定を行いました。
これなら、朝5時の時点で更新したページを自動生成させることができるでしょう。
 /etc/crontab に記述する方法の例を記述します。

# Webalizer
0       5       *       *       *       root    /bin/sh /usr/local/etc/webalizer0.sh > /dev/null 2>&1 /dev/null

 また、webalizer0.shには以下の例のように記述しておきます。

#!/bin/sh

echo '*** step1 ***'
cd /home/foo/example.ne.jp/log
/usr/bin/perl /home/foo/log/webalizer-cnv.pl /home/foo/log/httpd-acess.log > /home/foo/log/httpd-access-output.log
/usr/local/bin/ja-webalizer -c /usr/local/etc/ja-webalizer.example.ne.jp.conf -Q

 webalizer-cnv.pl は、「アクセスログ解析ソフトwebalizer日本語化」を参考に作成させていただきました。
そのままの指定でも良いのですが、複数の文字化けを回避したいために、私はドメインにより指定をしたりしていなかったりで変えています。

 webalizer-cnv.pl には、Jcode.pl を利用していますので、これを導入する必要があります。
/usr/ports/japanese/p5-Jcode と /usr/ports/japanese/jcode.pl を導入しておきます。

# cd /usr/ports/japanese/p5-Jcode
# make clean install clean
# cd /usr/ports/japanese/jcode.pl
# make clean install clean
# rehash

 これで、前提がクリアです。
webalizer-cnv.pl は、以下のように修正した形で利用しています。

#!/usr/bin/perl
use lib ('/usr/local/lib/perl5/site_perl/5.8.9/mach');
use Jcode;

open (IN ,$ARGV[0]);
while (<IN>){
        $_ =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
        Jcode::convert(\$_,euc);
        print $_;
}
close (IN);
exit;

 生成されるページは、特定の人が見る程度のアクセス制限でいいでしょう。
私は、.htaccess で制限を入れることにしています。

        AuthName        "ID/Password ?"
        AuthType        Basic
        AuthUserFile    /home/foo/auth/example.ne.jp.htpasswd
        Require         valid-user*

 パスワードの指定方法などはあえてここでは記述していませんが、htaccessで生成しておくようにしましょう。

【改訂履歴】作成:2009/03/08 追記 2013/04/06
【参考リンク】
Home of The Webalizer … Webalizerのオフィシャルページ

Copyright © 1996,1997-2006,2007- by F.Kimura,