O'Reilly Databases

oreilly.comSafari Books Online.Conferences.

We've expanded our coverage and improved our search! Search for all things Database across O'Reilly!

Search Search Tips

advertisement
AddThis Social Bookmark Button

Listen Print Discuss Subscribe to Databases Subscribe to Newsletters

Writing Apache's Logs to MySQL
Pages: 1, 2, 3

Querying the Database

Queries can run on the database immediately. The examples below are in straight SQL syntax. Some use the nested query syntax available only in MySQL 4.1 or above.

If you intend to write automated scripts in Perl or Python or whatever, it may be easier to run multiple SELECT statements or to compute derived values inside the scripting language itself. Whenever you request reports on web server performance, you'll need to specify the date range you want to cover.

Query: total bytes in and out

You want to find out how much network traffic your web server sends and receives. Keep in mind that this reflects traffic levels only at the application layer of the TCP/IP stack. It won't record the size of TCP or IP packets. That's a question SNMP polling tools can answer better.

select sum(bytesin), sum(bytesout) from blackbox \
	where datetime = curdate();

select sum(bytesin),sum(bytesout) from blackbox where datetime >= 
        date_sub(curdate(),interval 7 day);

The first example gives the total bytes in and bytes out for the given day. The second example gives the total for the past seven days.

Query: what percentage of hits goes to RSS syndication pages compared with HTML pages?

This example checks the number of hits against a specific RSS page against every possible hit against the server.

select (select count(url) from blackbox where url regexp '\.rss$')/
        (select count(url) from blackbox where url regexp '\.html$');

RSS hits are typically automated. Because of this, a lot of sites have a higher percentage of pure robot/agent traffic on their site than they may have had two years ago. By checking the RSS hits, you can determine whether the agent traffic is overwhelming your site.

Query: how often do users skip the Flash animation?

If you see a web site with one of those Flash animations on the main page, it usually has a "Skip" link at the bottom. If you're going to include one, throw in a meaningless query string at the end of the link, so you can determine which hits are redirects after the animation has finished and which ones come from people clicking on the link.

select (select count(url) from blackbox where url="/home.html?skipped"),
        (select count(url) from blackbox where url="/home.html"),

Verifying that the database is working

Here's a two-part script you can run to verify that the database really is logging hits from the servers. First, add this script to the cgi-bin directory of every web site that uses mod_log_mysql.

#!/usr/local/bin/perl

# returnid.pl

use strict;
use warnings;

print("X-ServerId: $ENV{'SERVERID'}\n");
print("X-UniqueId: $END{'UNIQUE_ID'}\n");
print "Content-type: text/plain\n\n";

print("Server id is $ENV{'SERVERID'}\n");
print("Unique id is $ENV{'UNIQUE_ID'}\n");

exit 0;

The server will return its serverid and unique ID for that particular hit. The next program will act as an HTTP client. It will retrieve this URL, then connect to the database and search the logs for the unique ID.

#!/usr/local/bin/perl

# checkid.pl

#
# Pragmas
#

use strict;
use warnings;

#
# Includes
#

use LWP::UserAgent;
use HTTP::Request;
use HTTP::Response;

use DBI;

#
# Global Variables
#

use vars qw/$Url $Agent $Db $ServerId $UniqueId /;

use constant {
  DSN         => "DBI:mysql:database=apachelog",
  DBUSER      => "logwriter",
  DBPASS      => "logpass",
  QUERYSTRING => 
   "select datetime,uniqueid from blackbox where uniqueid=? and serverid=?",
  DEFAULTURL  => "http://www.example.com:8080/cgi-bin/returnid.pl",
};

#
# Subroutines
#

sub getId
{
    my ($agent,$url) = @_;
    my $request      = HTTP::Request->new(GET => $url);
    my $response     = $agent->request($request);

    if ($response->is_success)
    {
        my $serverid=$response->header("X-ServerId");
        my $uniqueid=$response->header("X-UniqueId");
        print("Unique ID from server is $uniqueid\n");
        print("Server ID is $serverid\n");
        return ($serverid,$uniqueid);
    }

    return undef;
}

sub findId
{
    my ($db,$serverid,$uniqueid) = @_;

    my $query                    = $db->prepare(QUERYSTRING);

    $query->execute($uniqueid,$serverid);

    if (my ($dt,$uid)=$query->fetchrow_array())
    {
        print("Database recorded $uid at $dt\n");
    } else {
        print("Can't find a database record for unique-id $uniqueid\n");
    }

    return;
}

#
# Main Block
#

# Initialize the user agent
$Agent = LWP::UserAgent->new();

# Initialize the database client
$Db = DBI->connect(DSN,DBUSER,DBPASS);

# URL
$Url = shift || DEFAULTURL;

if (($ServerId,$UniqueId) = getId($Agent,$Url))
{
        findId($Db,$ServerId,$UniqueId);
} else {
        print("Unable to get data from webserver");
        exit 1;
}

If you run this program at a regular polling interval, it will warn you when the remote database is not responding or if the Blackbox table is not recording hits from the web servers.

Final Thoughts

If you've read the first article, you should already understand why you want to log your server performance data. The core concepts are still the same; I'm just introducing a few variations on improving the process.

The two new logging directives provide more flexibility with virtual hosting environments. It also allows having just one Blackbox log file for each running server.

If you want to take the really big step, consider the option of writing your logs straight to a database. The initial setup process may be complex, but after that there is a huge administrative benefit. It's an ideal solution for dealing with large server farms.

Chris Josephes works as a system administrator for Internet Broadcasting.


Return to the Apache DevCenter.


What other reports do you use?
You must be logged in to the O'Reilly Network to post a talkback.
Post Comment
Full Threads Oldest First

Showing messages 1 through 1 of 1.

  • Database logging modules very bad idea
    2005-02-21 22:14:31  Coyote-Moon [Reply | View]

    It is a very bad idea to use an Apache module that logs directly to the database. I know there are countless implementations, but most are toys that don't handle even the most trivial real world issues such as connection pooling and temporarily inaccessible databases. Screw it up and cycling your database (e.g., to apply a security upgrade) could cause your web server to hang. Oops.

    A far better design is to use "piped logs" that write logs messages to a separate process's stdin. This gives you clean separation between Apache and the database-enabled logger and there are ways to ensure the pipe is never blocked. (E.g., one thread copies stdin to an internal buffer, a second thread consumes that buffer to feed the database. This allows you to support bursts that far exceed your ability to write to the database.)

    Could these tricks be done within an Apache module? Probably, but you would have to get it working reliably within a far more complex environment. Why not simplify your life and use a separate process?

    P.S., it shouldn't be hard to normalize your data before writing it to the logs, it's not like much needs to be done. In my implementation the only thing I needed to handle is a separate IP table that has the canonical host name, RBL information, etc. It's trivial to support with a placeholder and separate thread every time a previously unknown IP address is seen.


Tagged Articles

Post to del.icio.us

This article has been tagged:

apache

Articles that share the tag apache:

Multiuser Subversion (38 tags)

Introducing LAMP Tuning Techniques (32 tags)

Apache Web-Serving with Mac OS X: Part 1 (26 tags)

Introducing mod_security (25 tags)

Location, Location, Location: Tips for Storing Web Site Files (22 tags)

View All

mysql

Articles that share the tag mysql:

MySQL FULLTEXT Searching (155 tags)

Live Backups of MySQL Using Replication (152 tags)

Advanced MySQL Replication Techniques (125 tags)

Ten MySQL Best Practices (59 tags)

Rolling with Ruby on Rails (56 tags)

View All

tips

Articles that share the tag tips:

Top Ten Digital Photography Tips (165 tags)

Top Ten Mac OS X Tips for Unix Geeks (163 tags)

Top Ten Data Crunching Tips and Tricks (30 tags)

Ten Essential Development Practices (26 tags)

View All

web

Articles that share the tag web:

What Is Web 2.0 (2258 tags)

Rolling with Ruby on Rails (686 tags)

Very Dynamic Web Interfaces (362 tags)

Ajax on Rails (183 tags)

A Simpler Ajax Path (136 tags)

View All

webdev

Articles that share the tag webdev:

Rolling with Ruby on Rails (351 tags)

Very Dynamic Web Interfaces (163 tags)

Understanding MVC in PHP (96 tags)

A Simpler Ajax Path (93 tags)

Ajax on Rails (61 tags)

View All

Sponsored Resources

  • Inside Lightroom

Related to this Article

Understanding Oracle Clinical Understanding Oracle Clinical
by Joan M. Johnson
May 2007
$9.99 USD

Inside SQLite Inside SQLite
by Sibsankar Haldar
April 2007
$9.99 USD

Advertisement
O'Reilly Media

©2009, O'Reilly Media, Inc.
(707) 827-7000 / (800) 998-9938
All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.
About O'Reilly
Academic Solutions
Authors
Contacts
Customer Service
Jobs
Newsletters
O'Reilly Labs
Press Room
Privacy Policy
RSS Feeds
Terms of Service
User Groups
Writing for O'Reilly
Content Archive
Business Technology
Computer Technology
Google
Microsoft
Mobile
Network
Operating System
Digital Photography
Programming
Software
Web
Web Design
More O'Reilly Sites
O'Reilly Radar
Ignite
Tools of Change for Publishing
Digital Media
Inside iPhone
O'Reilly FYI
makezine.com
craftzine.com
hackszine.com
perl.com
xml.com

Partner Sites
InsideRIA
java.net
O'Reilly Insights on Forbes.com