pgBadger - logo
2012.12.18

Using pgbadger to monitor your PostgreSQL activity

Erwan Ben Souiden

It's a secret for no one, at Antelink we manage really large databases to index projects, repositories, versions, content, vulnerabilities and more... (we previously published some stats about our databases and Antepedia). And all of these databases are powered by PostgreSQL servers.

As you can imagine, one of the most important things is knowing how the databases are used. Indeed if we know what are the slowest queries, the
most frequent queries, etc.; we can optimize the databases (by adding an index, changing a query or modifying a schema/structure) and thus improve the performance! To do this, you have to analyze PostgreSQL activity logs. Analyzing can be extremely complicated, so you have to use a log analyzer to process your data. This kind of tool will help you to create reports, and monitor your database activity.

The rest of this blog post will represent how, and why I use log analyzers to enhance our performance.

At Antelink, I always do this job with pgFouine (which is written in PHP). It is pretty cool, but several weeks ago I discovered pgBadger, which is written in perl... and a lot cooler :-). pgFouine does the job, but I noted several limitations:

  • It is written in PHP... I have nothing against PHP but to use this language, you need to install some extra packages (and it's always a little boring). Moreover, PHP is less performant than other scripting language, such as perl, when processing big files. And if you have doubt, PostgreSQL logs could become easily very big
  • The latest version of pgfouine was released in February 2010, and it seems that it is no longer being maintained.
  • pgfouine does not have very many options, while pgBadger has many. pgBadger provides you the freedom to do whatever you want, for example: filter by request types, by users, by databases, add start date/time for data to be parsed in log (for zooming), and much more!
  • Finally, I find pgbadger's reports are better designed than pgfouine's
    reports! An added bonus is that pgbadger also gives you the option to generate graphs... and everything is better with graphs (like bluetooth)!

Now, I use pgBadger everytime I have a PostgreSQL instance!

Installing pgBadger is quite simple:

wget https://github.com/downloads/dalibo/pgbadger/pgbadger-2.2-1.tar.gz
tar -xvzf pgbadger-2.2-1.tar.gz
cd pgbadger-2.2-1/
perl Makefile.PL
make && sudo make install

pgBadger is now installed in /usr/bin/local with its MAN pages.

Now you have to configure the log settings of your PostgreSQL server. But be careful! When you enable logs, your server will write a lot of stuff, and it could decrease your server performance. Enable it only to obtain a snapshot of your database activity then disable it. This is why I work with 2 PostgreSQL configuration files, which are exactly the same. The only difference is that one gets the log settings enabled, and the other does not.

So the "ERROR REPORTING AND LOGGING" section of the PostgreSQL configuration file with log settings enabled looks like to this:

log_destination = 'stderr'
logging_collector = on
log_directory = 'pg_log'
log_filename = 'postgresql-%Y-%m-%d.log'
log_rotation_age = 0
client_min_messages = notice
log_min_messages = warning
log_min_error_statement = error
log_min_duration_statement = 0
log_checkpoints = on
log_connections = on
log_disconnections = on
log_duration = off
log_error_verbosity = verbose
log_hostname = on
log_line_prefix = '%t [%p]: [%l-1] db=%d,user=%u '
log_lock_waits = on
log_statement = 'none'
log_temp_files = 0

And here is the one with the log settings disabled:

log_destination = 'stderr'
logging_collector = on
log_directory = 'pg_log'
log_filename = 'postgresql-only_error.log'
log_rotation_age = 0
log_rotation_size = 10MB
client_min_messages = error
log_min_messages = fatal
log_min_error_statement = error
log_checkpoints = off
log_connections = off
log_disconnections = off
log_duration = off
log_error_verbosity = default
log_hostname = off
log_line_prefix = '%t [%p]: [%l-1] db=%d,user=%u '
log_statement = 'none'

Then to enable one configuration, I just create a symlink on the one I
want, and reload my PostgreSQL server as you can see here:

# ls /etc/postgresql/9.1/main/postgresql* -alh
lrwxrwxrwx 1 root     root      30 17 déc.  12:46 /etc/postgresql/9.1/main/postgresql.conf -> postgresql-LOG_DISABLE.conf
-rw-r--r-- 1 postgres postgres 19K 6 déc.   2011 /etc/postgresql/9.1/main/postgresql-LOG_ENABLE.conf
-rw-r--r-- 1 postgres postgres 19K 6 déc.   2011 /etc/postgresql/9.1/main/postgresql-LOG_DISABLE.conf
 
# /etc/init.d/postgresql reload
Reloading PostgreSQL 9.1 database server: main.

Note: the configuration file with log settings disabled, actually uses a log but only to save fatal or critical errors! It's obviously not enough to generate stats and reports.

If you enable logs, you can create pgBadger reports by launching a command like this one:

pgbadger -f stderr -s 10 -T "PGBadger - test.mydomain.net - 2012-12-14" -o /path/to/pgbadger-test.mydomain.net-2012-12-14.html /path/to/postgresql-2012-12-14.log

And just use your favorite browser to read the report (which is /path/to/pgbadger-test.mydomain.net-2012-12-14.html) ! That's all.

Reports look like this:

pgbadger - screenshots - summary report

pgbadger - screenshots - graph by query types

pgbadger - screenshots - query types pie chart

pgbadger - screenshots - slowest queries report

You can use a cron job to automatically generate pgbadger reports if logs have been created. Here is my /etc/cron.daily/pgbadger-daily_process:

 

#!/bin/bash
  
# automatically generate pgbadger report according to the yesterday's PostgreSQL log
 
PGBADGER_SCRIPT='/usr/bin/local/pgbadger'
PGBADGER_OPTIONS="-f stderr -s 10 -T PGBadger-`hostname`-`date --date='yesterday' +%Y%m%d` -q"
  
PGLOG_PATH='/path/to/pg_log'
PGLOG_FILE_NAME="postgresql-`date --date="yesterday" +%Y-%m-%d`.log"
  
PGBADGER_RESULT_FILE="/www/pgbadger-reports.yourdomain.net/pgbadger-`hostname`-`date --date="yesterday" +%Y-%m-%d`".html
PGBADGER_FULL_COMMAND="$PGBADGER_SCRIPT $PGBADGER_OPTIONS -o $PGBADGER_RESULT_FILE $PGLOG_PATH/$PGLOG_FILE_NAME"
  
#################################################################
echo "$0 - `date` - INFO - job starts"
if [ -e "$PGLOG_PATH/$PGLOG_FILE_NAME" ]
then
    echo "$0 - `date` - INFO - launching the creation of pgbadger report with $PGLOG_PATH/$PGLOG_FILE_NAME..."
    $PGBADGER_FULL_COMMAND
    if [ $? -eq 0 ]
    then
        echo "$0 - `date` - OK - success to create pgbadger report"
    else
        echo "$0 - `date` - ERROR - problem during the execution of: $PGBADGER_FULL_COMMAND"
        exit 1
    fi
else
    echo "$0 - `date` - INFO - log $PGLOG_PATH/$PGLOG_FILE_NAME not found"
fi
echo "$0 - `date` - INFO - job done"

With this cron job you will automatically create pgbadger reports according to yesterday's PostgreSQL log.

That's all! Now you know everything about your database activity, thanks to the very useful pgBadger.

Now it's time to analyze your reports, use "EXPLAIN" tool from PostgreSQL (basically on slowest queries) to understand how to optimize your databases and queries, but that's another story... :-)

Stay informed on our latest news!

Follow us