Recently, I came across a customer's need to load and manage OS log files like Apache logs into a Oracle Database repository in order to allow applications to query the data.
The first two things that came to my mind were Oracle Loader and Oracle External Tables. Since I had quite a huge back log of data that would have to be imported to start with, I decided to create a script that would help us to both automate the process of loading and managing the flow of such application logs.
What is needed to run this script
As far as where to place the script, you have the following options:
- Place the script on the Mid Tier Server where the Apache logs are located and make use of the Perl/Modules available on that Node.
- Place the script on a different server where you have Perl derived of some Oracle Home. If you choose this path, be aware that you will have to NFS share the location of the OS logs between the Apache server and the server where the script runs
Once you have defined from which Oracle Home you will use Perl, you need to set your session's environment variables
In My Case, I am using an Oracle Portal 11g Home as follows:
So, I go ahead and set the environment variables replacing <ORACLE_HOME> with the path above
Edit the script with your preferred editor and adjust the first line according to your environment. Replacing the <ORACLE_HOME> with the correct path
Save and close the file.
At this point, we may perform a few tests to find out if the Perl environment has been setup properly for the script to be run.
Creating a database Schema
Defining a TNS Net Alias
Running the Script for the First Time
$ export ORACLE_HOME=/u01/app/oracle/product/Middleware/as_portal
$ export PERL5LIB=$ORACLE_HOME/perl/lib/site_perl/5.10.0/i686-linux-thread-multi/:$ORACLE_HOME/perl/lib/5.10.0
$ export LD_LIBRARY_PATH=$ORACLE_HOME/lib
$ export PATH=$ORACLE_HOME/perl/bin:$PATH
The script creates interval partitioned tables and global indexes. make sure you adapt the script according to your Oracle licensing.
Using the Script
& = AND operator (Required)
<> = Optional
 = Required
log_loader.pl [-help] |
Options: -help : Displays usage help.
-createrepo : Instructs the script to create the repository tables on the repository
database specified with the -dbuser, -dbpassword and -tnsalias parameters.
YOU MUST CREATE A DATABASE USER THAT WILL BE USED TO STORE THE REPOSITORY
TABLES PRIOR TO CREATING THE REPOSITORY. THE DATABASE USER MUST HAVE PRIVILEGES
TO CREATE TABLES, INDEXES and DIRECTORIES. The name "LLO" is recommended as the
-dropfirst : This switch is to be used in conjunction with the -createrepo switch. It defaults
to NULL, which means that not passing this switch will instruct the script NOT
to drop the objects already created. This is usefull when you just want to add
new Tables or Objects into your repository database but you do not want any
existing objects which may contain data to be dropped.
Passing this switch will instruct the script to RECREATE all repository objects
Use this switch with CAUTION.
database schema user.
-type= : Specify the log type the script is supposed to read.
Possible options are: APACHE|SYSLOG. However, at the time of this writing the
only supported value for -type is APACHE.
-logdir= : Specify the directory where the logs to be processed are located. The script
performs checks to confirm that the directory is valid.
-logfiles= : You may enter a comma-separated list of log files names to be imported.
There is no need to specify the full path, only the log file names as the path
will derive from the -logdir switch.
All log files specified must reside on the same directory defined under the
-logdir= parameter. If you pass a value for the -logfiles parameter, the
-fileprefix= parameter is ignored.
-fileprefix= : Log File Name prefix so that the script will search and process only files that
begin with the specified string and are located on the directory passed through
the -logdir switch. This switch is ignored if you also pass a value for the
PLEASE NOTE THAT THIS SCRIPT WILL MAKE USE OF /sbin/fuser IN ORDER TO IDENTIFY
THE APACHE LOG THAT IS CURRENTLY BEING WRITTEN TO. APACHE LOGS CURRENTLY BEING
WRITTEN TO ARE NOT PROCESSED BY THIS SCRIPT.
-dbuser= : Database user schema used to connect to the repository database. It refers to the
schema user who owns the repository database objects created when -createrepo was
-dbpassword= : Password for the user passed in through the -dbuser parameter.
SECURITY WARNING: Yes, you will have to pass the password in plain text but
I am sure some clever guy will solve this issue. There are some Perl modules
that allow for masked or concealed input.
-tnsalias= : TNS alias defined in the tnsnames.ora located under the ORACLE_HOME that has been
set on your environment. Make sure the environment variable TNS_ADMIN is set
-defaulttbs= : Optionally, Inform the tablespace where the log history table will be placed.
If ignored, or passed a NULL value, it will attempt to use a tablespace named
-movetodir= : You may optionally define a directory directory to which each log file that has
been successfully imported into the repository database should be moved to. If
this parameter is specified, each file will be moved
to this directory once it has been successfully imported into the database.
IF THE REPOSITORY DATABASE IS LOCATED ON A DIFFERENT HOST THAN THE HOST WHERE
THE LOGS ARE BEING READ FROM, YOU MUST PRESENT THE SAME LOG LOCATION PATH TO
THE DATABASE SERVER VIA.
-compresstool=: Full path to the compression tool to be used to compress each log file as they
are moved to the directory defined in -movetodir. This parameter is ignored if
-movetodir parameter is not specified.
-about : Acknowledgments, Copyrights and general information about this script.
- The location for the log files is /shared/apache/logs.
- You want the script to read and import all files that start with the string access_log1.
- You want to import all data into a repository user schema you created previously. The connection details for such user schema are as follows
- Username: llo
- Password: llopassword
- TNS alias: llodb
- As well as importing the files, you want to make sure each file imported gets compressed using the gzip tool and once compressed they get moved to an archive directory located at /shared/apache/archive
Now, let's suppose you want to import only a few distinct APACHE log files instead of allowing the script to import all logs that match a supplied string.
In this case, you must make use of the -logfiles= switch in order to pass a comma-delimited list of log file names to be imported by the script.
In the following example, we will only be importing a few log files that reside on the directory defined by the -logdir switch:
-dbpassword=llopassword -tnsalias=llodb \
On the previous example, we make use of the -logfiles= switch to instruct the script to load only the logs passed in a comma-delimited list. Note that you must pass pathless names in either compressed or uncompressed format.
Passing the current APACHE log file in the comma-delimited list will cause the script to skip it as it makes use of the unix fuser command to check if the file is in use. The script will only process files that are not in use.
Import Only the Content that Matters to you
Scripts Error Codes and Messages
30 01,13 * * * "/backup/logs_apache/log_loader.sh" >/dev/null 2>&1
And the following is the content of the script:
perl /backup/logs_apache/log_loader.pl -type=APACHE \
-fileprefix=access_log -dbuser=llo -dbpassword=logloader -tnsalias=pdblog \
-movetodir=/backup/logs_apache/prod/autoloaded -compresstool=/bin/gzip \