Translate

Tuesday 7 January 2014

LOG_LOADER - Manage and Load your OS Logs into an Oracle Database



Recently, I came across a customer's need to load and manage OS log files like Apache logs into a Oracle Database repository in order to allow applications to query the data.
The first two things that came to my mind were Oracle Loader and Oracle External Tables. Since I had quite a huge back log of data that would have to be imported to start with, I decided to create a script that would help us to both automate the process of loading and managing the flow of such application logs.


Installation


The script, as it is, can be downloaded from here. A new window will be opened and the script code will be displayed. You may either copy and paste it or right-click and select Save as.

The script has been designed and tested on both RHEL and OEL 5.8 but it is believed that it will run on any version above 5.0.

The Perl used by the script is the Perl that is installed with Oracle Software like Oracle Database or components of Oracle Fusion Middleware, in this case, Oracle Portal.

I have tested the script as it using Perl and Modules that were already installed along with Oracle Portal 11g deployed onto Oracle Weblogic Server on the Mid Tier server and imported over 1 billion rows worth of APACHE logs into an Oracle 12c pluggable database.


What is needed to run this script


One of my objectives when creating the script was to minimize the need of downloading both Perl and any extra modules. I came to realize that all my Perl related needs, were already fulfilled as part of any Oracle product installation. All that is necessary to run this script is to set the environment variables to make use of the right Perl with the right Modules. That in itself sounds easy but it can be tricky to say the least.

This post will guide you through some steps to make sure you have all it takes to run the script with no errors or setting of hunting for astray Perl modules around the internet.

As far as where to place the script, you have the following options:


  • Place the script on the Mid Tier Server where the Apache logs are located and make use of the Perl/Modules available on that Node.
  • Place the script on a different server where you have Perl derived of some Oracle Home. If you choose this path, be aware that you will have to NFS share the location of the OS logs between the Apache server and the server where the script runs
For the sake of simplicity, I have decided to place my script on a NFS share that is presented to most of our production servers. This makes it easy to have access to it from any server where I may want to load logs from.

Script Dependencies


Dependencies : This script has been developed in Perl and it depends on the 
               following modules:

               Getopt::Long (Included in Oracle Home)
               DBD::Oracle (Included in Oracle Home
               File::Basename (Included in Oracle Home)

               The version and modules of Perl that are installed along most 
               Oracle products like Middleware and database are enough to get 
               the script running.
               Remember to set the following environment variables so that you 
               make use of the Oracle Supplied Perl that should include all you
               need to run this script.

First we need to set the environment so that we make use of the Perl and Modules from our Oracle Home.

Once you have defined from which Oracle Home you will use Perl, you need to set your session's environment variables

In My Case, I am using an Oracle Portal 11g Home as follows:

export ORACLE_HOME=/u01/app/oracle/product/Middleware/as_portal

So, I go ahead and set the environment variables replacing <ORACLE_HOME> with the path above

export PERL5LIB=$ORACLE_HOME/perl/lib/site_perl/5.10.0/i686-linux-thread-
multi/:$ORACLE_HOME/perl/lib/5.10.0
export LD_LIBRARY_PATH=$ORACLE_HOME/lib
export PATH=$ORACLE_HOME/perl/bin:$PATH

Edit the script with your preferred editor and adjust the first line according to your environment. Replacing the <ORACLE_HOME> with the correct path

#!<ORACLE_HOME>/perl/bin

Save and close the file.

At this point, we may perform a few tests to find out if the Perl environment has been setup properly for the script to be run.


Testing DBD

This script makes use of this module to connect to the Oracle Repository Database. Now, because some Oracle supplied Perl scripts require to make connections to Oracle databases, the Perl version included in the Oracle Software already has it.

To find our if your Perl environment is set correctly and if it can find the DBD::Oracle module: 

perl -e 'use DBD::Oracle; print $DBD::Oracle::VERSION,"\n";'

1.20

If your environment is correctly setup, the above command should return the version of the Driver/Module.

Testing Getopt::Long


# I am glad to find out that the Getopt::Long is also included in the Perl provided by Oracle

perl -e 'use Getopt::Long; print $Getopt::Long::VERSION,"\n";'

2.37

Getting Started

Before you run the script for the first time, there are a few changes to be made in the script itself.

Even though the script's behavior can mostly be controlled via switches during invocation, some parameters must be changed inside the script.

The following parameters will be used by the script changes will be used one-time-only when the script is run passing the -createrepo switch

Make the following changes to the script to reflect your environment:

Open the script and locate/change the following lines inside the script to reflect the paths in your environment:

# CHANGE THE FOLLOWING PATHS TO MATCH YOUR PATH TO YOUR CHOSEN ORACLE_HOME
# ALTERNATIVELY, YOU MAY ADD EXTRA SWITCHES SO THAT THE USER CAN PASS THESE
# VALUES TO THE SCRIPT

$oracle_home = '/u01/app/oracle/product/Middleware/as_portal';

Change the above line to reflect the ORACLE HOME location you have chosen 

$tns_admin   = '/u01/app/oracle/product/Middleware/as_portal/network/admin';

Since this script will require a TNS Net Alias, you will have to add such entry to the tnsnames.ora file located under the path specified above

$ld_library_path = '/u01/app/oracle/product/Middleware/as_portal/lib';


Next, locate/change the following section in the script. Please note that the following directories must reflect the paths on the server where the repository DATABASE IS INSTALLED. For easy of use, you may have to create a NFS share between the database server and the server where the script is running. 


# CHANGE THE FOLLOWING LOCATIONS TO REFLECT PATH'S ACCESSIBLE TO THE SCRIPT
# ON THE SERVER WHERE THE DATABASE RESIDES

$ext_table_bad_files_path = '/backup/logs_apache/prod/log_loader/bad';
$ext_table_log_files_path = '/backup/logs_apache/prod/log_loader/log';
$zcat_path = '/backup/logs_apache/prod/log_loader';

If the script detects that a file to be loaded is compressed, it will make use of the zcat OS binary to preprocess the file. To avoid permission issues, make a copy of the zcat utility to the directory defined above.

Since the script makes use of Oracle External Tables to mine and upload the data into the repository database, you must define a location for your BAD, LOG and to the zcat binary.

Creating a database Schema

The purpose of this script is to load and manage OS log files into an Oracle Database. Create a user account with permissions to create and drop tables as well as indexes and Oracle directories. It is recommended that a separate tablespace is created and assigned as the default tablespace for the log loader schema user

create user LLO identified by <password> default tablespace LLO temporary tablespace temp;

grant create table, create any directory to LLO;

Defining a TNS Net Alias

On the machine where the script will be running, define an Oracle Net Service Alias to be used by the -tnsalias switch.

Example:


pdblog =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.248.139)(PORT = 1521))
    )
    (CONNECT_DATA =
        (SERVICE_NAME = pdblogdb.domain.gv.ao)
    )
  )

Running the Script for the First Time

Before the script can be run regularly, it must be invoked passing the -createrepo switch to instruct the script to create all necessary objects in the repository database.

Example:

If you have not set the environment as described above, set it as follows. Please note that if the script will be called subsequently from the cron, such environment variables must set in the OS session started by the cron job. This can be achieved by wrapping the script invocation in a shell script setting up the environment variables in the shell script.


$ export ORACLE_HOME=/u01/app/oracle/product/Middleware/as_portal
$ export PERL5LIB=$ORACLE_HOME/perl/lib/site_perl/5.10.0/i686-linux-thread-multi/:$ORACLE_HOME/perl/lib/5.10.0
$ export LD_LIBRARY_PATH=$ORACLE_HOME/lib

$ export PATH=$ORACLE_HOME/perl/bin:$PATH

Run the script passing the -createrepo switch as well as the other required parameters. 

$ perl log_loader.pl -createrepo -dbuser=llo -dbpassword=logloader -tnsalias=pdblog defaulttbs=users


The script validates the passed parameters as well as their dependencies. If any inconsistencies or insufficient parameters are passed to the script, it will return an LLO type error followed by a number and a message that should describe what went wrong. If everything is OK with the parameters, the script will attempt to connect to the database and perform all necessary actions. 
Bellow is the output of the script once it has connected and created the database objects:


Attempting to connect to the repository Database via TNS alias <pdblog>

User     : llo

Password : *************



  *** Setting Environment Variables ***


  ORACLE_HOME=/u01/app/oracle/product/as10g/MidTier/
  TNS_ADMIN=/u01/app/oracle/product/as10g/MidTier/network/admin
  LD_LIBRARY_PATH=/u01/app/oracle/product/as10g/MidTier/lib


10/02/2014 14:27:12: Connected.

Table LLO_APACHE_LOGS dropped.
Table LLO_APACHE_LOGS and Indexes Created.

Table LLO_SYSLOG_LOGS dropped.
Table LLO_SYSLOG_LOGS and Indexes Created.

Repository objects created successfully.

10/02/2014 14:27:12: Closing connection to the repository database.

As we can see from the above output, this version of the script as it is creates all necessary directories and two tables as well as it's indexes. One table is for APACHE logs and it is fully functional and the other would be for OS SYSLOGS, although it is not functional at this point. be the first to implement and share with us!
The script creates interval partitioned tables and global indexes. make sure you adapt the script according to your Oracle licensing.



Using the Script 


Now that the repository has been setup, the script should be ready to start loading and managing your APACHE logs. Be aware that this script as it is expects the APACHE contents to have the following format. Although not all columns need to have a value, it does expect to find all the following columns when it applies its field delimiters:

105.172.13.165 - DANIEL.DAMEDA [11/Feb/2014:01:00:05 +0100] "GET /portal/pls/sigfe/WWJS.GENERICO HTTP/1.1" 200 60859

You may have to make some adjustments to the script if you notice that your APACHE log format is not the same as the above.

The script's behavior is affected by the values passed to its switches. Therefore, it is important to understand that each switch does. The script itself does some validation of the switches to make sure the input values are correct as well as dependency checking. For example if you invoke the script in running mode by passing the -type=APACHE, it will expect that you also pass extra parameters like which logs to load, where to store it and what to do with each log file once it has been processed.
Depending on the values passed as parameters for the script's switches. 
To get a full list of the switches and their description, invoke the script passing the -help switch as follows:

Make sure your environment is set prior to running the script. Refer to the chapter "Script Dependencies" above for instructions on how to set your environment.

$ perl log_loader.pl -help

Not Copyrighted, 2013 Authorship Rights Reserved
http://www.thatoracledude.blogspot.com/
----------------------------------------------------------------------

Usage: |  = OR  operator
       &  = AND operator (Required)
       <> = Optional
       [] = Required
       log_loader.pl  [-help]                 |
                      [-createrepo]           &
                         <-dropfirst>            &
                         [-dbuser=]              &
                         [-dbpassword=]          &
                         [-tnsalias=]            &
                         <-defaulttbs=>          |
                      [-type=<APACHE|SYSLOG>] &
                         [-logdir=]              &
                            <-logfiles=>         |
                            [-fileprefix=]       &
                         [-dbuser=]              &
                         [-dbpassword=]          &
                         [-defaulttbs=]          |
                         [-tnsalias=]            |
                         <movetodir=>            |
                         <compresstool=>      |
                      [-about]

Options: -help         : Displays usage help.
         -createrepo   : Instructs the script to create the repository tables on the repository
                         database specified with the -dbuser, -dbpassword and -tnsalias parameters.
                         YOU MUST CREATE A DATABASE USER THAT WILL BE USED TO STORE THE REPOSITORY
                         TABLES PRIOR TO CREATING THE REPOSITORY. THE DATABASE USER MUST HAVE PRIVILEGES
                         TO CREATE TABLES, INDEXES and DIRECTORIES. The name "LLO" is recommended as the
-dropfirst    : This switch is to be used in conjunction with the -createrepo switch. It defaults
        to NULL, which means that not passing this switch will instruct the script NOT
to drop the objects already created. This is usefull when you just want to add 
new Tables or Objects into your repository database but you do not want any 
existing objects which may contain data to be dropped. 
Passing this switch will instruct the script to RECREATE all repository objects
Use this switch with CAUTION.
                         database schema user.                         
         -type=        : Specify the log type the script is supposed to read.
                         Possible options are: APACHE|SYSLOG. However, at the time of this writing the 
                         only supported value for -type is APACHE.
         -logdir=      : Specify the directory where the logs to be processed are located. The script 
        performs checks to confirm that the directory is valid.
         -logfiles=    : You may enter a comma-separated list of log files names to be imported.
        There is no need to specify the full path, only the log file names as the path
will derive from the -logdir switch.
                         All log files specified must reside on the same directory defined under the
                         -logdir= parameter. If you pass a value for the -logfiles parameter, the
                         -fileprefix= parameter is ignored.
         -fileprefix=  : Log File Name prefix so that the script will search and process only files that
                         begin with the specified string and are located on the directory passed through
                         the -logdir switch. This switch is ignored if you also pass a value for the 
                         -logfiles switch.
                         PLEASE NOTE THAT THIS SCRIPT WILL MAKE USE OF /sbin/fuser IN ORDER TO IDENTIFY
                         THE APACHE LOG THAT IS CURRENTLY BEING WRITTEN TO. APACHE LOGS CURRENTLY BEING
                         WRITTEN TO ARE NOT PROCESSED BY THIS SCRIPT. 
         -dbuser=      : Database user schema used to connect to the repository database. It refers to the
                schema user who owns the repository database objects created when -createrepo was
run. 
         -dbpassword=  : Password for the user passed in through the -dbuser parameter.
                         SECURITY WARNING: Yes, you will have to pass the password in plain text but
                         I am sure some clever guy will solve this issue. There are some Perl modules 
                         that allow for masked or concealed input.
         -tnsalias=    : TNS alias defined in the tnsnames.ora located under the ORACLE_HOME that has been
                         set on your environment. Make sure the environment variable TNS_ADMIN is set
appropriately.
         -defaulttbs=  : Optionally, Inform the tablespace where the log history table will be placed.
                         If ignored, or passed a NULL value, it will attempt to use a tablespace named
USERS.
         -movetodir=   : You may optionally define a directory directory to which each log file that has 
                been successfully imported into the repository database should be moved to. If
this parameter is specified, each file will be moved
                         to this directory once it has been successfully imported into the database.
                         IF THE REPOSITORY DATABASE IS LOCATED ON A DIFFERENT HOST THAN THE HOST WHERE
                         THE LOGS ARE BEING READ FROM, YOU MUST PRESENT THE SAME LOG LOCATION PATH TO
                         THE DATABASE SERVER VIA.
         -compresstool=: Full path to the compression tool to be used to compress each log file as they
                         are moved to the directory defined in -movetodir. This parameter is ignored if
                         -movetodir parameter is not specified.
         -about        : Acknowledgments, Copyrights and general information about this script.



Example:

Let's assume that you have a a few APACHE log files located on a directory that is shared between the server where the script will run and the server where the database is located via NFS.

  • The location for the log files is /shared/apache/logs.
  • You want the script to read and import all files that start with the string access_log1.
  • You want to import all data into a repository user schema you created previously. The connection details for such user schema are as follows
    • Username: llo
    • Password: llopassword
    • TNS alias: llodb
  • As well as importing the files, you want to make sure each file imported gets compressed using the gzip tool and once compressed they get moved to an archive directory located at /shared/apache/archive
The switches you would use to instruct the script to perform the above tasks are is as follows:

perl log_loader.pl -type=APACHE -logdir=/shared/apache/logs \
-fileprefix=access_log1 -dbuser=llo -dbpassword=llopassword -tnsalias=llodb \
-movetodir=/shared/apache/archive compresstool=/bin/gzip

When the script executes, it will first check all the switches for semantic errors, then for dependencies as well as check if any paths and binary files exist. Once all pre-checks are performed, the script will attempt to connect to the repository database using the credentials provided through the -dbuser=, -dbpassword= and -tnsalias= switches. Once connected to the database, the script searches the directory passed through the -logdir= switch for files that start with whatever string was passed through the -fileprefix= switch. Files that match the search patters will be displayed and processed one by one. 
If the script detects that a file is compressed, it will use zcat as a pre-processor for each file before it reads the data and imports it into the database repository tables.

The following is an output example from the script where it displays all the files that match a specific search pattern and are candidates for import.

Verifying files that match the criteria provided

================================================

   FILE NAME                                             FILE SIZE
   ----------------------------------------------------  ------------------------
   access_log.1392076800                                 129.64 MB
   access_log.1392120000                                 11.48 MB


Creating Temporary Oracle Directory for External Tables
=======================================================

Directories LLO_DIR, LLO_LOG, LLO_BAD and LLO_ZCAT Created.

Below is an output script example from the script where it details the steps a each file goes through to get imported, compressed (if nor already compressed) and moved to an archive destination.

Creating External Table on File /u01/app/oracle/product/as10g/MidTier/Apache/Apache/logs/access_log.1392076800
Loading Data . . . Imported.
Compressing File . . . Done.
Moving File . . . Done.
Skipping Current Log File access_log.1392120000.

In the last line of the above output, we can see that the script has skipped one particular log file named access_log.1392120000. It has done so because it uses the Unix fuser command to find out if the file is in use so it does not process the current APACHE log file.

Now, let's suppose you want to import only a few distinct APACHE log files instead of allowing the script to import all logs that match a supplied string.

In this case, you must make use of the -logfiles= switch in order to pass a comma-delimited list of log file names to be imported by the script.

In the following example, we will only be importing a few log files that reside on the directory defined by the -logdir switch:


perl log_loader.pl -type=APACHE -logdir=/shared/apache/logs \
-logfiles=mylog1.gz,mylog2.gz,mylog3.log,access_log.1392076930 -dbuser=llo \
-dbpassword=llopassword -tnsalias=llodb \
-movetodir=/shared/apache/archive compresstool=/bin/gzip

On the previous example, we make use of the -logfiles= switch to instruct the script to load only the logs passed in a comma-delimited list. Note that you must pass pathless names in either compressed or uncompressed format.
Passing the current APACHE log file in the comma-delimited list will cause the script to skip it as it makes use of the unix fuser command to check if the file is in use. The script will only process files that are not in use.

Import Only the Content that Matters to you

In this section I will explain how you can change the script so that it does not flood your repository database with OS log entries that you may not find relevant to your needs:

You may add filters to the part of the script where it performs a INSERT AS SELECT from an external table.

The above section is indicated in the script by the following comment:

# FILTER OUT WHAT THE ENTRIES YOU DO NOT WANT IMPORTED USING THE # WHERE CLAUSE OF THE FOLLOWING INSERT AS SELECT STATEMENT

Below the above message you will find the SQL statement itself which performs the direct path insert:


INSERT /*+ APPEND NOLOGGING PARALLEL */
  INTO LLO_APACHE_LOGS
                (HOST,
                 LOGNAME,
                 REMOTE_USER,
                 REQUEST_DATE,
                 REQUEST_LINE,
                 REQUEST_STATUS,
                 BYTES_SENT)
SELECT /*+ PARALLEL */
                 host,
                 logname,
                 remote_user,
                 CASE
                    WHEN
                       REGEXP_LIKE(request_date,'[[:digit:]]\{2\}/[[:alpha:]]\{3\}/[[:digit:]]\{4\}:[[:digit:]]\{2\}:[[:digit:]]\{2\}:[[:digit:]]\{2\}','i')
                    THEN
                       TO_DATE(request_date,'[DD/Mon/YYYY:HH24:MI:SS')
                    ELSE
                       TO_DATE('01/Jan/1970:00:00:00','DD/Mon/YYYY:HH24:MI:SS')
                    END REQUEST_DATE,
                 request_line,
                 request_status,
                 bytes_sent
  FROM LLO_APACHE_LOG_EXT
 WHERE request_line NOT LIKE '%POST /portal/pls/test/!PORTAL.wwpro_app_provider%'

If necessary, you may add as many "NOT LIKE" clauses as you want to make sure you filter out non necessary entries.
Also, you can make use of the REGEXP_LIKE function to filter out records that do not satisfy a specific format.

Scripts Error Codes and Messages


LLO-00010: Invalid Parameter passed to the script. Check each passed parameter and look for possible typos. Use the -help switch for correct parameters and syntax.

LLO-00020: Insufficient parameters passed to the script. Enter -help for parameters.
LLO-00030: Invalid value for parameter -logtype. Enter -help for parameters.
LLO-00040: nvalid value for either parameter -fileprefix or -log_files. Enter either a prefix for the log files to be read or a comma-delimited list of log files to be imported.
LLO-00050: Invalid or missing value for either -dbuser, -dbpassword or -tnsalias. Enter -help for parameters.
LLO-00060: When passing -createrepo, you must also provide value for parameters -dbuser, -dbpassword and -tnsalias. Enter -help for parameters.

LLO-00080: Invalid value for parameter -logdir. Unable to find directory $log_dig.
LLO-00100: It will output any database related error that happen during the creation of the external table directories.
LLO-00110It will output any database related error that happen while importing data into the repository database.
LLO-00120: It will report any error that happens while attempting to connect to the repository database.
LLO-00210: Unable to locate file $compress_tool passed through the parameter -compresstool.
LLO-00220: Unable to locate directory $move_to_dir passed through the parameter -movetodir.


Script Automation

Once the script is up and running manually, you may automate its executions using the unix cron. The following is an example where the script is wrapped on a shell script that sets the environment and then invokes log_loader.pl to perform the load and management of APACHE logs.


30 01,13 * * * "/backup/logs_apache/log_loader.sh" >/dev/null 2>&1

And the following is the content of the script:

export PERL5LIB=/u01/app/oracle/product/as10g/MidTier/perl/lib/site_perl/5.6.1/i686-linux:/u01/app/oracle/product/as10g/MidTier/perl/lib/5.6.1
export LD_LIBRARY_PATH=/u01/app/oracle/product/as10g/MidTier/lib
export PATH=/u01/app/oracle/product/as10g/MidTier/perl/bin:$PATH
perl /backup/logs_apache/log_loader.pl -type=APACHE \
-logdir=/u01/app/oracle/product/as10g/MidTier/Apache/Apache/logs \
-fileprefix=access_log -dbuser=llo -dbpassword=logloader -tnsalias=pdblog \
-movetodir=/backup/logs_apache/prod/autoloaded -compresstool=/bin/gzip \
> /backup/logs_apache/log_loader_last_import.log


Special consideration for VLDB's and Oracle Text


Note that if you end up with a very large table and you intend to search no a CLOB column, you may have to create an Oracle Text index on the column to allow your queries to run in a timely fashion.

Context indexes are not created by default by this script. If you feel the need to enable Oracle Text, you will have to check of Oracle Text is installed on your database and create an index on the CLOB column.

Also, make sure you grant the appropriate privileges to the database repository schema owner in order to be able to use Oracle Text.


Acknowledgments


Please feel free to use, modify and make this script better as you wish. 
The script as it is lays out the idea so that you can follow its principles and adequate it to your needs. 

However, please keep some kind of reference to my blog or my name in some shape or form as it would make me very happy to see people finding any use for something we started.

Also, please send comments, questions, suggestions and critics as well as ideas that you may want to share with me. Any feedback will be VERY appreciated even if is just a hello or to say that this may be totally useless or obsolete by some other technology already available.

Special thanks to my colleague Alex Zaballa who helped me with ideas and suggestions and helped me testing after we imported over 1 billion rows of APACHE logs into a 12c Pluggable database.

Have lots of fun and thanks for taking your time to read this!

Daniel Da Meda - 2014

No comments:

Post a Comment

Hello there, please leave your comment, complaints and questions!