Friday, October 30, 2009

Using Logparser to dump Bluecoat log files into SQL

Working with Bluecoat files in the raw can be time-consuming. Findstr and grep only work so fast. Windows grep is slow. I know SQL syntax OK, so I tend to dump logfiles into databases to analyze them for activity. There are certainly other ways to do it, such as using a reporting tool for Bluecoat. (Splunk's free Bluecoat application, e.g.).

Theoretically, Bluecoat logfiles are the same as W3C web server log files that logparser can consume via the -i:W3C directive.

You can see the fields in a Bluecoat log below.

#Fields: date time time-taken c-ip cs-username cs-auth-group x-exception-id sc-filter-result cs-categories cs(Referer) sc-status s-action cs-method rs(Content-Type) cs-uri-scheme cs-host cs-uri-port cs-uri-path cs-uri-query cs-uri-extension cs(User-Agent) s-ip sc-bytes cs-bytes x-virus-id

For some reason, Bluecoat leaves two spaces between cs(Referrer) and sc-Status, so all the columns to the right of sc(Referrer) past that will be one off. BlueCoat also leaves spaces in cs-categories and surrounds them with quotation marks, so you need to specify -dQuotes:on. Logparser doesn't have a quick and easy way to handle the double-spaces issue, so I wrote a VB Script to handle it. (VBScript is pretty quick at text handling and it's much faster than using search and replace in WordPad or Notepad on a 500-1000 MB File.)

Here's the VBScript:
'start

Set objFSO = CreateObject("Scripting.FileSystemObject")
'change this line to wherever you want to read the input from.
Set objTextFile = objFSO.OpenTextFile("c:\myBluecoatlog.log",1)
Set objNewFile = objFSO.CreateTextFile("c:\myCleanBlueCoatlog.log")
Do Until objTextFile.AtEndOfStream

myString = objTextFile.Readline
objNewFile.WriteLine(Replace (myString, " ", " "))
Loop
'end vbscript
Here's the logparser file:
-------------------start
SELECT TO_LOCALTIME(TO_TIMESTAMP(date, time)) AS date,
time-taken,
c-ip,
cs-username,
cs-auth-group,
x-exception-id,
sc-filter-result,
cs-categories,
cs(Referer) AS Referer,
sc-status AS scStatus,
s-action,
cs-method,
rs(Content-Type) AS ContentType,
cs-uri-scheme,
cs-host,
cs-uri-port,
cs-uri-path,
cs-uri-query,
cs-uri-extension,
cs(User-Agent) AS UserAgent,
s-ip,
sc-bytes,
cs-bytes,
x-virus-id

INTO BlueCoat4
FROM c:\myCleanBlueCoatlog.log
------------------end
And here's the command line for logparser. (Save the logparser file as c:\scripts\log\bluecoat.sql)

logparser file:c:\scripts\log\bluecoat.sql -i:W3C -o:SQL -server:sqlservername -database:BLUECOAT -createtable:ON -dQuotes:ON


Statistics:
-----------

Elements processed: 613076
Elements output: 613076
Execution time: 241.20 seconds (00:04:1.20)
About 2500 lines/sec. Processor utilization is almost zero for SQL and logparser, so it's all about disk time.

The above is from a file that's 310,935,417 bytes large. That means BlueCoat logs are about 507 bytes per line, or 0.5k per line before compression. The last time I checked BlueCoat gz compression, it was about 15% of the original file size. Compressed, the line would cost you 76 bytes.

No comments:

Post a Comment