Post Information Systems Grad School: June 2006

Thursday, June 29, 2006

Network Analysis With Free EtherPeek: Ethereal Gets Some Competition

Way back in 1999, I was looking for a packet analyzer. I was familiar with EtherPeek for the Macintosh from a few years before, and I found that the AG Group was producing EtherPeek for Windows, too. The AG Group is now WildPackets, and they are exceedingly helpful to anyone that has to troubleshoot data networks. AG Group always offered some cool network freebies: IP Subnet Calculator, netTools and a great protocol reference chart.

One of their people, J. Scott Haugdahl, has an excellent book, Network Analysis and Troubleshooting, which offers a bottom-up review of the OSI 7-layer model . (Which one are you: All People Say They Need Data Processing or Please Do Not Throw Sausage Pizza Away?)

I liked EtherPeek and the book so much that I bought both and paid out of my own pocket even though my job was managing the network. Of course, this was back in the day when running tcpdump required you to know your IRQ, DMA and chip set (i.e. DEC Tulip). My job at the time was helping change a campus network from Netware to TCP/IP when Windows and Macintosh didn't even install a TCP/IP stack by default. We went from three-and-a-half network protocols (two different Netware frame types) to one and a half (we still had a couple of AppleTalk issues.) Each computer was on the Internet with a public IP address and no firewall. The ping of death still worked against most machines, and we also got hit with Smurf and Trinoo attacks that would disrupt all online activity.

WildPackets makes some excellent packet analyzers for wired and wireless networks. Now their base-level product is free: OmniPeek Personal. While I have been using Ethereal since my old version of EtherPeek became obsolete because it was on my ancient Dell laptop, I missed EtherPeek because it was the first packet analyzer I really got to know well. I could create filters and find exactly what I needed to find. EtherPeek also had good summary statistical functions, which could tell me who was producing the most traffic on my networks. Omnipeek Personal is better than my copy of EtherPeek was because it includes some expert analysis about bad packets and delayed response times. It also produces HTML statistics just like the original, and it has a better interface than Ethereal, using color to show differences between packets.

For those of you that underestimate the power of color, try printing a Google or Mapquest map in black and white and one in color and see which one is easier to read while you're driving. OmniPeek makes it easier to read your packet stats and is easier on your eyes than Ethereal. It's also supposed to do wireless captures -- I'll update when I get a compatible chipset wireless card.

Wednesday, June 28, 2006

The Visio 2007 Beta

Since I started using visio in 1999 or so, I loved it. Microsoft bought Visio for $1.5 billion, which was the most Microsoft had paid anyone for anything. Since then, Microsoft has incorporated it into its Office line.

I usually don't use a lot of Beta, but I had no choice. Visio 2003 does not connect to MS SQL Server 2005, even with SQL Native Client installed on my laptop. I had two choices: download and install my (student free) copy of Visio for Enterprise Architects on my soon-to-be-dead laptop, or download a free Visio 2007 Beta.

I use Visio for diagramming almost anything technical, from rack diagrams to network and Active Directory diagrams to schoolwork like data flow diagrams, class diagrams, statecharts, and entity relationship diagrams. (I can't afford ERWin.) You can even export from MS Project into MS Visio to create GANNT and PERT charts that are more customizable than what you can do in Project. (Although for updating diagrams on large projects quickly, nothing beats Critical Tools which does a much better job of creating Work Breakdown Structures and PERT charts than MS Project.

One of my favorite features in Visio is reverse-engineering databases. I find it much easier to create databases in Access and then reverse-engineer the diagram in Visio. I can also test out the Access database and see if I can get the reports I need with the right queries. (I hear that even Oracle DBAs with years of experience test things in Access.) I can also use this feature to investigate vendor-supplied databases. (One-size-fits-none databases tend to have hundreds of tables.)

In Visio, I just create a new database diagram, then select Database | Reverse Engineer and point it at my data source, which is still a little cumbersome to set up on a new non-Access database. After importing the tables, indexes and queries I need, I can select Database | Options | Document and hit the checkboxes for cardinality, crow's feet, and actions for relationships. This box has changed slightly for Visio 2007, and it looks like the IDEF1X symbol set is also new, and it will be especially helpful to defense contractors.

Another good thing about Visio 2007 is that I can use all my old stencils, including the giant pack of slightly dated Altima stencils that came with a 3com switch. Since I can't afford to buy lots of custom stencils, I am very happy to see that more vendors are offering free equipment representations for their products at places like the Visio Cafe.

If you're looking for a free version of Visio to work with, the Visio 2007 Beta will work. Mine hasn't even crashed yet.

Saturday, June 24, 2006

Data Mining and Data Warehousing Might Just Protect Your Identity, Someday

In dealing with financial activities, our law enforcement/intelligence community is someplace between Get Smart and Mission:Impossible, depending on which story you read in the newspaper.

Eighty-one people in 17 states used a California woman's Social Security number, according to the AP on June 18, 2006. You'd think the IRS or Social Security Administration would notice that 81 jobs falls outside the normal range of jobs. Maybe even past 3 standard deviations above the mean number of jobs that people hold in a given time period.

"They knew what was happening but wouldn't do anything," said Schmierer, 33, a housewife in this San Francisco suburb. "One name, one number; why can't they just match it up?"

Then on June 23, the New York Times breaks a story about how the Treasury is overseeing a CIA program that monitors data going through the Society for Worldwide Interbank Financial Telecommunications. What do these reporters think FinCen does? They look for the same kind of activity that the NY Times-revealed program does, except they've been doing it a lot longer than the CIA.

A former compliance officer for a major brokerage once told me that you might get away with insider trading once. After that the investigators would know the people with whom you attended kindergarten and might be in a position to give you insider information. That's link analysis.

The last time I bought AMEX traveler's cheques it took half an hour because of the paperwork required by the bank to satisfy post-9/11 financial tracking regulations, so it doesn't surprise me that the intelligence community is monitoring international transactions. (The paperwork is so tedious that I'm going to carry cash more often than not.)

Our government can access tons and tons of data about every transaction that travels across our borders, but without efficient algorithms for flagging suspicious activity, it will all be useless. Placing every tax return and W-2 statement into a single data warehouse would be academic. Yahoo and Google probably generate more data in a week than all our tax returns and W-2s annually. You would think the Social Security Administration would be able to see the fraud in their systems. The ACLU wouldn't even be able to argue that our government isn't allowed to look at its own data.

Once you loaded the data, a few queries could spit out suspicious Social Security Number users in a day or two. Again, the budget for this would be under a million or two.

Tuesday, June 20, 2006

Counting Web Attacks

I see a lot of 404 errors in my Apache logs. A 404 error is a file not found, e.g. someone has requested a file that's not there. Often it means I made a typo in a configuration or HTML someplace. More often, it means someone someplace is probing my server for weak web applications.

Linux and open source software have made it easy to add web applications running under Apache and MySQL. The problem is as more and more sites start using these cool web applications, hackers are able to find holes in them. The developers fix the holes and release patches, but many webmasters don't apply the patches. Thus I see probes like the one below in my Apache logs:


212.83.253.101 - - [19/Jun/2006:09:24:49 -0400] "GET /a1b2c3d4e5f6g7h8i9/nonexistentfile.php HTTP/1.0" 404 320 "-" "-"
212.83.253.101 - - [19/Jun/2006:09:24:49 -0400] "GET /adxmlrpc.php HTTP/1.0" 404 294 "-" "-"
212.83.253.101 - - [19/Jun/2006:09:24:49 -0400] "GET /adserver/adxmlrpc.php HTTP/1.0" 404 303 "-" "-"
212.83.253.101 - - [19/Jun/2006:09:24:49 -0400] "GET /phpAdsNew/adxmlrpc.php HTTP/1.0" 404 304 "-" "-"
212.83.253.101 - - [19/Jun/2006:09:24:50 -0400] "GET /phpadsnew/adxmlrpc.php HTTP/1.0" 404 304 "-" "-"
212.83.253.101 - - [19/Jun/2006:09:24:50 -0400] "GET /phpads/adxmlrpc.php HTTP/1.0" 404 301 "-" "-"
212.83.253.101 - - [19/Jun/2006:09:24:50 -0400] "GET /Ads/adxmlrpc.php HTTP/1.0" 404 298 "-" "-"
212.83.253.101 - - [19/Jun/2006:09:24:50 -0400] "GET /ads/adxmlrpc.php HTTP/1.0" 404 298 "-" "-"
212.83.253.101 - - [19/Jun/2006:09:24:50 -0400] "GET /xmlrpc.php HTTP/1.0" 404 292 "-" "-"
212.83.253.101 - - [19/Jun/2006:09:24:51 -0400] "GET /xmlrpc/xmlrpc.php HTTP/1.0" 404 299 "-" "-"
212.83.253.101 - - [19/Jun/2006:09:24:51 -0400] "GET /xmlsrv/xmlrpc.php HTTP/1.0" 404 299 "-" "-"
212.83.253.101 - - [19/Jun/2006:09:24:51 -0400] "GET /blog/xmlrpc.php HTTP/1.0" 404 297 "-" "-"
212.83.253.101 - - [19/Jun/2006:09:24:51 -0400] "GET /drupal/xmlrpc.php HTTP/1.0" 404 299 "-" "-"
212.83.253.101 - - [19/Jun/2006:09:24:52 -0400] "GET /community/xmlrpc.php HTTP/1.0" 404 302 "-" "-"
212.83.253.101 - - [19/Jun/2006:09:24:52 -0400] "GET /blogs/xmlrpc.php HTTP/1.0" 404 298 "-" "-"
212.83.253.101 - - [19/Jun/2006:09:24:52 -0400] "GET /blogs/xmlsrv/xmlrpc.php HTTP/1.0" 404 305 "-" "-"
212.83.253.101 - - [19/Jun/2006:09:24:52 -0400] "GET /blog/xmlsrv/xmlrpc.php HTTP/1.0" 404 304 "-" "-"
212.83.253.101 - - [19/Jun/2006:09:24:52 -0400] "GET /blogtest/xmlsrv/xmlrpc.php HTTP/1.0" 404 308 "-" "-"
212.83.253.101 - - [19/Jun/2006:09:24:53 -0400] "GET /b2/xmlsrv/xmlrpc.php HTTP/1.0" 404 302 "-" "-"
212.83.253.101 - - [19/Jun/2006:09:24:53 -0400] "GET /b2evo/xmlsrv/xmlrpc.php HTTP/1.0" 404 305 "-" "-"
212.83.253.101 - - [19/Jun/2006:09:24:53 -0400] "GET /wordpress/xmlrpc.php HTTP/1.0" 404 302 "-" "-"
212.83.253.101 - - [19/Jun/2006:09:24:53 -0400] "GET /phpgroupware/xmlrpc.php HTTP/1.0" 404 305 "-" "-"

This is a probe, not an attack. There's nothing illegal about requesting files that aren't on my server, is there? But if I touch /var/www/html/adxmlrpc.php, we may find out what happens next. Note that most of these requests, while probing for different applications, share one thing in common: RPC on PHP.

The below is chart of probes by date and request on this webserver. There's not enough space to list each one as it corresponds to the color. (MS Excel shows me data point details info on mouseover in my pivot table.)

Attacks by Application

Wednesday, June 14, 2006

Friendster and LinkedIn Meet Vonage and Skype, or Link Analysis and Transitive Closure

The ultimate social and business network would include more than just email contacts as Friendster or LinkedIn do. It would include the people you call using Vonage and/or Skype. Using the the Call Detail Records (as the National Security Agency might) to do what law enforcement calls "Link Analysis," a social or business network could connect you via phone numbers.

The telcos have been doing link analysis for years as part of their fraud detection programs, and what the NSA might be doing is not much different. Link analysis is really transitive closure, but most computer security and law enforcement people don't know relational algebra, so they call it link anlysis.

Transitive closure (aka recursive closure), at its simplest is this: The transitive closure of relation [table] R with attributes [columns] (A1, A2) defined on the same domain is the relation R augmented with all tuples [rows] successively deduced by transitivity; that is if (a,b) and (b,c) are tuples of R, the tuple (a,c) is also added to the result. (From Connolly and Begg's Database Systems, in reference to Timothy Merrett's Relational Information Systems, 1984). Since I was interested in the relational algebra, I bought a "new" copy of Merrett's book from an Amazon reseller for $8. In defining closure, Merrett refers to Aho, Hopcroft and Ullman (1974), and says, "to do so here would involve too much of a mathematical digression." It's 2006, and a book published in 2005 references another book from 1984 (!) that refers to a book from 1974. The relational database model has not changed much since E.F. Codd's work in 1971. What has changed is the scalability of hardware that we use to run our relational database management systems.

One example of transitive closure that project managers might understand is an exercise in the Merrett book: "Find the expression which gives PATHS the duration of all sets of activities in" and lists the data for the PERT chart. A query (or your relational algebra expression) would show all the paths through the network, and should probably show the critical path as well.

What makes the PERT chart example interesting is that it can show more than one path through a network between two nodes. When talking about link analysis using call detail records, many models show single links between nodes. In Investigative Data Mining for Security and Criminal Detection, Jesus Mena lists a couple of COTS off-the-shelf link analysis tools, ATAC and Analysts' Notebook. These systems can take call detail records and produce links and even chart them on graphs. Mena's book lists many tools, including some free applications and others with a free demo. For the documention of the tools alone, the book is worth the price. Mena's book details a lot of the history of AI and datamining in the security community, but it also confuses database terminology (related relations, e.g.) to make it understandable by the law enforcement community. Despite this, Mena implies that law enforcement in the 21st Century is going to need a lot more artificial intelligence and database experts.

Sample query to bring up people in your network:

SELECT callee
FROM table.cdr
WHERE callee = 'my_target_no'
UNION
SELECT callee
FROM table.cdr
WHERE caller = (SELECT callee FROM table.cdr WHERE caller = 'my_target_no');

The trouble with this query, adapted from the manger-employee recursive example that everybody learns in database school, is that it would eventually return everyone with a telephone. Thus the iterations must be controlled, and I need to adapt the query above from a recursive query into an iterative one if I were going to make it work on SQL 2005.

Sunday, June 11, 2006

Connecting Sharepoint to SQL 2005 Report Server

It seemed simple: Export OLAP reports from SQL 2005 Reporting Services into Sharepoint. I like Sharepoint because it solves a ton of problems in organizations. I'm still surprised at how many Microsoft shops don't use Sharepoint because it's free and it integrates into Active Directory. (Sharepoint Portal Server, a different product, costs money, scales more and is personalizable.) All you need for Sharepoint is IIS and SQL or the MSDE; and FrontPage 2003 if you want to edit graphics. Microsoft has a lot of Sharepoint resources available for download, but they're not well organized.

The details slowed me down a few hours. There are several different ways of configuring security contexts, and you will have to keep your accounts and passwords straight. I have yet to find a step-by-step on Technet, but I'm still looking. I did see a page showing cool OLAP reports in Sharepoint on Technet, but no link to help me set it up.

The biggest problem that I've seen many other folks have is the 404 Bad Request error in the /Reports ReportManager Virtual Directory. /ReportServer worked the first time, but without the ReportManager Virtual Directory, it's not so useful. At first I thought this was a DCOM security issue because of the event log entries I got. (Ten of these on the first request for http://myreportserver/reports after restarting IIS and then no more until restarting IIS.)

The application-specific permission settings do not grant Local Activation permission for the COM Server application with CLSID {BA126AD1-2166-11D1-B1D0-00805FC1270E} to the user NT AUTHORITY\NETWORK SERVICE SID (S-1-5-20). This security permission can be modified using the Component Services administrative tool.

The trouble with that message is that there's no DCOM component in Component Services that corresponds to the CLSID. This didn't stop me from searching the registry for a while, finding that the CLSID is involved with about a dozen basic network services, none of which are in the Component Services MMC.

I gave up searching the Registry and I added NT Authority\Network Service to the DCOM user group on the local machine and restarted IIS. No joy. I was able to clear the event log of that error this way, but I still got the same error requesting http://myreportserver/Reports, just with no event log entries. I rechecked all the settings in SQL Server's Report Configuration Tool, which is very useful, but still didn't solve the problem.

I googled the source code on the error page:

System.Net.WebException: The request failed with HTTP status 400: Bad Request.

and found a site at MIT concerning a totally unrelated applicaton that threw the same error. I had one other Virtual Web on the machine, so I deleted it and reset my Default Web Site set to All Unassigned IP addresses and restarted IIS. Bingo. I can manage reports over the Web -- it just takes a while to start up the first time you request http://myresportserver/Reports. I can access it from http://localhost/Reports on that box now; before localhost requests failed, and I didn't know why.

I still have to set the right permissions for everything. I also need to choose whether to share a data connection or use the web visitor's security context. Just listing all the security contexts makes me dizzy: The Sharepoint App Pool, the Report Server App Pool, SQL Report Server Data Sources, the DCOM permissions mentioned above, and finally, your users' accounts in Sharepoint and Reports.

Sharepoint doesn't hold the Report -- it just passes your request on to the Report Server. Thus, you'll need to set permissions for the Sharepoint and the SQL Report Server. If you have Sharepoint permissions but not Report Server permissions, the Report Explorer web part will be blank.

Steps that worked for me:
1. Start with a good SQL 2005 install with all necessary components -- like Reporting Services.
2. Install IIS and ASP.net 2 if they're not installed already. I installed SQL 2005 Service Pack 1 after this step. (Make sure you have only a default web site on IIS to avoid my issues.)
3. Use the SQL 2005 Report Configuration Manager. This is when you'll need to decide which security schema you're going to use before you can complete this. The Configuration Manager saves a lot of time because you won't have to touch IIS Manager. (The whole scripting IIS configurations in XML thing is going to make my IIS skills obsolete before long.)
4. Create a simple report. SQL Books Online has a tutorial using the Adventure Works database.
5. Verify that http://yourreportserver/Reports and http://yourreportserver/ReportServer work.

Now move to your Sharepoint box running WSS.

6. Use stsadm.exe to install the web part. You will find the report explorer and report viewer web parts on your SQL box: (Search the Report Services library for Sharepoint for more details.)
C:\Program Files\Microsoft SQL Server\90\Tools\Reporting Services\SharePoint\RSWebParts.cab
7. Open your SharePoint site and add the Report Explorer Web Part from the Virtual Server Gallery.
8. Point the Report Explorer at http://yourreportserver/Reports and leave the start path blank for now.
9. You should be able to see your SQL Reports on your Sharepoint site.

My example runs on two boxes: SQL 2005 and Reporting Services/IIS on one box, (along with the Exchange 12 Beta), and my Sharepoint on another box. Sharepoint doesn't seem to run on the same box as the Exchange 12 Beta.

Friday, June 9, 2006

Your (Firewall) Data are Ugly. Please Fix It.

Data warehousing and data marts would be simple to construct if only the data were in a standard format. Five years from now, businesses will take OLAP for granted. (OLAP is a fancy way of saying we're going to automate the sums and averages of your sales data over time so you don't have to do all that stuff in Excel any more.) Five to ten years from now, businesses will live or die by their data mining algorithms. (I classify DM as a step above standard OLAP.) Before this can happen, the data have to be available in a usable form.

I come from an information security background, thus I spend far too much time poring over computer logs: web server access logs, firewall logs, Windows event logs, not to mention /var/log/*. I have learned lots of stupid log tricks, like using logwatch, grep (my favorite), Snare to send Windows logs to syslog, and now, Microsoft's free Logparser tool. Logparser has poor documentation but will certainly pay you back for time taken to learn to use it. There's even a non-Microsoft site dedicated to logparser.

Note: Syslog does not store data in 3NF rows. If you want to be able to sort by fields with destPort, sourcePort, sourceIP, destIP, without doing text search, you'll be doing a LOT of ETL work.

This week I was thinking about replacing my firewall/router (a Netopia R9100 with the hardware VPN upgrade that I trade off with a Linksys WRT-54GS (v3) when I'm not paranoid about using wireless.) And yes, I'm not supposed to tell you that, but it doesn't really make a difference if we're both using nmap. So I looked at firewall vendors websites to learn what I could about logging capabilities. I'm slightly less concerned about security in my home lab than I am about collecting data on attacks. Firewalls have been around for over ten years now, so you'd think they would have logging down.

Watchguard: several logging options, including syslog and XML, SNMP costs extra.

Juniper/NetScreen: syslog, SNMP, NetIQ (If I feel like paying for that, too.)

Checkpoint: "Eventia Reporter™ is a complete reporting system that delivers in-depth network security activity and event information from Check Point log data." This means I can look at CheckPoint logs, but I can't correlate them to anything else. This Checkpoint vs. Cisco page is also interesting.

SonicWall: "ViewPoint®, Local Log, Syslog, WebTrends" I can pay extra for SonicWall's "Viewpoint" product, but I still can't correlate SonicWall logs to any other logs. One SonicWall includes a "secure" switch in their firewall: I would love to see what happens when I try an arp spoof. (If I wanted a switch, I would buy one.)

Cisco PIX: SNMP, Syslog, and AAA ("Authentication, Authorization, and Accounting Support") It does Cisco logging. It also has a CLI. (Command-Line Interface.) Unless Cisco starts giving me free hardware, I'm not sure why I'd use a PIX. If I blow a command, my network is not secure. A CLI is fine when it's obvious if a command is working or not, as with routing, but with firewalls, it makes me nervous. Then again, you should test every port after entering a rule change on your firewall.

Microsoft ISA Server: "ISA Server 2004 provides detailed security and access logs in standard data formats, such as delimited text files, Microsoft SQL Server databases, or SQL Server 2000 Desktop Engine (MSDE) databases."

I don't even like software firewalls, but Microsoft makes it easy for me. At $1,500 plus $250 for decent software, Watchguard is more expensive than ISA server. Checkpoint and Juniper won't even tell me how much their products cost. Sonicwall, Watchguard, and ISA Server are all priced on CDW.

If firewall data are this disparate, I can't imagine what a pain it must be to build data warehouses with data from other sources. Current firewall products seem to create their own silos and make it difficult to track intruders across a network rather than just at the perimeter.

Tuesday, June 6, 2006

The Sum of All Ports, coming to a SQL server near you.

Using syslog, MS SQL 2005, SQL Server Analysis Services, and MS Excel, I can build a cube with my firewall log violations and then import the cube into Excel and produce pivot tables. While this might seem more complicated than it needs to be, I could produce a daily scorecard of attacks. The only catch is that I need a firewall that logs to SQL server or a syslog to SQL server connector. The syslog => SQL connection would be tough because my router/firewall doesn't do uniform syslog notifications. I know enterprise-level firewalls do much better logging, like the Watchguard X-series which I was fond of just because I could make them do almost anything. The last time I checked, though, they still cost $1,500 for the base model plus $500 for the appropriate software.

With the Watchguard's new XML logging, I could create a SQL Server Integration Services package to import the data regularly. From there, I could get SQL Server Analysis services to process my cube each night. Then I use Microsoft Sharepoint's Scorecard or OLAP web part to display statistics. Best of all, I wouldn't have to mess with doing my own manual extract-transform-load (ETL) of my router log data.

The graph below represents a simple count of attacks by port on my router. Port 0 corresponds to ICMP. (I don't respond to ping requests.) The rest of the ports are closed, except for port 80, which you're using now. I ban a few IPs on port 80 because they won't stop posting junk trackbacks onto my blog. The ports are in alphabetical order rather than numerical order because I must store them in text fields rather than numerical fields in the database. If the port numbers aren't text then SSAS will OLAP them and I'll end up with the sum of all ports, which is nonsense but nevertheless might make a good statistic for MBA-types. While the graphic may not be all that impressive, the scalability is. Using SQL and SSAS, I could track probes and attacks on hundreds of firewalls at a time, track trends over time, and even predict the level of future probes.

Probes by Port

Monday, June 5, 2006

Assessing Attacks; or 18th Century Epistolary Novels vs. Data Structures

Being assigned a data warehousing/data mining project for class sounds like fun, but where am I supposed to get a data set? I can buy a database of all area codes and exchanges with latitude and longitude, but I would still have to simulate a hundred million records to address scalability and query optimization issues. Then I could find out if my estimations of the size of records is within a factor of ten, but the networks I see still wouldn't be "real" and I would have no idea if that's what real social networks looked like. (As an undergrad English Lit major, I was reading 18th Century epistolary novels instead of taking Data Structures like my Computer Science major classmates. The sad part is that Data Strutures would have been more interesting.)

Fortunately, data magically appear on my Linux box every day.

Each morning at four am, logwatch runs on my Fedora Core 4 (Red Hat Linux) box. It tells me how many times nonexistent files on my webserver have been requested, and how many router firewall violation attempts have been logged. It also tells me how many times Apache logged a "method not allowed" 405 code. I have several daily log files that give me useful information on attacks. The problem is that there are so many attacks that if I banned every IP that looked for a web application hole or probed a port I wouldn't have time for anything else.

So it makes sense to look for attack source IP (Internet Protocol) addresses that probe my router AND request holes in web apps. To do this I need three files: my router log from syslog; and two greps of all my Apache logs. (grep -h will suppress file names at the beginning of each line) looking for 404 and 405 errors. This gives me three tables, from which I can do inner joins on source IP in each. Of course, I have do do some tedious data cleanup to get the text log files into Excel and from there Access. (I always underestimate the time it takes to clean up data.) From Access, I'm going to go to SQL 2005, Analysis Services, and build a cube. From there I should be able to "see" the attacks using Pivot Tables in Microsoft Excel.

If I see a source IP in my router log and Apache error logs, then it's probably worth banning. Correlating IP addresses to identify those involved in multiple methods of attack takes me from hundreds of IP addresses down to six.