Wednesday, June 14, 2006

Friendster and LinkedIn Meet Vonage and Skype, or Link Analysis and Transitive Closure

The ultimate social and business network would include more than just email contacts as Friendster or LinkedIn do. It would include the people you call using Vonage and/or Skype. Using the the Call Detail Records (as the National Security Agency might) to do what law enforcement calls "Link Analysis," a social or business network could connect you via phone numbers.

The telcos have been doing link analysis for years as part of their fraud detection programs, and what the NSA might be doing is not much different. Link analysis is really transitive closure, but most computer security and law enforcement people don't know relational algebra, so they call it link anlysis.

Transitive closure (aka recursive closure), at its simplest is this: The transitive closure of relation [table] R with attributes [columns] (A1, A2) defined on the same domain is the relation R augmented with all tuples [rows] successively deduced by transitivity; that is if (a,b) and (b,c) are tuples of R, the tuple (a,c) is also added to the result. (From Connolly and Begg's Database Systems, in reference to Timothy Merrett's Relational Information Systems, 1984). Since I was interested in the relational algebra, I bought a "new" copy of Merrett's book from an Amazon reseller for $8. In defining closure, Merrett refers to Aho, Hopcroft and Ullman (1974), and says, "to do so here would involve too much of a mathematical digression." It's 2006, and a book published in 2005 references another book from 1984 (!) that refers to a book from 1974. The relational database model has not changed much since E.F. Codd's work in 1971. What has changed is the scalability of hardware that we use to run our relational database management systems.

One example of transitive closure that project managers might understand is an exercise in the Merrett book: "Find the expression which gives PATHS the duration of all sets of activities in" and lists the data for the PERT chart. A query (or your relational algebra expression) would show all the paths through the network, and should probably show the critical path as well.

What makes the PERT chart example interesting is that it can show more than one path through a network between two nodes. When talking about link analysis using call detail records, many models show single links between nodes. In Investigative Data Mining for Security and Criminal Detection, Jesus Mena lists a couple of COTS off-the-shelf link analysis tools, ATAC and Analysts' Notebook. These systems can take call detail records and produce links and even chart them on graphs. Mena's book lists many tools, including some free applications and others with a free demo. For the documention of the tools alone, the book is worth the price. Mena's book details a lot of the history of AI and datamining in the security community, but it also confuses database terminology (related relations, e.g.) to make it understandable by the law enforcement community. Despite this, Mena implies that law enforcement in the 21st Century is going to need a lot more artificial intelligence and database experts.

Sample query to bring up people in your network:

SELECT callee
FROM table.cdr
WHERE callee = 'my_target_no'
UNION
SELECT callee
FROM table.cdr
WHERE caller = (SELECT callee FROM table.cdr WHERE caller = 'my_target_no');

The trouble with this query, adapted from the manger-employee recursive example that everybody learns in database school, is that it would eventually return everyone with a telephone. Thus the iterations must be controlled, and I need to adapt the query above from a recursive query into an iterative one if I were going to make it work on SQL 2005.

No comments:

Post a Comment