Wednesday, May 17, 2006

More on How the NSA Tracking System Might Work

The Washington Post's David Ignatius postulated about how the National Security Agency's system might work. In doing so, he provided an excellent example of data mining. What the NSA is trying to do is simple and complex at the same time. The data structure is simple, but the sheer volume makes it complex.

The problem may seem hopelessly complex, but if you use common sense, you can see how the NSA has tried to solve it. Suppose you lost your own cellphone and bought a new one, and people really needed to find out that new number. If they could search all calling records, they would soon find a number with the same pattern of traffic as your old one -- calls to your spouse, your kids, your office, your golf buddies. They wouldn't have to listen to the calls themselves to know it was your phone. Simple pattern analysis would be adequate -- so long as they had access to all the records.


The trouble is, simple pattern analysis isn't that simple when you start trying to code it. You would have a giant data cube, and you would have millions of slices to compare with each other. On the other hand, if you have one target number and have a query that pulls all its callees, you could craft another query that searches for those same numbers. You could then score new numbers based on old queries: each query would have a rating of between 0 and 1 with 1 being just like the original number.

If you have voice matching that can confirm a 1, you could design an artifcial neural network that learns as it targets new numbers. Voice matching would require eavesdropping -- but if you got a score of 1, it would be worth the trouble. This way, your neural network could learn what the score is between 0 and 1 that should trigger voice matching.

No comments:

Post a Comment