Thursday, July 20, 2006

Google vs. the National Security Agency

Apparently, watching Google is now as much sport as watching the NSA, according to the latest in Baseline Magazine. Discovering the way Google solves data-related problems may be more interesting because Google, unlike the NSA, is not encumbered by government contracting procedures and regulations.

Think about it: you can search the web with Google and find files faster than you can when you're looking for files on your own computer using Windows Search. To learn why it takes longer to search a hard disk on your own *#$% computer than the web, read the Baseline story.

The National Security Agency and Google are in the same business, essentially: take a firehose spitting out information and sort it into something useful. Both the NSA and Google keep their collective mouths shut about sources and methods. The NSA has been slightly better about keeping purchases of high bandwidth out of the news, but only because they have an organization advantage of operating outside the traditional business community (assuming Watkins-Johnson is not a normal business).

The Baseline story estimates the number of Google servers at somewhere around 450,000, but you should think of them as a much smaller number of MPP supercomputers. Google initially had trouble because most data centers couldn't deliver enough watts per square foot to power dense server blade environments, so they turned to AMD processors. That's a process of scaling up computing power, and I wonder how the NSA solved the same problem, although I assume they just pumped in more watts for processors and cooling. Those of you familiar with Microsoft's current file system, NTFS, may know that you can set disk cluster size from 4 kbytes to 64 kbytes. Google's file system has a cluster size of 64 Mbytes. Their files are large, and a large cluster size leads to more efficiency. Google has re-engineered kernel, filesystems, and who knows what else for scalability. Did they re-engineer from the ground up more efficiently than the NSA?

Another facet of the Baseline Google story is the office-in-a-box. As a former IT contractor for political campaigns, I had to figure out the cheapest fastest way to set up a computing infrastructure for a field office in, say, Des Moines, Manchester, or Columbia. (Columbia is the capital of South Carolina, for those of you out of the primary calendar loop.) My setup was fairly simple: router, firewall, server (Domain Controller also running DHCP, DNS, and a Global Catalog, of course), printer/copier.

Google has office IT-in-a-box that would put mine to shame. Google also has shipping containers converted into server infrastructures that they can ship anywhere. Baseline implies that the military's IT- infrastructures-in-a-shipping-container exist in Powerpoint only.

As far as the Google vs. NSA operating efficiency battle goes, at least there's more than one career option for deep geeks. I would have a hard time deciding between the two because they both offer serious computing power. As far as ethical considerations go, both have pluses and minuses. The NSA doesn't make money selling advertising; as hard as Google tries, running a business requires some level of compromise to make money. The power of both organizations could be abused. The way things are going in the Intelligence Community, Google will be more secret than the NSA in five years.

What do you think?

Monday, July 17, 2006

Java vs. Python/Plone vs. PHP/XOOPS vs. J2EE vs. .Net

eWeek had one of the coolest lab tests I've seen in a while last week: they tested portal applications for speed on different architectures. This is exactly the kind of testing I'd do in my little lab if I had the resources.

eWeek tested JBoss on Windows, Plone on Windows, XOOPS on Windows, Plone on Linux (SUSE), XOOPS on Linux (SUSE), JBoss on Linux (CentOS), Liferay on CentOS, and Sharepoint Portal Server on Windows. Their results were mixed: .Net had the highest throughput per second in KB, Windows JBoss the highest number of transactions per second, J2EE/Liferay/Linux the highest hits per scond, and JBoss on Windows the lowest page load time. .Net and Windows JBoss were among the fastest portal applications, but the various LAMP flavors did OK, especially if you did something to speed them up. (Like using Zend Technologies' PHP accelerator.)

The main point of all this testing did not indicate that one portal architecture necessarily better than the rest. If your organization has expertise in a specific portal area, then that architecture will work for you. The key is that whatever architecture you choose is what matches your business.

Information Technology is changing, as always. What matters most is not the technology -- it's your understanding of your business and how to make that work faster and cheaper. IT and IS are just tools for your business.

Sunday, July 9, 2006

Sharepoint Version 3 Beta Install

Microsoft's future for the Office suite emphasizes online collaboration. Microsoft's purchase of Groove and upcoming Office Groove and Groove Server emphasize collaboration, online and off, as key to their future success. (Sign up for the free Office 2007 Beta already -- Microsoft is pushing this pretty hard.)

I have been using Sharepoint since verson one to help solve simple problems with clients and their online office space, including collaboration. I have used Sharepoint to do everything from host photo albums and documents to doing really obvious things like "click here to connect to printer x in room y." I'm trying to see what Groove has to offer, but first I am playing around with Sharepoint Version 3.

Sharepoint Version 3 Beta has three requirements that it checks before the install: .Net Framework 2.0, since it's a .Net 2.0 application, ASP.net 2.0, and Windows Workflow Foundation Beta 2 v.3.0.3807.7 or above. Windows Workflow is the new process that will be available in Office 2007.

Once you have the installer running, it tells you that it can do an in-place upgrade of your Sharepoint v.2 site if you database is under 30 GB. However, it will need to restart IIS, Sharepoint, and the Sharepoint timer during the upgrade process. This is where I ran into one of two problems. I didn't look at all four tabs of the installer, so I neglected to check that my Sharepoint was front-end only. The database for mine is hosted on SQL 2005. Thus, the post-install script choked at step 5 of 10, stalling on registering SP services. The preupgrade install script posts details here:
\PreupgradeReport_632880453377812500_Log.txt and here PreupgradeReport_632880453377812500_Summary.xml, proving that everything has an XML file in the future.

The other problem was a result of my original Sharepoint v.2 install. Sharepoint v.3 will not let you connect to a configuration database with an SQL account. You must use a domain account. I didn't want to use NT Authority\Network Service, so I tried to change the SQL 2005 permissions for the configuration database, to no avail. I ended up doing a SQL Profiler Trace of the connect step to see what I could change in SQL to make Sharepoint use a different account. Thus result I got is a little opaque: a lot of exec sp_resetconnection and exec dbo.proc_getObject @Id='68430B8A-6365-44B4-99E2-CC842773FCDA', which results in:
68430B8A-6365-44B4-99E2-CC842773FCDA 8446FC57-4D84-4D79-8EA9-4B1C9C02C40C 9920F486-2FF4-4D10-9532-E01979826585 Central Administration, and more, which didn't help much. Sharepoint Version 3 help wasn't an aid here either.

Since I hadn't done much with the old Sharepoint except install the SQL Report Server web parts, I created a new site. The Sharepoint installer had already nuked my Default Web Site, so I didn't feel like I had much to lose. If I had to reconnect to the old database, I would have had to reset the password on the NT Authority\Network Service account. Once you get a configuration database, you can use SQL accounts for the individual web site connections.

Installing the web parts again was no problem. I used the same web parts I did for Sharepoint v.2 from the SQL 2005 install. Our old friend, stsadm.exe hasn't changed:
C:\Program Files\Common Files\Microsoft Shared\web server extensions\12\BIN\STSADM.EXE -o addwppack -filename "C:\Program Files\Microsoft SQL Server\90\Tools\Reporting Services\SharePoint\RSWebParts.cab", assuming both are on the same box.

There's a lot more to Sharepoint 3 Admin tool: you can select specific users for inbound and outbound mail, for instance. You can also backup and restore content databases, as well as configure antivirus settings. There's even a built-in best practices analyzer tool.

Finally, there's a recycle bin. Now Sharepoint administrators won't have to figure out ways of protecting content from users. Next installment: how granular are the permissions?