Friday, August 4, 2006

Services for Unix in Six Easy Steps

After tiring of redoing samba.conf files over and over again, I finally tried out Microsoft's (free) Services for Unix for simple file sharing between my Fedora Core 4 box and servers on my domain.

1. Copy over /etc/passwd and /etc/group to a secure folder on the windows machine where you are going to install SfU.
2. Download and install Microsoft's Services for Unix, and tell it to use password files during the installation process. (This is not a lesson on setting up a NIS domain -- just connecting one Linux box to your Windows server quickly and reliably. SfU installs several other items by default, like Unix Perl and grep; ActiveState Perl is optional. You want NFS Server.) Reboot.
3. In Windows, right click on the folder you'd like to share, click and NFS sharing tab, and select "share this folder."
4. On your linux box, as root, add a line like this to /etc/fstab:
myserver.com:/somefolder /somefolder nfs defaults 0 0
5. On linux, mount /pub
6. cd /pub and ls -la to your heart's content.

There are many security implications for Windows and Linux, like sharing your password and group files, and I'll sniff and trying cracking the passwords later.

Thursday, August 3, 2006

Business Intelligence Studio: A Wizard for your Data Mart

My grad school project group finished our data mart for class. I learned a few things about SQL Server 2005 Analysis Services, Reporting Services, and Business Intelligence Studio along the way. One of our group members is a SAS programmer, so he provided us with simulated data: attendance records for a theoretical amusement park that included zip code and promotion type, with zip-code ACORN differences. Our idea was that if you came to the park with a coupon, we would know from the bar code where you came from; if you didn't have a coupon, the gate would ask you your zip code, just like at the retail store.

We had six million rows in our fact table, which included dateID, promotionID, zipID, and attendance, giving us three dimensions. We were going more for scalability rather than trying to pile in a lot of dimensions. I took the flat files from our SAS master and imported them into a SQL 2005 database. For some reason, SQL 2005's import tool defaults to a nvarchar(50) type. While six million records isn't much, the records were fixed-size and much smaller, so I was able to stuff them in an nchar type that was small and efficient because it doesn't require an offset column array. Our six million fact table records matched to 5 types of promotion, about 10,000 zip codes, and individual days for 3 years.

Once we created the database, I exited SQL Server Management Studio and opened up a new Business Intelligence Studio project. One difference between Management Studio and BI Studio is that BI Studio does not use SQL authentication. You must use Windows/Domain accounts. There is no sa in Business Intelligence Studio.

Once in BI Studio,you create a data source, just like with any project that involves a database. Then you create a data source view by selecting the tables you want to use and creating a dimensional model diagram. Microsoft calls this their Unified Dimensional Model. While your fact table doesn't need a primary key, your dimension tables do. The arrows should be pointing from your fact table to the dimension tables. I don't know why, but I often get this wrong when I'm creating a new data source view.

Finally, your fact table columns need to reference the right dimensions. Matching these up took a little longer than expected because I had to guess which fields were what, because our SAS genius didn't document which numbers were what. I looked at the 8-digit numbers and guessed wrong. The auto-build cube wizard worked fine, and the dimensions looked OK. Processing the cube failed when it timed out, and the error messages didn't say anything about orphaned fact rows.

Once the columns were matched up correctly, everything worked fine, and the auto-build cube wizard is pretty impressive. If you did everything right, building the cube should give you dimensions. You can create new dimensions if the wizard missed some. The time dimension has many options. At first I set time to be a regular dimension to avoid promblems: is day of week a number or a name? Monday or 1? While this will give you good reports, the order of weekdays and months will be alphabetical. (Which makes be think we should rename all month and day names so they can be alphabetical and still be in order, or re-sort them so that April is the first month and Friday the first day of the week.) Setting the time dimension as a time dimension rather than a "standard" dimension will fix this.

Processing the cube took under five minutes. (You also need apppropriate account permissions to process the cube) If the cube processes sucessfully, you can browse pivots immediately with the browser in BI Studio. Once you see that your dimensions work, you can start connecting to SQL Server Analysis services from Excel or from SS Report Server. (Again, with appropriate Windows Domain accounts.)

Microsoft's BI Studio Cube and Dimension wizards work the way they should: if you have defined your dimensional model accurately and your data are valid, they'll create the right dimensions for you. The wizards are no substitute for a valid dimensional model and they can't fix bad data or orphaned records.

Next: Creating Pivot Table Reports using our cube and SQL Server Reporting Services for IIS.