Ibrahim Hafidh: February 2007

Tuesday, February 27, 2007

Get Rid of those Distinct Count Blues

Do you have the case of the distinct count blues? Too many rows in your fact table to do a distinct count? Are your resources overwhelmed by the distinct count process?

No worries...SQL2005 to the rescue! Take advantage of the many-to-many relationship capabilities of SQL2005 to do counts on your dimension table instead of the usual distinct count on your fact table. Read more about the many-to-many distinct count from Marco Russo's white paper. Thanks Marco for your insights.
With this alternative method, you gain maximum performance.
Implementing is easy too:
1. Create a measure group on your dimension (e.g. Customer, Product) with a simple row count and not a distinct count
2. Define the M-M relationships in your Dimension Usage
And that's it!

I am so excited about SQL2005 and how you can now solve such business problems which were almost impossible in SQL2000.

Monday, February 26, 2007

Many-to-Many in SQL 2005

Anyone who has built cubes in SQL 2000 and had to deal with the issue of many-many relationships knows how difficult it was to deal with that.
Enter SQL 2005 and the all problems are solved. Really, it is that much easier to handle many-many scenarios in 2005. The implementation is very straightforward. You have your "bridge tables" in your database structure and all you need to do is set them up in the Dimension Usage in SQL 2005 Cube Designer (Visual Studio). Of course, you have to know your structure and data but the rest is easy.
Read more about it here from a white paper on many-to-many by Marco Russo.

Sunday, February 25, 2007

How to Downgrade a Database from SQL Server 2005 to SQL Server 2000

There really is no wizard or documented procedure from Microsoft regarding this issue. Also, there isn't a simple way of backing up in SQL 2005 and restoring in SQL 2000. Actually, you can't; because there is no backward compatibility between the two versions. You can backup in SQL 2000 and restore in SQL 2005 but not the other way around.

Is this a case of Microsoft forcing you to upgrade or are the two DB versions structurely too different and complex to have backward compatibility? You can decide for yourself.

In the meantime, someone posted the procedure here on a Microsoft forum.

Basically, there are 3 simple steps:
Step 1 - Generate Scripts for the Database Elements and Structures (excluding FK and triggers)
Step 2 - Move the data from 2005 to 2000 using the export wizard in 2005
Step 3 - Generate Scripts for the Database Foreign Keys and Triggers

This works fine if you have a small database on the same network. But if your database is bigger and you have to export across your network, then having a backup file to restore from would be a much easier task. Those steps would be:
1. Backup your database
2. Compress the db file
3. FTP to your target server
4. Restore on target server

Wednesday, February 07, 2007

Curse of the distinct count

Distinct counts are a pain in SQL AS.

DISTINCT COUNT
AS2k - based on the dimension structure.
AS2k5 - based on hierarchy and attribute structure. This allows the distinct count measure to be calculated on much larger volumes of data.
Some issues: distinct counts are typically calculated at run-time – i.e. at the moment the user asks the question.

OPTIMIZE DISTINCT COUNTS
The basic run down of these distinct count optimization techniques are to:
· SQL2005 - Create a separate measure group for each distinct count measure
. SQL2000 - Create a separate cube for each distinct count
· Create custom aggregations
· Create a partitioning strategy for your OLAP cube allowing yourself to “distribute” the data.

In my experience, larger data sets of distinct counts are very slow to process even when creating separate cubes. Unless maybe you create partitions. Have not done much with custom aggregations. That's on my list to do.

More about distinct counts here from Denny Lee.
More on building aggregations manually here from Chris Webb.

Also, look at the Microsoft SQL Server 2000 Analysis Services Performance Guide
for Storage Design Wizard, Aggregation Usage, and Cube Editor.

10 Shopping Tricks That Stores Hate

I like these tips from the consumerist:

Using Credit and Paying it Off on Time
Saying NO to the Extended Warranty
Shopping in the Store But Buying Online
Opening A Store Credit Card To Get A Discount, Then Cutting It Up

Basic Data Mining Concepts

CASE Table
A Case table has Attributes (columns) and States (values).
==> Equivalent to the Dimension Table in Data Warehousing
Case key = the PK of the table

NESTED Table
Nested case - the many part of the relationship.
==> Equivalent to the Fact Table in Data Warehousing
Nested key = Not the FK (i.e. the PK of the table). Important: It is rather the other attributes that make up the nested key.

Reference
Data Mining with SQL Server 2005 by ZhaoHui Tang and Jamie MacLennan

Analysis Services 2005 Performance Guide

Analysis Services 2005 Performance Guide is now available from the MSDN downloads. Good stuff on performance including: distinct counts, partitioning, many-many, etc.

Data Mining 001

Definition:
Data mining is about analyzing data and finding hidden patterns using automatic or semi-automatic means.

Three Parts to Data Mining:
1. Create the model - similar to "create table"
discrete = distinct categories
continuous = numeric columns

2. Train the model - similar to "insert into table"
a. processing the model ==> similar to processing a cube
b. training model ==> truth table

3. Predict the model - similar to "select from table"
a. closing the analysis loop

Reference
Data Mining with SQL Server 2005 by ZhaoHui Tang and Jamie MacLennan

Ibrahim Hafidh