1

I am having some trouble with SSAS and data mining - specifically the Microsoft Clustering package.

I intend to ultimately do my work in AMO and MDX, but for now, just happy to understand how it works in the BIDS via Visual Studio. One step at a time!

The whole problem is around clustering both "vertically" and "horizontally" (separately) from a table that is organized vertically. My main source data table in my OLTP database looks like =>

ID_NUM {numbers 1 - 20,000}

TECK_ID {numbers 1-500, {for each ID_NUM}} (though just grabbed a few of these for playing around with the data in the screencaps)

TECK_VALUE {a double, the 'fact' bit}

So- 10 million rows, of two ints and a double.

Which looks like this- https://i.stack.imgur.com/AUUPA.jpg

So I create a new Analysis Services project in Visual Studio, set up a Data Source, and bring in the above table, as well as two "dimension tables" (identity of what id_num is, names of what each teck_id is) into a Data Source View and link it up, matching up the appropriate keys.

Which looks like this- https://i.stack.imgur.com/hHkfy.jpg

Next I want to manipulate how my data is represented, so I go to set up a cube from this Data Source View. I create dimensions based on my two "dimension" tables (the above "id_num" primary key one, and the "teck_id" primary key one), and create a single measure (as sum) of the teck_value column from my main table. This all seems to compile successfully.

Which looks like this- https://i.stack.imgur.com/90t0Q.jpg

The reason I think everything has worked well is I can arrange my data how I want by browsing the cube. I am able to define my "rows" as both the id_num, or as the "teck_id", with the other one filling up the columns. The measure "Teck_value" always makes up the dataset of the table. This is exactly how I want it, the flexibility to arrange my data both ways.

Which looks like this- https://i.stack.imgur.com/uWL5a.jpg And this- https://i.stack.imgur.com/6tH9T.jpg

Beautiful! Now I wish to do some mining on this basis! I wish to, quite simply, using Microsoft Clustering to (separately) -

  • Assign each TECK_ID a cluster number based on how it varies on each ID_NUM
  • Assign each ID_NUM a cluster based on how it varies on each TECK_ID

Seemingly a simple requirement - just changing what is represented as "rows" and what as "columns" - which I already appear to be able to do through the cube browser. This seems to be one of the main points of OLAP rather than OLTP from my uneducated perspective!

Yet when I try to set this up I fail utterly!

The Clustering Wizard leaves me confounded and I come up with nonsense results. I am given the option of selecting a key (for which I can choose either of the above), but no option to parse by the other dimension. Indeed, the only thing I can choose to mine on is TECK_VALUE, which isn't any good as that doesn't separate out the different fields!

My wizard looks like this- https://i.stack.imgur.com/iuRiS.jpg

So, I am left in a pickle. I really don't want to go back and line up my OLTP databases horizontally because 1) this would mean having 20k columns when I try to categorize my TECK_IDs. and 2) I was hoping SSAS and OLAP can give me the flexibility I need to mine the fields that I want - isn't that part of the reason you set up a cube "chop up the data how you like" ?

Bonus points for helping me with the AMO / MDX side as well! :)

willy_pond
  • 47
  • 1
  • 5

0 Answers0