0

I am trying to create a calculated column that flags/counts the changes in values across rows in another column, in Spotfire. Below is an example of the data types I'm looking at and the desired results.

My hope is that for each Location, and ordered along Time, I can identify when the values of "colors" changes and have running count so that each cluster of similar values between changes is given the same label (Cluster Desire 1) for each Location. It would be best if the running count of clusters can restart at each location but this is not crucial. Any help would be more than appreciated!

Example data and desired outcome

NightLearner
  • 295
  • 1
  • 7
  • 21

1 Answers1

1

I thought of a way to do it, relying on one intermediate column (I used two just to make it a bit clearer).

First: the concatenation of values for each row within its Location: called [concatString]

Concatenate(Concatenate([Color]) over (Intersect([Location],AllPrevious([Time]))),', ')

Spotfire defaults to comma followed by space as a separator: I could not find a way of changing that in this kind of expression.

Then within each [concatString] I remove repeated values. The complication is that the last one did not have the comma+space, and I did not manage to make the regular expression I am using understand that. So my workaround was to add a final comma+space to [concatString]. Hence the extra Concatenate(..).

The formula for the column without repetitions, [consolidatString] is:

RXReplace([concatString],"(\\w+\,\\s)\\1+","$1","g")

Then what we have achieved is an individual value for each line we want to group. We can then simply rank [consolidatString] to achieve the desired column:

DenseRank([consolidatString],[Location])

Gaia Paolini
  • 1,044
  • 1
  • 5
  • 4
  • 1
    Spotfire wants two back slashes in front of special characters – Gaia Paolini Oct 07 '20 at 16:46
  • this is a really nice solution! I did have to add an extra \ after the +\ in the consolidatString formula (not sure if that is an issue you ran into as well). Out of interest, do you think there will be a limit to the characters that can be in the concatString rows? I'm going to test out on thousands of rows of data. Let's see! – NightLearner Oct 07 '20 at 19:07
  • Out of interest, would the solution be much different if I was interested in doing this same approach for intergers (0,1,2) rather than Strings like Red and Blue? Thanks! – NightLearner Oct 07 '20 at 19:12
  • Should be ok with integers, maybe a good idea to first encode your Red Blue etc into integers to save space (I don't know what the limit is, the main factor would be how many rows you have in each Location). You can simply encode with DenseRank([Color]) – Gaia Paolini Oct 08 '20 at 11:26