3

enter image description hereI've been tasked to set up a Tableau worksheet of counts of data (ultimately to create percentages) where the contrived incoming data looks like the following.

id  fruit
1   apple
1   orange
1   lemon
2   apple 
2   orange
3   apple 
3   orange
4   lemon
4   orange

The worksheet needs to look something like the following:

Count of ids
2   Lemons
2   No lemons

I've only been using Tableau for about 4 hours, so is this doable? Can anyone point me in the right direction?

The data is coming in from a SQL Server database in a format that I can control if that helps contribute towards a solution.

Doug Smith
  • 143
  • 1
  • 13
  • you need the data as 2 sets one for lemons and another for rest, Is that correct? – Siva Aug 28 '18 at 17:10
  • @Siva - Yes, provided that items aren't double counted. If an id is counted as a lemon then it should not be as a not lemon. (Other combinations may be required as well) – Doug Smith Aug 28 '18 at 17:29
  • Ok, so you need to count of all fruits and the same time no double count with same ID, correct? – Siva Aug 28 '18 at 17:35
  • No double counts of same id: so 50% of our total population has a fruit of lemon and 50% of the Ids have No Lemon – Doug Smith Aug 28 '18 at 17:46

2 Answers2

2

This is a good use for a set.

In the data pane on the left sidebar, right click on the Id field and create a set named "Ids that contain at least one lemon" (or use a shorter less precise name)

In the set definition dialog panel, define the set by choosing "Use all" from the General tab, and then on the Condition tab, define the condition by the formula max([Fruit]="lemon")

There are many ways to think of a set, but the most abstract is just as a mathematical set of Ids that satisfy the condition. Remember each Id has many data rows, so the condition is a function of many data rows and uses the aggregation function MAX(). For booleans, True is treated as greater than False, so MAX() will return True if at least one of the data rows satifies the condition. By contrast, MIN() is True only if ALL (non-null) data rows satisfy the condition.

Once you have a set that separates your ids into Lemon scented Ids and others, then you can use that set in many ways - in calculated fields, in filters, in combination with other sets to make new sets, and of course on shelves to make visualizations.

To get a result like your question seeks, you could put your new set on the Row shelf, and put CNTD(ID) on the text shelf or columns shelf. Make sure you understand why you need count distinct (CNTD) instead of SUM([Number of Records]) here.

BTW, the LOD calculation { fixed [Id] : max([Fruit]="lemon") } is effectively the same solution.

Alex Blakemore
  • 11,301
  • 2
  • 26
  • 49
  • First off, your explanation is greatly appreciated and explains quite a bit that I didn't understand before. However, the results I am getting give me a result count of 1 lemon and 1 non lemon vs the 2X2 that I was expecting. I must be missing something somehow. – Doug Smith Aug 29 '18 at 17:18
  • Are you sure you CNTD(ID)? Can you post an image of your worksheet? – Alex Blakemore Aug 29 '18 at 17:26
  • Attempted to add screen shot to post. Interestingly, I now get a 2X4 answer vs the 1X1, which is a step in the right direction I think – Doug Smith Aug 29 '18 at 19:29
  • Great answer. I was only thinking about LOD when read this question. – Jose Cherian Aug 29 '18 at 20:52
  • Doug. Its not possible to debug your example without also seeing your data and the definition of your set. Can you post. Your set should be a set of Encounters, not a set of Lemons. When you edit the set, and look at the summary section of the General tab, what Field is listed in the Summary? – Alex Blakemore Aug 30 '18 at 04:02
  • @AlexBlakemore sir, can you please help to explain instead of set and LoD why this statement `IF MAX([Fruit])="lemon" THEN "lemon" ELSE "No lemon" END` not working in normal calculated field.. – Siva Aug 30 '18 at 11:09
  • @AlexBlakemore - my apologies. I thought the formula should be applied to the fruit not the ID. That does make a difference!. I am however, accepting the solution by Jose because of its extendibility. Your explanations though are greatly appreciated. Thanks again. – Doug Smith Aug 30 '18 at 14:05
2

Alex's solution based on sets are very good for this scenario, but I would like to show that LODs can be more flexible if you need to extend your solution to include more categories. for the current scenario, create a calculation with below formula and create text table using COUNTD(Id)

{FIXED [Id]:IF MAX([Fruit]='lemon') THEN 'Lemon'  ELSE 'No Lemon' END}

enter image description here

Now for the extension part, you are considering below list where you want to count IDs with Lemon, Apple and others. Since no double counting of Ids are allowed, categorization will follow the order. (This kind of precedence will be a headache without LODs)

enter image description here

Now you can change your calculation as below:

{FIXED [Id]:IF MAX([Fruit]='lemon') THEN 'Lemon' 
            ELSEIF MAX([Fruit]='apple') THEN 'Apple' 
            ELSE 'No Lemon or Apple' END}

Now your visualization automatically changes to include the new category. This can be extended for any number of fruits.

enter image description here

Jose Cherian
  • 7,207
  • 3
  • 36
  • 39
  • Good point. Yes, LOD calcs are more general than sets. LOD calcs can have any value, whereas sets are effectively boolean valued functions - true or false, in or out. Just be clear you understand how your conditional expression works above. If there is at least one Lemon, then the value will be Lemon (even if there are also apples). I know that is just a simple example, but it would be less likely to be confusing if there was one more category for ids that had both an apple and a lemon. – Alex Blakemore Aug 30 '18 at 03:57
  • I'm accepting this answer by Jose, because of its extendibility. Thank you Jose, Alex and Siva for your time and help. – Doug Smith Aug 30 '18 at 14:07
  • @Jose Cherian.. can you please help to explain instead of set and LoD why this statement `IF MAX([Fruit])="lemon" THEN "lemon" ELSE "No lemon" END` not working in normal calculated field.. – Siva Sep 01 '18 at 18:36