1

I have a dataset with 4 columns: ID (unique identifier of user), Year, Country and Level in this format:

+----+------+---------+-------+
| ID | Year | Country | Level |
+----+------+---------+-------+
|  1 | 2015 | USA     |     1 |
|  1 | 2016 | China   |     2 |
|  2 | 2015 | China   |     2 |
|  2 | 2016 | Russia  |     2 |
|  3 | 2015 | Russia  |     1 |
|  3 | 2016 | China   |     2 |
|  4 | 2015 | USA     |     2 |
|  4 | 2016 | USA     |     3 |
|  5 | 2014 | China   |     1 |
|  5 | 2016 | USA     |     2 |
|  6 | 2015 | USA     |     1 |
|  6 | 2016 | USA     |     2 |
|  7 | 2015 | Russia  |     2 |
|  7 | 2016 | China   |     3 |
+----+------+---------+-------+

The user will be able to filter the dataset by country.

I want to create a table using the country filter that shows in a column if a user was the previous year in any of the countries selected aggregated by the level variable, apart from other variables only affected by the current country filter.

For example E.g., if I select China and USA:

+----+------+---------+-------+-----------------+
| ID | Year | Country | Level | In selection PY |
+----+------+---------+-------+-----------------+
|  1 | 2015 | USA     |     1 | No              |
|  1 | 2016 | China   |     2 | Yes             |
|  2 | 2015 | China   |     2 | No              |
|  3 | 2016 | China   |     2 | No              |
|  4 | 2015 | USA     |     2 | No              |
|  4 | 2016 | USA     |     3 | Yes             |
|  5 | 2014 | China   |     1 | No              |
|  5 | 2016 | USA     |     2 | No              |
|  6 | 2015 | USA     |     1 | No              |
|  6 | 2016 | USA     |     2 | Yes             |
|  7 | 2016 | China   |     3 | No              |
+----+------+---------+-------+-----------------+

The aggregated result will be:

+-------+-------------------+-----------------+
| Level | Number of records | In selection PY |
+-------+-------------------+-----------------+
|     1 |                 3 |               0 |
|     2 |                 6 |               2 |
|     3 |                 2 |               1 |
+-------+-------------------+-----------------+

Do you know any way to calculate this aggregated table efficiently? (this would be done in a dataset with millions of rows, with a variable set of countries to be selected)

AlexD
  • 21
  • 4
  • 1
    That last level of aggregation could be difficult to display. One option would be to pre preocess the data so that [In selection PY] is an actually field in the datasource – tyvich Oct 30 '17 at 21:09
  • The problem is that In selection PY depends on the filter applied (e.g. if the user filters some Countries, In selection PY needs to check if the ID was in the selected countries the previous year). The filter to country must apply to both country and country PY. – AlexD Oct 31 '17 at 08:34
  • You should still be able pre-compute your [In selection PY] field/column outside of Tableau. You want to group by both ID, and country and then check for a previous year. This is possible with something like [pandas](https://pandas.pydata.org/pandas-docs/stable/groupby.html). Then when you bring in the data to Tableau you already have your second table generated, and can still filter by country. – tyvich Nov 01 '17 at 16:53
  • You can't pre-compute the [In selection PY] field/column outside of Tableau, because it depends on the selection. [In selection PY] = 'Yes' if both Country and Country_PY is within selected countries. The country doesn't necessarily be the same (e.g. if you select countries USA and China in the filterm and if someone was in China and the next year is in USA, [In selection PY] = 'Yes'.) – AlexD Nov 02 '17 at 10:06

1 Answers1

1

I found a solution, will post in case it is helpful for someone else:

I change the Country filter to "Add to Context" and created this variable:

In Selection PY: if Year = 2016 then
{fixed [ID]:min(if Year = 2015 then 1 END)}
elseif Year = 2015 then
{fixed [ID]:min(if Year = 2014 then 1 END)}
elseif Year = 2014 then
{fixed [ID]:min(if Year = 2013 then 1 END)}

In this way the variable Selection PY is dynamically calculated according to the country filter.

It is only necessary to know in advance which years are stored in the dataset (or add more years to be safe).

AlexD
  • 21
  • 4