I have a dataset with 4 columns: ID (unique identifier of user), Year, Country and Level in this format:
+----+------+---------+-------+
| ID | Year | Country | Level |
+----+------+---------+-------+
| 1 | 2015 | USA | 1 |
| 1 | 2016 | China | 2 |
| 2 | 2015 | China | 2 |
| 2 | 2016 | Russia | 2 |
| 3 | 2015 | Russia | 1 |
| 3 | 2016 | China | 2 |
| 4 | 2015 | USA | 2 |
| 4 | 2016 | USA | 3 |
| 5 | 2014 | China | 1 |
| 5 | 2016 | USA | 2 |
| 6 | 2015 | USA | 1 |
| 6 | 2016 | USA | 2 |
| 7 | 2015 | Russia | 2 |
| 7 | 2016 | China | 3 |
+----+------+---------+-------+
The user will be able to filter the dataset by country.
I want to create a table using the country filter that shows in a column if a user was the previous year in any of the countries selected aggregated by the level variable, apart from other variables only affected by the current country filter.
For example E.g., if I select China and USA:
+----+------+---------+-------+-----------------+
| ID | Year | Country | Level | In selection PY |
+----+------+---------+-------+-----------------+
| 1 | 2015 | USA | 1 | No |
| 1 | 2016 | China | 2 | Yes |
| 2 | 2015 | China | 2 | No |
| 3 | 2016 | China | 2 | No |
| 4 | 2015 | USA | 2 | No |
| 4 | 2016 | USA | 3 | Yes |
| 5 | 2014 | China | 1 | No |
| 5 | 2016 | USA | 2 | No |
| 6 | 2015 | USA | 1 | No |
| 6 | 2016 | USA | 2 | Yes |
| 7 | 2016 | China | 3 | No |
+----+------+---------+-------+-----------------+
The aggregated result will be:
+-------+-------------------+-----------------+
| Level | Number of records | In selection PY |
+-------+-------------------+-----------------+
| 1 | 3 | 0 |
| 2 | 6 | 2 |
| 3 | 2 | 1 |
+-------+-------------------+-----------------+
Do you know any way to calculate this aggregated table efficiently? (this would be done in a dataset with millions of rows, with a variable set of countries to be selected)