-1

My table is having duplicate rows listed based on a duplicates ID column. The duplicate rows may have one or more Characteristic columns having unique values. I am trying to get a count of which Characteristic columns in duplicate rows have unique values.

Before:

+-----+----------+-------------+-----------+------------+
| ID  | charType | charFlavour | charColor | charWeight |
+-----+----------+-------------+-----------+------------+
| 123 | gel      | mint        | blue      | 10gms      |
| 123 | liquid   | mint        | blue      | 10gms      |
| 123 | solid    | mint        | blue      | 10gms      |
| 456 | wood     | orange      | red       | 20gms      |
| 456 | wood     | vanilla     | red       | 20gms      |
| 456 | wood     | raspberry   | red       | 20gms      |
| 456 | wood     | strawberry  | red       | 20gms      |
| 789 | metal    | mango       | yellow    | 25gms      |
| 789 | metal    | mango       | yellow    | 30gms      |
| 789 | metal    | mango       | yellow    | 22gms      |
| 333 | silica   | NA          | magenta   | 11gms      |
| 333 | plastic  | NA          | white     | 11gms      |
| 333 | rubber   | NA          | teal      | 11gms      |
+-----+----------+-------------+-----------+------------+

After:

+-------------+-----+-----+-----+-----+-------+
|     ID      | 123 | 456 | 789 | 333 | Total |
+-------------+-----+-----+-----+-----+-------+
| charType    |   1 |   0 |   0 |   1 |     2 |
| charFlavour |   0 |   1 |   0 |   0 |     1 |
| charColor   |   0 |   0 |   0 |   1 |     1 |
| charWeight  |   0 |   0 |   1 |   0 |     1 |
+-------------+-----+-----+-----+-----+-------+

Is this format possible using a Pivot-table or Google Query?

tehhowch
  • 9,645
  • 4
  • 24
  • 42
sifar
  • 1,086
  • 1
  • 17
  • 43
  • If you have columns in the source as columns in the report, you can probably use `QUERY`. Or, you could `TRANSPOSE` the `QUERY` result to get the source columns as report rows. For your query, it seems like you want to report `"counta(unique(column))>1"-1` – tehhowch Jul 22 '18 at 19:08
  • @tehhowch yes, i tried transposing the source columns, but not getting counta function working in query....not sure if unique works also! – sifar Jul 22 '18 at 19:24
  • those are the worksheet functions that describe what you are trying to do. I doubt they are valid in the query language. – tehhowch Jul 22 '18 at 20:04
  • Yes, i am struggling with the same. – sifar Jul 22 '18 at 20:07

1 Answers1

1

Perhaps this isn't the most elegant solution you were looking for — it appears that no function in Google's query language returns all the unique values of a column. But this solution should successfully count, for each attribute, how many IDs correspond to more than one value of that attribute. For example, it will count how many IDs correspond to multiple charFlavours. Here are the two steps/queries to make:

  1. =QUERY(A1:E, "select A, max(B), min(B), max(C), min(C), max(D), min(D), max(E), min(E) group by A", 1): This will select the alphabetically/numerically maximum and minimum value for each attribute with respect to each ID. It will return one row per ID, containing the min and max attribute values.

  2. For each attribute, use something like =QUERY(G1:O, "select count(G) where H != I", 1). If you have four attributes, you will need four of these calls; just change where H != I to be the two columns corresponding to each attribute. Each of these QUERY calls will generate a table with just one value, the number of IDs having multiple values for a certain attribute.

MattS
  • 138
  • 9
  • Thanks @MattS. However, for example sake i listed only 4 attribute columns, but there are more than 20. So doing a max & min for each column would be lengthy. Instead of putting a query for every call, surely an ArrayFormula wrapped around it once should do, right? – sifar Jul 23 '18 at 09:07
  • I am not sure i understood the max and min on textual values. i am getting this [TABLE](https://imgur.com/u0SgU8G) and a single count. I am trying to get an accumulated count of characteristic columns for each duplicate set of rows, where the characteristic values may be unique. See my screenshots: [Before](https://imgur.com/GSjbwKc), [After](https://imgur.com/hy8QFfN). – sifar Jul 23 '18 at 09:24
  • Not entirely sure now whether this method solves your problem. The max and min values tell you whether, for that ID, there are multiple unique values for a certain attribute. In the screenshot you provided, for instance, the fact that both the max and min `charColor` attributes for 123 are `blue` shows that all the colors for 123 are blue. We know that there is only one color for 123. But if they were different, we wouldn't know how many different unique attributes values there were — two, five, or even fifty. – MattS Jul 23 '18 at 19:46
  • Hi @MattS, here is an example [sheet](https://docs.google.com/spreadsheets/d/1LW6GFOsHU9J5naI0QXTHWycUjVR-dqxkwAxY3RQonxk/edit?usp=sharing). It has an already finished example shown in 2 steps, but using different formulae and not as a single formula. Can you share how you have created the resultant table? – sifar Jul 23 '18 at 20:28
  • I added a third sheet in the document demonstrating what I meant in my answer. Not sure if it solves your problem but if not, hopefully it will at least provide some inspiration. If all else fails, you could also try just making a script using Google Apps Script to manually solve the counting problem. – MattS Jul 23 '18 at 21:55