1

I have a simple table with 2 columns: ID (integer) and Category (string), and each ID can repeat with a few categories, like so:

ID    Cat
---   ---
1     A
1     B
2     B
3     A
3     B
3     C

I want to reshape this table so that each unique Category would be a dummy variable (0/1 if ID has it):

ID    A   B   C
---   --  --  --
1     1   1   0
2     0   1   0
3     1   1   1

Now, if the set of unique categories is known (and small) this is an easy CASE WHEN statement x no. of unique categories.

My questions are:

a) What if it isn't unknown or really large? How do I create this 'CASE WHEN' effect automatically?

b) More importantly: I'm not necessarily interested in all categories (say only dummies for 'A' and 'B') but only categories which I have in a separate table called Cats, which is a simple 1 column holding my relevant categories (again, unknown how many), like:

 Cat
 ---
 A
 B

How do I create dummy variables for only the categories in this dynamic table?

Do you think all of this should really be done in other tools e.g. R?

Thanks!

(I'm using Teradata SQL with SQLA, but I think it's a general SQL question)

Giora Simchoni
  • 3,487
  • 3
  • 34
  • 72

1 Answers1

2

Just use table:

table(dat)
   Cat
ID  A B C
  1 1 1 0
  2 0 1 0
  3 1 1 1

and in case you want to have the binary table for a group of Cat:

table(subset(dat,Cat %in% c('A','B')))
   Cat
ID  A B
  1 1 1
  2 0 1
  3 1 1
agstudy
  • 119,832
  • 17
  • 199
  • 261