Each opportunityID have several products I want to have a binary column that says if an opportunity has this product or not. How to do that?
Input
+---+---------------+--------+----------+----------+
| | Opportunityid | Level | Product1 | Product2 |
+---+---------------+--------+----------+----------+
| 1 | 10 | Low | SS | ISP |
| 2 | 20 | High | ISP | Azure |
| 3 | 30 | Normal | Azure | ISP |
| 4 | 40 | | SS | |
| 5 | 50 | | ISP | |
+---+---------------+--------+----------+----------+
Expected Output(Checks over product 1 and product2)
+---+---------------+--------+----------+----------+--------+---------+-----------+
| | Opportunityid | Level | Product1 | Product2 | HasSS? | HasISP? | HasAzure? |
+---+---------------+--------+----------+----------+--------+---------+-----------+
| 1 | 10 | Low | SS | ISP | 1 | 1 | 0 |
| 2 | 20 | High | ISP | Azure | 0 | 1 | 1 |
| 3 | 30 | Normal | Azure | ISP | 0 | 1 | 1 |
| 4 | 40 | | SS | | 1 | | 0 |
| 5 | 50 | | ISP | | 0 | 1 | 0 |
+---+---------------+--------+----------+----------+--------+---------+-----------+
Code
library(caret)
Products <- data.frame(
Opportunityid=c(10, 20, 30, 40, 50),
Level=c('Low', 'High', 'Normal', '', ''),
Product1=c('SS', 'ISP', 'Azure', 'SS', 'ISP'),
Product2=c('ISP', 'Azure', 'ISP', '',''))
# dummify the data
dmy <- dummyVars(" ~ .", data = Products)
trsf <- data.frame(predict(dmy, newdata = Products))
trsf
PS: I have more than 100 products, so I want the process to be automated