I have a tibble that looks like this.
# A tibble: 1,000 x 3
id question answer
<chr> <chr> <chr>
1 aaa What is your favorite color? Green
2 aaa What is your favorite band? Green Day
3 aaabb What is your favorite color? Blue
4 aaabb What is your favorite band? Blue
5 ccc What is your favorite color? Blue
6 ccc What is the difference between you and me? Five bank accounts
# ... with more rows
I'd like to expand it into a wide data frame. I used this code.
aTibble %>% distinct() %>% spread(question, answer)
But, I end up with a data frame that is filled with empty rows.
# A tibble: 1,000 x 3
id V1 What is your favorite color? What is your favorite band? What is the difference between you and me?
1 aaa NA NA NA
2 aaa NA NA NA
3 aaabb NA NA NA
4 aaabb NA NA NA
5 ccc NA NA NA
6 ccc NA NA NA
# ... with more rows
In the original tibble, some rows have the ID and then null for question and answer. There are no duplicate questions for a single ID. That said, different IDs can answer different questions, they don't all have the same questions.
Additionally, I didn't make the V1 row and that wasn't in my original tibble. It appeared after the spread().
The frustrating part is that when I do the function on a small dataset, it works just fine. When I do the function on the full dataset (~150K records), I get NAs.