I have originally tried to extract genres from the Kaggle IMDB data set:
https://www.kaggle.com/param1/d/deepmatrix/imdb-5000-movie-dataset/the-money-makers
The raw data for genres comes in a format like Action_Adventure_Comedy etc. From this I used str_split to map the genres to separate columns. The data comes out as such:
V1 V2 V3
Action Adventure Comedy
Adventure Comedy Horror
Action Adventure Horror
What I want to create is a 'Dummy Variable' for each genre on a separate column. This should scan V1 through V4 to see if it contains the value for the genre, and return either a 1 if it does or a zero if it doesn't. The output I'm wanting is as follows:
Action Adventure Comedy Horror
1 1 1 0
0 1 1 1
1 1 0 1
Please note that because I'm only wanting to look at a single genre, and not multiple (e.g. Action and not Action_Adventure), I am unable to use model.matrix. Any help would be greatly appreciated.
Stu