I would like to run a regression using a training data frame that I have put into tidy text format. The original data file includes participants with noted developmental disabilities and participants who may or may not have a developmental disability. I created a data frame from a larger tidy text data frame that picked up on key words in my text files and noted how many times the word occurred in the text document. Those with a noted disability have "D" in front of their first name. It looked like this:
Name of Text File Word n
DAdam autism 3
DAdam adhd 2
DJane autism 1
Mark adhd 4
Joey add 3
I then added binary variables to denote if the word occurred with 1 for yes and 0 for no
df$autism <- 1
df$autism <- if_else(one_dev$word == "autism", 1, 0)
So now the data frame looks like this:
Name of Text File Word n autism adhd add
DAdam autism 3 1 0 0
DAdam adhd 2 0 1 0
DJane autism 1 1 0 0
Mark adhd 4 0 1 0
Joey add 3 0 0 1
I would like it to look like this:
Name of Text File autism adhd add
DAdam 1 1 0
DJane 1 0 0
Mark 0 1 0
Joey 0 0 1
And then I would like to be able to run a regression to try and predict if a particular participant is likely to have developmental disability.
Thank you!