I have a dataframe with a column that has "Yes", "No" and "Maybe" values. Here is a sample of how the dataframe looks like for context (not actual data I'm working with as that's more sensitive):
State | City | Do you like the color Blue? | Yes | Maybe | No |
---|---|---|---|---|---|
Arizona | Phoenix | Yes | 1 | 0 | 0 |
Arizona | Phoenix | Yes | 1 | 0 | 0 |
Arizona | Phoenix | Maybe | 0 | 1 | 0 |
Arizona | Phoenix | No | 0 | 0 | 1 |
Arizona | Scottsdale | No | 0 | 0 | 1 |
Arizona | Scottsdale | Yes | 1 | 0 | 0 |
Arizona | Scottsdale | Maybe | 0 | 1 | 0 |
California | San Francisco | Yes | 1 | 0 | 0 |
California | San Francisco | No | 0 | 0 | 1 |
California | San Francisco | Maybe | 0 | 1 | 0 |
California | Los Angeles | Yes | 1 | 0 | 0 |
California | Los Angeles | Yes | 1 | 0 | 0 |
California | Los Angeles | No | 0 | 0 | 1 |
This is a two part question:
I would like to convert the "Yes" and "Maybe" in the "Do you like the color Blue?" column to equal 1 and the "No" to equal 0 (so categorical to numeric) and add it as a separate column.
I want to also make between states statistical comparisons as well (e.g. proportion of those who said "No" in California versus in Arizona). I was thinking of subsetting the data set by state and then making the comparisons, but the data set I'm working with has about 15 states. Is there a faster/more efficient way to do so?