0

Here, I am manipulating election data, and the current data is in the following format. Both a visual and coded example are included (while visual is a bit condensed). Moreover, values have been edited from their originals.

# Representative Example
library(tidyverse)
test.df <- tibble(yr=rep(1956),mn=rep(11),
             sub=rep("Alabama"),
             unit_type=rep("County"),
             unit_name=c("Autauga","Baldwin","Barbour"),
             TotalVotes=c(1000,2000,3000),
             RepVotes=c(500,1000,1500),
             RepCandidate=rep("Eisenhower"),
             DemVotes=c(500,1000,1500),
             DemCandidate=rep("Stevenson"),
             ThirdVotes=c(0,0,0),
             ThirdCandidate=rep("Uncommitted"),
             RepVotesTotalPerc=rep(50.00),
             DemVotesTotalPerc=rep(50.00),
             ThirdVotesTotalPerc=rep(0.00)
             )
----------------------------------------------------------------------------------------------------
yr   | mn  | sub  | unit_type | unit_name | TotalVotes | RepVotes |  RepCan  | DemVotes | DemCan 
----------------------------------------------------------------------------------------------------
1956   11   Alabama  County    Autauga        1000        500      EisenHower   500     Stevenson
----------------------------------------------------------------------------------------------------
1956   11   Alabama  County    Baldwin        2000       1000      EisenHower   1000    Stevenson
----------------------------------------------------------------------------------------------------
1956   11   Alabama  County    Barbour        3000       2000      EisenHower   2000    Stevenson
----------------------------------------------------------------------------------------------------

I am trying to get a table that looks like the following:

----------------------------------------------------------------------------------------------------
yr   | mn  | sub  | unit_type | unit_name |   pty_n   |   can    |   TotalVotes   | CanVotes
----------------------------------------------------------------------------------------------------
1956   11   Alabama  County     Autauga    Republican   Eisenhower     1000          500 
----------------------------------------------------------------------------------------------------
1956   11   Alabama  County     Autauga    Democrat     Stevenson      1000          500 
----------------------------------------------------------------------------------------------------
1956   11   Alabama  County     Autauga    Independent  Uncommitted    1000            0 
----------------------------------------------------------------------------------------------------
# and etc. for other counties in example (Baldwin, Barbour, etc)

As you can see, I pretty much want three observations per county, where candidates are all in one column, as well as their respective votes in another (CanVotes, or the like).

I have tried using things like pivot_longer() or spread(), but I am having a hard time visualizing these in code. Any help here would be greatly appreciated in sort of reorienting my data to get a candidate column, but also moving the rest of the data with it!

PageSim
  • 143
  • 1
  • 1
  • 8

2 Answers2

1

Here is a solution that first uses pivot_longer to bring the Votes into a long format. Then I use mutate with case_when to substitute the former column names with the actual candidate names and delete the single candidate columns:

long_table <- pivot_longer(test.df,
                           cols = c(RepVotes, DemVotes, ThirdVotes),
                           names_to = "pty_n",
                           values_to = "CanVotes") %>% 
  mutate(can = case_when(
    pty_n == "RepVotes" ~ RepCandidate,
    pty_n == "DemVotes" ~ DemCandidate,
    pty_n == "ThirdVotes" ~ ThirdCandidate
  ),
  pty_n = case_when(
    pty_n == "RepVotes" ~ "Republican",
    pty_n == "DemVotes" ~ "Democrat",
    pty_n == "ThirdVotes" ~ "Independent"
  )) %>% 
  select(-c(RepCandidate, DemCandidate, ThirdCandidate))
# A tibble: 9 x 12
     yr    mn sub     unit_type unit_name TotalVotes RepVotesTotalPerc DemVotesTotalPerc ThirdVotesTotalPe~ pty_n      CanVotes can       
  <dbl> <dbl> <chr>   <chr>     <chr>          <dbl>             <dbl>             <dbl>              <dbl> <chr>         <dbl> <chr>     
1  1956    11 Alabama County    Autauga         1000                50                50                  0 Republican      500 Eisenhower
2  1956    11 Alabama County    Autauga         1000                50                50                  0 Democrat        500 Stevenson 
3  1956    11 Alabama County    Autauga         1000                50                50                  0 Independe~        0 Uncommitt~
4  1956    11 Alabama County    Baldwin         2000                50                50                  0 Republican     1000 Eisenhower
5  1956    11 Alabama County    Baldwin         2000                50                50                  0 Democrat       1000 Stevenson 
6  1956    11 Alabama County    Baldwin         2000                50                50                  0 Independe~        0 Uncommitt~
7  1956    11 Alabama County    Barbour         3000                50                50                  0 Republican     1500 Eisenhower
8  1956    11 Alabama County    Barbour         3000                50                50                  0 Democrat       1500 Stevenson 
9  1956    11 Alabama County    Barbour         3000                50                50                  0 Independe~        0 Uncommitt~

I tried to build a custom spec, but it seems that the names have to be derived from the column names and can't be directly conditional on other columns.

starja
  • 9,887
  • 1
  • 13
  • 28
  • Thank you for drawing my attention to `case_when()`! Exactly what I was looking for! – PageSim Jun 08 '20 at 18:57
  • Any way you can think of to edit the answer, using @Wimpel 's method to maintain parties? I am thinking `parties <- gsub( "Candidate", "", grep( "^.*Candidate$", names(df), value = TRUE ) )`, but unsure where to include this in a pipe-able answer, since I want to maintain a column for party names. – PageSim Jun 08 '20 at 19:20
  • I am thinking ```mutate(RepCandidate=paste(RepCandidate,"-Rep"), DemCandidate=paste(DemCandidate,"-Dem"), ThirdCandidate=paste(ThirdCandidate,"-",ThirdParty)) %>% ```, then separate by `-` to get a party column before running your code? – PageSim Jun 08 '20 at 19:25
  • 1
    You can also use fixed names with `case_when`. I think, here it's again the easiest solution to get clean party names. – starja Jun 08 '20 at 19:36
  • This mostly works. I will edit slightly because third party candidates have specific parties. Thanks again! You're the bomb! – PageSim Jun 08 '20 at 19:46
1

Here is a data.table go at things

library( data.table )
#convert data to the data.table-format
setDT( test.df )
#get the different paries to update the variable balter in
parties <- gsub( "Candidate", "", grep( "^.*Candidate$", names( test.df ), value = TRUE ) )
#melt to each candidate and his/her votes
DT.melt <- melt(test.df, 
                id.vars = c("yr", "mn", "sub", "unit_type", "unit_name"),
                measure.vars = patterns( can = "^.*Candidate$",
                                         canVotes = "^(Rep|Dem|Third)Votes$" ),
                variable.name = "pty_n"
                )
#get the totals from the original date (by unit_name) through joining
DT.melt[ test.df, TotalVotes := i.TotalVotes, on = .(unit_name)]
#and pass the correct party name to the pty_n column
DT.melt[, pty_n := parties[ pty_n ] ][]

#      yr mn     sub unit_type unit_name pty_n         can canVotes TotalVotes
# 1: 1956 11 Alabama    County   Autauga   Rep  Eisenhower      500       1000
# 2: 1956 11 Alabama    County   Baldwin   Rep  Eisenhower     1000       2000
# 3: 1956 11 Alabama    County   Barbour   Rep  Eisenhower     1500       3000
# 4: 1956 11 Alabama    County   Autauga   Dem   Stevenson      500       1000
# 5: 1956 11 Alabama    County   Baldwin   Dem   Stevenson     1000       2000
# 6: 1956 11 Alabama    County   Barbour   Dem   Stevenson     1500       3000
# 7: 1956 11 Alabama    County   Autauga Third Uncommitted        0       1000
# 8: 1956 11 Alabama    County   Baldwin Third Uncommitted        0       2000
# 9: 1956 11 Alabama    County   Barbour Third Uncommitted        0       3000
Wimpel
  • 26,031
  • 1
  • 20
  • 37
  • Another excellent answer. Potentially a bit better because it maintains the party name column. What is `setDT` doing here? Is that avoiding the depreciated `melt` in the `library(data.table)`? – PageSim Jun 08 '20 at 19:16
  • Perhaps, we can use `mutate(RepCandidate=paste(RepCandidate,"-Rep"), DemCandidate=paste(DemCandidate,"-Dem"), ThirdCandidate=paste(ThirdCandidate,"-",ThirdParty)) %>%`, then `separate(can,c("can","pty_n"),"-") %>%` after @starja 's code to maintain party names? Excellent help, again. Cannot say thank you enough! – PageSim Jun 08 '20 at 19:28
  • 1
    `setDT()` is converting the data.frame/tibble to a data.table, so that use all the goodness of the `data.table`-package ;-) – Wimpel Jun 08 '20 at 19:31