Combining Rows in R with Pivot or Spread?

Question

Here, I am manipulating election data, and the current data is in the following format. Both a visual and coded example are included (while visual is a bit condensed). Moreover, values have been edited from their originals.

# Representative Example
library(tidyverse)
test.df <- tibble(yr=rep(1956),mn=rep(11),
             sub=rep("Alabama"),
             unit_type=rep("County"),
             unit_name=c("Autauga","Baldwin","Barbour"),
             TotalVotes=c(1000,2000,3000),
             RepVotes=c(500,1000,1500),
             RepCandidate=rep("Eisenhower"),
             DemVotes=c(500,1000,1500),
             DemCandidate=rep("Stevenson"),
             ThirdVotes=c(0,0,0),
             ThirdCandidate=rep("Uncommitted"),
             RepVotesTotalPerc=rep(50.00),
             DemVotesTotalPerc=rep(50.00),
             ThirdVotesTotalPerc=rep(0.00)
             )
----------------------------------------------------------------------------------------------------
yr   | mn  | sub  | unit_type | unit_name | TotalVotes | RepVotes |  RepCan  | DemVotes | DemCan 
----------------------------------------------------------------------------------------------------
1956   11   Alabama  County    Autauga        1000        500      EisenHower   500     Stevenson
----------------------------------------------------------------------------------------------------
1956   11   Alabama  County    Baldwin        2000       1000      EisenHower   1000    Stevenson
----------------------------------------------------------------------------------------------------
1956   11   Alabama  County    Barbour        3000       2000      EisenHower   2000    Stevenson
----------------------------------------------------------------------------------------------------

I am trying to get a table that looks like the following:

----------------------------------------------------------------------------------------------------
yr   | mn  | sub  | unit_type | unit_name |   pty_n   |   can    |   TotalVotes   | CanVotes
----------------------------------------------------------------------------------------------------
1956   11   Alabama  County     Autauga    Republican   Eisenhower     1000          500 
----------------------------------------------------------------------------------------------------
1956   11   Alabama  County     Autauga    Democrat     Stevenson      1000          500 
----------------------------------------------------------------------------------------------------
1956   11   Alabama  County     Autauga    Independent  Uncommitted    1000            0 
----------------------------------------------------------------------------------------------------
# and etc. for other counties in example (Baldwin, Barbour, etc)

As you can see, I pretty much want three observations per county, where candidates are all in one column, as well as their respective votes in another (CanVotes, or the like).

I have tried using things like pivot_longer() or spread(), but I am having a hard time visualizing these in code. Any help here would be greatly appreciated in sort of reorienting my data to get a candidate column, but also moving the rest of the data with it!

starja · Accepted Answer · 2020-06-08T19:36:03.377

Here is a solution that first uses pivot_longer to bring the Votes into a long format. Then I use mutate with case_when to substitute the former column names with the actual candidate names and delete the single candidate columns:

long_table <- pivot_longer(test.df,
                           cols = c(RepVotes, DemVotes, ThirdVotes),
                           names_to = "pty_n",
                           values_to = "CanVotes") %>% 
  mutate(can = case_when(
    pty_n == "RepVotes" ~ RepCandidate,
    pty_n == "DemVotes" ~ DemCandidate,
    pty_n == "ThirdVotes" ~ ThirdCandidate
  ),
  pty_n = case_when(
    pty_n == "RepVotes" ~ "Republican",
    pty_n == "DemVotes" ~ "Democrat",
    pty_n == "ThirdVotes" ~ "Independent"
  )) %>% 
  select(-c(RepCandidate, DemCandidate, ThirdCandidate))
# A tibble: 9 x 12
     yr    mn sub     unit_type unit_name TotalVotes RepVotesTotalPerc DemVotesTotalPerc ThirdVotesTotalPe~ pty_n      CanVotes can       
  <dbl> <dbl> <chr>   <chr>     <chr>          <dbl>             <dbl>             <dbl>              <dbl> <chr>         <dbl> <chr>     
1  1956    11 Alabama County    Autauga         1000                50                50                  0 Republican      500 Eisenhower
2  1956    11 Alabama County    Autauga         1000                50                50                  0 Democrat        500 Stevenson 
3  1956    11 Alabama County    Autauga         1000                50                50                  0 Independe~        0 Uncommitt~
4  1956    11 Alabama County    Baldwin         2000                50                50                  0 Republican     1000 Eisenhower
5  1956    11 Alabama County    Baldwin         2000                50                50                  0 Democrat       1000 Stevenson 
6  1956    11 Alabama County    Baldwin         2000                50                50                  0 Independe~        0 Uncommitt~
7  1956    11 Alabama County    Barbour         3000                50                50                  0 Republican     1500 Eisenhower
8  1956    11 Alabama County    Barbour         3000                50                50                  0 Democrat       1500 Stevenson 
9  1956    11 Alabama County    Barbour         3000                50                50                  0 Independe~        0 Uncommitt~

I tried to build a custom spec, but it seems that the names have to be derived from the column names and can't be directly conditional on other columns.

Thank you for drawing my attention to `case_when()`! Exactly what I was looking for! — PageSim, Jun 08 '20 at 18:57
Any way you can think of to edit the answer, using @Wimpel 's method to maintain parties? I am thinking `parties <- gsub( "Candidate", "", grep( "^.*Candidate$", names(df), value = TRUE ) )`, but unsure where to include this in a pipe-able answer, since I want to maintain a column for party names. — PageSim, Jun 08 '20 at 19:20
I am thinking ```mutate(RepCandidate=paste(RepCandidate,"-Rep"), DemCandidate=paste(DemCandidate,"-Dem"), ThirdCandidate=paste(ThirdCandidate,"-",ThirdParty)) %>% ```, then separate by `-` to get a party column before running your code? — PageSim, Jun 08 '20 at 19:25
You can also use fixed names with `case_when`. I think, here it's again the easiest solution to get clean party names. — starja, Jun 08 '20 at 19:36
This mostly works. I will edit slightly because third party candidates have specific parties. Thanks again! You're the bomb! — PageSim, Jun 08 '20 at 19:46

Wimpel · Answer 2 · 2020-06-08T19:32:23.647

Here is a data.table go at things

library( data.table )
#convert data to the data.table-format
setDT( test.df )
#get the different paries to update the variable balter in
parties <- gsub( "Candidate", "", grep( "^.*Candidate$", names( test.df ), value = TRUE ) )
#melt to each candidate and his/her votes
DT.melt <- melt(test.df, 
                id.vars = c("yr", "mn", "sub", "unit_type", "unit_name"),
                measure.vars = patterns( can = "^.*Candidate$",
                                         canVotes = "^(Rep|Dem|Third)Votes$" ),
                variable.name = "pty_n"
                )
#get the totals from the original date (by unit_name) through joining
DT.melt[ test.df, TotalVotes := i.TotalVotes, on = .(unit_name)]
#and pass the correct party name to the pty_n column
DT.melt[, pty_n := parties[ pty_n ] ][]

#      yr mn     sub unit_type unit_name pty_n         can canVotes TotalVotes
# 1: 1956 11 Alabama    County   Autauga   Rep  Eisenhower      500       1000
# 2: 1956 11 Alabama    County   Baldwin   Rep  Eisenhower     1000       2000
# 3: 1956 11 Alabama    County   Barbour   Rep  Eisenhower     1500       3000
# 4: 1956 11 Alabama    County   Autauga   Dem   Stevenson      500       1000
# 5: 1956 11 Alabama    County   Baldwin   Dem   Stevenson     1000       2000
# 6: 1956 11 Alabama    County   Barbour   Dem   Stevenson     1500       3000
# 7: 1956 11 Alabama    County   Autauga Third Uncommitted        0       1000
# 8: 1956 11 Alabama    County   Baldwin Third Uncommitted        0       2000
# 9: 1956 11 Alabama    County   Barbour Third Uncommitted        0       3000

Another excellent answer. Potentially a bit better because it maintains the party name column. What is `setDT` doing here? Is that avoiding the depreciated `melt` in the `library(data.table)`? — PageSim, Jun 08 '20 at 19:16
Perhaps, we can use `mutate(RepCandidate=paste(RepCandidate,"-Rep"), DemCandidate=paste(DemCandidate,"-Dem"), ThirdCandidate=paste(ThirdCandidate,"-",ThirdParty)) %>%`, then `separate(can,c("can","pty_n"),"-") %>%` after @starja 's code to maintain party names? Excellent help, again. Cannot say thank you enough! — PageSim, Jun 08 '20 at 19:28
`setDT()` is converting the data.frame/tibble to a data.table, so that use all the goodness of the `data.table`-package ;-) — Wimpel, Jun 08 '20 at 19:31

Combining Rows in R with Pivot or Spread?

2 Answers2