end_result_tbl
This end_result_tbl is an example from a different voter file in ideal format.
ID GEN_16 GEN_14 GEN_08 PP_16 PR_16 PR_15 PR_14
0001 1 1 1 1 0 0 0
0002 0 0 0 0 1 0 1
0003 1 1 1 0 0 0 0
0004 1 0 1 0 0 0 1
0005 1 0 1 1 1 0 1
raw_data_tbl
ID Voter_History
0001 GE 20161108;20121106 GE;20081104 GE;20080205 PP;General Election 2004
0002 2016 GENERAL ELECTION;2014 GENERAL ELECTION
0003 20121106 GE;20081104 GE;General Election 2006
0004 GE 20150910
0005 16 GENERAL ELECTION; 14 PRIMARY ELECTION
Looking to make variables for each election out of conditional string matches for each string of text.
Each election has about 9 iterations. if one iteration is matched for an election, then a "1" is placed to show a VOTE in that election, if none are matched, then a "0" for a NO VOTE.
Below are the iterations for the 2016 November General Election
GEN_16<-c("20161108 GE",
"16 GENERAL ELECTION",
"GENERAL 2016",
"GENERAL ELECTION 2016",
"2016 GENERAL ELECTION",
"GENERAL ELECTION, 2016",
"16 GENERAL ELECTION",
"GE 20161108")
Here is what I have tried (attempting only 2016 General Election):
raw_data_tbl$GEN_16<-
as.integer(stri_detect(raw_data_tbl$Voter_History,GEN_16))
which(GEN_16%in%raw_data_tbl$Voter_History
require(dplyr)
Sequences <- GEN_16
Database <- raw_data_tabl$Voter_History
df=as.data.frame(sapply(Sequences, function(x) grep(x,Database)))
stats=df %>% summarise_all(funs(sum))
cbind(Sequences,as.numeric(stats))
this is actually a quite simple albeit super long code in sql but find it's equivalent in R hard to find.
raw_data_tabl has about 17 million voters in it.
any direction is super appreciated, thanks in advance.