0

I've been working on something for a while now and still haven't figured out how to get it to work in my preferred way. Hoping someone can help me:

I have a dataframe containing lots of data (5000+ obs) about city budgets, therefore, one of the variable names is obviously 'city'. I have a seperate list of 40 cities that I want to attach to this dataframe and essentially conditionally check for each cityname in the df, if it's also on the seperate list (and so; code it 1; or else 0). I made an example below with smaller dataset:

city <- c(rep("city_a", 8), rep("city_b", 5), rep("city_c", 4), rep("city_d", 7), 
rep("city_e", 3), rep("city_f", 9), rep("city_g", 4)) 
school <- c(1:8, 1:5, 1:4, 1:7,1:3, 1:9, 1:4)
df <- data.frame(city, school)

seperate_list <- tolower("City_A, City_B, City_E, City_G")
seperate_list <- gsub('[,]', '', seperate_list)
seperate_list <- strsplit(seperate_list, " ")[[1]]

Note: You may ask; why do the second part like that? My dataset is much larger and I wanted to find a way to make the process more automatic, so e.g. I wouldn't have to manually delete all the commas and seperate the citynames from one another. Now that I have df and seperate_list, I want to combine them in df, by adding a third column that specifies whether (1) or not (0) each city is in the seperate list. I've tried using a for loop and also lapply, but with no luck since I'm not very skilled in both of those yet.

I would appreciate a hint, so I can sort of find of myself!

Demi
  • 33
  • 1
  • 5
  • What is the expected result? – Bernhard Sep 09 '21 at 12:24
  • Thanks for commenting! I described the expected results instead of showing it, maybe that was a bit unclear. Anyways, the solution provided by @danlooo was what I was looking for. – Demi Sep 09 '21 at 12:51
  • It was unclear to me whether "I want to combine them" meant more then just the additional column. Your intention was clear, however, the moment you accepted @danloo's answer. – Bernhard Sep 09 '21 at 13:22
  • Note the spelling of *separate*. – G. Grothendieck Sep 09 '21 at 14:10
  • Thanks G., it was a long day at work for me, judging by the amount of typos in this post :) – Demi Sep 10 '21 at 09:35

1 Answers1

0
library(tidyverse)

city <- c(rep("city_a", 8), rep("city_b", 5), rep("city_c", 4), rep("city_d", 7), 
          rep("city_e", 3), rep("city_f", 9), rep("city_g", 4)) 
school <- c(1:8, 1:5, 1:4, 1:7,1:3, 1:9, 1:4)
df <- data.frame(city, school)

seperate_list <- tolower("City_A, City_B, City_E, City_G")
seperate_list <- gsub('[,]', '', seperate_list)
seperate_list <- strsplit(seperate_list, " ")[[1]]


df %>%
  mutate(
    in_list = city %in% seperate_list
  ) %>%
  as_tibble()
#> # A tibble: 40 x 3
#>    city   school in_list
#>    <chr>   <int> <lgl>  
#>  1 city_a      1 TRUE   
#>  2 city_a      2 TRUE   
#>  3 city_a      3 TRUE   
#>  4 city_a      4 TRUE   
#>  5 city_a      5 TRUE   
#>  6 city_a      6 TRUE   
#>  7 city_a      7 TRUE   
#>  8 city_a      8 TRUE   
#>  9 city_b      1 TRUE   
#> 10 city_b      2 TRUE   
#> # … with 30 more rows

Created on 2021-09-09 by the reprex package (v2.0.1)

I think you might also look in joining tables and make the list of interest as a column of another table. This looks for what databases and relational algebra are made for.

danlooo
  • 10,067
  • 2
  • 8
  • 22