0

I usually use dplyr to filter data. I know hava huge dataset (62176 entries) of banks operating in different countries. I'd like to subset/filter that datasets for Eurozone banks only.

I haven't found any workaround rather than pasting all the name of Eurozone countries and then create a new dataset with filter.

Is there any workaround for this problem?

Thank you!

chrtpmdr
  • 35
  • 6
  • Could you show small parts of your dataset so that other users can try alternative ways? – Abdur Rohman Mar 13 '22 at 14:42
  • It depends on the structure of your dataset. If you have only a column with country names and none identifying the currency, yes. Alternatively, you can search for a dataset containing the country names + currency and filter by the EUR ones. This way you won't have to paste the names. – Fla28 Mar 13 '22 at 14:44
  • 1
    Pls post some data and examples of code you have tried and did not work. Without this its too very difficult to understand the issue and provide solutions. – Vinay Mar 13 '22 at 15:21
  • 3
    chrtpmdr, your recent trend of questions of the last year have all been lacking in reproducibility: providing a picture of data (other questions) assumes we are willing to painstakingly transcribe data you already have; and providing no clue requires us to guess/assume, and likely *make up* our own fake data to demonstrate a process. Both of those show an apparent lack of effort on your part, and I for one (perhaps others too) would much prefer that at a minimum you meet us halfway. If you are not willing to go that far, what is my motivation? – r2evans Mar 13 '22 at 15:27
  • Please see https://stackoverflow.com/q/5963269, [mcve], and https://stackoverflow.com/tags/r/info for great discussions about improving questions to be more reproducible. Most of the techniques take minimal effort on your part and go very far in helping us help you. Thanks! – r2evans Mar 13 '22 at 15:27
  • 2
    Isn‘t is a 2mins. task to quickly google for a table of Eurozone countries, paste that into R and then filter on this country vector? Admittedly, it might be a little annoying if the country names don‘t match due to different terminology, e.g. UK vs GB, but it‘s writing round about 20 countries, one time. Asking this question probably took longer. :D – deschen Mar 13 '22 at 15:28

2 Answers2

1

Without the data we can't give you clear answers however, given my understanding of the problem below are some methods.

  1. Assuming your dataset already has a column that has each bank's operating country, you could create a manual vector of the countries you are interested in and then filter the dataset for rows that match
#manually assign countries to vector (this must match how the countries are listed in your data)
euro_countries<- c("Germany","England","France","Poland")

#Then filter dataset to pull up rows that match, I make up colnames as I don't know your data

dataframe %>% filter(op_country %in% euro_countries)
  1. alternatively, depending on your data set you can reference the very helfpul countrycode library in R which has an existing dataset that can potentially join your dataset country column against the matching column in countrycode::codelist and then reference the countrycode::codelist$continent to filter for countries in "Europe".
#join your data set with the codelist table but depends on country column in your dataset
dataframe <- left_join(x=df,y=countrycode::codelist,by=c("op_country"="country.name.en"))

#filter your dataset with the new column
dataframe %>% filter(continent=="Europe")
alejandro_hagan
  • 843
  • 2
  • 13
0

The programmatic approach, assumes the full dataset is a dataframe with "country" as a column heading:

  1. manually create a comma delimited text file, eurocountries.txt, of all Eurocountry country names, using the naming conventions in your dataset. Place the file in the R working directory.

  2. run the following R code:

     library(tidyverse)
     Eurolist <- read_file(eurocountries.txt)  # check the content  
     Eurocountry.dataset <- dataset %>% filter(country %in% Eurolist)
    
GGAnderson
  • 1,993
  • 1
  • 14
  • 25