2

Is there a function or package in R that extracts the relevant country and it's corresponding continent from a string variable?

Each observation in the variable is coded like this:


df<- c("random,thing,thing, United States", "site, level, state, information, Sweden")

and so on and so on. The country is the last item after the final "," delimiter. I essentially want to extract that item or use some sort of function that extracts the country from a string like that.

Then I would need to create another column with the relevant continent. So the final output would look like this:


df2 <- data.frame(Country=c("United States", "Sweden), Continent= c("North America", "Europe")

df2 does need to follow that exact nomenclature ie United States can be written as USA or United States of America - I just need some sort of function/package that extracts that country info and gives me the continent in R.

Thank you so much!

Mr. Biggums
  • 197
  • 8
  • In the absence of a package, you could always [`separate()`](https://tidyr.tidyverse.org/reference/separate.html) that second string by `",\\s?"` , and so convert it into five columns, the last of which is `country`. You could then import a dataset of countries with their information (including `continent`), and then simply [`*_join()`](https://dplyr.tidyverse.org/reference/mutate-joins.html) on `country`. – Greg Feb 03 '22 at 23:44

1 Answers1

2

Ok, both are possible:

library(dplyr)
library(countrycode)

df<- c("random,thing,thing, United States", "site, level, state, information, Sweden")
df2 <- data_frame(Country =  sub('.*\\, ', '', df))
df2$Continent <- countrycode(sourcevar = df2$Country,
                             origin = "country.name",
                             destination = "continent")
df2
Bloxx
  • 1,495
  • 1
  • 9
  • 21