I want to make a world map with ggplot as follows:
library(ggplot2)
library(countrycodes)
library(dplyr)
library(tidyverse)
worldmap_AMIS_Market <- map_data("world")
# Set colors
vec_AMIS_Market <- c("Canada", "China","United States of America", "Republic of Korea", "Russian Federation")
worldmap_AMIS_Market <- mutate(worldmap, fill = ifelse(region %in% vec_AMIS_Market, "green", "lightgrey"))
# Use scale_fiil_identity to set correct colors
ggplot(worldmap_AMIS_Market, aes(long, lat, fill = fill, group=group)) +
geom_polygon(colour="gray") + ggtitle("Map of World") +
ggtitle("Availability of AMIS Supply and Demand Data - Monthly") +
scale_fill_identity()
As you can see the US does not light up in green, because in the worldmap_AMIS_Market
data, the US is written as USA
, while the vector uses United States of America
. The same goes for Russia and South Korea. As I am going to go through this process for around 50 different datasets, I would prefer to not manually correct all countries that do no match.
Is there any way to solve issues like this? I have a couple of ideas, but not an actual solution:
- I could do fuzzy matching, but that won't work for USA -> United States.
- I know the package
countrycodes
can convert countries to iso codes etc, but I don't think it has the option to correct country names (https://github.com/vincentarelbundock/countrycode). - I could somehow collect all alternative naming conventions for all countries, and then do a fuzzy match on that. But I don't know where to get the alternative names from, and I am not sure I would be able to write the fuzzy code for this scenario anymore.
Could someone perhaps help me fix this?