I am trying to slice the following strings as 3 separated columns (Country, City, Count) in R
Country City Count
Japan Tokyo 361
The data:
"country=Japan&city=Tokyo","361"
"country=Spain&city=Barcelona","359"
"country=United Kingdom&city=London","333"
"country=Japan&city=Fukuoka","259"
"country=United States of America&city=New York City","223"
I've tried this:
library(data.table)
library(stringr)
df <- read.table(file.choose(), header = FALSE, sep = ",", colClasses = c('character', 'character'), na.strings = 'null')
df.1 <- data.table(str = as.character(df$V1))
df.2 <- df.1[grepl("country=.+&city=\\w+", str),
country := str_extract(str,"(?<=country=)(.+)"),
city := str_extract(str, "(?<=city=)(.+)")]
But from this, while the city format comes as I'd like to view, the column country would return as follows:
Japan&city=Tokyo
I would like to eliminate the &city=Tokyo bit to make nice format.
Then, I'd merge the df and df.2 together so I have the number values aligned. However, I think there must be a smarter way to do this.
Please share me your knowledge. I appreciate your assistance.