I am currently exploring the possibility of extracting country name from Author Affiliations (PubMed Articles) my sample data looks like:
Mechanical and Production Engineering Department, National University of Singapore.
Cancer Research Campaign Mammalian Cell DNA Repair Group, Department of Zoology, Cambridge, U.K.
Cancer Research Campaign Mammalian Cell DNA Repair Group, Department of Zoology, Cambridge, UK.
Lilly Research Laboratories, Eli Lilly and Company, Indianapolis, IN 46285.
Initially I tried to remove punctuations and split the vector into words and then compared it with a list of country names from Wikipedia but I am not successful at this.
Can anyone please suggest me a better way of doing it? I would prefer the solution in R
as I have to do further analysis and generate graphics in R
.