This is something that I can do easily in Excel. But Im being confounded by R.
I would like to assign country names to a long list of strings ("affiliation").
c("Department of Psychiatry and Behavioural Sciences, University College London Medical School, UK.",
"", "Ty Dewi Sant School of Nursing, University Hospital of Wales, College of Medicine, Cardiff.",
"University of Massachusetts Medical Center.", "Older Women's League.",
"Kimberly Quality Care, Boston, MA.", "Michaux Manor Living Center, Fayetteville, PA.",
"Florida Diagnostic and Learning Resources System, University of South Florida, Tampa 33613.",
"", "Bigel Institute for Health Policy, Brandeis University, Waltham, MA.",
"", "York Health Authority.", "Southern Illinois University, Edwardsville.",
"St. Joseph's Hospital, Memphis, TN.", "Long Term Home Care of the Frail Elderly Foundation, New York City.",
"Catholic University of America, Washington, DC.", "Mercy Health Center, Oklahoma City, OK.",
"", "Visiting Nurse Service of New York.", "RespiteCare Center, Evanston, IL.",
"Camden and Islington HA.", "National Advisory Council on Aging.",
"Visiting Nurse Service of New York.", "American Health Care Association, Washington, DC.",
"HealthCare Partners Medical Group, Los Angeles, CA 90015, USA.",
"Tad Publishing Company, Peoria, IL, USA.", "Child Health Investment Partnership, Roanoke, VA, USA.",
"School of Public Health, State University of New York, Albany 12237, USA.",
"Bundoora Extended Care Centre.", "", "", "Family Respite Center, Falls Church, VA, USA.",
"", "University of Victoria.", "", "Homemaker Health Aide Service of the National Capital Area.",
"West Lambeth Health Authority, London SE1 7EH.", "Bon Secours Hospital/Villa Maria Nursing Center, North Miami, FL 33161.",
"Alzheimer's Disease and Related Disorders Association, Syracuse, NY.",
"Alzheimer's Association, Washington DC.", "South Carolina Commission on Aging, Columbia.",
"University of New Mexico College of Nursing.", "Department of Human Development and Family Studies, University of Alabama, Tuscaloosa.",
"Ballard Health Care Residence, Des Plaines, IL.", "Bowman Gray School of Medicine of Wake Forest University, Winston-Salem, NC.",
"Case Western Reserve University.", "School of Public and Environmental Administration, Indiana University, Indianapolis 46202.",
"Manor HealthCare Corp, Silver Spring, MD.", "Relationship Builders, Napa, CA.",
"", "", "Medical University of South Carolina, USA.", "Tokyo Metropolitan Institute of Gerontology, Itabashi, Japan. tatsuro@tmig.or.jp",
"Medical University of South Carolina, USA.", "Royal Hospital for Sick Children, Bristol.",
"Barefield, Ennis, Co. Clare., Ireland.", "North Georgia College, Dahlonega 30597, USA.",
"Institute for Psychology (I), University of Wurzburg, Germany.",
"Camborne Redruth Community Hospital, Cornwall, United Kingdom.",
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "",
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "",
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "",
"", "", "", "", "", "", "", "", "", "", "Institute of Child Health and Great Ormond Street Hospital for Children NHS Trust, London, UK.",
"Department of Psychiatry, University of Toronto, Toronto, Ontario, Canada. carol.cohen@sunnybrook.on.ca",
"Boston University School of Social Work, MA 02215, USA.", "",
"Neurosciences Unit, General Infirmary at Leeds.", "", "", "School of Kang-Ning Junior College of Nursing, Nei-Hu, Taiwan, ROC.",
"College of Nursing, South Dakota State University, USA.", "Department of Geriatric Medicine, University of Manchester, UK.",
"Southern Illinois University, Department of Social Work, Edwardsville 62026-1450, USA.",
"Redlands Community College, El Reno, Oklahoma, USA.", "", "",
"Department of Geriatric Medicine, Alexandra Hospital, Singapore.",
"School of Nursing and Midwifery, Department of Gerontological and Continuing Care Nursing, University of Sheffield, Sheffield, England. Liz.hanson@act.shef.ac.uk",
"", "State University of New York, Health Science Center at Syracuse, 13210, USA. HAMR@mailbox.hscsyr.edu",
"Div. of Active Palliative Care, Todachuo General Hospital.",
"Children and Young People's Kidney Unit, Nottingham City Hospital, NHS Trust, UK.",
"School of Nursing & Midwifery, Department of Gerontological & Continuing Care Nursing, University of Sheffield. liz.hanson@act.shef.ac.uk",
"Harrington Memorial Hospital, Southbridge, MA, USA.", "", "Department of Curriculum and Instruction, Iowa State University, Ames, 50011. USA.",
"Children & Young People's Kidney Unit, Nottingham City Hospital, U.K.",
"School of Social Work, Boston University, MA 02215, USA. freedman@bu.edu",
"Royal Free Hospital, London, UK.", "Humboldt State University, Department of Nursing, Arcata, CA, USA.",
"Department of Psychiatry, The University of Queensland, Mental Health Centre, Royal Brisbane Hospital, Herston, Australia. davidk@psychiatry.uq.edu.au",
"Centre for Evidence Based Nursing, University of York, Heslington, York, Nth Yorkshire, UK, YO1 5DG. cat4@york.ac.uk",
"School of Nursing, University of British Columbia, Vancouver. magenta@bc.sympatico.ca",
"Medisinsk avdeling, Lovisenberg Diakonale Sykehus, Oslo.", "School of Nursing, Yale University, USA.",
"Centre de la Mémoire, Hôpital Roger Salengro, Centre Hospitalier Universitaire, Lille.",
"University of Ulster and Eastern Health and Social Services Board, Ulster, Northern Ireland. r.mcconkey@ulst.ac.uk",
"Thames Valley Family Practice Research Unit, Department of Family Medicine's Centre for Studies in Family Medicine, University of Western Ontario (UWO), London. jbbrown@julian.uwo.ca",
"", "", "Department of Special Education, University of Nijmegen, The Netherlands. A.Hendriks@ped.kun.nl",
"European Institute of Health and Medical Sciences, University of Surrey, Guildford, England.",
"California State University School of Nursing, Chico, USA.")
Within each string may or may not be a substring referring to a location, which itself may refer to country. The intended output is a dataframe as follows:
Affiliation[1], matchedCountry
Affiliation[2], matchedCountry
...
Affiliation[n], matchedCountry
"matchedCountry" is meant to be assessed based on several lists (university, UK cities, US states, etc.) and NA is allowed. And some lists only return ISO codes.
Based on the feedback thus far (thanks @rbm), I have managed a solution (see answer section) that does the job quite well. That said, I am sure performance could still be improved. Thanks.
References:
- Simultaneously merge multiple data.frames in a list
- R grepl: quickly match multiple strings against multiple substrings, returning all matches
- R grep: Match one string against multiple patterns
- Speedy test on R data frame to see if row values in one column are inside another column in the data frame
- Extract & combine multiple substrings using multiple patterns from some but not all strings contained in list & return to list in R
- How to detect substrings from multiple lists within a string in R