Text mining in R, reading every row for a yes/no answer

Question

I've been trying to figure out a way of using R on how to extract from a CSV file that was created using the RISmed package from PubMed certain terms, for example latino in a way that would create a new variable "Latino" read the whole row and insert if there is any mention of the word yes or no in the newly created variable

how would I be able to do this and which package do you recommend?

Here is a sample of my code

library(RISmed)
library(dplyr) # tibble and other functions

RCT_topic <- 'randomized clinical trial'
RCT_query <- EUtilsSummary(RCT_topic, mindate=2016, maxdate=2017, retmax=100)
summary(RCT_query)
RCT_records <- EUtilsGet(RCT_query)

RCT_data <- data_frame('PMID'=PMID(RCT_records),
                       'Title'=ArticleTitle(RCT_records),
                       'Abstract'=AbstractText(RCT_records),
                       'YearPublished'=YearPubmed(RCT_records),
                       'Month.Published'=MonthPubmed(RCT_records),
                       'Country'= Country(RCT_records),
                       'Grant' =GrantID(RCT_records),
                       'Acronym' =Acronym(RCT_records),
                       'Agency' =Agency(RCT_records),
                       'Mesh'=Mesh(RCT_records))

Hi, I don't know if you have tried str_detect() from the stringr package? Otherwise that might be a suitable solution Which variable are you interested in finding the key-word (Latino)? — Allan A, Feb 03 '19 at 19:17
No I have not tried it, i will look it up and see if it works, I'm trying to find the key word (Latino) in the following variables title, abstract and in Mesh variables — Manny Ma, Feb 03 '19 at 19:47

Allan A · Answer 1 · 2019-02-04T10:25:47.000

1

This is one solution:

library(stringr)

RCT_data %>% str_detect("Latino")

This will return which column that Latino is in and then you can apply the same command on that column in order to find the rows. For instance in the column of Abstract as below.

RCT_data %>% mutate(new_variable = ifelse(Abstract %>% str_detect("Latino"), "yes", "no"))

This will add a new column called new_variable with conatuing which which row contain yes if it conatins "Latino" and no if not.

edited Feb 04 '19 at 10:25

answered Feb 03 '19 at 19:34

Allan A

427
8
17

Been trying to work with stringr, but once i try to apply i get this 2 errors ``` RCT_data %>% str_detect("Latino") argument is not an atomic vector; coercing [1] FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE > + RCT_data %>% mutate(new_variable = ifelse(Abstract %>% str_detect("latino"), "yes", "no")) Error in FUN(left) : invalid argument to unary operator ``` My guess would be becuase of the Mesh variable which is a list – Manny Ma Feb 04 '19 at 00:49
That's strange, unfortunatly I can't seem to reproduce the error messages on the given code. str_detect works just fine on both dataframes, tibbles and lists so that should not be the problem. – Allan A Feb 04 '19 at 10:24
In regards to the secondary error message, it seem to indicate a typo of some sort. it looks to be this part "+ " – Allan A Feb 04 '19 at 10:32

score 1 · Accepted Answer · answered Feb 03 '19 at 19:36

1

Why not use grepl to add a column indicating whether or not a search term is found in the abstract column of your search results? grepl will return a logical vector indicating TRUE if your pattern is found, or FALSE if is not.

# There are no mentions of "Latino" or "latino" in your df. 
RCT_data$Latino <- grepl("Latino|latino",RCT_data$Abstract)

# There are several mentions of the word "pain":
RCT_data$Pain <- grepl("pain",RCT_data$Abstract)

answered Feb 03 '19 at 19:36

twb10

533
5
18

This was working perfectly fine while working on title and abstract but I can't seem to make it work on the variable Mesh which is a list and can't seem to make it read the word **Hispanic** `head(RCT_true$Mesh) $`3108` Heading Type 12 Hispanic Americans Descriptor` – Manny Ma Feb 04 '19 at 05:17
Oh and yes I made the sample to small, I had to ramp the sample to 20,000 just to get a few TRUE returns – Manny Ma Feb 04 '19 at 05:19

Text mining in R, reading every row for a yes/no answer

2 Answers2