I have a list of negative
words which has 4783 elements. I also have another list (dataframe
) tf2
with multiple variables "user","reuser", "full_text", "range", "user.location", "date2"
. I want to compare one column of the multi-variable list with the negative words list.
And, based on the boolean outcome, if the word is present in 'negative
and tf2$full_text
; I want to create another true
or false
column in tf2
.
I am trying something like this.
tf3 <- apply(tf2, function(x) (x$negative <- intersect(x["full_text"], ng)))
But, it is no good. Can we also use something like any(ele in x.full_text.split() for ele in negative)
in the function?
I am adding 10 rows from tf2
dataframe as below:
structure(list(user = c("jdugger2", "rustedshakles", "hhherm",
"KnightKiwi", "KeithGrayeb", "Clayconboy1", "goblinhunter44",
"migueli44271514", "hms_smeagol", "owlwoman911_"), reuser = c("TheOnion",
"TheOnion", "TheOnion", "TheOnion", "TheOnion", "GA_peach3102",
"TheOnion", "TheOnion", "TheOnion", "SSG_PAIN"), full_text = c("RT @TheOnion: Taliban Agrees To Peace Deal Despite Concerns About America’s Human-Rights Record .....co/zMTRk7p8J8 .....co/N1KRAX…",
"RT @TheOnion: Taliban Agrees To Peace Deal Despite Concerns About America’s Human-Rights Record .....co/zMTRk7p8J8 .....co/N1KRAX…",
"RT @TheOnion: Taliban Agrees To Peace Deal Despite Concerns About America’s Human-Rights Record .....co/zMTRk7p8J8 .....co/N1KRAX…",
"RT @TheOnion: Taliban Agrees To Peace Deal Despite Concerns About America’s Human-Rights Record .....co/zMTRk7p8J8 .....co/N1KRAX…",
"RT @TheOnion: Taliban Agrees To Peace Deal Despite Concerns About America’s Human-Rights Record .....co/zMTRk7p8J8 .....co/N1KRAX…",
"RT @GA_peach3102: A week-long REDUCTION in VIOLENCE between US, Taliban & Afghan forces is set to begin Friday at midnight\n\nThis will lead…",
"RT @TheOnion: Taliban Agrees To Peace Deal Despite Concerns About America’s Human-Rights Record .....co/zMTRk7p8J8 .....co/N1KRAX…",
"RT @TheOnion: Taliban Agrees To Peace Deal Despite Concerns About America’s Human-Rights Record .....co/zMTRk7p8J8 .....co/N1KRAX…",
"RT @TheOnion: Taliban Agrees To Peace Deal Despite Concerns About America’s Human-Rights Record .....co/zMTRk7p8J8 .....co/N1KRAX…",
"RT @SSG_PAIN: ⚡⚡\nUS, Taliban Announce Peace Deal to Be Signed Next Week .....co/5sEqGQw8K5"
), range = c(140L, 140L, 140L, 140L, 140L, 143L, 140L, 140L,
140L, 95L), user.location = c("Queens, NY", "", "", "Ecruteak City, Johto",
"", "Arizona, USA", "Gobowen, England", "", "San Francisco",
"HighRockyNews RT for planet)"), date2 = c(21022020L, 21022020L,
21022020L, 21022020L, 21022020L, 21022020L, 21022020L, 21022020L,
21022020L, 21022020L)), row.names = c(NA, 10L), class = "data.frame")
I don't know how to give a negative list of 4783 words here. If we can use an arbitrary list of some 20 negative words. Then, I guess we can test this.