2

I have a dataset from twitter. I need to remove tweets related to an specific word. I used 'Filter Examples' operator and selected for 'Condition class' -> 'attribute_value'. I followed the Rapidminer guide but it's not working. It says:

"This parameter is available when the parameter 'attribute_value_filter' is selected as condition class. The condition format is an Attribute name, followed by a comparison function and a value to match. Nominal Attributes can be compared by = and != with an arbitrary string, which can also include a regular expression."

Hence, I typed: text{=strike!=} also tried: text=strike!=

'text' is the name of my attribute 'strike' is the word I want to remove.

However, I'm getting this error:

Error rapidminer

Can someone please point out what I am doing wrong? I've tried several variations, but for some idiot reason it's not working.

Please be kind I'm quite new at this and I really need it for my thesis. Thank you so much!!!

Christian König
  • 3,437
  • 16
  • 28

1 Answers1

0

to filter out tweets containing a certain word, you need to use regular expression syntax. The most simple expression would be:

text != .*strike.* but this would also filter out texts where strike is part of another word, so probably better suited would be:

text != .*\sstrike[\s\.\!\,\.\:$].

reading as: filter out any example in text where before strike are arbitrary characters and a white space and followed by either a white space,a punctuation character or end of line.

David
  • 792
  • 5
  • 17