2

I am trying to keep rows in a csv file that match a certain string using regex in Rapidminer on a Windows 8 machine. I wrote regex that selects the right rows but the output does not retain line breaks and appears as a continuous string. I would appreciate any suggestions on how to keep the line breaks.

My file looks like this:

"ABCDEF","text",numbers,"JAN 1, 2014","text",numbers,10
"BCDEFG","text",numbers,"JAN 1, 2014","text",numbers,1
"CDEFGH","text",numbers,"FEB 1, 2014","text",numbers,12
"CDEFGH","text",numbers,"DEC 1, 2013","text",numbers,8

The following regexes select text from correct rows (1-3) but eliminate line breaks in output:

"[A-Z]*".*2014.*?(?=[\r\n$]+)
"[A-Z]*".*2014.*?(?=([\r\n]{2}))
"[A-Z]*".*2014.*?(?=([\r\n]{2}[\r\n$]*))

I tried multiline mode as well with the following regex, but with same result:

(?m)^"[A-Z]*".*2014.*?(?=[\r\n]+)$

My output looks like the following:

"ABCDEF","text",numbers,"JAN 1, 2014","text",numbers,10 "BCDEFG","text",numbers,"JAN 1, 2014","text",numbers,1 "CDEFGH","text",numbers,"FEB 1, 2014","text",numbers,12

Thank you in advance.

EDIT: With hwnd's and others' excellent suggestions, I came up with the following expression that worked in RapidMiner: (?m)^("[A-Z]+".2014.)\r\n

raj_k
  • 85
  • 6

1 Answers1

1

I would give a try to the following instead of using a lookahead assertion.

(?m)^"[A-Z]+".*2014.*$
hwnd
  • 69,796
  • 4
  • 95
  • 132
  • Thanks for your suggestions. I tried both expressions. It is the same result. They do not preserve the line breaks. – raj_k Aug 12 '14 at 17:58
  • I combined your suggestions with multiline mode. The following expression worked: (?m)^("[A-Z]+".*2014.*)\r\n - Thank you. – raj_k Aug 12 '14 at 18:11
  • Thanks for the green tick, glad I could help. – hwnd Aug 12 '14 at 18:57