I have a large text file with some notes in TextWrangler that I want to parse with Regex and write the matches to a CSV file for MySQL import. Here is a sample source:
ARCHIVE
02.09.2014 22:35
title1
content
content
content
content
30.08.2014 18:13
title2
content
content with tab
content with tab
content
...
more notes as above
...
Each note starts with a date surrounded by returns, then a title and some content lines. I'm currently testing with the following Regex in the TW Find dialog with Grep checked to get the date, title and content block for each note:
\r(\d\d\.\d\d\.\d\d\d\d \d\d:\d\d)\s*\r\r(.+)(?s)((?:(?!\r\d\d\.\d\d\.\d\d\d\d \d\d:\d\d\s*\r).)*)
What this does is look for the date surrounded by returns, then captures the title line and finally all lines following provided that another date block is not encountered. The latter uses a non-capturing negative lookahead. Before the last step the DOTALL setting is enabled with (?s)
including returns in the dot metacharacter.
With the sample source above the Find works for the first note but not for the second one, where some lines are indented with tabs. TW shows this error:
This is where I'm stuck. Can anyone give me a hint?