1

I am at the beginning of learning Regex, and I use every opportunity to understand how it's working. Currently I am trying to extract dates from a text file (which is in fact a vnt-file type from my mobile phone). It looks like following:

BEGIN:VNOTE
VERSION:1.1
BODY;ENCODING=QUOTED-PRINTABLE;CHARSET=UTF-8:18.07.=0A14.08.=0A15.09.=0A15.10.=
=0A13.11.=0A13.12.=0A12.01.=0A03.02. Grippe=0A06.03.=0A04.04.2015=0A0=
5.05.2015=0A03.06.2015=0A03.07.2015=0A02.08.2015=0A30.08.2015=0A28.09=
17.11.2017=0A
DCREATED:20171118T095601
X-IRMC-LUID:150
END:VNOTE

I want to extract all dates, so that the final list is like that:

18.07.
14.08.
15.09.
15.10.

and so on. If the date has also a year, it should also be displayed.

I almost found out how to detect the dates by the following regex:

.+(\d\d\.\d\d\.(2015|2016|2017)?).+

But it only detect very few of the dates. The result is this:

BEGIN:VNOTE
VERSION:1.1
15.10.
04.04.2015
30.08.2015
24.01.2016
DCREATED:20171118T075601
X-IRMC-LUID:150
END:VNOTE

Then I tried to add a question mark which makes the .+ not greedy, as far as I read in tutorials. Then the regex looks like:

.+?(\d\d\.\d\d\.(2015|2016|2017)?).+?

But the result is still not what I am looking for:

BEGIN:VNOTE
VERSION:1.1
21.03.20.04.18.05.18.06.18.07.14.08.15.09.15.10.
13.11.13.12.12.01.03.02.06.03.04.04.20150A0=
03.06.201503.07.201502.08.201530.08.20150A28.09=
28.10.201525.11.201528.12.201524.01.20160A
DCREATED:20171118T075601
X-IRMC-LUID:150
END:VNOTE

For someone who is familiar with regex I am pretty sure this is very easy to solve, but I don't get it. It's very confusing when you are new to regex. I tried to find a hint in some tutorials or stackoverflow posts, but all I found is this: Notepad++ how to extract only the text field which is needed? But it doesn't work for me. I assume it might have something to do with the fact that my text file is not one single line.

I have my example on regex101 too. I would be very thankful if maybe someone can give me a hint what else I can try.

Edit: I would like to detect the dates with the regex and as a result have a list with only the dates (maybe it is called substitute?)

Edit 2: Sorry for not mentioning it earlier: I just want to use the regex in e.g. Notepad++ or an online regex test website. Just to get the result of the dates and save the result in a new txt-file. I don't want to use the regex in an programming language. My apologies for not being precisely before.

Edit 3: The result should be a list with the dates, and each date in a new line: I want to extract all dates, so that the final list is like that:

18.07.
14.08.
15.09.
15.10.
Bettina
  • 25
  • 1
  • 5
  • In what language/tool do you plan to run the regex? Without knowing this, I don't think an exact answer can be given. – Tim Biegeleisen Nov 19 '17 at 00:53
  • Ah okay, I am sorry for not providing this information. My apologies! And thank you Tim for not giving me up. I just want to have the result in Notepad++ or any online regex testing website. I am not using the regex it for a programming language like JavaScript or PHP. – Bettina Nov 20 '17 at 08:02
  • It's going to be hard to handle this from Notepad++. It would be much easier using an app language like Java or JavaScript. – Tim Biegeleisen Nov 20 '17 at 08:07
  • What is your desired result from that piece of text in the end? – Jerry Nov 20 '17 at 08:32
  • @Jerry: Thank you for your question. I would like to have a list with all dates. Per each line I would like to have a date. – Bettina Nov 20 '17 at 09:03
  • @Tim: I am familiar with JavaScript. So if this would be easier, then I will give it a try. – Bettina Nov 20 '17 at 09:05
  • @Bettina How about [this](https://regex101.com/r/EWrldt/3)? – Jerry Nov 20 '17 at 09:23
  • @Jerry: Yes, yes, yes!! This is it! Oh, so great! Exactly what I was looking for! Thank you so much Jerry! If you add it as an answer then I can mark it as solved. Also thank you Tim for not giving up on me! You are both so great, and I am really thankful for your help! – Bettina Nov 20 '17 at 09:29

2 Answers2

1

I suggest this pattern:

(?:.*?|\G)(\d\d\.\d\d\.(?:\d{4})?)

This makes use of the \G flag that, in this case, allows for multiple matches from the very start of the match without letting any single unmatched character in the text, thus allowing the removal of all but what's wanted.

If you want to remove the extra matches as well, add |.* at the end:

(?:.*?|\G)(\d\d\.\d\d\.(?:\d{4})?)|.*

regex101 demo

In N++, make sure the options underlined are selected, and that the cursor is at the beginning. In the picture below, I replaced then undid the replacement, only to show that matches were identified (16 replacements).

enter image description here

Jerry
  • 70,495
  • 13
  • 100
  • 144
0

You can try using the following pattern:

\d{2}\.\d{2}\.(?:\d{4})?

This will match day.month dates of the form 18.07., but it also allows such a date to be followed by a four digit year, e.g. 18.07.2017. While it would be nice to make the pattern more restrictive, to avoid false fire matches, I do not see anything obvious which can be added to the above pattern. Follow the demo link below to see the pattern in action.

Demo

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • Thank you Tim for your help. Your solution looks much more professionial than mine! But unfortunately the substitution with $1 or \1 doesn't work that good with your regex (even if I put the regex into a capturing group). – Bettina Nov 18 '17 at 10:52
  • Your comment suggests you are trying to do something other than what your question says. Please edit your question and tell us your actual requirements. – Tim Biegeleisen Nov 18 '17 at 10:56
  • Thank you for the hint. I have added an additional information to my post. Hope this helps to explain what I want to do with my regex. – Bettina Nov 18 '17 at 19:19