-2

I am parsing text in Power Automate in a multi line pdf document. The text example is like the one below

xxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxx
Invoice Nº / Date
12345678 / 10.04.2023

Using Google, Stack Overflow and regex101 I got this far:

(?<=Invoice Nº / Date\r|Invoice Nº / Date\n)[^\r\n]+

that gets me this (am I that far from the intended result? please help

12345678 / 10.04.2023

how can I independently obtain the Invoice Nº and the Date to populate Excel rows?

FYI, I can't use \K

Thanks in advance.

Castella
  • 3
  • 1
  • 7
  • If you want to match both in a single regex, use [`\d{2}\.\d{2}\.\d{4}|\d+`](https://regex101.com/r/3cNOjt/1). – InSync May 26 '23 at 12:08
  • because there are other lines with dates above these ones (this belongs to a multi line pdf) your suggestion will obtain a date above the reference that I am requesting. Also I want to obtain both results independently. IE, one regex for the reference that belongs to the Invoice Nº and the date that belongs to the date – Castella May 26 '23 at 12:49
  • You can add some lookbehind assertions then: [`(?<=Invoice Nº \/ Date\r?\n)\d+|(?<=Invoice Nº \/ Date\r?\n\d+ \/ )\d{2}\.\d{2}\.\d{4}`](https://regex101.com/r/3cNOjt/2) – InSync May 26 '23 at 12:54
  • I don't use Power Automate, so I'm not exactly sure how it handles the matching part. The regex recommended above has two parts: before and after the `|`; this means that you may also match each part independently. – InSync May 26 '23 at 12:57
  • this was perfect, I can know retrieve the parts I require independently.. Thank you – Castella May 26 '23 at 13:37

2 Answers2

0

Use a character class to handle varying line endings:

(?<=Invoice Nº / Date[\r\n])[^\r\n]+

See live demo.

To capture the parts as groups:

(?<=Invoice Nº \/ Date[\r\n])(\S+) \/ (\S+)

See live demo.

Bohemian
  • 412,405
  • 93
  • 575
  • 722
0

answer provided in InSync's 2nd comment. thank you very much

enter image description here

Castella
  • 3
  • 1
  • 7