0

I want extract the text between ':' and '|' characters, but in second and third data there are a space after the ':'

The intput:

Referencia:22726| Referencia Cliente Ak: 233726 | Referencia histórica: 256726 | Suelo | AGOLADA (Pontevedra) -  CARPAZO O PE#A LONJA [EXTRACT]
Referencia:39766| Referencia Cliente Ak: 39767 | Referencia histórica: 39768 | Garaje | MOJACAR (Almería) -  URB.VILLA MIRADOR DEL MAR - MOD. # [EXTRACT]
Referencia:397A5| Referencia Cliente Ak: 397B5 | Referencia histórica: 397C5 | Garaje | MOJACAR (Almería) -  VILLA MIRADOR DEL MAR-MODULO #-PLAZA 4 [EXTRACT]
Referencia:AA39803| Referencia Cliente Ak: P_39803 | Referencia histórica: 200_39803 | Garaje | MOJACAR (Almería) -  VILLA MIRADOR DEL MAR - MODULO [EXTRACT]

Output desired:

22776
233726
256726
39766
39767
39768
397A5
397B5
397C5
AA39803
P_39803
200_39803

My first pattern: (?<=:)(\w{5,12}) This matches only the first column.

My second pattern: (?<=:\s)(\w{5,12}) This matches the second and third columns

So I believed that my third pattern was the correct one: (?<=:\s?)(\w{5,12}) That pattern don't works.

Trimax
  • 2,413
  • 7
  • 35
  • 59
  • 1
    You can use the `regex` module, which supports variable lengths in lookaheads and lookbehinds. It is planned to replace the standard `re` module, they say. You can install it with Pip. (`pip install regex`). – Brōtsyorfuzthrāx Aug 12 '14 at 23:26

2 Answers2

2

a lookbehind can't be variable length in python. A way to solve this:

(?:(?<=:\s)|(?<=:))(\w{5,12})

But since you use a capturing group, a lookbehind is useless:

:\s?(\w{5,12})
Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125
  • Oh, thanks. I didn't know that a lookbehind can't be variable length in python. I imagine that neither a lookahead. But, is it thereby only in python or in other languages too. – Trimax Feb 26 '14 at 09:45
  • a lookahead can be variable length. – Casimir et Hippolyte Feb 26 '14 at 09:48
  • With .net you can use a variable length lookbehind. Java allows to use this quantifier `{n,m}` and PHP allows alternations. – Casimir et Hippolyte Feb 26 '14 at 09:50
  • That new solution is more clean. Why a lookahead can be variable length but not either lookbehind? – Trimax Feb 26 '14 at 09:51
  • I don't have a definitive answer about this, as you can see limitations are different between regex engines. People says:"Since a regex engine works from left to right, it is hard to implement a variable length lookbehind." (to repeat three times before sleeping) – Casimir et Hippolyte Feb 26 '14 at 09:57
1

Remove the ? character in the Lookbehind move the \s to the matches

(?<=:)(\s?\w{5,12})
Nambi
  • 11,944
  • 3
  • 37
  • 49