0

In a Postgres query I wrote, I'm trying to find an empty char between a digit and a char.

The Raw-Data looks pretty similar to this:

"X  1111-11-112222-22-22YY 3333-33-334444-44-44ZZZ5555-55-556666-66-66AAA7777-77-778888-88-88B  9999-99-991111-11-11"

I would like to split this into following table:

X  1111-11-112222-22-22
YY 3333-33-334444-44-44
ZZZ5555-55-556666-66-66
AAA7777-77-778888-88-88
B  9999-99-991111-11-11

So, normally i would do this by defining the Regex (?<\d)(?=[A-z]) which is giving me the empty char between the chars and digits, but Postgres doesn't support lookbehinds.

Anyone an idea how to fix this?

Thanks in advance!

2 Answers2

1
 (\w+\s*[\d-]+)

This will give all the groups.See demo.

http://regex101.com/r/nW8dX7/1

vks
  • 67,027
  • 10
  • 91
  • 124
  • So if i understand it right, you're extracting the blocks of XXX1111-11-112222-22-22 ? – Michael Werner Aug 22 '14 at 10:06
  • Ok, because the actual Data consists of this type of blocks, seperated by either 10 or 13 (or, if combined, 23) blanks, which are extracted with this too. I've tried a non-greed verision, but I'm still stuck withe the blanks. Any idea? – Michael Werner Aug 22 '14 at 10:14
  • (\w+\s*[\d-]+)\s* Try this. – vks Aug 22 '14 at 10:15
  • Still leaves the blanks, even with the non-greedy version. – Michael Werner Aug 22 '14 at 10:18
  • http://regex101.com/r/nW8dX7/3 .have a look at this.dont know why yours is not working – vks Aug 22 '14 at 10:20
  • I think the problem is, that the expression is matching the blocks incl. spaces and excl. (if you roll over the text it highlights both versions). – Michael Werner Aug 22 '14 at 10:23
  • You have to extract groups.Dont match the while thing. – vks Aug 22 '14 at 10:25
  • The problem is, that i want to change all of this "blank-blocks" to pipes and the above discussed blocks as well. – Michael Werner Aug 22 '14 at 10:30
  • That has to be done by re.replace or some other function.It would require another regex. – vks Aug 22 '14 at 10:32
  • I'm open for that. I've now tried replacing all the space-blocks having \s{3,} (so the information-blocks don't get cutted). But regexp_replace(regexp_replace(mystring, '(\w+\s*[\d-]+?)','|','g'),'\s{3,}','|','g') only returns Pipes (meaning the info-blocks self are getting replaced by |, which i don't want) – Michael Werner Aug 22 '14 at 10:35
  • Use "(\w+\s*[\d-]+)\s*" and replace by "\1|" – vks Aug 22 '14 at 10:41
  • replace the first group which you match inside () – vks Aug 22 '14 at 10:47
  • Oh, understood. This even gives me nearly the desired output, except, that there are a lot of blank lines in between. Any way to work around this with the RegEx, or do I have to manualy clear them? – Michael Werner Aug 22 '14 at 10:49
  • If the answer is correct with blank lines its better to remove separately or else we will have to check input and make regex again :).Removing blank lines should be easy. – vks Aug 22 '14 at 10:52
  • Ok, thanks a lot for you help and patience :D (I would vote you up, but my reputation won't let me :/ ) – Michael Werner Aug 22 '14 at 10:54
  • I did the upvote for ya, I see @vks all the time when I'm browsing regex questions. – asontu Aug 22 '14 at 11:43
  • thanx a lot :P the irony is i cant ask anything........my question asking thing is locked. :( – vks Aug 22 '14 at 11:49
0

I would do a replace with a character you can then later split on. I'm not familiar with psql but something like:

split('~', replace(value, '\d(?=[A-z])', '$0~'))
asontu
  • 4,548
  • 1
  • 21
  • 29
  • This is what I'm already doing, still it inserts the char right in front of the digit, which transfers it into the next block – Michael Werner Aug 22 '14 at 10:33
  • Could you post that code? That seems odd as this snippet explicitly tells the replace to place the `~` _after_ the $0. – asontu Aug 22 '14 at 11:42