-1

I am trying to exclude delimiters within text qualifiers. For this, I am trying to use Regex. However, I am new to Regex and am not able to fully accomplish my needs. I would be very greatful if someone can help me out.

In Alteryx, I load a delimited flat text file as 'non-delimited' and say that it does not have text qualifiers. Thus, the input will look something like this:

  1. "aabb"|ccdd|eeff|gghh
  2. "aa|bb"|ccdd|eeff|gghh
  3. "aa|bb"|ccdd|"ee|ff"|gghh
  4. "aa|bb"|"cc|dd"|"ee|ff"|"gg|hh"
  5. "aabb"|"ccdd"|"eeff"|"gghh"
  6. "aabb"|"ccdd"|"eeff"|"gg|hh"
  7. aabb|ccdd|eeff|gghh
  8. "aa|bb"|ccdd|eeff|"gg|hh"
  9. aabb|cc|dd|eeff|gghh
  10. aabb|"cc||dd"|eeff|gghh
  11. aabb|"c|c|dd"|eeff|gghh
  12. "aa||bb"|ccdd|eeff|gghh
  13. "a|a|b|b"|ccdd|eeff|gghh
  14. "aabb"|ccdd|eeff|"g|g|hh"
  15. "aabb"|ccdd|eeff|"gg||hh"

I want to exclude all delimiters that are in between text qualifiers.

I have tried to use Regex to replace the delimiters within text qualifiers with nothing.

So far, I have tried the following Regex code for my target:

(")(.*?[^"])\|+(.*?)(")

And I have used the following for my replace: $1$2$3$4

However, this will not fix te lines 11, 13, 14 and 15.

I wish to obtain the following results:

  1. "aabb"|ccdd|eeff|gghh
  2. "aabb"|ccdd|eeff|gghh
  3. "aabb"|ccdd|"eeff"|gghh
  4. "aabb"|"ccdd"|"eeff"|"gghh"
  5. "aabb"|"ccdd"|"eeff"|"gghh"
  6. "aabb"|"ccdd"|"eeff"|"gghh"
  7. aabb|ccdd|eeff|gghh
  8. "aabb"|ccdd|eeff|"gghh"
  9. aabb|cc|dd|eeff|gghh
  10. aabb|"ccdd"|eeff|gghh
  11. aabb|"ccdd"|eeff|gghh
  12. "aabb"|ccdd|eeff|gghh
  13. "aabb"|ccdd|eeff|gghh
  14. "aabb"|ccdd|eeff|"gghh"
  15. "aabb"|ccdd|eeff|"gghh"

Thank you in advance for helping me out!

With kind regards, Robin

Kannappan Sirchabesan
  • 1,353
  • 11
  • 21
Robin
  • 1
  • 4
  • In my target, (")(.?[^"])\|+(.?)("), there is a star symbol in between the dot and the question mark. Somehow, the star sign * is not showing. – Robin Jan 20 '19 at 14:53
  • In line 14 and 15, I want my target to be: 14. "g|g|hh" 15. "gg||hh" However, the target start at the first " sign. I don't know how to ignore text qualifiers in my target that do not encapsulate a delimiter. – Robin Jan 20 '19 at 14:57

1 Answers1

0

I can't think of the correct syntax in REGEX unless you are putting in each pattern that could be found.

However, an easier way (maybe not as performant), would be to use a Text to Columns selecting Ignore delimiters in quotes. If you need it back together in one cell afterwards, you can transpose, then remove delimiters followed by a Summarize to concatenate each RecordID Group.

KaneG
  • 146
  • 4