0

I have written a regex to identify the words only starting, only ending with a single quote (') or both starting/ending:

re_quotes_front = r"(?:\s+|^)(?P<g1>'\w+)(?=\s+|$)"
re_quotes_end = r"(?:\s+|^)(?P<g2>\w+')(?=\s+|$)"
re_quotes_front_end = r"(?:\s+|^)(?P<g3>'\w+')(?=\s+|$)"
generic_quotes = r"%s|%s|%s" % (re_quotes_front, re_quotes_end, re_quotes_front_end)

Now I want to replace the matching group with a space between ' and the matched word.

For example, let's say we have to match the regex with the below string:

snippet = "shot perhaps 'artistically' with 'handheld cameras 'devices'"

I want to replace all of them using one regex, i.e., generic_quotes. Can that be done?

The output should be:

"shot perhaps ' artistically ' with ' handheld cameras ' devices '"

Even for any one of the regex, the below does not work:

result = re.sub(re_quotes_front, r"\g<g1>".replace("'", " ' "), snippet)

I should get the word 'handheld converted to ' handheld (note the space before and after the quote) and that's not happening. I couldn't understand how this is working. Rather, I am getting the output:

"shot perhaps 'artistically' with'handheld cameras'em 'asdsd'"

How can I reference the word using the <group_name1> and modify it?

Pankaj
  • 346
  • 1
  • 3
  • 14
  • artistically and devices also start with a single quote. Why aren't they affected? – CinCout Sep 20 '19 at 07:13
  • I suspect you need `re.sub(r"(\s|^)(')(\w+)(?=\s|$)", r'\1\2 \3', text)`, see https://regex101.com/r/XZFqYG/2. Surely you may [use a callback](https://stackoverflow.com/questions/53282036) where you could modify the whole match value, but it will be less efficient – Wiktor Stribiżew Sep 20 '19 at 07:14
  • Can you modify the regex? Or only code? The point is, it is much harder to modify just a group contents inside `re.sub`, it is easier to capture substrings and use backreferences in the string replacement pattern. – Wiktor Stribiżew Sep 20 '19 at 07:18
  • @CinCout : The regex is to match only the words beginning with quote and not the word both beginning and ending with quote. Hence, only `'handheld` – Pankaj Sep 20 '19 at 07:50
  • 1
    So, please explain, update the question. – Wiktor Stribiżew Sep 20 '19 at 07:57
  • @WiktorStribiżew Updated. I don't know was I able to frame it properly or not. Please let me know if the updated question is confusing. I wanted to understand why the `.replace("'"," ' ")` doesn't work in `re.sub()`. Thanks! – Pankaj Sep 20 '19 at 08:13
  • 1
    It is already answered on SO: the `r"\g".replace("'", " ' ")` just returns `\g` and this is the replacement pattern. You do not replace quotes in the match, you replace them in the `\g` string. – Wiktor Stribiżew Sep 20 '19 at 08:16
  • @WiktorStribiżew Thanks Wiktor. I got the point, I got a workaround as below, which is not efficient I know: `result = re.sub(generic_quotes, add_spaces, temp) def add_spaces(match_obj): match_str = match_obj.group(0) result = match_str.replace("'", " ' ") return " " + result.strip()` – Pankaj Sep 20 '19 at 08:53
  • Just use a lambda or a `def` as the replacement argument. – Wiktor Stribiżew Sep 20 '19 at 08:53
  • 1
    I have a question, if I do a match_obj.group(1), I get an error. I have modified the regex as: `re_quotes_front = r"(?:\s+|^)(')(\w+)(?=\s+|$)" re_quotes_end = r"(?:\s+|^)(\w+)(')(?=\s+|$)" re_quotes_front_end = r"(?:\s+|^)(')(\w+)(')(?=\s+|$)" generic_quotes = r"%s|%s|%s" % (re_quotes_front, re_quotes_end, re_quotes_front_end)` There is group 1 in all the regex then why the error? – Pankaj Sep 20 '19 at 08:53

0 Answers0