5

While there are several posts on StackOverflow that are similar to this, none of them involve a situation when the target string is one space after one of the substrings.

I have the following string (example_string): <insert_randomletters>[?] I want this string.Reduced<insert_randomletters>

I want to extract "I want this string." from the string above. The randomletters will always change, however the quote "I want this string." will always be between [?] (with a space after the last square bracket) and Reduced.

Right now, I can do the following to extract "I want this string".

target_quote_object = re.search('[?](.*?)Reduced', example_string)
target_quote_text = target_quote_object.group(1)
print(target_quote_text[2:])

This eliminates the ] and that always appear at the start of my extracted string, thus only printing "I want this string." However, this solution seems ugly, and I'd rather make re.search() return the current target string without any modification. How can I do this?

jpp
  • 159,742
  • 34
  • 281
  • 339
Foobar
  • 7,458
  • 16
  • 81
  • 161

5 Answers5

4

Your '[?](.*?)Reduced' pattern matches a literal ?, then captures any 0+ chars other than line break chars, as few as possible up to the first Reduced substring. That [?] is a character class formed with unescaped brackets, and the ? inside a character class is a literal ? char. That is why your Group 1 contains the ] and a space.

To make your regex match [?] you need to escape [ and ? and they will be matched as literal chars. Besides, you need to add a space after ] to actually make sure it does not land into Group 1. A better idea is to use \s* (0 or more whitespaces) or \s+ (1 or more occurrences).

Use

re.search(r'\[\?]\s*(.*?)Reduced', example_string)

See the regex demo.

import re
rx = r"\[\?]\s*(.*?)Reduced"
s = "<insert_randomletters>[?] I want this string.Reduced<insert_randomletters>"
m = re.search(r'\[\?]\s*(.*?)Reduced', s)
if m:
    print(m.group(1))
# => I want this string.

See the Python demo.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Question - will this capture the target string if there is only one whitespace after the first substring? – Foobar Mar 31 '18 at 19:14
  • Yes, since the `\s*` will match zero or more whitespace characters.Check out the Python [docs](https://docs.python.org/2/library/re.html#re-syntax) if needed. – devsaw Mar 31 '18 at 19:22
  • @Roymunson If there is only one literal space, you may use a space. But if you are not sure, you may use any of the hints I added to the answer. BTW, `" +"` matches 1+ spaces, `" *"` matches 0 or more spaces. `\s+` will match 1+ whitespace chars, `\s*` will match 0+. – Wiktor Stribiżew Mar 31 '18 at 19:25
2

Regex may not be necessary for this, provided your string is in a consistent format:

mystr = '<insert_randomletters>[?] I want this string.Reduced<insert_randomletters>'

res = mystr.split('Reduced')[0].split('] ')[1]

# 'I want this string.'
jpp
  • 159,742
  • 34
  • 281
  • 339
1

The solution turned out to be:

target_quote_object = re.search('] (.*?)Reduced', example_string)
target_quote_text = target_quote_object.group(1)
print(target_quote_text)

However, Wiktor's solution is better.

Foobar
  • 7,458
  • 16
  • 81
  • 161
1

You [co]/[sho]uld use Positive Lookbehind (?<=\[\?\]) :

enter image description here

import re
pattern=r'(?<=\[\?\])(\s\w.+?)Reduced'

string_data='<insert_randomletters>[?] I want this string.Reduced<insert_randomletters>'

print(re.findall(pattern,string_data)[0].strip())

output:

I want this string.
Aaditya Ura
  • 12,007
  • 7
  • 50
  • 88
0

Like the other answer, this might not be necessary. Or just too long-winded for Python. This method uses one of the common string methods find.

  • str.find(sub,start,end) will return the index of the first occurrence of sub in the substring str[start:end] or returns -1 if none found.
  • In each iteration, the index of [?] is retrieved following with index of Reduced. Resulting substring is printed.
  • Every time this [?]...Reduced pattern is returned, the index is updated to the rest of the string. The search is continued from that index.

Code

s = ' [?] Nice to meet you.Reduced  efweww  [?] Who are you? Reduced<insert_randomletters>[?] I want this 
string.Reduced<insert_randomletters>'


idx = s.find('[?]')
while idx is not -1:
    start = idx
    end = s.find('Reduced',idx)
    print(s[start+3:end].strip())
    idx = s.find('[?]',end)

Output

$ python splmat.py
Nice to meet you.
Who are you?
I want this string.
devsaw
  • 1,007
  • 2
  • 14
  • 28