0

I have a string:

test=' 40  virtual asset service providers law, 2020e section 1  c law 14 of 2020   page 5  cayman islands'

I want to match all occurrences of a digit, then print not just the digit but the three characters either side of the digit.

At the moment, using re I have matched the digits:

print (re.findall('\d+', test ))
['40', '2020', '1', '14', '2020', '5']

I want it to return:

[' 40  v', 'w, 2020e s', 'aw 14 of', 'of 2020   ', 'ge 5  c']
agorapotatoes
  • 311
  • 2
  • 16

3 Answers3

5

Use . to capture any character and then {0,3} to capture up to 3 characters on each side

print(re.findall('.{0,3}\d+.{0,3}', test))
Or Y
  • 2,088
  • 3
  • 16
2
re.findall(".{0,3}\d+.{0,3}", test)

The {0,3} "greedy" quantifier match at most 3 characters.

1

Here you go:

re.findall('[^0-9]{0,3}[0-9]+[^0-9]{0,3}', test)

[EDIT]
Breaking the pattern down:
'[^0-9]{0,3}' matches up to 3 non-digit characters
'[0-9]+' matches one or more digits

The final pattern '[^0-9]{0,3}[0-9]+[^0-9]{0,3}' matches one or more digits surrounded by up to 3 non-digits on either side.

To reduce confusion, I am in favor of using '[^0-9]{0,3}' instead of '.{0,3}' (as mentioned in other answers) in the pattern, because it explicitly tells that non-digits need to be matched. '.' could be confusing because it matches any literal (including digits).

Puneet Singh
  • 344
  • 3
  • 12
  • 2
    Hello! While this code may solve the question, including an explanation of how and why this solves the problem would really help to improve the quality of your post, and probably result in more up-votes. Remember that you are answering the question for readers in the future, not just the person asking now. Please edit your answer to add explanations and give an indication of what limitations and assumptions apply. – – SherylHohman Aug 13 '20 at 20:08
  • 1
    Thanks for bringing that to my notice. – Puneet Singh Aug 14 '20 at 06:45