How to capture string of characters from where it is indicated to the first point followed by a line break?

Question

import re

x = """44
5844 44554  Hi hi!   , sahhashash; asakjas. jjksakjaskjas.
ooooooppkkk"""

#both initial after the last line break that they have within their capture range
# ((?:\w+)?)   ---> with a capturing group this pattern can capture a substring of alphanumeric characters (uppercase and lowercase) until it is taken with a space, a comma or a dot
# ((?:\w\s*)+)   ---> this pattern is similar to the previous one but it does not stop when finding spaces
regex_patron_m1 = r"\s*((?:\w+)?) \s*\¿?(?:del |de |)\s*((?:\w\s*)+)\s*\??"

m1 = re.search(regex_patron_m1, x, re.IGNORECASE) #Con esto valido la regex haber si entra o no en el bloque de code

if m1:
    word, association = m1.groups()
    
    print(repr(word)) #print captured substring by first capture group
    print(repr(association)) #print captured substring by second capture group

The output that I get with this two patterns

'5844'
'44554  Hi hi'

What should I modify to get the following? since I don't understand why both capture groups start their capture after the newline

And what should I do so that the capture of the second capture group is up to the full stop point ".[\s|]*\n*" or ".\n*"? To get

'44'
'5844 44554  Hi hi!   , sahhashash; asakjas. jjksakjaskjas.'

And if I didn't want it to stop at the line break, to get something like this, what should I do?

'44'
'5844 44554  Hi hi!   , sahhashash; asakjas. jjksakjaskjas.
ooooooppkkk'

Please stop using `print(repr(word))`. Use `print(word)` or you might see something you do not expect. — Wiktor Stribiżew, Dec 13 '22 at 13:12
I only use that in the test to check the data format and because trailing and leading spaces and line breaks are important. It is also a way to check if it justifies or not using the `.strip()` . It is not the idea that it is in the final code, but it is more to visualize possible errors — Matt095, Dec 13 '22 at 13:24

D_action · Answer 1 · 2022-12-14T09:55:00.777

0

try this expression:

((\w+(\r|\n).*.)\n\w*)

group 1:

44
5844 44554  Hi hi!   , sahhashash; asakjas. jjksakjaskjas.
ooooooppkkk

group 2:

44
5844 44554  Hi hi!   , sahhashash; asakjas. jjksakjaskjas.

hope this is what you were looking for.

EDIT: I don't know if I well undertood your question, but try this one:

(\w+)\n(.*\n\w+)

demo here

edited Dec 14 '22 at 09:55

answered Dec 13 '22 at 13:30

D_action

83
6

That expression corresponds to the first or second capture group? For some reason it is not working for me – Matt095 Dec 13 '22 at 13:37
the expression captures both groups. see the demo here : https://regex101.com/r/cUbVKQ/1 – D_action Dec 13 '22 at 13:38
I didn't try it with your code – D_action Dec 13 '22 at 13:39
The problem is that there are 2 capture groups, one that does not accept empty spaces, and the other that does, but both must be limited by line breaks. – Matt095 Dec 13 '22 at 13:50
I see your last edit, I need 2 capture groups that captures, `((?:\w+)?)` --> "44" , `((?:\w\s*)+)` --> `"5844 44554 Hi hi! , sahhashash; asakjas. jjksakjaskjas."` , but I can't capture those 2 substrings using my capture groups, so I needed help to find some capture groups that do work. I have tried to do this with the capture groups that you gave me `r"\s*((\w+(\r|\n).*.)\n\w*) \s*\¿?(?:del |de |)\s*(\w+)\n(.*\n\w+)\s*\??"` ,but it did not work directly for me – Matt095 Dec 14 '22 at 14:03
so you want the first group to capture alpha numeric chracters and stop at first occurence of comma dot or space, and the second group to capture from the begining any alphanumeric character with comma, dot and space BUT stop at the end of line??is that what you want? – D_action Dec 16 '22 at 13:41

score 0 · Answer 2 · answered Dec 13 '22 at 13:42

0

Create a string containing line breaks

Newline code \n（LF）, \r\n（CR + LF）

Triple quote ''' or """

With indent

Concatenate a list of strings on new lines

Split a string into a list by line breaks: splitlines()

Remove or replace line breaks

Output with print() without a trailing newline

answered Dec 13 '22 at 13:42

Anmol Madhav-12215512

1
1

The problem is that there are 2 capture groups, one that does not accept empty spaces, and the other that does, but both must be limited by line breaks. The capture groups that I put have the problem that they do not start in the correct place (as seen in the example), but rather do so after a line break, also they do not stop their capture in the proper place – Matt095 Dec 13 '22 at 13:52

How to capture string of characters from where it is indicated to the first point followed by a line break?

2 Answers2