re.findall() returning a list of sets

Question

I made a basic regex to find a url:

([a-zA-Z0-9]+\.|)([a-zA-Z0-9\-])+\.[a-z]+[a-zA-Z0-9\?\/\=\-\_]*

([a-zA-Z0-9]+\.|) For a subdomain ([a-zA-Z0-9\-])+ for the hostname \.[a-z]+for the domain [a-zA-Z0-9\?\/\=\-\_]* for the path

When I run this basic program

text = "test.google.com test.google.com"
urls = re.findall("([a-zA-Z0-9]+\.|)([a-zA-Z0-9\-])+\.[a-z]+[a-zA-Z0-9\?\/\=\-\_]*", text)
print(urls)

I get this output [('test.', 'e'), ('test.', 'e')]

I assume it has something to do with my regex, but what? Thanks!

because re.findall would return all the captured chars when capturing group exists — Avinash Raj, Jul 28 '19 at 10:46
You're capturing two groups, so you get a 2-tuple for each of those, and then a list because you're using findall. — 9769953, Jul 28 '19 at 10:47
You need to put that '+' inside the parenthesis so that all characters are captured (not only 1) — Victor Ruiz, Jul 28 '19 at 10:49
See https://stackoverflow.com/questions/31915018/re-findall-behaves-weird — The fourth bird, Jul 28 '19 at 10:51

score -1 · Answer 1 · answered Jul 28 '19 at 10:48

-1

The parentheses denote capture groups and this is what is getting returned from findall

answered Jul 28 '19 at 10:48

Kevin Glasson

408
2
13

Avinash Raj · Answer 2 · 2019-07-28T10:58:25.090

-1

Because re.findall would return all the captured chars when capturing group exists. Remove the capturing group or turning it to a non-capturing group will return all the matched chars.

(?:[a-zA-Z0-9]+\.)?[a-zA-Z0-9\-]+\.[a-z]+[a-zA-Z0-9\?\/\=\-\_]*

https://regex101.com/r/efXF9D/1/

or

If you want to capture each part separately then you have to use appropriate capturing group for each.

(?:([a-zA-Z0-9]+)\.)?([a-zA-Z0-9\-]+)\.([a-z]+)([a-zA-Z0-9\?\/\=\-\_]*)

https://regex101.com/r/efXF9D/2/

edited Jul 28 '19 at 10:58

answered Jul 28 '19 at 10:49

Avinash Raj

172,303
28
230
274

Ah! Thank you for your help. I see where I went wrong :) – wtreston Jul 28 '19 at 10:50

re.findall() returning a list of sets

2 Answers2