How can I get letters in an expression on Python

Question

I have this expression:

<a class="a-link-normal" href="https://www.amazon.it/Philips-GC8735-PerfectCare-Generatore-Vapore/dp/B01J5FGW66/ref=gbph_img_s-3_7347_c3de3e94?smid=A11IL2PNWYJU7H&amp;pf_rd_p=82ae57d3-a26a-4d56-b221-3155eb797347&amp;pf_rd_s=slot-3&amp;pf_rd_t=701&amp;pf_rd_i=gb_main&amp;pf_rd_m=A11IL2PNWYJU7H&amp;pf_rd_r=MDQJBKEMGBX38XMPSHXB" id="dealImage"></a>

And I need to get the 10 letters next to "/dp/" (B01J5FGW66)

How can I make a function that do this?

FlyingTeller · Accepted Answer · 2018-09-07T16:23:38.037

2

Using regex:

import re
s = '<a class="a-link-normal" href="https://www.amazon.it/Philips-GC8735-PerfectCare-Generatore-Vapore/dp/B01J5FGW66/ref=gbph_img_s-3_7347_c3de3e94?smid=A11IL2PNWYJU7H&amp;pf_rd_p=82ae57d3-a26a-4d56-b221-3155eb797347&amp;pf_rd_s=slot-3&amp;pf_rd_t=701&amp;pf_rd_i=gb_main&amp;pf_rd_m=A11IL2PNWYJU7H&amp;pf_rd_r=MDQJBKEMGBX38XMPSHXB" id="dealImage"></a>'
print(re.search(r"dp\/([A-Za-z0-9]{10})\/", s)[1])

Output:B01J5FGW66

Explanation:

begin at "dp/":

dp\/

capture group delimited by () matching 10 (through {10}) small letters(a-z), capital letters(A-Z) and numbers(0-9):

([A-Za-z0-9]{10})

end at "/":

\/

using re.search we can search for that expression in your string sand acces the results for the 1st capture group with [1].

Note that you might want to add extra code in case no match is found:

m = re.search(r"dp\/([A-Za-z0-9]{10})\/", s)
if m is not None:
    print(m[1])
else:
    # if nothing is found, search return None
    print("No match")

edited Sep 07 '18 at 16:23

answered Sep 07 '18 at 15:03

FlyingTeller

17,638
3
38
53

Thanks so much, I understand everything, but I have a last error: **return _compile(pattern, flags).search(string) TypeError: expected string or bytes-like object** I tried with str() but do not works – M. T. Sep 07 '18 at 15:15
What is the type of the variable you are giving to `search` as second argument – FlyingTeller Sep 07 '18 at 16:14
the type is **** how can I resolve this? – M. T. Sep 07 '18 at 17:34
Try adding `["href"]` – FlyingTeller Sep 07 '18 at 17:45
What do you mean for "add"? I have a variable that contain the scrape of the page how can I do the "adding"? – M. T. Sep 07 '18 at 17:49
if your variable was named `s`, try `re.search(r"dp\/([A-Za-z0-9]{10})\/", s["href"])` – FlyingTeller Sep 07 '18 at 17:51
it says `TypeError: 'NoneType' object is not subscriptable` – M. T. Sep 07 '18 at 17:55

score 0 · Answer 2 · answered Sep 07 '18 at 15:08

I assume you always just want what's between the slashes next to dp (the next route), and that the 10 characters is kind of irrelevant. A little clunky, but this works:

>>> x = '<a class="a-link-normal" href="https://www.amazon.it/Philips-GC8735-PerfectCare-Generatore-Vapore/dp/B01J5FGW66/ref=gbph_img_s-3_7347_c3de3e94?smid=A11IL2PNWYJU7H&amp;pf_rd_p=82ae57d3-a26a-4d56-b221-3155eb797347&amp;pf_rd_s=slot-3&amp;pf_rd_t=701&amp;pf_rd_i=gb_main&amp;pf_rd_m=A11IL2PNWYJU7H&amp;pf_rd_r=MDQJBKEMGBX38XMPSHXB" id="dealImage"></a>'
>>> splits = x.split("/")
>>> dp_index = splits.index('dp')
>>> result = splits[dp_index+1] # Get the next one over
>>> result
'B01J5FGW66'

to put it in a funciton, you can do it like this:

def get_route_next_to_dp(html_str):
    splits = html_str.split("/")
    dp_index = splits.index('dp')
    result = splits[dp_index+1] # Get the next one over
    return result

Usage might look like:

html_str = '<a class="a-link-normal" href="https://www.amazon.it/Philips-GC8735-PerfectCare-Generatore-Vapore/dp/B01J5FGW66/ref=gbph_img_s-3_7347_c3de3e94?smid=A11IL2PNWYJU7H&amp;pf_rd_p=82ae57d3-a26a-4d56-b221-3155eb797347&amp;pf_rd_s=slot-3&amp;pf_rd_t=701&amp;pf_rd_i=gb_main&amp;pf_rd_m=A11IL2PNWYJU7H&amp;pf_rd_r=MDQJBKEMGBX38XMPSHXB" id="dealImage"></a>'
route_next_to_dp = get_route_next_to_dp(html_str)
print(route_next_to_dp)

outputs

'B01J5FGW66'

as desired.

score 0 · Answer 3 · answered Sep 07 '18 at 15:18

Try this: it basically uses regular expression and count the next 10 strings and check if it is found.

import re
my_string='<a class="a-link-normal" href="https://www.amazon.it/Philips-GC8735-PerfectCare-Generatore-Vapore/dp/B01J5FGW66/ref=gbph_img_s-3_7347_c3de3e94?smid=A11IL2PNWYJU7H&amp;pf_rd_p=82ae57d3-a26a-4d56-b221-3155eb797347&amp;pf_rd_s=slot-3&amp;pf_rd_t=701&amp;pf_rd_i=gb_main&amp;pf_rd_m=A11IL2PNWYJU7H&amp;pf_rd_r=MDQJBKEMGBX38XMPSHXB" id="dealImage"></a>'
m = re.search(r"dp\/([A-Za-z0-9]{10})\/", my_string)
if m.group(1):
    print(m.group(1))

How can I get letters in an expression on Python

3 Answers3