20
for imgsrc in Soup.findAll('img', {'class': 'sizedProdImage'}):
    if imgsrc:
        imgsrc = imgsrc
    else:
        imgsrc = "ERROR"

patImgSrc = re.compile('src="(.*)".*/>')
findPatImgSrc = re.findall(patImgSrc, imgsrc)

print findPatImgSrc

'''
<img height="72" name="proimg" id="image" class="sizedProdImage" src="http://imagelocation" />

This is what I am trying to extract from and I am getting:

findimgsrcPat = re.findall(imgsrcPat, imgsrc)
File "C:\Python27\lib\re.py", line 177, in findall
    return _compile(pattern, flags).findall(string)
TypeError: expected string or buffer

'''

soulcheck
  • 36,297
  • 6
  • 91
  • 90
phales15
  • 465
  • 4
  • 7
  • 18

4 Answers4

43

There is more simple solution:

 soup.find('img')['src']
StanleyD
  • 2,308
  • 22
  • 20
31

You're passing beautifulsoup node to re.findall. You have to convert it to string. Try:

findPatImgSrc = re.findall(patImgSrc, str(imgsrc))

Better yet, use the tools beautifulsoup provides:

[x['src'] for x in soup.findAll('img', {'class': 'sizedProdImage'})]

gives you a list of all src attributes of img tags of class 'sizedProdImage'.

soulcheck
  • 36,297
  • 6
  • 91
  • 90
0

In my example, the htmlText contains the img tag but it can be used for a URL too. See my answer here

from BeautifulSoup import BeautifulSoup as BSHTML
htmlText = """<img src="https://src1.com/" <img src="https://src2.com/" /> """
soup = BSHTML(htmlText)
images = soup.findAll('img')
for image in images:
    print image['src']
Abu Shoeb
  • 4,747
  • 2
  • 40
  • 45
0

You're creating an re object, then passing it into re.findall which expects a string as the first argument:

patImgSrc = re.compile('src="(.*)".*/>')
findPatImgSrc = re.findall(patImgSrc, imgsrc)

Instead, use the .findall method of the patImgSrc object you just created:

patImgSrc = re.compile('src="(.*)".*/>')
findPatImgSrc = patImgSrc.findall(imgsrc)
Kirk Strauser
  • 30,189
  • 5
  • 49
  • 65
  • Still getting the error: Traceback (most recent call last): File "C:\Users\BuyzDirect\Desktop\OverStock_Listing_Format_Tool.py", line 50, in findPatImgSrc = patImgSrc.findall(imgsrc) TypeError: expected string or buffer – phales15 Nov 27 '11 at 23:49