2

have an issue with this. I am not sure how to go about showing a single img. For example:

<img srcset="http://i4.manchestereveningnews.co.uk/incoming/article13390833.ece/ALTERNATES/s180/Mike-Grimshaw-34-was-fatally-attacked-following-the-attack-outside-his-Trafford-home-last-Thursday.jpg 180w, http://i4.manchestereveningnews.co.uk/incoming/article13390833.ece/ALTERNATES/s390/Mike-Grimshaw-34-was-fatally-attacked-following-the-attack-outside-his-Trafford-home-last-Thursday.jpg 390w, http://i4.manchestereveningnews.co.uk/incoming/article13390833.ece/ALTERNATES/s458/Mike-Grimshaw-34-was-fatally-attacked-following-the-attack-outside-his-Trafford-home-last-Thursday.jpg 458w" src="http://i4.manchestereveningnews.co.uk/incoming/article13390833.ece/ALTERNATES/s615/Mike-Grimshaw-34-was-fatally-attacked-following-the-attack-outside-his-Trafford-home-last-Thursday.jpg">

As you can see above, there are different alternative images, however i am trying to scrape a single one to be shown.

import bs4 as bs
import urllib.request
import datetime
import random 
import re


random.seed(datetime.datetime.now())

sauce = urllib.request.urlopen('http://www.manchestereveningnews.co.uk/news/greater-manchester-news').read()
soup = bs.BeautifulSoup(sauce, 'lxml')

# 




title = soup.title
link = soup.link
image = re.search(img 'srcset=img(.*?),)  
 #this doesnt work, not sure how to 

strong = soup.strong
description = soup.description
location = soup.location


title = soup.find('h1', class_ ='publication-font', )   

image = soup.find('img')
strong = soup.find('strong')
location = soup.find('em').find('a')
description = soup.find('div', class_='description',to.text)


#Previous Code
print("H1:", title.text)
print("Article Link:", link)
print("Image Url:\n", image)
print("1st Paragraph:\n", strong.text)
print("2nd Paragraph:\n", description.string)
print("Location:\n", location.text)

My code is above, however the previous result when on my previous try would show:

Greater Manchester News
<link href="rss.xml" rel="alternate" title="Default home feed" 

type="application/rss+xml"/>

<img data-`src="http://i4.manchestereveningnews.co.uk/incoming/article13390833.ece/ALTERNA`TES/s615/Mike-Grimshaw-34-was-fatally-attacked-following-the-attack-outside-his-Trafford-home-last-Thursday.jpg" data-`srcset="http://i4.manchestereveningnews.co.uk/incoming/article13390833.ece/ALTE`RNATES/s180/Mike-Grimshaw-34-was-fatally-attacked-following-the-attack-outside-his-Trafford-home-last-Thursday.jpg 180w,` http://i4.manchestereveningnews.co.uk/incoming/article13390833.ece/ALT`ERNATES/s

390/Mike-Grimshaw-34-was-fatally-attacked-following-the-attack-outside-his-`Trafford-home-last-Thursday.jpg 390w, `http://i4.manchestereveningnews.co.uk/incoming/article13390833.ece/ALTERNATES/s458/Mike-Grimshaw-34-was-fatally-attacked-following-t`he-attack-outs`ide-his-

Trafford-home-last-Thursday.jpg 458w"/>
        Family of dad stabbed in the neck while defendin

g his fiancée from thugs speak of their heartbreak
        Mike Grimshaw, 34, died after being stabbed in the neck outside his 

home in Trafford last Thursday

Trafford

In the results, shows multiple image names, however i am trying to only show a single image link. How do i go about doing this.

Any ideas would be much appreciated.

UmarZaii
  • 1,355
  • 1
  • 17
  • 26
Amir Shaw
  • 135
  • 1
  • 11

1 Answers1

0

You can access the attribute data-src or data-srcset to get the image you want :

image = soup.find('img')
single_img = image.get('data-src') # return the main image link

or

import re
image = soup.find('img')
img_string = image.get('data-srcset') # this return a string you have to parse 
img_set = re.findall(r'(https?://[^\s]+)', img_set) # regex to match only links

Then you can access whatever index you want in img_set (just test the length of the list before)

PRMoureu
  • 12,817
  • 6
  • 38
  • 48