0

Basically, I'm trying to scrape a table in python using BeautifulSoup.

I've managed to scrape all the data in the other linked array, but for some reason when I add .text, it prints both the text and the text inside the span tag. The span text is not needed.

I've tried to do .string and .text.text, but it doesn't seem to work.

Can anyone spot the problem here?

Here is my code:

soup = BeautifulSoup(urllib2.urlopen('http://www.livefootballontv.com/').read())

for row in soup('div', {'id': 'tv-guide'})[0]('ul'):
    tds = row('li')
    print tds[0].string, tds[1].text, tds[1].span.string, tds[2].string, tds[3].img['alt'], '\n'
    db = MySQLdb.connect("127.0.0.1","root","","footballapp")
    cursor = db.cursor()
    sql = "INSERT INTO TVGuide(DATE, FIXTURE, COMPETITION, KICKOFF, CHANNELS) VALUES (%s,%s,%s,%s,%s)"
    results = (str(tds[0].string), str(tds[1]).text, str(tds[1].span.string), str(tds[2].string), str(tds[3].img['alt']))
    cursor.execute(sql, results)
    db.commit()
    db.rollback()
    db.close()

Then I am given

Sunday 22 June 2014 USA vs PortugalBrasil World Cup 2014 Group G Brasil World Cup 2014 Group G 11:00pm BBC1

Tuesday 24 June 2014 Costa Rica vs EnglandBrasil World Cup 2014 Group D Brasil World Cup 2014 Group D 5:00pm ITV

Steinar Lima
  • 7,644
  • 2
  • 39
  • 40
Thomas
  • 1,199
  • 3
  • 13
  • 25
  • possible duplicate of [Only extracting text from this element, not its children](http://stackoverflow.com/questions/4995116/only-extracting-text-from-this-element-not-its-children) – Steinar Lima Feb 12 '14 at 01:09

1 Answers1

1

Use contents, and access the entry you want.

Example:

from bs4 import BeautifulSoup
import urllib2

soup = BeautifulSoup(urllib2.urlopen('http://www.livefootballontv.com/').read())

for row in soup('div', {'id': 'tv-guide'})[0]('ul'):
    tds = row('li')
    print tds[1].contents[0]

Output:

SV Hamburg vs Bayern Munich
Arsenal vs Manchester United
Napoli vs Roma
...
USA vs Portugal
Costa Rica vs England
Steinar Lima
  • 7,644
  • 2
  • 39
  • 40
  • I found a [duplicate question](https://stackoverflow.com/questions/4995116/only-extracting-text-from-this-element-not-its-children) btw. You can also use `find(text=True, recursive=False)` – Steinar Lima Feb 12 '14 at 01:09
  • The First one worked perfectly, thank you very much :) Top Geezer – Thomas Feb 12 '14 at 01:21