0

I am trying to parse twitter desired outputs are URL of tweet, date of tweet, Sender and the twit itself. there no errors but the result is empty. i could not find the problem the code is hereunder:if you could help me out it would be great hence i would be using the data in my thesis

from bs4 import BeautifulSoup
import urllib.request
import openpyxl
wb= openpyxl.load_workbook('dene1.xlsx')
sheet=wb.get_sheet_by_name('Sayfa1')
headers = {}
headers['User-Agent'] = "Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.27 Safari/537.17"
url = 'https://twitter.com/search?q=TURKCELL%20lang%3Atr%20since%3A2012-01-01%20until%3A2012-01-09&src=typd&lang=tr'
req = urllib.request.Request(url, headers = headers)
resp = urllib.request.urlopen(req)
respData = resp.read()
soup = BeautifulSoup(respData , 'html.parser')
gdata = soup.find_all("div", {"class": "content"})
for item in gdata:
    try:
        items2 = item.find('a', {'class': 'tweet-timestamp js-permalink js-nav js-tooltip'})
        items21=items2.get('href')
        items22=items2.get('title')
    except:
        pass
    try:
        items1 = item.find('span', {'class': 'username js-action-profile-name'}).text
    except:
        pass
    try:
        items3 = item.find('p', {'class': 'TweetTextSize js-tweet-text tweet-text'}).text
        sheet1=sheet.append([items21, items22,items1,items3])
    except:
        pass
wb.save('dene1.xlsx')

regards

metiny
  • 1

1 Answers1

0

Eevery line in your excepts causes an error at least once, you never see them as you use blank excepts to literally catch every exception:

import urllib.request
from bs4 import BeautifulSoup


headers = {
    'User-Agent': "Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.27 Safari/537.17"}

url = 'https://twitter.com/search?q=TURKCELL%20lang%3Atr%20since%3A2012-01-01%20until%3A2012-01-09&src=typd&lang=tr'
req = urllib.request.Request(url, headers = headers)
resp = urllib.request.urlopen(req)
respData = resp.read()

soup = BeautifulSoup(respData, 'html.parser')
gdata = soup.find_all("div", {"class": "content"})
for item in gdata:
    items2 = item.find('a', {'class': 'tweet-timestamp js-permalink js-nav js-tooltip'}, href=True)
    if items2:
        items21 = items2.get('href')
        items22 = items2.get('title')
        print(items21)
        print(items22)
    items1 = item.find('span', {'class': 'username js-action-profile-name'})
    if items1:
        print(items1.text)
    items3 = item.find('p', {'class': 'TweetTextSize js-tweet-text tweet-text'})
    if items3:
        print(items3.text)

Now you can see lots of output.

Padraic Cunningham
  • 176,452
  • 29
  • 245
  • 321