0

I tried to web-scraping from a website with Python 2.7, where there is a table which one have to loading. If I am trying to web-scraping it, i get only that : "Loading" or "Sorry, we don't have any information about it" because it has to loading first..

I read some article and code, but nothing worked.

My codes:

import urllib2, sys
from BeautifulSoup import BeautifulSoup
import json

site= "https://www.flightradar24.com/data/airports/bud/arrivals"
hdr = {'User-Agent': 'Mozilla/5.0'}
req = urllib2.Request(site,headers=hdr)
page = urllib2.urlopen(req)
soup = BeautifulSoup(page)
nev = soup.find('h1' , attrs={'class' : 'airport-name'})
print nev

table = soup.find('div', { "class" : "row cnt-schedule-table" })
print table

import urllib2
from bs4 import BeautifulSoup
import json

# new url      
url = 'https://www.flightradar24.com/data/airports/bud/arrivals'

# read all data
page = urllib2.urlopen(url).read()

# convert json text to python dictionary
data = json.loads(page)

print(data['row cnt-schedule-table'])
tardos93
  • 235
  • 2
  • 17
  • That data is usually load by ajax and sometimes from javascript's vars. You need to find the source and get the information from it. – VMRuiz Jul 25 '17 at 07:07
  • 1
    Use tools like fiddler, charles proxy. For this instance this is your ajax api call https://api.flightradar24.com/common/v1/airport.json?code=bud&plugin[]=&plugin-setting[schedule][mode]=arrivals&plugin-setting[schedule][timestamp]=1500966512&page=2&limit=50&token= – Aki003 Jul 25 '17 at 07:11
  • This link is not good idea for me, because some informations lost with this way. – tardos93 Jul 25 '17 at 07:17

1 Answers1

1

I am also facing this issue ..you can use python selenium package. we need to wait for loading your table so i used time.sleep () but this is not correct way .you can use wait.until("element")method PFB sample code for login

from bs4 import BeautifulSoup
from selenium import webdriver
import time
profile=webdriver.FirefoxProfile()
profile.set_preference("intl.accept_languages","en-us")
driver = webdriver.Firefox(firefox_profile=profile)
driver.get("https://www.flightradar24.com/data/airports/bud/arrivals")
time.sleep(10)
html_source=driver.page_source
soup=BeautifulSoup(html_source,"html.parser")
print soup

reference link.

Selenium waitForElement

Nandha Kumar
  • 413
  • 8
  • 18
  • Is it risky if i am using time.sleep? time.sleep(10) is enough or it depends on the hardver and the internet connection? – tardos93 Jul 25 '17 at 07:18
  • ya i know that's why i mention this not corect way ...so we can use selenium api wait.untill() this method wait until table content(table element ) populate .. – Nandha Kumar Jul 25 '17 at 07:23
  • Hm. I tried to insert this time.sleep under the "page =urllib2...." but i got this error message : webdriver.Firefox.implicitly_wait(30) TypeError: unbound method implicitly_wait() must be called with WebDriver instance as first argument (got int instance instead) This is the code: webdriver.Firefox.implicitly_wait(30) – tardos93 Jul 25 '17 at 07:27
  • time.sleep is not suitable for urllib. u need to use selenium package – Nandha Kumar Jul 25 '17 at 07:30
  • Uh, can you update it in your answer too with my code for get a readable version? And i can accept it – tardos93 Jul 25 '17 at 07:33
  • @karnaf a) as far as i know there is no conflict between `time.sleep` and `urllib`, b) use `#` for commens in python ( `//` is floor division ) – t.m.adam Jul 25 '17 at 07:47