-1

I am trying to develop a python software for extracting prices by web scraping from these sites:

https://p2p.binance.com/es/trade/all-payments/USDT?fiat=ARS

https://www.kucoin.com/es/otc/buy/USDT-ARS

I used as a guide a simple youtube tutorial where they use the extension "beautiful soup" to scrape data.

They use a code like this to extract the data in the first place:

import requests
import pandas as pd
url='https://resultados.as.com/resultados/futbol/primera/2021_2022/'
page=requests.get(url)
soup=BeautifulSoup(page.content,'html.parser')

#Equipos
count=0
eq=soup.find_all('a', class_='nombre-equipo')
equipos=list()
for i in eq:
    if count<20:
        equipos.append(i.text)   
    else:
        break
    count+=1

In this example, you simply need to set the Web page where the data will be scraped from (url) and the data location (soup.find_all(xxxxxxx)).

I did that with both urls from binance and kucoin, and inspected the pages to obtain the corrects data classes:

1

with this example, line eq=soup.find_all('a', class_='nombre-equipo'), should be:

eq=soup.find_all('div', class_='css-1m1f8hn')

But I can´t make it to scrape any data. Do you have any idea, or maybe some other web scraper to use?

Vladislav Povorozniuc
  • 2,149
  • 25
  • 26
Alan
  • 1
  • 2
  • Can you tell us, where exactly are you stuck – Himanshu Poddar Aug 16 '22 at 14:09
  • [Scrapy](https://docs.scrapy.org/en/latest/intro/overview.html) is a framework for crawling web sites and data mining – Vladislav Povorozniuc Aug 16 '22 at 14:20
  • That is the example that works. It doesnt work if you set the following parameters: from bs4 import BeautifulSoup import requests import pandas as pd url='https://p2p.binance.com/es/express/buy/USDT/ARS' page=requests.get(url) soup=BeautifulSoup(page.content,'html.parser') #Equipos count=0 eq=soup.find_all('text', class_='css-1c1ahuy') print(eq) equipos=list() for i in eq: if count<10: equipos.append(i.text) print(equipos) else: break count+=1 print(equipos) – Alan Aug 16 '22 at 14:24
  • @Aln it does not works for P2P because both pages may have different structure – Himanshu Poddar Aug 16 '22 at 14:33
  • @Himanshuman any idea which extension could work with that structure? – Alan Aug 16 '22 at 16:06
  • You ll have to code accordingly by studying the DOM of that website. – Himanshu Poddar Aug 16 '22 at 16:21
  • different pages may have different structures and they need different code to get data - there is NO extension to do it. You have to manually analyze HTML on page and create code exactly for this page. – furas Aug 16 '22 at 17:04
  • you may have the most common problem: page may use `JavaScript` to add/update elements but `BeautifulSoup`/`lxml`, `requests`/`urllib` can't run `JS`. You may need [Selenium](https://selenium-python.readthedocs.io/) to control real web browser which can run `JS`. OR use (manually) `DevTools` in `Firefox`/`Chrome` (tab `Network`) to see if `JavaScript` reads data from some URL. And try to use this URL with `requests`. `JS` usually gets `JSON` which can be easy converted to Python dictionary (without `BS`). You can also check if page has (free) `API` for programmers. – furas Aug 16 '22 at 17:06
  • Thank you @furas ! I think it might be that..I will try with Selenium.. I didnt find in the api information regarding p2p .. – Alan Aug 16 '22 at 18:55
  • I tried with Selenium but got the same results... any other idea? – Alan Aug 16 '22 at 20:36
  • if you tried with Selenium then you could show code in question. Sometimes JavaScript may need time to add all elements and you may need `sleep()` or use special function to wait for data. But at least with Selenium you have opened browser and you can see if it loads expected page. And you may check if it has the same classes as before - because sometimes pages may use random names for classes to block bots/scripts. – furas Aug 16 '22 at 22:19

1 Answers1

0

This page uses JavaScript to add items to page - so it may need to use Selenium to control real web browser which can run JavaScript.

But this makes other problems.

JavaScript sometimes needs time to add elements to page and it needs to use sleep() or special Waits to wait for elements.

Page ask for cookies and you have to accept it. It also display popups with informations which you have to close to get access to elements behind popups.

Page may have other problems - ie. it may display popups in different order and it may need to click then in different order.

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
#from selenium.webdriver.common.keys import Keys
#from selenium.webdriver.support.ui import WebDriverWait
#from selenium.webdriver.support import expected_conditions as EC
#from selenium.common.exceptions import NoSuchElementException, TimeoutException

#from webdriver_manager.chrome import ChromeDriverManager
from webdriver_manager.firefox import GeckoDriverManager

import time

url = 'https://p2p.binance.com/es/trade/all-payments/USDT?fiat=ARS'

#driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver = webdriver.Firefox(service=Service(GeckoDriverManager().install()))

print('> loading page')
driver.get(url)
time.sleep(10)

print('> accept cookies')
driver.find_element(By.XPATH, '//button[@id="onetrust-accept-btn-handler"]').click()
time.sleep(3)

print('> close popup with tutorials')
driver.find_element(By.XPATH, '//*[local-name() = "svg"][@class="css-1pcqseb"]').click()
time.sleep(3)

print('> close other popup')
driver.find_element(By.XPATH, '//button[@class=" css-fkqim7"]').click()
time.sleep(3)

print('> get data')
all_items = driver.find_elements(By.XPATH, '//div[@class="css-1m1f8hn"]')   
#all_items = driver.find_elements(By.XPATH, '//div[@class="css-1kj0ifu"]')
print('len(all_items):', len(all_items))

for item in all_items:
    print(item.text)
    #print(item.text.replace('\n', ' '))

BTW:

At the bottom of the page (in footer) you can see column Support and link to APIs which may shows how to get some data without scraping - but this need to register own application to get unique API-key. But first you have to check if API gives access to data which you need (because sometimes APIs give access only to selected elements or you have pay for access to more useful information)

furas
  • 134,197
  • 12
  • 106
  • 148