0

I used Selenium and Colab to download seller data. I have been unable to download the website content for a few days now.

The seller's details are visible in incognito mode. In normal mode, I have to log in to see the data.

How to deal with it?

My code:

# install chromium, its driver, and selenium
!apt update
!apt install chromium-chromedriver
!pip install selenium
!pip install dnspython
!pip install pipedrive-python-lib
# set options to be headless
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup
import pandas as pd
import pymongo
from pymongo import MongoClient
from datetime import date
from pipedrive.client import Client

options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
options.add_argument('--incognito')

url = 'https://allegro.pl/oferta/dr-coffee-f12-big-plus-ekspres-do-kawy-10196811305#aboutSeller'

wd = webdriver.Chrome(options=options)
wd.get(url)
wd.maximize_window()

soup = BeautifulSoup(wd.page_source, 'html.parser')

the content of the soup:

<html><head><title>allegro.pl</title><style>#cmsg{animation: A 1.5s;}@keyframes A{0% 
{opacity:0;}99%{opacity:0;}100%{opacity:1;}}</style><meta content="width=device-width, 
initial-scale=1.0" name="viewport"/></head><body style="margin:0"><script>var dd= 
{'cid':'AHrlqAAAAAMAPz5ltF-0LmMAI-N- 
0w==','hsh':'77DC0FFBAA0B77570F6B414F8E5BDB','t':'bv','s':29560,'host':'geo.captcha- 
delivery.com'}</script><script src="https://ct.captcha-delivery.com/c.js"></script> 
<script>if("string"==typeof navigator.userAgent&&navigator.userAgent.indexOf("Firefox")>-1) 
{var isIframeLoaded=!1,maxTimeoutMs=5e3;function iframeOnload(e){isIframeLoaded=!0;var 
a=document.getElementById("noiframe");a&&a.parentNode.removeChild(a)}var initialTime=(new 
Date).getTime();setTimeout(function(){isIframeLoaded||(new Date).getTime()- 
initialTime>maxTimeoutMs&&(document.body.innerHTML='<div id="noiframe">Please enable JS and 
disable any ad blocker</div>'+document.body.innerHTML)},maxTimeoutMs)}else function 
iframeOnload(){}</script><iframe border="0" frameborder="0" height="100%" 
onload="iframeOnload()" scrolling="yes" src="https://geo.captcha-delivery.com/captcha/? 
initialCid=AHrlqAAAAAMAPz5ltF-0LmMAI-N- 
0w%3D%3D&amp;hash=77DC0FFBAA0B77570F6B414F8E5BDB&amp;cid=T-jqitz6Xj5IAh.rlEft_uW8shQiyEx- 
q0h3fbxjp7ibFQdeKxAG4O8mJHUbhP_2L.dCWU9ZNi0VhRHr-_84zpxnvkfMwS- 
X8HKiPK8cue&amp;t=bv&amp;referer=https%3A%2F%2Fallegro.pl%2Foferta%2Fdr-coffee-f12-big-plus- 
ekspres-do-kawy-10196811305&amp;s=29560" style="height:100vh;" width="100%"></iframe>
</body></html>
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
dominik
  • 61
  • 1
  • 2
  • 8
  • You need to either solve the captcha or try harder to not look like a bot. There's lots of answers here for doing it either way. – pguardiario Mar 21 '21 at 01:09
  • @pguardiario I have used fake-user with the same result. How can I better imitate the user? – dominik Apr 10 '21 at 12:49

0 Answers0