How to bypass Mod_Security while scraping

Question

I tried running this Python script using BeautifulSoup and requests modules :

from bs4 import BeautifulSoup as bs
import requests

url = 'https://udemyfreecourses.org/
headers = {'UserAgent' : 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.193 Safari/537.36'
}
soup = bs(requests.get(url, headers= headers).text, 'lxml')

But when I send this line :

print(soup.get_text())

It doesn't scrape the text data but instead, It returns this output:

Not Acceptable!Not Acceptable!An appropriate representation of the requested resource could not be found on this server. This error was generated by Mod_Security.

I even even used headers when requesting the webpage, so It can looks like a normal navigator, but I'm still getting this message that's preventing me from accessing the real webpage

Note : The webpage is working perfectly on the navigator directly, but It doesn't show much info when I try to scrape it.

Is there any other way than the one I used with headers that can get a perfect valid request from the website and bypass this security called Mod_Security?

Any help would be very very helpful, Thanks.

ModSecurity is a web application firewall which can be configured by rules and it is smart enough not to tell you which rule was hit to reject your traffic. I guess in your case the website wants tell you that it does not like to be scraped. — Klaus D., Dec 27 '20 at 14:10

wuerfelfreak · Accepted Answer · 2020-12-27T15:47:16.800

2

EDIT: The Dash in "User-Agent" is essential.

Following this Answer https://stackoverflow.com/a/61968635/8106583

headers = {
     'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:55.0) Gecko/20100101 Firefox/55.0',
}

Your User-Agent is the problem. This User-Agent works for me.

Also: Your ip might be blocked by now :D

edited Dec 27 '20 at 15:47

answered Dec 27 '20 at 14:27

wuerfelfreak

2,363
1
14
29

So, is it really about MacOS vs Linux, which is unlikely? Or User-Agent does need a dash? – Tearo Dactyl Dec 27 '20 at 14:50
Propably the latter. Didn't catch that either. I just tried a different one – wuerfelfreak Dec 27 '20 at 14:53
1

It's because of the missing dash. Try it! – Coco Dec 27 '20 at 15:16
Thank you all so much – Dec 27 '20 at 16:09

How to bypass Mod_Security while scraping

1 Answers1